Predicting Employee Intent-to-Stay from Engagement Survey Data: An Interpretable, Class Imbalanced Machine Learning Case Study
Keywords:
employee retention, Intent-to-stay, Human resource analytics, machine learning, CRISP-DMAbstract
Employee retention is a strategic need for capital intensive firms, such as state-owned power enterprises, whose service continuity throughout the upstream to downstream value chain relies on a stable and engaged workforce. Most predictive HR studies focus on attrition; however, a proactive approach necessitates recognizing employees whose commitment to remain is not firmly established, allowing for retention initiatives to commence prior to the escalation of disengagement. This research establishes an interpretable machine learning framework to forecast employee desire to remain, based on the CRISP-DM approach. The organization wide Employee Engagement Survey, comprising 32,907 respondents and 102 engineered predictors across 53 engagement items and demographic attributes, involves preprocessing, exploratory analysis, dimensionality reduction (PCA and t-SNE), K Means clustering, supervised classification, multi metric evaluation, and permutation based interpretability. The aim is highly skewed as only 6% of employees reported less than full commitment to stay. The evaluation is therefore focused on ROC AUC, recall and precision recall (PR AUC) and not accuracy. Six algorithms were evaluated. Logistic Regression found the optimal balance (ROC AUC = 0.853, recall = 0.758, PR AUC = 0.318) accurately identifying about 75% of employees not fully committed. Interpretability study identified proximity to retirement age, confidence in the company's future, feeling of vitality at work and achievement of career goals as most significant determinants.. The contribution is not a novel algorithm but rather the insight revealed by this analysis: proximity to retirement is the predominant factor, causing a simplistic model to disproportionately identify senior employees. This illustrates that proactive intent to stay predictions should be regarded as interpretable decision support rather than solely an accuracy driven endeavor.
Downloads
Published
How to Cite
Issue
Section
Copyright (c) 2026 Bagus Satrio Diharjo, Ira Puspitasari, Agustinus Titis Iswara

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.


