Skip to content

mahayasa/ctgan-enn-cs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sample Image Sample Image Sample Image

Optimized Customer Churn Prediction Using Tabular GAN-Based Hybrid Sampling Method and Cost-Sensitive Learning

DOI PyPI version PyPI version Python Version Open in Colab

Introduction

we aim to enhance the performance of several classical machine learning algorithms, including Decision Trees (DT), Logistic Regression (LR), and Support Vector Machines (SVM), in customer churn prediction tasks using CTGAN-ENN and cost-sensitive learning method.

Objectives

  • Optimizing customer churn prediction result by extend the CTGAN-ENN with cost-sensitive learning perspective
  • Evaluate the performance of prediction using F1-score, AUC, and G-Mean metric
  • Ivestigate how robust is the classical machine learning algorithm on this area

Methodology

Sample Image

CTGAN was used to generates synthetic data to augment the minority class, resulting in a set of generated data. The new dataset is further processed with ENN which aims to remove noisy or ambiguous instances by identifying and eliminating overlapping data points. The details of framework result can be accessed here



Sample Image

The objective of CostLearnGAN is fine-tuning on hyperparameter of the lassifier in 𝑐𝑙𝑎𝑠𝑠_𝑤𝑒𝑖𝑔ℎ𝑡 hyperparameter. This involves adjusting the weights assigned to the classes in the loss function to handle class

Results

CostLearnGAN framework surpasses the CTGAN-ENN on average on AUC-ROC, F1-Score, and G-Mean evaluation metrics, the result also shows CTGAN-ENN-CS was more robust than CTGAN-ENN in all classical machine learning algorithms.

Key Findings

  • Cost-Sensitive learning was able to improve hybrid sampling method on classical machine learning (DT,SVM,LR)
  • CostLearnGAN improved performance on customer churn prediction on AUC, F1-Score and G-Mean metrics
  • CostLearnGAN was the most robust performance on all algorithm

Future Work

Experiments another hybrid combination on CTGAN method such as adding anomaly detection method on CTGAN to make sure the synthetic data produced are not outliers.


This research was conducted as part of ASEAN GMS grant and part of AIDA (Applied Intelligence and Data Analytics) lab in College of Computing, Khon Kaen University, Thailand. This study also conducted in collaboration with Rebecca Lab, Feng Chia University, Taiwan.

Cite this work

@misc{costlearngan,
  author = {I Nyoman Mahayasa Adiputra, Paweena Wanchai, Pei-Chun Lin},
  title = {Optimized customer churn prediction using tabular generative adversarial network (GAN)-based hybrid sampling method and cost-sensitive learning},
  year = {2025},
  url = {https://doi.org/10.7717/peerj-cs.2949}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors