Header menu link for other important links
X

Feature selection techniques to counter class imbalance problem for aging related bug prediction

Lov Kumar, Ashish Sureka,
Published in Association for Computing Machinery
2018
Abstract

Aging-Related Bugs (ARBs) occur in long running systems due to error conditions caused because of accumulation of problems such as memory leakage or unreleased files and locks. Aging-Related Bugs are hard to discover during software testing and also challenging to replicate. Automatic identification and prediction of aging related fault-prone files and classes in an object oriented system can help the software quality assurance team to optimize their testing efforts. In this paper, we present a study on the application of static source code metrics and machine learning techniques to predict aging related bugs. We conduct a series of experiments on publicly available dataset from two large open-source software systems: Linux and MySQL. Class imbalance and high dimensionality are the two main technical challenges in building effective predictors for aging related bugs. We investigate the application of five different feature selection techniques (OneR, Information Gain, Gain Ratio, RELEIF and Symmetric Uncertainty) for dimensionality reduction and five different strategies (Random Under-sampling, Random Oversampling, SMOTE, SMOTEBoost and RUSBoost) to counter the effect of class imbalance in our proposed machine learning based solution approach. Experimental results reveal that the random under-sampling approach performs best followed by RUSBoost in-terms of the mean AUC metric. Statistical significance test demonstrates that there is a significant difference between the performance of the various feature selection techniques. Experimental results shows that Gain Ratio and RELEIF performs best in comparison to other strategies to address the class imbalance problem. We infer from the statistical significance test that there is no difference between the performances of the five different learning algorithms.

About the journal
JournalData powered by TypesetACM International Conference Proceeding Series
PublisherData powered by TypesetAssociation for Computing Machinery
Open AccessNo