Classification of Control and Neurodegenerative Disease Subjects Using Tree Based Classifiers

Main Article Content

Syed Ahsin Ali Shah
Nazneen Habib
Wajid Aziz
Ehsan Ullah Khan
Malik Sajjad Ahmed Nadeem


Background: The medical researchers are developing different non-invasive methods for early detection of Neurodegenerative Diseases (NDDs) when pharmacological interventions are still possible to further prevent the disease progression. The NDDs are associated with the degradation in the complex gait dynamics and motor activity. The classification of gait data using machine learning techniques can assist the physicians for early diagnosis of the neural disorder when clinical manifestation of the diseases is not yet apparent.

Aims: The present study was undertaken to classify the control and NDD subjects using decision trees based classifiers (Random Forest (RF), J48 and REPTree).

Methodology: The data used in the study comprises of 16 control, 20 Huntington’s Disease (HD), 15 Parkinson’s Disease (PD), and 13 Amyotrophic Lateral Sclerosis (ALS) subjects, which were taken from publicly available database from Physionet. The age range of control subjects was 20-74, HD subjects was 36-70, PD subjects was 44-80, and ALS subjects was 29-71. There were 13 attributes associated with the data. Important features/attributes of the data were selected using correlation feature selection - subset evaluation (cfs) method. Three tree based machine learning algorithms (RF, J48 and REPTree) were used to classify the control and NDD subjects. The performance of classifiers were evaluated using Precision, Recall, F-Measure, MAE and RMSE.

Results: In order to evaluate the performance of tree based classifiers, two different settings of data i.e. complete features and selected features were used. In classifying control vs HD subjects, RF provides the robust separation with classification accuracy of 84.79% using complete features and 83.94% using selected features. While in classifying control vs PD subjects, and control vs ALS subjects, RF also provides the best separation with classification accuracy of 86.51% and 94.95% respectively using complete features and 85.19% and 93.64% respectively using selected features.

Conclusion: The variability analysis of physiological signals provides a valuable non-invasive tool for quantifying the system of dynamics of healthy subjects and to examine the alternations in the controlling mechanism of these systems with aging and disease. It is concluded that selected features encode adequate information about neural control of the gait. Moreover, the selected features along with tree based machine learning algorithms can play a vital for early detection of NDDs, when pharmacological interventions are still possible.

Decision trees classifiers, machine leaning, neurodegenerative diseases, stride interval.

Article Details

How to Cite
Shah, S. A. A., Habib, N., Aziz, W., Khan, E. U., & Nadeem, M. S. A. (2020). Classification of Control and Neurodegenerative Disease Subjects Using Tree Based Classifiers. Journal of Pharmaceutical Research International, 32(11), 63-73.
Original Research Article


Meriggi P, Castiglioni P, Rizzo F, Gower V, Andrich R, Rabuffetti M, Ferrarin M, Di Rienzo M. Potential role of wearable, ambulatory and home monitoring systems for patients with neurodegenerative diseases and their caregivers. In Pervasive Computing Technologies for Healthcare (Pervasive Health), 2011 5th International Conference. IEEE. 2011;316-319.

Agrawal M, Biswas A. Molecular diagnostics of neurodegenerative disorders. Frontiers in molecular biosciences. 2015;2:54.

Harter A, Hopper A, Steggles P, Ward A, Webster P. The anatomy of a context-aware application. Wireless Networks. 2002;8(2/3):187-97.

Iram S. Early detection of neurodegenerative diseases from bio-signals: A machine learning approach (Doctoral dissertation, Liverpool John Moores University).

Chung S, Sonntag KC, Andersson T, Bjorklund LM, Park JJ, Kim DW, Kang UJ, Isacson O, Kim KS. Genetic engineering of mouse embryonic stem cells by Nurr1 enhances differentiation and maturation into dopaminergic neurons. European Journal of Neuroscience. 2002;16(10): 1829-38.

Hausdorff JM, Lertratanakul A, Cudkowicz ME, Peterson AL, Kaliton D, Goldberger AL. Dynamic markers of altered gait rhythm in amyotrophic lateral sclerosis. Journal of applied physiology. 2000;88(6): 2045-53.

Han J, Jeon HS, Jeon BS, Park KS. Gait detection from three dimensional acceleration signals of ankles for the patients with Parkinson’s disease. In Proceedings of the IEEE. The International Special Topic Conference on Information Technology in Biomedicine, Ioannina, Epirus, Greece 2006;2628.

Hausdorff JM, Peng CK, Ladin ZV, Wei JY, Goldberger AL. Is walking a random walk? Evidence for long-range correlations in stride interval of human gait. Journal of Applied Physiology. 1995;78(1):349-58.

Aziz W, Arif M. Complexity analysis of stride interval time series by threshold dependent symbolic entropy. European Journal of Applied Physiology. 2006;98(1): 30-40.

Aziz W, Arif M. Genetically optimized hybrid gait dynamics classifier. In 2006 International Conference on Emerging Technologies. IEEE. 2006;765-770.

Abbasi AQ, Loun WA. Symbolic time series analysis of temporal gait dynamics. Journal of Signal Processing Systems. 2014;74(3):417-422.

Qumar A, Aziz W, Saeed S, Ahmed I, Hussain L. Comparative study of multiscale entropy analysis and symbolic time series analysis when applied to human gait dynamics. In 2013 International Conference on Open Source Systems and Technologies. IEEE. 2013;126-132.

Zheng H, Yang M, Wang H, McClean S. Machine learning and statistical approaches to support the discrimination of neuro-degenerative diseases based on gait analysis. In Intelligent patient Management. Springer, Berlin, Heidelberg. 2009;57-70.

Barton JG, Lees A. An application of neural networks for distinguishing gait patterns on the basis of hip-knee joint angle diagrams. Gait & Posture. 1997; 5(1):28-33.

Xia Y, Gao Q, Ye Q. Classification of gait rhythm signals between patients with neuro-degenerative diseases and normal subjects: Experiments with statistical features and different classification models. Biomedical Signal Processing and Control. 2015;18:254-62.

Aydin F, Aslan Z. Classification of neurodegenerative diseases using machine learning methods. International Journal of Intelligent Systems and Applications in Engineering. 2017;1(5):1-9.

Wasan SK, Bhatnagar V, Kaur H. The impact of data mining techniques on medical diagnostics. Data Science Journal. 2006;5:119-26.

Xu G, Zong Y, Yang Z. Applied data mining. CRC Press; 2013.

Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. Physio Bank, Physio Toolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation. 2000;101(23):e215-20.

Hall MA, Smith LA. Feature subset selection: A correlation based filter approach. In Proceedings of International Conference on Neural Information Processing and Intelligent Information Systems, Berlin: Springer. 1997;855-858.

Breiman L. Random forests. Machine Learning. 2001;45(1):5-32.

Tomas P, Krohova J, Dohnalek P, Gajdos P. Classification of cardiotocography records by random forest. In Telecommunications and Signal Processing (TSP) 2013. 36th International Conference. IEEE. 2013;620-923.

Quinlan JR. Simplifying decision trees. International Journal of Man-machine Studies. 1987;27(3):221-34.

Witten IH, Frank E, Hall MA, Pal CJ. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann; 2016.

Rokach L, Maimon O. Top-down induction of decision trees classifiers-a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2005;35(4):476-87.

Alpaydin E. Introduction to machine learning. MIT Press; 2009.

Quinlan JR. Induction of decision trees. Machine Learning. 1986;1(1):81-106.

Quinlan JR. C4. 5: Programs for machine learning. Elsevier; 2014.