A Novel Method to Improve the Efficiency of Classification Phase of a Decision Tree

Naga Muneiah Janapati; Ch. D. V. Subba Rao

pdf (English)

Citacions a Google Acadèmic

Naga Muneiah Janapati

Jawaharlal Nehru Technological University (Andhra Pradesh, Índia). Department of Computer Science and Engineering

Ch. D. V. Subba Rao

Sri Venkateswara University College of Engineering (Andhra Pradesh, Índia). Department of Computer Science and Engineering

So far, most of the research on classification algorithms in machine learning has been focused only on improving the training speed and further improving the technical performance evaluation measures of the constructed models. There is no focus on improving the runtime efficiency of the classification phase which is much required in some critical applications. In this paper, we are considering the computation complexity of a decision tree's classification phase as the major criterion. A novel approach has been proposed to predict the class label of an unseen instance using the decision tree in less time than the regular tree traversal method. In the proposed method, the constructed decision tree is represented in the form of arrays. Then, the process of finding the class label is carried out by performing the bitwise operations between the elements of the arrays and test instance. Empirical results on various UCI data sets proved that the proposed method outperforms the standard method and five other benchmark classifiers and its classification is at least four times faster than the regular method.

Paraules clau

Data mining, Classification, Decision trees

Com citar

Janapati, Naga Muneiah; and Subba Rao, Ch. D. V. “A Novel Method to Improve the Efficiency of Classification Phase of a Decision Tree”. ELCVIA: electronic letters on computer vision and image analysis, vol.VOL 19, no. 3, pp. 38-54, https://raco.cat/index.php/ELCVIA/article/view/375322.

Drets

Referències

A.T. Azar and S.M. El-Metwally, “Decision tree classifiers for automated medical diagnosis”, Neural Comput & Applic, 23(7), pp. 2387-2403, 2013. https://doi.org/10.1007/s00521-012-1196-7

G. Nie, W.L. Rowe, Y. Zhang, Tian, Y. Shi, “Credit card churn forecasting by logistic regression and decision tree”, Expert Syst Appl., 38(12), pp. 15273–15285, 2011. doi: 10.1016/j.eswa.2011.06.028

J. Naga Muneiah and C. Subba Rao, “An Efficient Probability Estimation Decision Tree Postprocessing Method for Mining Optimal Profitable Knowledge for Enterprises with Multi-Class Customers”, Inteligencia Artificial, 22(64), pp. 63-84, 2019. https://doi.org/10.4114/intartif.vol22iss64pp63-84

Naga Muneiah Janapati and Ch. D. V. Subba Rao, “Customer’s class transformation for profit maximization in multi-class setting of Telecom industry using probability estimation decision trees”, Journal of Intelligent & Fuzzy Systems, 37(6), pp. 8167-8197, 2019. DOI: 10.3233/JIFS-190628

Quiang Yang, Jie Yin, Charles Ling, and Rong Pan, “Extracting Actionable knowledge using decision Trees”, IEEE Transactions on Knowledge and Data Engineering, 17(1), pp. 43-56, 2007. doi: 10.1109/TKDE.2007.9

Manuel Fernández-Delgado, Eva Cernadas, Sen´en Barro, and Dinani Amorim, “Do we Need Hundreds of
Classifiers to Solve Real World Classification Problems?”, Journal of Machine Learning Research, vol. 15, pp.3133-3181, 2014.

Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and L. C. Thomas, “Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research”, European Journal of Operational Research, 247(1), pp. 124-136, 2015. DOI: 10.1016/j.ejor.2015.05.030

Marina Sokolova and Guy Lapalme, “A systematic analysis of performance measures for classification tasks”, Information Processing and Management, 45, pp. 427–437, 2009. DOI: 10.1016/j.ipm.2009.03.002

Tjen-Sien Lim and Wei-Yin Loh, “A Comparison of Prediction Accuracy, Complexity, and Training Time of
Thirty-Three Old and New Classification Algorithms”, Machine Learning, 40, 203–228, 2000.
https://doi.org/10.1023/A:1007608224229

M. Reif, F. Shafait, A. Dengel, “Prediction of Classifier Training Time Including Parameter Optimization”, In: Bach J., Edelkamp S. (eds) KI 2011: Advances in Artificial Intelligence. KI 2011, Springer, Berlin, Heidelberg, 2011. https://doi.org/10.1007/978-3-642-24455-1_25

T. Doan, J. Kalita, “Predicting run time of classification algorithms using meta-learning”, Int. J. Mach. Learn. Cybern., 8, pp.1929–1943, 2017.

Long Bing Cao, D. Luo, C. Zhang, “Knowledge actionability: Satisfying technical and business interestingness”, International Journal of Business Intelligence and Data Mining, 2(4), pp. 496-514, 2007. DOI: 10.1504/IJBIDM.2007.016385

M. Panda and A. Abraham, “Hybrid evolutionary algorithms for classification data mining”, Neural Comput & Applic, 26(3), pp. 507-523, 2015. https://doi.org/10.1007/s00521-014-1673-2

L. E. Hamid and S.A.R. Al-Haddad, “Automated Leaf Alignment and Partial Shape Feature Extraction for Plant Leaf Classification”, Electronic Letters on Computer Vision and Image Analysis, 18(1), pp.37-51, 2019. DOI: https://doi.org/10.5565/rev/elcvia.1143

V. Huddar, B. K. Desiraju, V. Rajan, S. Bhattacharya, S. Roy, and C. K. Reddy, “Predicting Complications in Critical Care Using Heterogeneous Clinical Data”, IEEE Access, vol. 4, pp. 7988-8001, 2016. doi: 10.1109/ACCESS.2016.2618775

A. I. Weinberg and M. Last, “Selecting a representative decision tree from an ensemble of decision-tree models for fast big data classification”, J. Big Data, 6(23), 2019. https://doi.org/10.1186/s40537-019-0186-3

A. Ashari, I. Paryudi, and A. Min, “Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool”, Int. J. Adv. Comput. Sci. Appl., 49(11), pp. 33–39, 2013. DOI: 10.14569/IJACSA.2013.041105

P. Hurtik and I. Perfilieva, “Fast Training and Real-Time Classification Algorithm Based on Principal Component Analysis and F-Transform”, Proc. of 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS), IEEE, 275-280, 2018. doi: 10.1109/SCIS-ISIS.2018.00056.

Vivek Seshadri, Kevin Hsieh, Amirali Boroum, Donghyuk Lee, M. A. Kozuch, Onur Mutlu, P. B. Gibbons, T. C. Mowry, “Fast Bulk Bitwise AND and OR in DRAM”, IEEE Computer Architecture Letters, 14( 2), 127-131, 2015. doi: 10.1109/LCA.2015.2434872

Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 2, September 2016.

Randall Hyde, Understanding the Machine, vol. 1, 2nd Ed., No Starch Press, 2020.

J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993.

W. Xindong, J. Vipin Kumar, R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, N. Angus, Bing Liu, S. Philip, Y.Z. Zhou, S. Michael, D.J. Hand, and D. Steinberg, “Top 10 algorithms in data mining. Knowledge and Information Systems”, 14(1), pp. 1-37, 2008. DOI 10.1007/s10115-007-0114-2.

D. Dua, C. Graff, UCI Machine Learning Repository, http://archive.ics.uci.edu/ml. 2019.

Huang J. and Ling C. X., “Using AUC and Accuracy in Evaluating Learning Algorithms,” IEEE Trans. on
Knowledge and Data Engineering, 17(3): 299-310, 2005. DOI: 10.1109/TKDE.2005.50

R.O. Duda, D.G. Starc & P.E. Hart, Pattern classification, Wiley, 2000.

N. S. Altman, “An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression”, The American
Statistician, 46(3), pp.175-185, 1992.

L. Breiman, Random Forests, Machine Learning, 45 (1), pp. 5-32, 2001. https://doi.org/10.1023/A:1010933404324

C. Cortes and V. Vapnik, “Support-vector networks”, Machine Learning, 20, pp. 273–297, 1995.
https://doi.org/10.1007/BF00994018

Y. Freund and R. E. Schapire, “Experiments with a New Boosting Algorithm”, Machine Learning: Proceedings of the Thirteenth International Conference, 1996.

Duleep Rathgamage Don, I. E. Iacob, “DCSVM: fast multi-class classification using support vector machines”, International Journal of Machine Learning and Cybernetics, 11, pp. (433–447), 2020.
https://doi.org/10.1007/s13042-019-00984-9

R. R. Yager, “An extension of the naive Bayesian classifier”, Information Sciences, 176, pp. 577–588, 2006.

Filip Kadlček and Otto Fučík, “Fast and Energy Efficient AdaBoost Classifier”, Proceedings of the 10th FPGAworld Conference, 2, pp.1-5, 2013.

S. Ruggieri, “Efficient C4.5”, IEEE Transactions on Knowledge and Data Engineering, 14(2), pp. 438-444, 2002. doi: 10.1109/69.991727.

J. Han, J. Pei, and M. Kamber, Data Mining: Concepts and Techniques. 3rd edition, Elsevier, 2011.

Article Sidebar

Main Article Content

Article Details