Hostname: page-component-848d4c4894-5nwft Total loading time: 0 Render date: 2024-06-03T07:32:23.260Z Has data issue: false hasContentIssue false

Large-scale data analysis on aviation accident database using different data mining techniques

Published online by Cambridge University Press:  22 November 2016

A.B. Arockia Christopher*
Affiliation:
Department of Information Technology, VSB Engineering College, Karur, Tamil Nadu, India
V. Shunmughavel Vivekanandam*
Affiliation:
Department of Computer Science and Engineering, VSB Engineering College, Karur, Tamil Nadu, India
A.B. Antony Anderson*
Affiliation:
Department of Computer Science and Engineering, PVP College of Technology and Engineering for Women, Dindigul, Tamil Nadu, India
S. Markkandeyan*
Affiliation:
Department of IT, RVS College of Engineering, Dindigul, Tamil Nadu, India
V. Sivakumar*
Affiliation:
Department of Electrical and Electronics Engineering, VSB Engineering College, Karur, Tamil Nadu, India

Abstract

Data mining is an iterative process in which progress is defined by discovery through either automatic or manual methods. A data cleaning procedure is proposed to improve the quality of classification tasks in the knowledge discovery process by taking into account both redundant and conflicting data. The redundancy check is performed on the original dataset and the resultant dataset is preserved. This resultant dataset is then checked for conflicting data and, if any are found, they are corrected and updated on the original aircraft dataset. This updated dataset is then classified using a variety of classifiers such as Bayes, functions, lazy, MISC, rules and decision trees. The performance of the updated datasets on these classifiers is examine, and the result shows a significant improvement in the classification accuracy after redundancy and conflicts are removed. The conflicts after correction are updated in the original dataset, and when the performance of the classifier is evaluated, great improvement is observed. This paper aims to address how data mining techniques can be used to understand complex system accidents in the aviation domain. Decision trees are considered to be the one of the most powerful and popular approaches in knowledge discovery and data mining. The objective is to develop a classification model for aviation risk investigation and reduction using a decision tree induction method that enhances the ability to form decision trees and thereby proves that the classification accuracy of decision trees is greater. Different feature selectors are used in this study in order to reduce the number of initial attributes.

Type
Research Article
Copyright
Copyright © Royal Aeronautical Society 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

1. Aircraft accident database dataset retrieved from the website: http//www.planecrashinfo.com/database.htm.Google Scholar
2. Altidor, W., Khoshgoftaar, T.M., Van Hulse, J. and Napolitano, A. Ensemble feature ranking methods for data intensive computing applications, Handbook of Data Intensive Computing, 2011, Springer Science Business Media LLC, New York, New York, US, pp 349376.CrossRefGoogle Scholar
3. Antonio, A.-A., Benitez, J.M. and Castro, J.L. Consistency measures for feature selection, J. Intelligent Information Systems, 2008, 30, (3), pp 273292.Google Scholar
4. Appiah-Adu, K., Fyall, A. and Singh, S. Marketing culture and customer retention in the tourism industry, Services Industries J, 2000, 20, (2), pp 95114.CrossRefGoogle Scholar
5. Asha, G.K., Manjunath, A.S. and Jayaram, M.A. A comparative study of attribute selection using gain ratio and correlation based feature selection, Int. J. Information Technology and Knowledge Management, 2012, 2, pp 271277.Google Scholar
6. Bineid, M. and Fielding, J.P. Development of a civil aircraft dispatch reliability prediction methodology, Aircr Engineering and Aerospace Technology, 2003, 75, (6), pp 588594.CrossRefGoogle Scholar
7. Chen, Y., Li, Y., Cheng, X., Guo, L., Lipmaa, H., Yung, M. and Lin, D. Survey and taxonomy of feature selection algorithms in intrusion detection system, Inscrypt, 2006, LNCS 4318, Springer-Verlag, Berlin, Germany, pp 153167.Google Scholar
8. Dessureault, S, Sinuhaji, A. and Coleman, P. Data mining mine safety data, Mining Engineering, 2007, 59, (8), pp 64.Google Scholar
9. Donoho, D. For most large underdetermined systems of linear equations the minimal l1-norm near-solution is also the sparsest solution, Communications on Pure and Applied Mathematics, 2006, 59, pp 907934.CrossRefGoogle Scholar
10. Duch, W., Winiarski, T., Biesiada, J. and Kachel, A. Feature ranking, selection and discretization, International Conference on Artificial Neural Networks (ICANN) and Int. Conf. on Neural Information Processing (ICONIP), 2003, 251–254.Google Scholar
11. Gürbüz, F., Özbakir, L. and Yapici, H. Classification rule discovery for the aviation incidents resulted in fatality, Knowledge-Based Systems, 2009, 22, pp 622632.CrossRefGoogle Scholar
12. Gürbüz, F., Özbakir, L. and Yapici, H. Data mining and preprocessing application on component reports of an airline company in Turkey, Expert Systems with Applications, 2011, 38, pp 66186626.CrossRefGoogle Scholar
13. Guyon, I. and Elisseeff, A. An introduction to variable and feature selection, J Machine Learning Research, 2003, pp 11571182.Google Scholar
14. Han, J. and Kamber, M. Data Mining: Concepts and Techniques, 2001, Morgan Kaufmann Publishers Inc., San Francisco, California, US.Google Scholar
15. Ienco, D., Pensa, R.G. and Meo, R. Context-Based Distance Learning for Categorical Data Clustering, IDA 2009, LNCS 5772, Springer, Berlin, Germany, pp 8394.Google Scholar
16. Jiawei, H. and Kamber, M. Data Mining: Concepts and Techniques, 2011, Morgan Kaufmann Publishers Inc., San Francisco, California, US.Google Scholar
17. Liu, H. and Motoda, H. Computational Methods of Feature Selection, 2007, Chapman and Hall/CRC Press, Boca Raton, Florida, US.CrossRefGoogle Scholar
18. Grimaldi, G., Cunningham, P. and Kokaram, A. An evaluation of alternative feature selection strategies and ensemble techniques for classifying music, Workshop on Multimedia Discovery and Mining, Ireland, 2003.Google Scholar
19. Miller, A. Subset Selection in Regression, 2nd ed, 2002, Chapman and Hall/CRC Press, Boca Raton, Florida, US.CrossRefGoogle Scholar
20. Mitra, P., Murthy, C.A. and Pal, S.K. Unsupervised feature selection using feature similarity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24, (3), pp 301312.CrossRefGoogle Scholar
21. Narasimha, M. and Devi, V.S. Pattern Recognition: An Algorithmic Approach, 2011, Springer, pp 8697.Google Scholar
22. Nazeri, Z. and Zhang, J. Mining aviation data to understand impacts of severe weather on airspace system performance, Procedings of the International Conference on Information Technology, Las Vegas, Nevada, US, 2002.Google Scholar
23. Ng, A.Y. Feature selection, l1 vs. l2 regularization, and rotational invariance, 21st International Conference on Machine Learning, 2004, ACM, New York, New York, US.Google Scholar
24. Pizzi, N.J. and Pedrycz, W. Effective classification using feature selection and fuzzy integration, Fuzzy Sets and Systems, 2008.CrossRefGoogle Scholar
25. Berry, M. and Browne, M. Lecture Notes in Data Mining, 2006, World Scientific Publishing Co., Singapore, pp 8797.CrossRefGoogle Scholar
26. Shyur, H.J. A quantitative model for aviation safety risk assessment, Computers and Industrial Engineering, 2008, 54, (1), pp 3444.CrossRefGoogle Scholar
27. Solomon, S., Nguyen, H., Liebowitz, J. and Agresti, W. Using data mining to improve traffic safety programs, Industrial Management and Data Systems, 2006, 106, (5), pp 621643.CrossRefGoogle Scholar
28. Delimata, P. and Suraj, Z. Data mining exploration system for feature selection tasks, International Conference on Hybrid Information Technology, November 2006, Cheju Island, Korea.Google Scholar
29. Wang, H., Khoshgoftaar, T.M. and Napolitano, A. A comparative study of ensemble feature selection techniques for software defect prediction, Proc of the Ninth International Conference on Machine Learning and Applications, 2010, Washington, DC, US, pp 135-140.CrossRefGoogle Scholar
30. Kim, Y., Street, W.N. and Menczer, F. Feature Selection in Data Mining, 2003, IGI Publishing, Hershey, Pennsylvania, US.CrossRefGoogle Scholar
31. Saeys, Y., Inza, I. and LarraÑaga, P. A review of feature selection techniques in bioinformatics, Bioinformatics, 2007, 23, (19), pp 25072517.CrossRefGoogle ScholarPubMed