Large-scale data analysis on aviation accident database using different data mining techniques

A.B. Arockia Christopher; V. Shunmughavel Vivekanandam; A.B. Antony Anderson; S. Markkandeyan; V. Sivakumar

doi:10.1017/aer.2016.107

Large-scale data analysis on aviation accident database using different data mining techniques

Published online by Cambridge University Press: 22 November 2016

A.B. Arockia Christopher ,

V. Shunmughavel Vivekanandam ,

A.B. Antony Anderson ,

S. Markkandeyan and

V. Sivakumar

Show author details

A.B. Arockia Christopher*: Affiliation:
Department of Information Technology, VSB Engineering College, Karur, Tamil Nadu, India
V. Shunmughavel Vivekanandam*: Affiliation:
Department of Computer Science and Engineering, VSB Engineering College, Karur, Tamil Nadu, India
A.B. Antony Anderson*: Affiliation:
Department of Computer Science and Engineering, PVP College of Technology and Engineering for Women, Dindigul, Tamil Nadu, India
S. Markkandeyan*: Affiliation:
Department of IT, RVS College of Engineering, Dindigul, Tamil Nadu, India
V. Sivakumar*: Affiliation:
Department of Electrical and Electronics Engineering, VSB Engineering College, Karur, Tamil Nadu, India
*: Email: abchristomca@yahoo.com
Email: shunsvel@gmail.com
Email: tony_br@rediffmail.com
Email: markkan_s@rediffmail.com
Email: sivamugunthan13@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Data mining is an iterative process in which progress is defined by discovery through either automatic or manual methods. A data cleaning procedure is proposed to improve the quality of classification tasks in the knowledge discovery process by taking into account both redundant and conflicting data. The redundancy check is performed on the original dataset and the resultant dataset is preserved. This resultant dataset is then checked for conflicting data and, if any are found, they are corrected and updated on the original aircraft dataset. This updated dataset is then classified using a variety of classifiers such as Bayes, functions, lazy, MISC, rules and decision trees. The performance of the updated datasets on these classifiers is examine, and the result shows a significant improvement in the classification accuracy after redundancy and conflicts are removed. The conflicts after correction are updated in the original dataset, and when the performance of the classifier is evaluated, great improvement is observed. This paper aims to address how data mining techniques can be used to understand complex system accidents in the aviation domain. Decision trees are considered to be the one of the most powerful and popular approaches in knowledge discovery and data mining. The objective is to develop a classification model for aviation risk investigation and reduction using a decision tree induction method that enhances the ability to form decision trees and thereby proves that the classification accuracy of decision trees is greater. Different feature selectors are used in this study in order to reduce the number of initial attributes.

Keywords

data mining feature selection classification techniques decision tree aviation

Information

Type: Research Article
Information: The Aeronautical Journal , Volume 120 , Issue 1234 , December 2016 , pp. 1849 - 1866

DOI: https://doi.org/10.1017/aer.2016.107 [Opens in a new window]
Copyright: Copyright © Royal Aeronautical Society 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

1. Aircraft accident database dataset retrieved from the website: http//www.planecrashinfo.com/database.htm.Google Scholar

2. Altidor, W., Khoshgoftaar, T.M., Van Hulse, J. and Napolitano, A. Ensemble feature ranking methods for data intensive computing applications, Handbook of Data Intensive Computing, 2011, Springer Science Business Media LLC, New York, New York, US, pp 349–376.CrossRef Google Scholar

3. Antonio, A.-A., Benitez, J.M. and Castro, J.L. Consistency measures for feature selection, J. Intelligent Information Systems, 2008, 30, (3), pp 273–292.Google Scholar

4. Appiah-Adu, K., Fyall, A. and Singh, S. Marketing culture and customer retention in the tourism industry, Services Industries J, 2000, 20, (2), pp 95–114.CrossRef Google Scholar

5. Asha, G.K., Manjunath, A.S. and Jayaram, M.A. A comparative study of attribute selection using gain ratio and correlation based feature selection, Int. J. Information Technology and Knowledge Management, 2012, 2, pp 271–277.Google Scholar

6. Bineid, M. and Fielding, J.P. Development of a civil aircraft dispatch reliability prediction methodology, Aircr Engineering and Aerospace Technology, 2003, 75, (6), pp 588–594.CrossRef Google Scholar

7. Chen, Y., Li, Y., Cheng, X., Guo, L., Lipmaa, H., Yung, M. and Lin, D. Survey and taxonomy of feature selection algorithms in intrusion detection system, Inscrypt, 2006, LNCS 4318, Springer-Verlag, Berlin, Germany, pp 153–167.Google Scholar

8. Dessureault, S, Sinuhaji, A. and Coleman, P. Data mining mine safety data, Mining Engineering, 2007, 59, (8), pp 64.Google Scholar

9. Donoho, D. For most large underdetermined systems of linear equations the minimal l¹-norm near-solution is also the sparsest solution, Communications on Pure and Applied Mathematics, 2006, 59, pp 907–934.CrossRef Google Scholar

10. Duch, W., Winiarski, T., Biesiada, J. and Kachel, A. Feature ranking, selection and discretization, International Conference on Artificial Neural Networks (ICANN) and Int. Conf. on Neural Information Processing (ICONIP), 2003, 251–254.Google Scholar

11. Gürbüz, F., Özbakir, L. and Yapici, H. Classification rule discovery for the aviation incidents resulted in fatality, Knowledge-Based Systems, 2009, 22, pp 622–632.CrossRef Google Scholar

12. Gürbüz, F., Özbakir, L. and Yapici, H. Data mining and preprocessing application on component reports of an airline company in Turkey, Expert Systems with Applications, 2011, 38, pp 6618–6626.CrossRef Google Scholar

13. Guyon, I. and Elisseeff, A. An introduction to variable and feature selection, J Machine Learning Research, 2003, pp 1157–1182.Google Scholar

14. Han, J. and Kamber, M. Data Mining: Concepts and Techniques, 2001, Morgan Kaufmann Publishers Inc., San Francisco, California, US.Google Scholar

15. Ienco, D., Pensa, R.G. and Meo, R. Context-Based Distance Learning for Categorical Data Clustering, IDA 2009, LNCS 5772, Springer, Berlin, Germany, pp 83–94.Google Scholar

16. Jiawei, H. and Kamber, M. Data Mining: Concepts and Techniques, 2011, Morgan Kaufmann Publishers Inc., San Francisco, California, US.Google Scholar

17. Liu, H. and Motoda, H. Computational Methods of Feature Selection, 2007, Chapman and Hall/CRC Press, Boca Raton, Florida, US.CrossRef Google Scholar

18. Grimaldi, G., Cunningham, P. and Kokaram, A. An evaluation of alternative feature selection strategies and ensemble techniques for classifying music, Workshop on Multimedia Discovery and Mining, Ireland, 2003.Google Scholar

19. Miller, A. Subset Selection in Regression, 2nd ed, 2002, Chapman and Hall/CRC Press, Boca Raton, Florida, US.CrossRef Google Scholar

20. Mitra, P., Murthy, C.A. and Pal, S.K. Unsupervised feature selection using feature similarity, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24, (3), pp 301–312.CrossRef Google Scholar

21. Narasimha, M. and Devi, V.S. Pattern Recognition: An Algorithmic Approach, 2011, Springer, pp 86–97.Google Scholar

22. Nazeri, Z. and Zhang, J. Mining aviation data to understand impacts of severe weather on airspace system performance, Procedings of the International Conference on Information Technology, Las Vegas, Nevada, US, 2002.Google Scholar

23. Ng, A.Y. Feature selection, l1 vs. l2 regularization, and rotational invariance, 21st International Conference on Machine Learning, 2004, ACM, New York, New York, US.Google Scholar

24. Pizzi, N.J. and Pedrycz, W. Effective classification using feature selection and fuzzy integration, Fuzzy Sets and Systems, 2008.CrossRef Google Scholar

25. Berry, M. and Browne, M. Lecture Notes in Data Mining, 2006, World Scientific Publishing Co., Singapore, pp 87–97.CrossRef Google Scholar

26. Shyur, H.J. A quantitative model for aviation safety risk assessment, Computers and Industrial Engineering, 2008, 54, (1), pp 34–44.CrossRef Google Scholar

27. Solomon, S., Nguyen, H., Liebowitz, J. and Agresti, W. Using data mining to improve traffic safety programs, Industrial Management and Data Systems, 2006, 106, (5), pp 621–643.CrossRef Google Scholar

28. Delimata, P. and Suraj, Z. Data mining exploration system for feature selection tasks, International Conference on Hybrid Information Technology, November 2006, Cheju Island, Korea.Google Scholar

29. Wang, H., Khoshgoftaar, T.M. and Napolitano, A. A comparative study of ensemble feature selection techniques for software defect prediction, Proc of the Ninth International Conference on Machine Learning and Applications, 2010, Washington, DC, US, pp 135-140.CrossRef Google Scholar

30. Kim, Y., Street, W.N. and Menczer, F. Feature Selection in Data Mining, 2003, IGI Publishing, Hershey, Pennsylvania, US.CrossRef Google Scholar

31. Saeys, Y., Inza, I. and LarraÑaga, P. A review of feature selection techniques in bioinformatics, Bioinformatics, 2007, 23, (19), pp 2507–2517.CrossRef Google Scholar PubMed

Article contents

Large-scale data analysis on aviation accident database using different data mining techniques

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests