The Art of Feature Engineering: Essentials for Machine Learning

Pablo Duboue

doi:10.1017/9781108671682

The Art of Feature Engineering

Essentials for Machine Learning

- Get access
  
  Buy a print copy
  
  Check if you have access via personal or institutional login
  
  Log in Register
Cited by 72
Cited by
- 72
Crossref Citations

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Lukyanenko, Roman Storey, Veda C. and Pastor, Oscar 2020. Article 8. Fondements des technologies de l’information d’après la philosophie systémiste de la réalité de Bunge. Mεtascience, Vol. N° 2, Issue. 2, p. 175.

CrossRef

Google Scholar

Patel, Jayesh 2020. The Democratization of Machine Learning Features. p. 136.

CrossRef

Google Scholar

Patel, Jayesh 2020. Unification of Machine Learning Features. p. 1201.

CrossRef

Google Scholar

Lukyanenko, Roman Storey, Veda C. and Pastor, Oscar 2021. Foundations of information technology based on Bunge’s systemist philosophy of reality. Software and Systems Modeling, Vol. 20, Issue. 4, p. 921.

CrossRef

Google Scholar

Rodríguez-Collado, Alejandro and Rueda, Cristina 2021. Electrophysiological and Transcriptomic Features Reveal a Circular Taxonomy of Cortical Neurons. Frontiers in Human Neuroscience, Vol. 15, Issue. ,

CrossRef

Google Scholar

Wang, Xinlei and Zhi, Jianing 2021. A machine learning-based analytical framework for employee turnover prediction. Journal of Management Analytics, Vol. 8, Issue. 3, p. 351.

CrossRef

Google Scholar

Cabral, J. B. Lares, M. Gurovich, S. Minniti, D. and Granitto, P. M. 2021. Drifting features: Detection and evaluation in the context of automatic RR Lyrae identification in the VVV. Astronomy & Astrophysics, Vol. 652, Issue. , p. A151.

CrossRef

Google Scholar

Sikelis, Konstantinos Tsekouras, George E. and Kotis, Konstantinos 2021. Ontology-Based Feature Selection: A Survey. Future Internet, Vol. 13, Issue. 6, p. 158.

CrossRef

Google Scholar

Lawler, Robin Liu, Yao-Hao Majaya, Nessa Allam, Omar Ju, Hyunchul Kim, Jin Young and Jang, Seung Soon 2021. DFT-Machine Learning Approach for Accurate Prediction of pKa. The Journal of Physical Chemistry A, Vol. 125, Issue. 39, p. 8712.

CrossRef

Google Scholar

Arias, Victor A. Vargas-Machuca, Juan Zegarra, Fabio C. and Coronado, Alberto M. 2021. Convolutional Neural Network Classification for Machine Tool Wear Based on Unsupervised Gaussian Mixture Model. p. 1.

CrossRef

Google Scholar

Betancourt, Clara Stomberg, Timo Roscher, Ribana Schultz, Martin G. and Stadtler, Scarlet 2021. AQ-Bench: a benchmark dataset for machine learning on global air quality metrics. Earth System Science Data, Vol. 13, Issue. 6, p. 3013.

CrossRef

Google Scholar

D'Amato, Vincenzo Oneto, Luca Camurri, Antonio and Anguita, Davide 2021. Keep it Simple: Handcrafting Feature and Tuning Random Forests and XGBoost to face the Affective Movement Recognition Challenge 2021. p. 1.

CrossRef

Google Scholar

Casusol, Alexis J. Zegarra, Fabio C. Vargas-Machuca, Juan and Coronado, Alberto M. 2021. Optimal window size for the extraction of features for tool wear estimation. p. 1.

CrossRef

Google Scholar

Zhao, Jingyuan Nan, Jinrui Wang, Junbin Ling, Heping Lian, Yubo and Burke, Andrew 2022. Battery Diagnosis: A Lifelong Learning Framework for Electric Vehicles. p. 1.

CrossRef

Google Scholar

Mayer, Stephanie van Herwijnen, Alec Techel, Frank and Schweizer, Jürg 2022. A random forest model to assess snow instability from simulated snow stratigraphy. The Cryosphere, Vol. 16, Issue. 11, p. 4593.

CrossRef

Google Scholar

Mehrabi, Mohammad Javad and Rafe, Vahid 2022. Using deep reinforcement learning to search reachability properties in systems specified through graph transformation. Soft Computing, Vol. 26, Issue. 18, p. 9635.

CrossRef

Google Scholar

Zegarra, Fabio C. Vargas-Machuca, Juan and Coronado, Alberto M. 2022. Tool wear and remaining useful life (RUL) prediction based on reduced feature set and Bayesian hyperparameter optimization. Production Engineering, Vol. 16, Issue. 4, p. 465.

CrossRef

Google Scholar

Wagner, Gerit Lukyanenko, Roman and Paré, Guy 2022. Artificial intelligence and the conduct of literature reviews. Journal of Information Technology, Vol. 37, Issue. 2, p. 209.

CrossRef

Google Scholar

Chicco, Davide Oneto, Luca Tavazzi, Erica and Ouellette, Francis 2022. Eleven quick tips for data cleaning and feature engineering. PLOS Computational Biology, Vol. 18, Issue. 12, p. e1010718.

CrossRef

Google Scholar

Mangla, Chaitanya Holden, Sean B. and Paulson, Lawrence C. 2022. Automated Reasoning. Vol. 13385, Issue. , p. 559.

CrossRef

Google Scholar

Download full list

Pablo Duboue, Textualization Software Ltd.

Publisher:: Cambridge University Press
Online publication date:: May 2020
Print publication year:: 2020
Online ISBN:: 9781108671682
DOI:: https://doi.org/10.1017/9781108671682

Subjects:: Communications and Signal Processing, Engineering, Computer Science, Pattern Recognition and Machine Learning

US$59.00

Digital access for individuals
(PDF download and/or read online)
Add to cart

Added to cart

Digital access for individuals
(PDF download and/or read online)
View cart
Export citation
Buy a print copy

Information

Contents

Metrics

Accessibility

When machine learning engineers work with data sets, they may find the results aren't as good as they need. Instead of improving the model or collecting more data, they can use the feature engineering process to help improve results by modifying the data's features to better capture the nature of the problem. This practical guide to feature engineering is an essential addition to any data scientist's or machine learning engineer's toolbox, providing new ideas on how to improve the performance of a machine learning solution. Beginning with the basic concepts and techniques, the text builds up to a unique cross-domain approach that spans data on graphs, texts, time series, and images, with fully worked out case studies. Key topics include binning, out-of-fold estimation, feature selection, dimensionality reduction, and encoding variable-length data. The full source code for the case studies is available on a companion website as Python Jupyter notebooks.

'Pablo Duboue is a true grandmaster of the art and science of feature engineering. His foundational contributions to the creation of IBM Watson were a critical component of its success. Now readers can benefit from his expertise. His book provides deep insights into to how to develop, assess, combine, and enhance machine learning features. Of particular interest to advanced practitioners is his discussion of feature engineering and deep learning; there is a pervasive myth in the industry that deep learning and big data have made feature engineering obsolete, but the book explains why that is often incorrect for real-world computing applications and explains the relationship between building effective features and deep neural network architectures. The book engages with countless other basic and advanced topics in the area of machine learning and feature engineering, making it a valuable resource for machine learning practitioners of all levels of experience.'

J. William Murdock - IBM

'Feature engineering is the process of identifying, selecting and evaluating input variables to statistical and machine learning models for a given problem. Pablo Duboue's The Art of Feature Engineering introduces the process with rich detail from a practitioner’s point of view, and adds new insights through four input data scenarios for the same prediction task. Highly recommended!'

Nelson Correa - Andinum Inc.

'TAoFE is a comprehensive handbook - sure to be a hit with data science practitioners. With highly accessible and didactic explanations of complex concepts, the book represents the state-of-the-art, and shows in practical terms how it applies to a wide range of real-world case studies.'

Gavin Brown - University of Manchester

'This book provides a large catalogue of feature manipulation techniques along with non-trivial examples to illustrate their applicability and impact on performance. It could be suitable as a textbook for an upper level undergrad or graduate text mining or multimodal data analysis class. Recent graduates starting in field data mining and text analysis will find this a useful text.'

Wlodek Zadrozny - University of North Carolina

References

Metrics

Altmetric attention score

Total number of HTML views: 0

Total number of PDF views: 0 *

Loading metrics...

Total views: 0 *

Loading metrics...

* Views captured on Cambridge Core between #date#. This data will be updated every 24 hours.

Usage data cannot currently be displayed.

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.

The Art of Feature Engineering

Essentials for Machine Learning

This Book has been cited by the following publications. This list is generated based on data provided by Crossref.

Book description

Reviews

Refine List

Actions for selected content:

Contents

Frontmatter
pp i-iv

Dedication
pp v-vi

Contents
pp vii-x

Preface
pp xi-xii

Part One - Fundamentals
pp 1-2

1 - Introduction
pp 3-33

2 - Features, Combined: Normalization, Discretization and Outliers
pp 34-58

3 - Features, Expanded: Computable Features, Imputation and Kernels
pp 59-78

4 - Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
pp 79-111

5 - Advanced Topics: Variable-Length Data and Automated Feature Engineering
pp 112-136

Part II - Case Studies
pp 137-138

6 - Graph Data
pp 139-162

7 - Time stamped Data
pp 163-185

8 - Textual Data
pp 186-211

9 - Image Data
pp 212-232

10 - Other Domains: Video, GIS and Preferences
pp 233-245

Bibliography
pp 246-269

Index
pp 270-274

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown

The Art of Feature Engineering

Essentials for Machine Learning

Book description

Reviews

Refine List

Actions for selected content:

Save Search

Contents

Metrics

Altmetric attention score

Full text views

Book summary page views

Accessibility standard: Unknown