To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Decision-making in the face of uncertainty is a significant challenge in machine learning, and the multi-armed bandit model is a commonly used framework to address it. This comprehensive and rigorous introduction to the multi-armed bandit problem examines all the major settings, including stochastic, adversarial, and Bayesian frameworks. A focus on both mathematical intuition and carefully worked proofs makes this an excellent reference for established researchers and a helpful resource for graduate students in computer science, engineering, statistics, applied mathematics and economics. Linear bandits receive special attention as one of the most useful models in applications, while other chapters are dedicated to combinatorial bandits, ranking, non-stationary problems, Thompson sampling and pure exploration. The book ends with a peek into the world beyond bandits with an introduction to partial monitoring and learning in Markov decision processes.
This book will help readers understand fundamental and advanced statistical models and deep learning models for robust speaker recognition and domain adaptation. This useful toolkit enables readers to apply machine learning techniques to address practical issues, such as robustness under adverse acoustic environments and domain mismatch, when deploying speaker recognition systems. Presenting state-of-the-art machine learning techniques for speaker recognition and featuring a range of probabilistic models, learning algorithms, case studies, and new trends and directions for speaker recognition based on modern machine learning and deep learning, this is the perfect resource for graduates, researchers, practitioners and engineers in electrical engineering, computer science and applied mathematics.
This chapter starts Part II (domain dependent feature engineering) by describing the creation of the base WikiCities dataset employed here and in the next three chapters. This dataset is used to predict population of cities using a semantic graph built from Wikipedia infoboxes. Semantic graphs were chosen as an example of handling and representing variable-length raw data as fixed-length feature vectors, particularly using the techniques discussed in Chapter 5. The intention with this dataset is to provide a task that can be attacked with regular features, with time series features, textual features and image features. The chapter discusses how the dataset come to be, an Exploratory Data Analysis over it, resulting in a base, incomplete featurization. From there, a first featurization was produced, with an error analysis process including feature ablation and mutual information feature utility. From this error analysis, a second featurization is proposed and an error analysis using feature stability concludes the exercise. All insights are captured in the two final feature sets, one conservative and other expected to have higher performance.
This chapter closes Part I presenting advanced topics, including dealing with variable length feature vectors, Feature Engineering and Deep Learning and automatic Feature Engineering (either supervised or unsupervised). It starts bridging the pure domain independent techniques to start drilling into problems of domain-specific importance. Variable length feature vectors has been a problem for fixed-size vector ML ever since. In general, techniques involve truncation, computing the most general tree and encoding paths on it or just destructive projection into a smaller plane. The chapter briefly delve into some Deep Learning concepts and what it entails for feature engineering. Automated Feature Learning using FeatureTools (the DataScience Machine) and genetic programming is covered, also Instance Engineering and Unsupervised Feature Engineering (in the form of autoencoders).
Extending the dataset from chapter 6 with the full Wikipedia page for each settlement, this chapter exemplifies natural language processing techniques to work with textual data. Textual problems are a domain that involves large number of correlated features, with feature frequencies strongly biased by a power law. Such behaviour is very common for many naturally occurring phenomena besides text. The very nature of dealing with sequences means this domain also involves variable length feature vectors.A central theme in this chapter is context and how to supply it within the enhanced feature vector to increase the signal-to-ratio available to the ML. As many times the target information (population) appears within the target page (as high as 53% of the cases in exact form), this problem is closely related to the NLP problem of Information Extraction. The chapter showcases different ways to approach the problem using words-as-features in the bag-of-words paradigm, then proceeding to do heavy feature selection using mutual information, stemming, modelling context with bigrams and skip-bigrams. More advanced topics include feature hashing and embeddings using word2vec.
This chapter deals with the topic of feature expansion and imputation with a particular emphasis on computable features. While poor domain modelling may result in too many features being added to the model, there are times when plenty of value can be gained by looking into generating features from existing ones. The excess features can then be removed using feature selection techniques (discussed in the next chapter). Computable Features will be particularly useful if we know the underlining ML model is unable to do certain operations over the features, like multiplying them (e.g., if the ML involves a simple, linear modelling). Another type of feature expansion involves calculating a best effort approximation of values missing in the data (Feature Imputation). The most straightforward expansion for features happens when the raw data contains multiple items of information under a single column (Decomposing Complex Features). The chapter concludes by borrowing ideas from a technique used in SVMs called the kernel trick. The type of projections that practitioners have found useful can lend themselves to be applied directly without the use of kernels.
This chapter discusses Feature Engineering techniques that look holistically at the feature set, therefore replacing or enhancing the features based on their relation to the whole set of instances and features. Techniques such as normalization, scaling, dealing with outliers and generating descriptive features are covered. Scaling and normalization are the most common, it involves finding the maximum and minimum and changing the values to ensure they will lie in a given interval (e.g., [0, 1] or [−1, 1]). Discretization and binning involve, for example, analyzing a feature that is an integer (any number from -1 trillion to +1 trillion) and realize that it only takes the values 0, 1 and 10 so it can be simplified into a symbolic feature with three values (value0, value1 and value10). Descriptive features is the gathering of information that talks about the shape of the data, the discussion centres around using tables of counts (histograms) and general descriptive features such as maximum, minimum and averages. Outlier detection and treatment refers to looking at the feature values across many instances and realizing some values might present themselves very far from the rest.
This chapter starts with basic definitions such as types of machine learning (supervised vs. unsupervised learning, classifiers vs. regressors), types of features (binary, categorical, discrete, continuos), metrics (precision, recall, f-measure, accuracy, overfitting), and raw data and then defines the machine learning cycle and the feature engineering cycle. The feature engineering cycle hinges on two types of analysis: exploratory data analysis, at the beginning of the cycle and error analysis at the end of each feature engineering cycle. Domain modelling and feature construction concludes the chapter with particular emphasis on feature ideation techniques.
Building on the dataset presented in the previous chapter, this chapter explores using historical information (as represented by previous DBpedia versions) to perform feature engineering using historical features.Working with historical data makes all Feature Engineering complex but the whole concept of “truth,” the immutablity of the target class is challenged. Great feature for a particular class have to become acceptable features for a different class. Topics covered include imputing timestamped data, lagged features and moving window averaging of the data. Due to unavailability of population data for cities, a second dataset revolving around countries is introduced to perform population prediction using time series ARIMA models over 50 years of data, as provided by the world bank. The chapter exemplifies different methods to blend machine learning with time series models, including using their output as another feature or training a model to predict their errors.
The last chapter using the dataset from chapter 6, this time the dataset is expanded using 80 thousand satellite images obtained from NASA for relative humidity density as it relates to vegetation. This image information is not visible information and thus pre-trained models are not useful for this problem. Instead, traditional computer vision techniques are showcased, involving histograms, local feature extraction, gradients and histograms of gradients. This domain exemplifies how to deal with high level of nuisance noise, how to normalize your features so that a small changes in the way the information was acquired does not completely throws off the underlining machine learning mechanism. These takeaways are valuable for anybody working with sensor data where the acquisition process has a high level of uncertainty. The domain also exemplifies working with large number of low-level sensor data.
This chapter presents a staple of Feature Engineering: the automatic reduction of features, either by direction selection or by projection to a smaller feature space.Central to Feature Engineering are efforts to reduce the number of features, as uninformative features bloat the ML model with unnecessary parameters. In turn, too many parameters then either produces suboptimal results, as they are easy to overfit, or require large amounts of training data. These efforts are either by explicitly dropping certain features (feature selection) or mapping the feature vector, if it is sparse, into a lower, denser dimension (dimensionality reduction). There are also cover some algorithms that perform feature selection as part of their inner computation (embedded feature selection or regularization). Feature selection takes the spotlight within Feature Engineering due to its intrinsic utility for Error Analysis. Some techniques such as feature ablation using wrapper methods are used as the starting step before a feature drill down. Moreover, as feature selection helps build understandable models, it intertwines with Error Analysis as the analysis profits from such understandable models.