Search results for Computer Science

Preface
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp xi-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

8 - Textual Data
from Part II - Case Studies
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 186-211
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Extending the dataset from chapter 6 with the full Wikipedia page for each settlement, this chapter exemplifies natural language processing techniques to work with textual data. Textual problems are a domain that involves large number of correlated features, with feature frequencies strongly biased by a power law. Such behaviour is very common for many naturally occurring phenomena besides text. The very nature of dealing with sequences means this domain also involves variable length feature vectors.A central theme in this chapter is context and how to supply it within the enhanced feature vector to increase the signal-to-ratio available to the ML. As many times the target information (population) appears within the target page (as high as 53% of the cases in exact form), this problem is closely related to the NLP problem of Information Extraction. The chapter showcases different ways to approach the problem using words-as-features in the bag-of-words paradigm, then proceeding to do heavy feature selection using mutual information, stemming, modelling context with bigrams and skip-bigrams. More advanced topics include feature hashing and embeddings using word2vec.

3 - Features, Expanded: Computable Features, Imputation and Kernels
from Part One - Fundamentals
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 59-78
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter deals with the topic of feature expansion and imputation with a particular emphasis on computable features. While poor domain modelling may result in too many features being added to the model, there are times when plenty of value can be gained by looking into generating features from existing ones. The excess features can then be removed using feature selection techniques (discussed in the next chapter). Computable Features will be particularly useful if we know the underlining ML model is unable to do certain operations over the features, like multiplying them (e.g., if the ML involves a simple, linear modelling). Another type of feature expansion involves calculating a best effort approximation of values missing in the data (Feature Imputation). The most straightforward expansion for features happens when the raw data contains multiple items of information under a single column (Decomposing Complex Features). The chapter concludes by borrowing ideas from a technique used in SVMs called the kernel trick. The type of projections that practitioners have found useful can lend themselves to be applied directly without the use of kernels.

Contents
Ryan Abbott
Book:

The Reasonable Robot

Published online:

15 June 2020

Print publication:

25 June 2020, pp vii-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

6 - Punishing Artificial Intelligence
Ryan Abbott
Book:

The Reasonable Robot

Published online:

15 June 2020

Print publication:

25 June 2020, pp 111-133
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Criminal law falls short in cases where an AI functionally commits a crime and there are no individuals who are criminally liable. This chapter explores potential solutions to this problem, with a focus on holding AI directly criminally liable where it is acting autonomously and irreducibly. Conventional wisdom holds that punishing AI is incongruous with basic criminal law principles such as the capacity for culpability and the requirement for a voluntary act. Drawing on analogies to corporate and strict criminal liability, the chapter shows AI punishment cannot be categorically ruled out with quick theoretical arguments. AI punishment could result in general deterrence and expressive benefits, and it need not run afoul of negative limitations such as punishing in excess of culpability. Ultimately, however, punishing AI is not justified, because it might entail significant costs and would certainly require radical legal changes. Modest changes to existing criminal laws that target persons, together with potentially expanded civil liability, is a better solution to AI crime.

Notes
Ryan Abbott
Book:

The Reasonable Robot

Published online:

15 June 2020

Print publication:

25 June 2020, pp 145-153
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

2 - Features, Combined: Normalization, Discretization and Outliers
from Part One - Fundamentals
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 34-58
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses Feature Engineering techniques that look holistically at the feature set, therefore replacing or enhancing the features based on their relation to the whole set of instances and features. Techniques such as normalization, scaling, dealing with outliers and generating descriptive features are covered. Scaling and normalization are the most common, it involves finding the maximum and minimum and changing the values to ensure they will lie in a given interval (e.g., [0, 1] or [−1, 1]). Discretization and binning involve, for example, analyzing a feature that is an integer (any number from -1 trillion to +1 trillion) and realize that it only takes the values 0, 1 and 10 so it can be simplified into a symbolic feature with three values (value0, value1 and value10). Descriptive features is the gathering of information that talks about the shape of the data, the discussion centres around using tables of counts (histograms) and general descriptive features such as maximum, minimum and averages. Outlier detection and treatment refers to looking at the feature values across many instances and realizing some values might present themselves very far from the rest.

4 - Artificial Inventors
Ryan Abbott
Book:

The Reasonable Robot

Published online:

15 June 2020

Print publication:

25 June 2020, pp 71-91
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

AI is generating patentable inventions without a person involved who qualifies as an inventor. Yet, there are no rules about whether such an invention could be patented, who or what could qualify as an inventor, and who could own the patents. There are laws that require inventors be natural persons, but they predate inventive AI and were never intended to prohibit patents. AI-generated inventions should be patentable because this will incentivize the development of inventive AI and result in more benefits for everyone. When an AI invents, it should be listed as an inventor because listing a person would be unfair to legitimate inventors. Finally, an AI’s owner should own any patents on its output in the same way that people own other types of machine output. The chapter proceeds to address a host of challenges that would result from AI inventorship, ranging from ownership of AI-generated inventions and displacement of human inventors to the need for consumer protection policies.

1 - Introduction
from Part One - Fundamentals
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 3-33
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter starts with basic definitions such as types of machine learning (supervised vs. unsupervised learning, classifiers vs. regressors), types of features (binary, categorical, discrete, continuos), metrics (precision, recall, f-measure, accuracy, overfitting), and raw data and then defines the machine learning cycle and the feature engineering cycle. The feature engineering cycle hinges on two types of analysis: exploratory data analysis, at the beginning of the cycle and error analysis at the end of each feature engineering cycle. Domain modelling and feature construction concludes the chapter with particular emphasis on feature ideation techniques.

Third-Party Materials
Ryan Abbott
Book:

The Reasonable Robot

Published online:

15 June 2020

Print publication:

25 June 2020, pp 144-144
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Some material in this book was adapted from the author’s prior works, including two coauthored works, reprinted with permission.

7 - Time stamped Data
from Part II - Case Studies
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 163-185
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Building on the dataset presented in the previous chapter, this chapter explores using historical information (as represented by previous DBpedia versions) to perform feature engineering using historical features.Working with historical data makes all Feature Engineering complex but the whole concept of “truth,” the immutablity of the target class is challenged. Great feature for a particular class have to become acceptable features for a different class. Topics covered include imputing timestamped data, lagged features and moving window averaging of the data. Due to unavailability of population data for cities, a second dataset revolving around countries is introduced to perform population prediction using time series ARIMA models over 50 years of data, as provided by the world bank. The chapter exemplifies different methods to blend machine learning with time series models, including using their output as another feature or training a model to predict their errors.

Part II - Case Studies
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 137-138
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Image Data
from Part II - Case Studies
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 212-232
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The last chapter using the dataset from chapter 6, this time the dataset is expanded using 80 thousand satellite images obtained from NASA for relative humidity density as it relates to vegetation. This image information is not visible information and thus pre-trained models are not useful for this problem. Instead, traditional computer vision techniques are showcased, involving histograms, local feature extraction, gradients and histograms of gradients. This domain exemplifies how to deal with high level of nuisance noise, how to normalize your features so that a small changes in the way the information was acquired does not completely throws off the underlining machine learning mechanism. These takeaways are valuable for anybody working with sensor data where the acquisition process has a high level of uncertainty. The domain also exemplifies working with large number of low-level sensor data.

4 - Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings
from Part One - Fundamentals
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 79-111
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter presents a staple of Feature Engineering: the automatic reduction of features, either by direction selection or by projection to a smaller feature space.Central to Feature Engineering are efforts to reduce the number of features, as uninformative features bloat the ML model with unnecessary parameters. In turn, too many parameters then either produces suboptimal results, as they are easy to overfit, or require large amounts of training data. These efforts are either by explicitly dropping certain features (feature selection) or mapping the feature vector, if it is sparse, into a lower, denser dimension (dimensionality reduction). There are also cover some algorithms that perform feature selection as part of their inner computation (embedded feature selection or regularization). Feature selection takes the spotlight within Feature Engineering due to its intrinsic utility for Error Analysis. Some techniques such as feature ablation using wrapper methods are used as the starting step before a feature drill down. Moreover, as feature selection helps build understandable models, it intertwines with Error Analysis as the analysis profits from such understandable models.

10 - Other Domains: Video, GIS and Preferences
from Part II - Case Studies
Pablo Duboue
Book:

The Art of Feature Engineering

Published online:

29 May 2020

Print publication:

25 June 2020, pp 233-245
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter showcases three domains different from the other domains to highlight particular techniques and problems missing from the previous chapters. Each domain uses a small dataset assembled just for the case study and undergoes only one featurization process. The first domain is video domain, where a mouse pointer tracking problem on a video is studied to showcase two key issues in feature engineering for video: speed of processing and reusability of results from a previous feature extraction. The second domain is geographical information systems, where bird migration data is studied to showcase the value of key points as physical summaries of paths and bi-dimensional clustering of points using R-trees. Finally, preference data is studied via movie preferences to showcase large scale imputation of missing features.

Hamiltonicity in random directed graphs is born resilient
Part of
- Graph theory
Richard Montgomery
Journal:

Combinatorics, Probability and Computing / Volume 29 / Issue 6 / November 2020

Published online by Cambridge University Press:

24 June 2020, pp. 900-942
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Let $\{D_M\}_{M\geq 0}$ be the n-vertex random directed graph process, where $D_0$ is the empty directed graph on n vertices, and subsequent directed graphs in the sequence are obtained by the addition of a new directed edge uniformly at random. For each $$\varepsilon> 0$$, we show that, almost surely, any directed graph $D_M$ with minimum in- and out-degree at least 1 is not only Hamiltonian (as shown by Frieze), but remains Hamiltonian when edges are removed, as long as at most $1/2-\varepsilon$ of both the in- and out-edges incident to each vertex are removed. We say such a directed graph is $(1/2-\varepsilon)$-resiliently Hamiltonian. Furthermore, for each $\varepsilon > 0$, we show that, almost surely, each directed graph $D_M$ in the sequence is not $(1/2+\varepsilon)$-resiliently Hamiltonian.
This improves a result of Ferber, Nenadov, Noever, Peter and Škorić who showed, for each $\varepsilon > 0$, that the binomial random directed graph $D(n,p)$ is almost surely $(1/2-\varepsilon)$-resiliently Hamiltonian if $p=\omega(\log^8n/n)$.

Towards the Kohayakawa–Kreuter conjecture on asymmetric Ramsey properties
Part of
- Graph theory
- Extremal combinatorics
Frank Mousset, Rajko Nenadov, Wojciech Samotij
Journal:

Combinatorics, Probability and Computing / Volume 29 / Issue 6 / November 2020

Published online by Cambridge University Press:

24 June 2020, pp. 943-955
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
For fixed graphs F1,…,Fr, we prove an upper bound on the threshold function for the property that G(n, p) → (F1,…,Fr). This establishes the 1-statement of a conjecture of Kohayakawa and Kreuter.

Effects of parity, sympathy and reciprocity in increasing social welfare
Part of
- Adaptive Learning Agents 2018
Sandip Sen, Chad Crawford, Adam Dees, Rachna Nanda Kumar, James Hale
Journal:

The Knowledge Engineering Review / Volume 35 / 2020

Published online by Cambridge University Press:

23 June 2020, e31
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
We are interested in understanding how socially desirable traits like sympathy, reciprocity, and fairness can survive in environments that include aggressive and exploitative agents. Social scientists have long theorized about ingrained motivational factors as explanations for departures from self-seeking behaviors by human subjects. Some of these factors, namely reciprocity, have also been studied extensively in the context of agent systems as tools for promoting cooperation and improving social welfare in stable societies. In this paper, we evaluate how other factors like sympathy and parity can be used by agents to seek out cooperation possibilities while avoiding exploitation traps in more dynamic societies. We evaluate the relative effectiveness of agents influenced by different social considerations when they can change who they interact with in their environment using both an experimental framework and a predictive analysis. Such rewiring of social networks not only allows possibly vulnerable agents to avoid exploitation but also allows them to form gainful coalitions to leverage mutually beneficial cooperation, thereby significantly increasing social welfare.

NASH EQUILIBRIUM STRATEGIES REVISITED IN SOFTWARE RELEASE GAMES
Yasuhiro Saito, Tadashi Dohi
Journal:

Probability in the Engineering and Informational Sciences / Volume 36 / Issue 1 / January 2022

Published online by Cambridge University Press:

23 June 2020, pp. 105-125
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A software release game was formulated by Zeephongsekul and Chiera [Zeephongsekul, P. & Chiera, C. (1995). Optimal software release policy based on a two-person game of timing. Journal of Applied Probability 32: 470–481] and was reconsidered by Dohi et al. [Dohi, T., Teraoka, Y., & Osaki, S. (2000). Software release games. Journal of Optimization Theory and Applications 105(2): 325–346] in a framework of two-person nonzero-sum games. In this paper, we further point out the faults in the above literature and revisit the Nash equilibrium strategies in the software release games from the viewpoints of both silent and noisy type of games. It is shown that the Nash equilibrium strategies in the silent and noisy of software release games exist under some parametric conditions.

Supersaturation of even linear cycles in linear hypergraphs
Part of
- Extremal combinatorics
- Graph theory
Tao Jiang, Liana Yepremyan
Journal:

Combinatorics, Probability and Computing / Volume 29 / Issue 5 / September 2020

Published online by Cambridge University Press:

23 June 2020, pp. 698-721
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
A classical result of Erdős and, independently, of Bondy and Simonovits [3] says that the maximum number of edges in an n-vertex graph not containing C2k, the cycle of length 2k, is O(n1+1/k). Simonovits established a corresponding supersaturation result for C2k’s, showing that there exist positive constants C,c depending only on k such that every n-vertex graph G with e(G)⩾ Cn1+1/k contains at least c(e(G)/v(G))2k copies of C2k, this number of copies tightly achieved by the random graph (up to a multiplicative constant).
In this paper we extend Simonovits' result to a supersaturation result of r-uniform linear cycles of even length in r-uniform linear hypergraphs. Our proof is self-contained and includes the r = 2 case. As an auxiliary tool, we develop a reduction lemma from general host graphs to almost-regular host graphs that can be used for other supersaturation problems, and may therefore be of independent interest.

Computer Science

Refine search

Refine search

Actions for selected content:

48453 results in Computer Science

Preface

8 - Textual Data

Summary

3 - Features, Expanded: Computable Features, Imputation and Kernels

Summary

Contents

6 - Punishing Artificial Intelligence

Summary

Notes

2 - Features, Combined: Normalization, Discretization and Outliers

Summary

4 - Artificial Inventors

Summary

1 - Introduction

Summary

Third-Party Materials

Summary

7 - Time stamped Data

Summary

Part II - Case Studies

9 - Image Data

Summary

4 - Features, Reduced: Feature Selection, Dimensionality Reduction and Embeddings

Summary

10 - Other Domains: Video, GIS and Preferences

Summary

Hamiltonicity in random directed graphs is born resilient

Towards the Kohayakawa–Kreuter conjecture on asymmetric Ramsey properties

Effects of parity, sympathy and reciprocity in increasing social welfare

NASH EQUILIBRIUM STRATEGIES REVISITED IN SOFTWARE RELEASE GAMES

Supersaturation of even linear cycles in linear hypergraphs

Computer Science

Refine search

Refine search

Actions for selected content:

Save Search

48453 results in Computer Science

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary