To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
This chapter summarizes the motivation of this book and its central messages. It also includes individual essays from each author presenting their point of view on the core issues and challenges.
This chapter’s first three sections drill down into the Analysis Rubric’s "understandability” element, focusing on the challenges that occur when applications must explain how a conclusion was reached, show causal relationships, or provide results that can be reproduced by others. Given the immense influence of data science findings, the chapter concludes with a discussion on the great care that is needed to communicate data science findings without being misleading.
This chapter first defines data science, its primary objectives, and several related terms. It continues by describing the evolution of data science from the fields of statistics, operations research, and computing. The chapter concludes with historical notes on the emergence of data science and related topics.
This chapter focuses on the challenges that may occur when applying statistical, machine learning, and operations research techniques to new data science problems. It contains discussions of theoretical, inductive bias and scale-related challenges.
We present and discuss a runtime architecture that integrates sensorial data and classifiers with a logic-based decision-making system in the context of an e-Health system for the rehabilitation of children with neuromotor disorders. In this application, children perform a rehabilitation task in the form of games. The main aim of the system is to derive a set of parameters the child’s current level of cognitive and behavioral performance (e.g., engagement, attention, task accuracy) from the available sensors and classifiers (e.g., eye trackers, motion sensors, emotion recognition techniques) and take decisions accordingly. These decisions are typically aimed at improving the child’s performance by triggering appropriate re-engagement stimuli when their attention is low, by changing the game or making it more difficult when the child is losing interest in the task as it is too easy. Alongside state-of-the-art techniques for emotion recognition and head pose estimation, we use a runtime variant of a probabilistic and epistemic logic programming dialect of the Event Calculus, known as the Epistemic Probabilistic Event Calculus. In particular, the probabilistic component of this symbolic framework allows for a natural interface with the machine learning techniques. We overview the architecture and its components, and show some of its characteristics through a discussion of a running example and experiments.
Over the past two decades, society has seen incredible advances in digital technology, resulting in the wide availability of cheap and easy-to-use software for creating highly sophisticated fake visual content. This democratisation of creating such content, paired with the ease of sharing it via social media, means that ill-intended fake images and videos pose a significant threat to society. To minimise this threat, it is necessary to be able to distinguish between real and fake content; to date, however, human perceptual research indicates that people have an extremely limited ability to do so. Generally, computational techniques fair better in these tasks, yet remain imperfect. What's more, this challenge is best considered as an arms race – as scientists improve detection techniques, fraudsters find novel ways to deceive. We believe that it is crucial to continue to raise awareness of the visual forgeries afforded by new technology and to examine both human and computational ability to sort the real from the fake. In this article, we outline three considerations for how society deals with future technological developments that aim to help secure the benefits of that technology while minimising its possible threats. We hope these considerations will encourage interdisciplinary discussion and collaboration that ultimately goes some way to limit the proliferation of harmful content and help to restore trust online.
Digital image correlation (DIC) techniques were used to evaluate strain distributions along tensile gage lengths immediately after yielding of a medium manganese steel (7 wt% Mn) in samples cold rolled in the range of 1–6 pct. With an increase in cold work, DIC confirmed that the yielding behavior transitioned from nucleation and propagation of a single localized deformation zone (Lüders band) to uniform deformation, that is, no evidence of strain localization. At intermediate amounts of cold work, a unique yielding behavior was evident where the initially-low positive strain hardening rate increased with tensile strain until conventional strain hardening (i.e., decrease in strain hardening rate with strain). The intermediate yielding behavior was associated with the development of multiple non‑propagating regions of strain localization, an observation not previously evident without the use of DIC.
This paper investigates how socio-spatial aspects of creativity, operationalized as the causal relations between the built environment and perceived creativity in university campuses’ public spaces, are currently applied in practice. Moreover, it discusses practitioners’ perceptions regarding research-generated evidence on socio-spatial aspects of creativity according to three effectiveness aspects: credibility, relevance, and applicability. The “research-generated evidence” is herein derived from data-driven knowledge generated by multi-disciplinary methodologies (e.g., self-reported perceptions, participatory tools, geospatial analysis, observations). Through a thematic analysis of interviews with practitioners involved in the (re)development of campuses public spaces of inner-city campuses and science parks in Amsterdam, Utrecht, and Groningen. We concluded that socio-spatial aspects of creativity concepts were addressed only at the decision-making level for Utrecht Science Park. Correspondingly, while presented evidence was considered by most practitioners as relevant for practice, perceptions of credibility and applicability vary according to institutional goals, practitioners’ habits in practice, and their involvement in projects’ roles and phases. The newfound interrelationships between the three effectiveness aspects highlighted (a) the institutional fragmentation issues in campuses and public spaces projects, (b) the research-practice gap related to such projects, which occur beyond the university campuses’ context, and (c) insights on the relationship between evidence generated through research-based data-driven knowledge and urban planning practice, policy, and governance related to knowledge environments. We concluded that if research-generated evidence on socio-spatial aspects of creativity is to be integrated into the evidence-based practice of campuses’ public spaces, an alignment between researchers, multiple actors involved, policy framing, and goal achievements are fundamental.
Let $V$ be a finite-dimensional vector space over $\mathbb{F}_p$. We say that a multilinear form $\alpha \colon V^k \to \mathbb{F}_p$ in $k$ variables is $d$-approximately symmetric if the partition rank of difference $\alpha (x_1, \ldots, x_k) - \alpha (x_{\pi (1)}, \ldots, x_{\pi (k)})$ is at most $d$ for every permutation $\pi \in \textrm{Sym}_k$. In a work concerning the inverse theorem for the Gowers uniformity $\|\!\cdot\! \|_{\mathsf{U}^4}$ norm in the case of low characteristic, Tidor conjectured that any $d$-approximately symmetric multilinear form $\alpha \colon V^k \to \mathbb{F}_p$ differs from a symmetric multilinear form by a multilinear form of partition rank at most $O_{p,k,d}(1)$ and proved this conjecture in the case of trilinear forms. In this paper, somewhat surprisingly, we show that this conjecture is false. In fact, we show that approximately symmetric forms can be quite far from the symmetric ones, by constructing a multilinear form $\alpha \colon \mathbb{F}_2^n \times \mathbb{F}_2^n \times \mathbb{F}_2^n \times \mathbb{F}_2^n \to \mathbb{F}_2$ which is 3-approximately symmetric, while the difference between $\alpha$ and any symmetric multilinear form is of partition rank at least $\Omega (\sqrt [3]{n})$.
Let $A \subseteq \{0,1\}^n$ be a set of size $2^{n-1}$, and let $\phi \,:\, \{0,1\}^{n-1} \to A$ be a bijection. We define the average stretch of $\phi$ as
where the expectation is taken over uniformly random $x,x' \in \{0,1\}^{n-1}$ that differ in exactly one coordinate.
In this paper, we continue the line of research studying mappings on the discrete hypercube with small average stretch. We prove the following results.
For any set $A \subseteq \{0,1\}^n$ of density $1/2$ there exists a bijection $\phi _A \,:\, \{0,1\}^{n-1} \to A$ such that ${\sf avgStretch}(\phi _A) = O\left(\sqrt{n}\right)$.
For $n = 3^k$ let ${A_{\textsf{rec-maj}}} = \{x \in \{0,1\}^n \,:\,{\textsf{rec-maj}}(x) = 1\}$, where ${\textsf{rec-maj}} \,:\, \{0,1\}^n \to \{0,1\}$ is the function recursive majority of 3’s. There exists a bijection $\phi _{{\textsf{rec-maj}}} \,:\, \{0,1\}^{n-1} \to{A_{\textsf{rec-maj}}}$ such that ${\sf avgStretch}(\phi _{{\textsf{rec-maj}}}) = O(1)$.
Let ${A_{{\sf tribes}}} = \{x \in \{0,1\}^n \,:\,{\sf tribes}(x) = 1\}$. There exists a bijection $\phi _{{\sf tribes}} \,:\, \{0,1\}^{n-1} \to{A_{{\sf tribes}}}$ such that ${\sf avgStretch}(\phi _{{\sf tribes}}) = O(\!\log (n))$.
These results answer the questions raised by Benjamini, Cohen, and Shinkar (Isr. J. Math 2016).
Labeled graphs are particularly well adapted to represent objects in the context of topology-based geometric modeling. Thus, graph transformation theory is used to implement modeling operations and check their consistency. This article defines a class of graph transformation rules dedicated to embedding computations. Objects are here defined as a particular subclass of labeled graphs in which arc labels encode their topological structure (i.e., cell subdivision: vertex, edge, face) and node labels encode their embedding (i.e., relevant data: vertex positions, face colors, volume density). Object consistency is defined by labeling constraints which must be preserved by modeling operations that modify topology and/or embedding. Dedicated graph transformation variables allow us to access the existing embedding from the underlying topological structure (e.g., collecting all the points of a face) in order to compute the new embedding using user-provided functions (e.g., compute the barycenter of several points). To ensure the safety of the defined operations, we provide syntactic conditions on rules that preserve the object consistency constraints.
Modelling the distributional semantics of such a morphologically rich language as Arabic needs to take into account its introflexive, fusional, and inflectional nature attributes that make up its combinatorial sequences and substitutional paradigms. To evaluate such word distributional models, the benchmarks that have been used thus far in Arabic have mimicked those in English. This paper reports on a benchmark that we designed to reflect linguistic patterns in both Contemporary Arabic and Classical Arabic, the first being a cover term for written and spoken Modern Standard Arabic, while the second for pre-modern Arabic. The analogy items we included in this benchmark are chosen in a transparent manner such that they would capture the major features of nouns and verbs; derivational and inflectional morphology; high-, middle-, and low-frequency patterns and lexical items; and morphosemantic, morphosyntactic, and semantic dimensions of the language. All categories included in this benchmark are carefully selected to ensure proper representation of the language. The benchmark consists of 45 roots of the trilateral, all-consonantal, and semivowel-inclusive types; six morphosemantic patterns (’af‘ala; ifta‘ala; infa‘ala; istaf‘ala; tafa‘‘ala; and tafā‘ala); five derivations (the verbal noun, active participle, and the contrasts in Masculine-Feminine; Feminine-Singular-Plural; Masculine-Singular-Plural); and morphosyntactic transformations (perfect and imperfect verbs conjugated for all pronouns); and lexical semantics (synonyms, antonyms, and hyponyms of nouns, verbs, and adjectives), as well as capital cities and currencies. All categories include an equal proportion of high-, medium-, and low-frequency items. For the purpose of validating the proposed benchmark, we developed a set of embedding models from different textual sources. Then, we tested them intrinsically using the proposed benchmark and extrinsically using two natural language processing tasks: Arabic Named Entity Recognition and Text Classification. The evaluation leads to the conclusion that the proposed benchmark is truly reflective of this morphologically rich language and discriminatory of word embeddings.
Canopy habitats challenge researchers with their intrinsically difficult access. The current scarcity of climatic data from forest canopies limits our understanding of the conditions and environmental variability of these diverse and dynamic habitats. We present 307 days of climate records collected between 2019 and 2020 in the tropical rainforest canopy of the Yasuní National Park, Ecuador. We monitored climate with a 10-min temporal resolution in the middle crowns of eight canopy trees. The distance between canopy climate stations ranged from 700 m to 10 km. Apart from air temperature, relative humidity, leaf wetness, and photosynthetically active radiation (PAR), measured in each canopy climate station, global radiation, rainfall, and wind speed were measured in different subsets of them. We processed the eight data series to omit erroneous records resulting from sensor failures or lack of the solar-based power supply. In addition to the eight original data series, we present three derived data series, two aggregating canopy climate for valleys or for ridges (from four stations each), and one overall average (from the eight stations). This last derived data series contains 306 days, while the shortest of the original data series covers 22 days and the longest 296 days. In addition to the data, two open-source tools, developed in RStudio, are presented that facilitate data visualization (a dashboard) and data exploration (a filtering app) of the original and aggregated records.