To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Let’s say you just had a great dining experience at a new restaurant that opened down the street. Or you just saw the worst movie of your life. Many people with these experiences turn to social media to talk about their experiences and share their opinions (we discuss why people do this in Chapters 2–4). Some may write a lengthy review on their blog. Others may write shorter reviews and post to a review site (like Yelp or Rotten Tomatoes). And still others may choose to engage in a lengthy back-and-forth discussion about the merits and pitfalls of the experience in an online discussion forum.
Expressing opinions in this way is not new behavior. In the past, we referred to this as word-of-mouth behavior. Neighbors talking to neighbors about their new cars. Co-workers having conversations around the water cooler about a new computer or the latest events unfolding in their favorite television programs. But there are two fundamental differences between offline word-of-mouth activity and online conversations occurring in social media.
In the previous chapter, we focused on opinion formation and the various factors that influence that process. In this chapter and the next, we look at what happens after you have formed an opinion. Do you share it with others? And, if so, why do you share it and what specifically do you say?
By “opinion expression,” we refer to an individual’s decision to communicate his or her opinions to others. This decision stage serves as a filter between an individual’s underlying opinions and the sharing of those opinions. As with any filter, not everything that encounters it is going to pass through. An individual may decide not to share any opinion at all or to voice a modified (or moderated) version of his or her actual opinion. In other words, the opinion expression decision can be broken down into two components: (1) whether to share an opinion and (2) what opinion to share.
Imagine you are chatting with some new friends, co-workers, or neighbors whom you have met fairly recently. In such unfamiliar situations, most people err on the side of caution and steer the conversation to safe topics like the weather, local restaurants, new movies , or weekend plans. That way, you don’t run much risk of inadvertently offending the people whom you just met by expressing a potentially contentious opinion.
We present research aiming to build tools for the normalization of User-Generated Content (UGC). We argue that processing this type of text requires the revisiting of the initial steps of Natural Language Processing, since UGC (micro-blog, blog, and, generally, Web 2.0 user-generated texts) presents a number of nonstandard communicative and linguistic characteristics – often closer to oral and colloquial language than to edited text. We present a corpus of UGC text in Spanish from three different sources: Twitter, consumer reviews, and blogs, and describe its main characteristics. We motivate the need for UGC text normalization by analyzing the problems found when processing this type of text through a conventional language processing pipeline, particularly in the tasks of lemmatization and morphosyntactic tagging. Our aim with this paper is to seize the power of already existing spell and grammar correction engines and endow them with automatic normalization capabilities in order to pave the way for the application of standard Natural Language Processing tools to typical UGC text. Particularly, we propose a strategy for automatically normalizing UGC by adding a module on top of a pre-existing spell-checker that selects the most plausible correction from an unranked list of candidates provided by the spell-checker. To build this selector module we train four language models, each one containing a different type of linguistic information in a trade-off with its generalization capabilities. Our experiments show that the models trained on truecase and lowercase word forms are more discriminative than the others at selecting the best candidate. We have also experimented with a parametrized combination of the models by both optimizing directly on the selection task and doing a linear interpolation of the models. The resulting parametrized combinations obtain results close to the best performing model but do not improve on those results, as measured on the test set. The precision of the selector module in ranking number one the expected correction proposal on the test corpora reaches 82.5% for Twitter text (baseline 57%) and 88% for non-Twitter text (baseline 64%).
Before we jump into how organizations can build their social media intelligence capabilities, we first need to understand the science of opinions. Behind every social media comment posted is a person with an opinion. However, not everyone with an opinion chooses to share that opinion online. The opinions we see posted to social media are an outcome of a two-stage process. In the first stage, we form our opinions (opinion formation). Then, in the second stage, we share our opinions with others (opinion expression). However, we don’t share all of our opinions. Instead, the opinions that we ultimately post online are those that somehow merit sharing. In other words, the opinion expression stage can be thought of as an opinion filter that allows some of our opinions to pass through to be posted on social media while our other opinions remain unshared (see Figure 2.1).
To illustrate the distinction between opinion formation and expression, here’s an offline example that most of us are familiar with: the Monday morning quarterback. We all know one, and chances are we know more than one. The Monday morning quarterback has well-developed opinions about Sunday’s football games and wants to share those opinions with anyone willing to listen: Which plays should have been run. Which players should have been substituted. Which calls were blown by the officials and should have been challenged.
The preferential attachment network with fitness is a dynamic random graph model. New vertices are introduced consecutively and a new vertex is attached to an old vertex with probability proportional to the degree of the old one multiplied by a random fitness. We concentrate on the typical behaviour of the graph by calculating the fitness distribution of a vertex chosen proportional to its degree. For a particular variant of the model, this analysis was first carried out by Borgs, Chayes, Daskalakis and Roch. However, we present a new method, which is robust in the sense that it does not depend on the exact specification of the attachment law. In particular, we show that a peculiar phenomenon, referred to as Bose–Einstein condensation, can be observed in a wide variety of models. Finally, we also compute the joint degree and fitness distribution of a uniformly chosen vertex.
Learner corpora have become prominent in language teaching and learning, enhancing data-driven learning (DDL) pedagogy by promoting ‘learning driven data’ in the classroom. This study explores the potential of a local learner corpus by investigating the effects of two types of DDL activities, one relying on a native-speaker corpus (NSC) and the second combining native-speaker and learner corpora. Both types of activities aimed at improving second language writers’ knowledge of linking adverbials and were based on a preliminary analysis of adverbial use in the local learner corpus produced by 31 study participants. Quantitative and qualitative data, obtained from writing samples, pre/post-tests, and questionnaires, were converged through concurrent triangulation. The results showed an increase in frequency, diversity and accuracy in all participants’ use of adverbials, but more significant improvement was made by the students who were exposed to the corpus containing their own writing. The findings of this study are thus interpreted as suggestive that combining learner and native-speaker data is a feasible and effective practice, which can be readily integrated in DDL-based instruction with positive impact.
We give some negative results on the expressiveness of naïve set theory (NS) in LP and in the four variants of minimally inconsistent LP defined in Crabbé (2011): ${\rm{L}}{{\rm{P}}_m},{\rm{L}}{{\rm{P}}_ = },{\rm{L}}{{\rm{P}}_ \subseteq }$, and ${\rm{L}}{{\rm{P}}_ \supseteq }$. We show that NS in LP cannot prove the existence of sets that behave like singleton sets, Cartesian pairs, or infinitely ascending linear orders. We show that NS is close to trivial in ${\rm{L}}{{\rm{P}}_m}$ and ${\rm{L}}{{\rm{P}}_ \subseteq }$, in the sense that its only minimally inconsistent model is a one-element model. We show that NS in ${\rm{L}}{{\rm{P}}_ = }$ and ${\rm{L}}{{\rm{P}}_ \supseteq }$ has the same limitations we give for NS in LP.
One of the many new features of English language learners’ dictionaries derived from the technological developments that have taken place over recent decades is the presence of corpus-based examples to illustrate the use of words in context. However, empirical studies have generally not been able to produce conclusive evidence about their actual worth. In Frankenberg-Garcia (2012a), I argued that these studies – and indeed learners’ dictionaries themselves – do not distinguish sufficiently between examples meant to aid language comprehension and examples that focus on enhancing language production. The present study reports on an experiment with secondary school students carried out to test the usefulness of separate corpus examples for comprehension and production. The results support the need for different types of examples for comprehension and production, and provide evidence in support of data-driven learning, particularly if learners have access to more than one example.
This study examines the role of guided induction as an instructional approach in paper-based data-driven learning (DDL) in the context of an ESL grammar course during an intensive English program at an American public university. Specifically, it examines whether corpus-informed grammar instruction is more effective through inductive, data-driven learning or through traditional deductive instruction. In the study, 49 participants completed two weeks of ESL grammar instruction on the passive voice in English. The learners participated in one of three instructional treatments: a data-driven learning treatment, a deductive instructional treatment using corpus-informed teaching materials, and a deductive instructional treatment using traditional (i.e., non-corpus-informed) materials. Results from pre-test, post-test, and delayed post-test indicated that the DDL group significantly improved their grammar ability with the passive voice, while the other two treatment groups did not show significant gains. The findings from this study suggest that in this learning context there are measurable benefits to teaching ESL grammar inductively using paper-based DDL.
Although Boolean Constraint Technology has made tremendous progress over the last decade, the efficacy of state-of-the-art solvers is known to vary considerably across different types of problem instances, and is known to depend strongly on algorithm parameters. This problem was addressed by means of a simple, yet effective approach using handmade, uniform, and unordered schedules of multiple solvers in ppfolio, which showed very impressive performance in the 2011 Satisfiability Testing (SAT) Competition. Inspired by this, we take advantage of the modeling and solving capacities of Answer Set Programming (ASP) to automatically determine more refined, that is, nonuniform and ordered solver schedules from the existing benchmarking data. We begin by formulating the determination of such schedules as multi-criteria optimization problems and provide corresponding ASP encodings. The resulting encodings are easily customizable for different settings, and the computation of optimum schedules can mostly be done in the blink of an eye, even when dealing with large runtime data sets stemming from many solvers on hundreds to thousands of instances. Also, the fact that our approach can be customized easily enabled us to swiftly adapt it to generate parallel schedules for multi-processor machines.
A crucial part of the contemporary interest in logicism in the philosophy of mathematics resides in its idea that arithmetical knowledge may be based on logical knowledge. Here, an implementation of this idea is considered that holds that knowledge of arithmetical principles may be based on two things: (i) knowledge of logical principles and (ii) knowledge that the arithmetical principles are representable in the logical principles. The notions of representation considered here are related to theory-based and structure-based notions of representation from contemporary mathematical logic. It is argued that the theory-based versions of such logicism are either too liberal (the plethora problem) or are committed to intuitively incorrect closure conditions (the consistency problem). Structure-based versions must on the other hand respond to a charge of begging the question (the circularity problem) or explain how one may have a knowledge of structure in advance of a knowledge of axioms (the signature problem). This discussion is significant because it gives us a better idea of what a notion of representation must look like if it is to aid in realizing some of the traditional epistemic aims of logicism in the philosophy of mathematics.
Corpora have been suggested as valuable sources for teaching English for academic purposes (EAP). Since previous studies have mainly focused on corpus use in classroom settings, more research is needed to reveal how students react to using corpora on their own and what should be provided to help them become autonomous corpus users, considering that their ultimate goal is to be independent scholars and writers. In the present study, conducted in an engineering lab at a Korean university over 22 weeks, data on students’ experiences and evaluations of consulting general and specialized corpora for academic writing were collected and analyzed. The findings show that, while both corpora served the participants well as reference sources, the specialized corpus was particularly valued for its direct help in academic writing because, as non-native English-speaking graduate engineering students, the participants wanted to follow the writing conventions of their discourse community. The participants also showed disparate attitudes toward the time taken for corpus consultation due to differences in factors such as academic experience, search purposes, and writing tasks. The article concludes with several suggestions for better corpus use with EAP students regarding the compilation of a corpus, corpus training, corpus competence, and academic writing.
Corpus linguistics has established that language is highly patterned. The use of patterned language has been linked to processing advantages with respect to listening and reading, which has implications for perceptions of fluency. The last twenty years has seen an increase in the integration of corpus-based language learning, or data-driven learning (DDL), as a supporting feature in teaching English as a foreign / second language (EFL/ESL). Most research has investigated student attitudes towards DDL as a tool to facilitate writing. Other studies, though notably fewer, have taken a quantitative perspective of the efficacy of DDL as a tool to facilitate the inductive learning of grammar rules. The purpose of this study is three-fold: (1) to present an EFL curriculum designed around DDL with the aim of improving spoken fluency; (2) to gauge how effective students were in employing newly discovered phrases in an appropriate manner; and (3) to investigate student attitudes toward such an approach to language learning. Student attitudes were investigated via a questionnaire and then triangulated through interviews and student logs. The findings indicate that students believe DDL to be a useful and effective tool in the classroom. However, students do note some difficulties related to DDL, such as encountering unfamiliar vocabulary and cut-off concordance lines. Finally, questions are raised as to the students’ ability to embed learned phrases in a pragmatically appropriate way.
Stream processing has been a very active and diverse area of research, commercial, and open-source development. This diversity of technologies has brought along new terminology, concepts, and infrastructure necessary for designing and implementing sophisticated applications.
We start this chapter by describing some of the application domains where stream processing technologies have been successfully employed (Section 2.2), focusing on the distinctive characteristics of these applications that make them suitable for the use of stream processing technologies.
These application scenarios allow us to illustrate the motivating requirements that led to the development of multiple information flow processing systems, a class that groups multiple technical approaches to continuous data processing (Section 2.3). We discuss some of its broad subcategories, including active databases, Continuous Query (CQ) systems, publish–subscribe systems, and Complex Event Processing (CEP) systems. All of them are precursors that have helped shape the stream processing paradigm.
We then switch the focus to the conceptual foundations and the architectural support behind the stream processing technology, and the applications it supports (Section 2.4). We also include an overview of analytics, i.e., the algorithms and knowledge discovery techniques, that form the basis of most innovative Stream Processing Applications (SPAs) and a historical perspective on the research that led to the development of several Stream Processing Systems (SPSs).
Finally we include a survey of academic, open-source, and commercially available SPS implementations, and describe their different characteristics.