To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Information and Communication Technologies (ICTs) have profoundly altered many aspects of life, including the nature of entertainment, work, communication, education, health care, industrial production and business, social relations and conflicts. As a consequence, they have had a radical and widespread impact on our moral lives and hence on contemporary ethical debates. Consider the following list: PAPA (privacy, accuracy, intellectual property and access); ‘the triple A’ (availability, accessibility and accuracy of information); ownership and piracy; the digital divide; infoglut and research ethics; safety, reliability and trustworthiness of complex systems; viruses, hacking and other forms of digital vandalism; freedom of expression and censorship; pornography; monitoring and surveillance; security and secrecy; propaganda; identity theft; the construction of the self; panmnemonic issues and personal identity; new forms of agency (artificial and hybrid), of responsibility and accountability; roboethics and the moral status of artificial agents; e-conflicts; the re-prioritization of values and virtues…these are only some of the pressing issues that characterize the ethical discourse in our information societies. They are the subject of information and computer ethics (ICE), a new branch of applied ethics that investigates the transformations brought about by ICTs and their implications for the future of human life and society, for the evolution of moral values and rights, and for the evaluation of agents' behaviours.
Since the seventies, ICE has been a standard topic in many curricula. In recent years, there has been a flourishing of new university courses, international conferences, workshops, professional organizations, specialized publications and research centres.
This chapter examines some metaphysical assumptions of Aristotle and Wiener that can be seen as philosophical roots of today's information and computer ethics. It briefly describes Floridi's new 'macroethics', which he calls information ethics to distinguish Floridi's 'macroethics' from the general field of information ethics that includes, for example, agent ethics, computer ethics, Internet ethics, journalism ethics, library ethics, bioengineering ethics, neurotechnology ethics, etc. In his book, Cybernetics: or Control and Communication in the Animal and the Machine, Wiener viewed animals and computerized machines as cybernetic entities. Beginning in 1950, with the publication of The Human Use of Human Beings, Wiener assumed that cybernetic machines will join humans as active participants in society. In Moor's computer ethics theory, respect for 'core values' is a central aspect of his 'just consequentialism' theory of justice, as well as his influential analysis of human privacy.
This chapter explores the philosophical theories that attempt to provide guidance when determining the best way to balance the rights of the individual user of Information and Computer Technologies (ICTs) with the legitimate public concerns of the societies impacted by these technologies. It looks at the philosophical justifications for free speech and privacy, as these play out in the use of ICTs where anonymous speech can exacerbate the conflicts regarding pornography, hate speech, privacy and political dissent. The Principle of Harm can be used to distinguish hate speech from the heated and vigorous debates that race and religion can both engender. The Gossip 2.0 phenomenon is a place where free speech and privacy collide head-on. This is antisocial networking, where the dark side of social interaction is leveraged by ICTs to gain a force and audience far beyond the restroom walls where this kind of communication is typically found.
Although natural language processing has come far, the technology has not achieved a major impact on society. Is this because of some fundamental limitation that cannot be overcome? Or because there has not been enough time to refine and apply theoretical work already done? Editors Madeleine Bates and Ralph Weischedel believe it is neither; they feel that several critical issues have never been adequately addressed in either theoretical or applied work, and they have invited capable researchers in the field to do that in Challenges in Natural Language Processing. This volume will be of interest to researchers of computational linguistics in academic and non-academic settings and to graduate students in computational linguistics, artificial intelligence and linguistics.
This is a collection of new papers by leading researchers on natural language parsing. In the past, the problem of how people parse the sentences they hear - determine the identity of the words in these sentences and group these words into larger units - has been addressed in very different ways by experimental psychologists, by theoretical linguists, and by researchers in artificial intelligence, with little apparent relationship among the solutions proposed by each group. However, because of important advances in all these disciplines, research on parsing in each of these fields now seems to have something significant to contribute to the others, as this volume demonstrates. The volume includes some papers applying the results of experimental psychological studies of parsing to linguistic theory, others which present computational models of parsing, and a mathematical linguistics paper on tree-adjoining grammars and parsing.
First published in 1993, this thesis is concerned with the design of efficient algorithms for listing combinatorial structures. The research described here gives some answers to the following questions: which families of combinatorial structures have fast computer algorithms for listing their members? What general methods are useful for listing combinatorial structures? How can these be applied to those families which are of interest to theoretical computer scientists and combinatorialists? Amongst those families considered are unlabelled graphs, first order one properties, Hamiltonian graphs, graphs with cliques of specified order, and k-colourable graphs. Some related work is also included, which compares the listing problem with the difficulty of solving the existence problem, the construction problem, the random sampling problem, and the counting problem. In particular, the difficulty of evaluating Pólya's cycle polynomial is demonstrated.
Semantic interpretation and the resolution of ambiguity presents an important advance in computer understanding of natural language. While parsing techniques have been greatly improved in recent years, the approach to semantics has generally improved in recent years, the approach to semantics has generally been ad hoc and had little theoretical basis. Graeme Hirst offers a new, theoretically motivated foundation for conceptual analysis by computer, and shows how this framework facilitates the resolution of lexical and syntactic ambiguities. His approach is interdisciplinary, drawing on research in computational linguistics, artificial intelligence, montague semantics, and cognitive psychology.
Computer vision is a rapidly growing field which aims to make computers 'see' as effectively as humans. In this book Dr Shapiro presents a new computer vision framework for interpreting time-varying imagery. This is an important task, since movement reveals valuable information about the environment. The fully-automated system operates on long, monocular image sequences containing multiple, independently-moving objects, and demonstrates the practical feasibility of recovering scene structure and motion in a bottom-up fashion. Real and synthetic examples are given throughout, with particular emphasis on image coding applications. Novel theory is derived in the context of the affine camera, a generalisation of the familiar scaled orthographic model. Analysis proceeds by tracking 'corner features' through successive frames and grouping the resulting trajectories into rigid objects using new clustering and outlier rejection techniques. The three-dimensional motion parameters are then computed via 'affine epipolar geometry', and 'affine structure' is used to generate alternative views of the object and fill in partial views. The use of all available features (over multiple frames) and the incorporation of statistical noise properties substantially improves existing algorithms, giving greater reliability and reduced noise sensitivity.
Coping with uncertainty is a necessary part of ordinary life and is crucial to an understanding of how the mind works. For example, it is a vital element in developing artificial intelligence that will not be undermined by its own rigidities. There have been many approaches to the problem of uncertain inference, ranging from probability to inductive logic to nonmonotonic logic. Thisbook seeks to provide a clear exposition of these approaches within a unified framework. The principal market for the book will be students and professionals in philosophy, computer science, and AI. Among the special features of the book are a chapter on evidential probability, which has not received a basic exposition before; chapters on nonmonotonic reasoning and theory replacement, matters rarely addressed in standard philosophical texts; and chapters on Mill's methods and statistical inference that cover material sorely lacking in the usual treatments of AI and computer science.
Abstract In this paper we apply various clustering algorithms to the dialect pronunciation data. At the same time we propose several evaluation techniques that should be used in order to deal with the instability of the clustering techniques. The results have shown that three hierarchical clustering algorithms are not suitable for the data we are working with. The rest of the tested algorithms have successfully detected two-way split of the data into the Eastern and Western dialects. At the aggregate level that we used in this research, no further division of sites can be asserted with high confidence.
INTRODUCTION
Dialectometry is a multidisciplinary field that uses various quantitative methods in the analysis of dialect data. Very often those techniques include classification algorithms such as hierarchical clustering algorithms used to detect groups within certain dialect area. Although known for their instability (Jain and Dubes, 1988), clustering algorithms are often applied without evaluation (Goebl, 2007; Nerbonne and Siedle, 2005) or with only partial evaluation (Moisl and Jones, 2005). Very small differences in the input data can produce substantially different grouping of dialects (Nerbonne et al., 2008). Without proper evaluation, it is very hard to determine if the results of the applied clustering technique are an artifact of the algorithm or the detection of real groups in the data.
The aim of this paper is to evaluate algorithms used to detect groups among language dialect varieties measured at the aggregate level. The data used in this research is dialect pronunciation data that consists of various pronunciations of 156 words collected all over Bulgaria.
Abstract In this experimental study, we aim to arrive at a global picture of the mutual intelligibility of various Dutch language varieties by carrying out a computer-controlled lexical decision task in which ten target varieties are evaluated – the Belgian and Netherlandic Dutch standard language as well as four regional varieties of both countries. We auditorily presented real as well as pseudo-words in various varieties of Dutch to Netherlandic and Belgian test subjects, who were asked to decide as quickly as possible whether the items were existing Dutch words or not. The experiment's working assumption is that the faster the subjects react, the better the intelligibility of (the language variety of) the word concerned.
INTRODUCTION
Research framework
When speakers of different languages or language varieties communicate with each other, one group (generally the economically and culturally weaker one) often switches to the language or language variety of the other, or both groups of speakers adopt a third, common lingua franca. However, if the languages or language varieties are so much alike that the degree of mutual comprehension is sufficiently high, both groups of speakers might opt for communicating in their own language variety.
This type of interaction between closely related language varieties, which Haugen (1966) coins semicommunication and Braunmüller and Zeevaert (2001) refer to as receptive multilingualism, has been investigated between speakers of native Indian languages in the United States (Pierce 1952), between Spaniards and Portuguese (Jensen, 1989), between speakers of Scandinavian languages (Zeevaert, 2004; Gooskens, 2006; Lars-Olof Delsing, 2007) and between Slovaks and Czechs (Budovičová, 1987).
Abstract In this paper we relate linguistic, geographic and social distances to each other in order to get a better understanding of the impact the Dutch-German state border has had on the linguistic characteristics of a sub-area of the Kleverlandish dialect area. This area used to be a perfect dialect continuum. We test three models for explaining today's pattern of linguistic variation in the area. In each model another variable is used as the determinant of linguistic variation: geographic distance (continuum model), the state border (gap model) and social distance (social model). For the social model we use perceptual data for friends, relatives and shopping locations. Testing the three models reveals that nowadays the dialect variation in the research area is closely related to the existence of the state border and to the social structure of the area. The geographic spatial configuration hardly plays a role anymore.
INTRODUCTION
The Dutch-German state border south of the river Rhine was established in 1830. Before that time, the administrative borders in this region frequently changed. The Kleverlandish dialect area, which extends from Duisburg in Germany to Nijmegen in The Netherlands, crosses the state border south of the Rhine. The area is demarcated by the Uerdingen line in the south, the diphthongisation line of the West Germanic ‘i’ in the West, and the border with the Low Saxon dialects of the Achterhoek area in the North-East. The geographic details of the area can be found in Figure 1 (the state border is depicted with a dashed-dotted line).
Abstract In this paper the role of concept characteristics in lexical dialectometric research is examined in three consecutive logical steps. First, a regression analysis of data taken from a large lexical database of Limburgish dialects in Belgium and The Netherlands is conducted to illustrate that concept characteristics such as concept salience, concept vagueness and negative affect contribute to the lexical heterogeneity in the dialect data. Next, it is shown that the relationship between concept characteristics and lexical heterogeneity influences the results of conventional lexical dialectometric measurements. Finally, a dialectometric procedure is proposed which downplays this undesired influence, thus making it possible to obtain a clearer picture of the ‘truly’ regional variation. More specifically, a lexical dialectometric method is proposed in which concept characteristics form the basis of a weighting schema that determines to which extent concept specific dissimilarities can contribute to the aggregate dissimilarities between locations.
BACKGROUND AND RESEARCH QUESTIONS
An important assumption underlying most if not all methods of dialectometry is that the automated analysis of the differences in language use between different locations, as they are recorded by dialectologists in large scale surveys, can reveal patterns which directly reflect regional variation. In this paper, in which we focus on lexical variation, we want to address one factor, viz. concept characteristics, which we will claim complicates this picture.
The argumentation which underlies our claim consists of three consecutive logical steps. As a first step, we analyse data taken from a large lexical database of Limburgish dialects in Belgium and The Netherlands, in which we more particularly zoom in on the names for concepts in the field of ‘the human body’.
This is the report of a panel discussion held in connection with the special session on computational methods in dialectology at Methods XIII: Methods in Dialectology on 5 August, 2008 at the University of Leeds. We scheduled this panel discussion in order to reflect on what the introduction of computational methods has meant to our subfield of linguistics, dialectology (in alternative divisions of linguistic subfields also known as variationist linguistics), and whether the dialectologists' experience is typical of such introductions in other humanities studies. Let's emphasise that we approach the question as working scientists and scholars in the humanities rather than as methodology experts or as historians or philosophers of science, i.e. we wished to reflect on how the introduction of computational methods has gone in our own field in order to conduct our own future research more effectively, or alternatively, to suggest to colleagues in neighbouring disciplines which aspects of computational studies have been successful, which have not been, and which might have been introduced more effectively. Since we explicitly wished to reflect not only on how things have gone in dialectology, but also to compare our experiences to others, we invited panellists with broad experience in linguistics and other fields.
We introduce the chair and panellists briefly.
John Nerbonne chaired the panel discussion. He works on dialectology, but also on grammar, and on applications such as language learning and information extraction and information access. He works in Groningen, and is past president of the Association for Computational Linguistics (2002).
Abstract In this study 91 local Swedish dialects were analysed based on vowel pronunciation. Acoustic measurements of vowel quality were made for 18 vowels of 1,014 speakers by means of principal component analysis of vowel spectra. Two principal components were extracted explaining more than ¾ of the total variance in the vowel spectra. Plotting vowels in the PC1-PC2 plane showed a solution with strong resemblance to vowels in a formant plane. Per location averages of all speakers were calculated and factor analysis was run with the 91 locations as data cases and the two acoustic component of the 18 words as variables. Nine factors were extracted corresponding to distinct geographic distribution patterns. The factor scores of the analysis revealed co-occurrence of a number of linguistic features.
INTRODUCTION
The traditional method of identifying dialect areas has been the so-called isogloss method, where researchers choose some linguistic features that they find representative for the dialect areas and draw lines on maps based on different realisations of these features. One problem with the isogloss method is that isoglosses rarely coincide, and a second is that the choice of linguistic features is subjective and depends on what the researcher chooses to emphasise. Dialectometric research has been trying to avoid these problems by aggregating over large data sets and using more objective data-driven methods when determining dialect areas (Séguy, 1973; Goebl, 1982; Heeringa, 2004; Nerbonne, 2009).
Abstract Levenshtein distance, also known as string edit distance, has been shown to correlate strongly with both perceived distance and intelligibility in various Indo-European languages (Gooskens and Heeringa, 2004; Gooskens, 2006). We apply Levenshtein distance to dialect data from Bai (Allen, 2004), a Sino-Tibetan language, and Hongshuihe (HSH) Zhuang (Castro and Hansen, accepted), a Tai language. In applying Levenshtein distance to languages with contour tone systems, we ask the following questions: 1) How much variation in intelligibility can tone alone explain? and 2) Which representation of tone results in the Levenshtein distance that shows the strongest correlation with intelligibility test results? This research evaluates six representations of tone: onset, contour and offset; onset and contour only; contour and offset only; target approximation (Xu & Wang, 2001), autosegments of H and L, and Chao's (1930) pitch numbers. For both languages, the more fully explicit onset-contouroffset and onset-contour representations showed significantly stronger inverse correlations with intelligibility. This suggests that, for cross-dialectal listeners, the optimal representation of tone in Levenshtein distance should be at a phonetically explicit level and include information on both onset and contour.
INTRODUCTION
The Levenshtein distance algorithm measures the phonetic distance between closely related language varieties by counting the cost of transforming the phonetic segment string of one cognate into another by means of insertions, deletions and substitutions. After Kessler (1995) first applied the algorithm to dialect data in Irish Gaelic, Heeringa (2004) showed that cluster analysis based on Levenshtein distances agreed remarkably with expert consensus on Dutch dialect groupings.
We are pleased to launch the first of several special issues designed to highlight cutting-edge research, methods, applications, literature, and websites in key fields of humanities and arts computing. The current double issue on variationist linguistics and computational humanities is an exemplar of what we hope to accomplish, especially in shortening the time it takes for important papers to move from initial presentation to publication. Under the guest editorship of John Nerbonne, Professor of Humanities Computing, Charlotte Gooskens, Associate Professor of Scandinavian Languages and Literature, both at the University of Groningen, The Netherlands, Sebastian Kürschner, Tenure track position (‘Juniorprofessur’) in variationist linguistics and language contact at the University of Erlangen-Nürnberg, Germany, and Renée van Bezooijen, Researcher at the University of Groningen, The Netherlands. This issue also introduces a roundtable discussion that we intend to become a regular feature of these special editions. The aim of the forum is to assess contributions to the field and link them to the broader interests of humanities and arts computing, as well as to highlight opportunities for connection and research within and among disciplines.
Over the next year, we will publish two additional thematic issues. Volume 3.1 will focus on humanities GIS. The past decade has witnessed an explosion of interest in the application of geo-spatial technologies to history, literature, and other arts and humanities disciplines. The special issue will highlight leading presentations from an August 2008 conference at the University of Essex and will include two new features – book reviews and website/tool reviews.
In this paper we ask two questions, which superficially seem to ask the same thing but in actual fact do not. First, we ask to what degree two languages (or language varieties) A and B resemble each other. The second question is how well a listener of variety B understands a speaker of variety A.
When we ask to what degree two language varieties resemble one another, or how different they are (which is basically the same question), it should be clear that the answer cannot be expressed in a single number. Languages differ from each other not in just one dimension but in a great many respects. They may differ in their sound inventories, in the details of the sounds in the inventory, in their stress, tone and intonation systems, in their vocabularies, and in the way they build words from morphemes and sentences from words. Last, but not least, they may differ in the meanings they attach to the forms in the language, in so far as the forms in two languages may be related to each other. In order to express the distance between two languages, we need a weighted average of the component distances along each of the dimensions identified (and probably many more). So, linguistic distance is a multidimensional phenomenon and we have no a priori way of weighing the dimensions.
The answer to the question how well listener B understands speaker A can be expressed as a single number. If listener B does not understand speaker A at all, the number would be zero. If listener B gets every detail of speaker A's intentions, the score would be maximal.