Data and evidence

András Kertész; Csilla Rákosi

Part III Data and evidence

12 The problem (P)_III

In Part II, we argued for the hypothesis (SP)_II which says that the p-model enables us to define the concepts of ‘data’ and ‘evidence’. Part III will be devoted to the realisation of (SP)_II in that we will show how the two notions which the book centres on – namely, ‘data’ and ‘evidence’ – can be defined with the help of the p-model.

Therefore, this part of the book has to solve the following problem:

(P)_III How does the p‐model define the concepts of ‘data' and ‘evidence'?

In Chapter 13, we show how constitutive features of linguistic data and evidence can be identified. In Chapter 14, we will summarise the solution to (P)_III.

In discussing what notion of ‘data’ and ‘evidence’ follows from the tenets of the p-model, our starting point will be that the standard view of linguistic data and evidence (SVLD) interprets data as statements capturing experiences directly, whose reliability is secured by general methodological rules. According to this view, from the set of data one can select a special subset called ‘evidence’, whose certainty is guaranteed by experience and intersubjective testability. Evidence is supposed to provide a firm base for the testing of theories. However, this view is untenable for several reasons:

(46)
1. (a) It has become generally acknowledged in the philosophy of science that experience by itself cannot guarantee the truth of a statement, and it only leads to fallible judgements affected by subjective factors.
2. (b) Although intersubjective testability reduces subjectivity immanent in individual experience as a source of knowledge, it does not eliminate it completely. If n persons evaluate a phenomenon in the same way, it does not follow that the n+1th person will also agree. This means that the criterion of intersubjectivity is built on induction (that is, on a type of plausible inference), since it infers from a finite number of cases to an infinite number of cases.
3. (c) Phenomena require interpretation. The object of one's experience has to be described with the help of a category system, that is, a theoretical apparatus. Consequently, it is not directly the experience to which one compares the hypotheses of the theory, but statements gained by processing the observed phenomena with the help of the conceptual apparatus of the theory. The thesis of the theory‐ladenness of observation refers to this finding.¹

13 The concept of ‘datum’ and ‘evidence’

13.1 Overview

As already mentioned in the previous chapter, the task of the present chapter is to provide definitions of the central notions ‘datum’ and ‘evidence’ with the help of the p-model. By ‘datum’, we will mean a statement with a positive plausibility value originating from some direct source. Following this, three different kinds of ‘evidence’ will be obtained from the p-model: weak, strong and relative evidence.

13.2 The concept of ‘datum’

In accordance with the p-model, data cannot be restricted to their information content, but we have to assign them a structure consisting of two components. They consist of a statement capturing an information content and a plausibility value (see Rescher Reference Rescher1979: 69):

(47) A datum is a statement with a positive plausibility value originating from some direct source.

In the sense of (47), one cannot state that data are in all cases true with certainty. Usually, they can be considered to be only more or less reliable ‘truth-candidates’, which are supported by their source to a certain extent, without being made by them true with certainty (Rescher Reference Rescher1976: 8, Reference Rescher1977a: 213, Reference Rescher1987: 307).

It is important to remark that in the sense of (47), the source on the basis of which the plausibility value is assigned to a datum is not necessarily identical to the entity usually referred to as ‘data source’, because the former is often more complex. This complexity is due to the circumstance that in linguistic theories, data usually contain more information than can be gained from the corpus, intuitive judgements, responses of participants in experiments etc., because they involve the results of a certain amount of linguistic analysis as well. If a datum is an ‘example’ which says only that a sentence or string of words etc. belongs to a given language or that it can be found in a corpus, then the p-context has to be extended by another datum which captures the linguistic structure of this sentence. Without this, the ‘example’ cannot be involved in the process of linguistic theorising.

Motivation and background

Nevertheless, it is very important to emphasise that this otherwise rather open characterisation of the notion of datum is by no means unconstrained. Thus, we do not consider any piece of information to be a datum. The plausibility of data is initially determined by the reliability of their source. The more reliable the source is on the basis of which a plausibility value is assigned to a datum, the higher initial plausibility value one can associate it with. A datum must possess a certain degree of plausibility when it comes into contact with other hypotheses of the theory. On this, we agree with Rescher completely:

We do not intend the conception of datahood to ‘open the floodgates’ in an indiscriminate way. Not everything is a datum: the concept is to have some logico-epistemological bite…Data are…not created as equal. All of them have some degree of positive plausibility in the epistemic circumstances at issue, but need certainly not be the same degree. (Rescher Reference Rescher1976: 10; emphasis as in the original)

In the sense of (47), hypotheses obtained not by inference but originating from some linguistic paper count as data as well. The source of such data is the reference to the linguistic paper at issue. At the same time, if one gives a detailed account of a psycholinguistic experiment instead of only summarising its outcome, then one is not endowed with data (receiving their plausibility value from some direct source). Instead, statements summarising the results of the experiment obtain their plausibility value from indirect sources, namely, by plausible inferences presented in the experimental report. In such cases, those statements count as data in the sense of (47) which serve as the starting points of the presented argumentation (systems of inductive generalisations, analogical inferences) such as authenticated and interpreted perceptual data etc. For more on this, see Chapter 16 and Section 19.6.

Examples 30(1)--(4)

(1) Statements capturing introspective judgements based on the linguistic intuition of native speakers are data. What counts as a datum in the sense of (47) is not the linguistic expression or utterance whose grammaticality is judged, but the statement expressing the judgement about the grammaticality of this expression or utterance. Accordingly, the statement
1. (a) The sentence Wer_i meint Lydia, liebt t_i Jakob? is correct.
or the statement
1. (b) The sentence Wer_i meint Lydia, liebt t_i Jakob? is more correct than the sentence Wer_i meint Lydia, dass t_i Jakob liebt?.
are data if the judgements are based on the linguistic intuition (introspection) of the linguist X or on an experiment carried out on a totality of 100 informants. Nevertheless, neither the sentence
1. (c) Wer_i meint Lydia, liebt t_i Jakob?
can be considered to be a datum, nor do (a) or (b), if the latter's plausibility stems from an indirect source, that is, if they are inferred from hypotheses of the theory.¹

We interpret judgements as plausible statements. This means that we do not deem them true with certainty. This decision is motivated by the problems raised in Schütze (Reference Schütze1996) which centres around the complexity and the resulting uncertainty of judgements. The reason for this is, among other things, that non-linguist native speakers do not judge grammaticality directly, but rather they provide acceptability judgements. Acceptability judgements, however, are products of linguistic performance (see Schütze Reference Schütze1996: 24). Therefore, even if one evaluates the statement “Sentence X is unacceptable” to be true with certainty on the basis of the judgement of person Y, it cannot be stated that the statement “Sentence X is grammatically not correct” is also true with certainty, because the unacceptability may result not only from the contribution of linguistic competence but may be traced back to some other factor.

Linguists often evaluate the grammaticality of sentences in a direct way when providing ‘introspective data’ for their theory. The term ‘(un)grammatical’, however, refers not only to their intuitive judgement about the correctness of the given sentence but inevitably presupposes some theory of grammar, too, which serves as a reference point against which violations of syntactic rules can be evaluated (and other kinds of violations of the rules of the language can be excluded). Thus, such statements are data, too, but the source on the basis of which they receive their plausibility value is more complex and cannot be reduced to the linguist's linguistic intuition alone.
(2) An example from Featherston (Reference Featherston2007: 275ff.) illustrates impressively the uncertainty of grammaticality judgements and the problems which arise from ignoring this uncertainty. Several syntacticians agreed on the grammaticality judgement according to which the sentences (a)--(c) are not better than (d):
1. (a) Wen_i meint Lydia, liebt Jakob t_i?
  
  whom thinks Lydia loves Jakob t_i
  
  ‘Who does Lydia think Jakob loves?’
2. (b) Wer_i meint Lydia, liebt t_i Jakob?
  
  who thinks Lydia loves t_i Jakob
  
  ‘Who does Lydia think loves Jakob?’
3. (c) Wen_i meint Lydia, dass Jakob t_i liebt?
  
  whom thinks Lydia that Jakob t_i loves
  
  ‘Who does Lydia think that Jakob loves?’
4. (d) Wer_i meint Lydia, dass t_i Jakob liebt?
  
  who thinks Lydia that t_i Jakob loves
  
  ‘Who does Lydia think that loves Jakob?’
As Featherston remarks, Haider, Grewendorf, Stechow & Sternefeld, Bayer and Lutz presuppose that native speakers judge (a)--(d) similarly. Therefore, on the basis of their linguistic intuition, they regarded the statement
1. (A) The sentences (a)--(d) are equally grammatical.
as a certainly true datum. From this they concluded that A clearly and perfectly refutes the following hypothesis:
1. (T) standard German, there is a that-trace effect. That is, subjects and objects behave differently in complement clauses with a complementiser insofar as only the latter can be extracted.
Featherston refers to a paper by Grewendorf (see Grewendorf Reference Grewendorf, Jakobs, von Stechow, Sternefeld and Vennemann1995) where he writes some years later that in standard German, A is a datum that is true with certainty, while this is not the case in North German dialects. Conversely, Fanselow (Reference Fanselow1987) holds that A cannot be treated as a datum in South German.

Featherston's experiments (see Featherston Reference Featherston2007: 276f.), however, yielded the result that A is not true with certainty – what is more, it is not even a plausible statement. Therefore, it cannot be treated as a datum. The participants of the experiments judged (a) and (b) to be better than (c) or (d). Consequently, on the basis of this experiment as a source one has to judge A to be implausible. Therefore, A cannot support the non-existence of the that-trace effect as part of an indirect source. Quite the contrary seems to be the case: Featherston's experiments serve with data in favour of T, since they can be interpreted as sources making A′ plausible:
1. (A′) Sentences (a)--(b) are better than (c)--(d).
(3) Corpus data are statements capturing characteristics of utterances in some corpus; thus, they do not necessarily have the same structure. They may, for example, have the form ‘The utterance U containing the linguistic phenomenon P can be found in corpus C.’, or ‘The structure S can be identified in corpus C and it had the characteristics X’, etc. They often have the form ‘The sentence in Example x is contained in Corpus C’ and occur together with further data such as ‘The sentence in Example x contains the occurrence of the construction S’. They have to be interpreted as data in the above sense, that is, as plausible statements originating from a direct source. This source, however, is not the corpus alone. Rather, it is a compound of (sub-)sources such as the historical document itself, the linguist's linguistic intuition and his/her skills in the application of some linguistic theory, etc. The reliability of each of these sub-sources influences the plausibility of the datum. Therefore, corpus data usually cannot be considered to be certainly true statements. It is not only the case that factors deciding over the genuineness of the historical document are relevant; difficulties or uncertainties in the identification and interpretation of the given linguistic phenomenon must be taken into account as well. There are several reasons for this:
1. (a) Both the occurrence of the investigated structure in the corpus (‘positive evidence’) and the lack of its occurrence (‘negative evidence’)² raise serious problems, which are unsolvable or at least difficult to handle with the methodologies at the researchers’ disposal:
  1. It is often difficult to decide how to differentiate relevant occurrences from insignificant, isolated, unreliable, dubious information.
  2. Linguists inevitably have to fall back on their own linguistic intuition when processing corpora. The interpretation of the utterances in the corpus, the identification of utterances containing the given linguistic phenomenon (syntactic structure, semantic or pragmatic phenomenon etc.) and the selection of the reliable examples require the active involvement of linguistic intuition; therefore, these operations contain subjective elements as well. For example, if a cognitive linguist searches for metaphors in a corpus, then he/she cannot rely on observable, ‘objective’ criteria. Therefore, he/she has to decide which expressions he/she is willing to consider – at least partly – metaphorical on the basis of his/her own linguistic intuition.
  3. The absence of a certain form or construction does not necessarily mean that this form/construction is faulty. It may happen that the phenomenon at issue appears rarely, and it can be found in a larger corpus or in a corpus covering another linguistic register.
2. (b) ‘Quantitative data’ indicate that a given phenomenon's occurrence is statistically significant. Of course, the identification of the occurrences raises the problems mentioned in (a), too. The reliability of such examples also depends on the range of data which their relative frequency is compared to, as well as on the statistical method applied. Thus, for example, the representativeness of the corpus is a factor that has to be checked in order to obtain reliable results.
3. (c) In connection with questionnaires, manifestations of the ‘observer's paradox’ (see Section 4.2) lead to uncertainty as well.

Meurers (Reference Meurers2007) enumerates several factors evincing that the use of corpus data presupposes the involvement of the linguist's linguistic intuition. Therefore, such data cannot be deemed certainly true statements:

The linguist has to decide which sentences of the corpus are relevant to a particular research problem. He/she has to select and reconstruct data. This requires the analysis of the sentences in the corpus with the help of the given theory and the identification of the syntactical, pragmatic etc. parameters.
One needs to decide whether a piece of information from the corpus is part of the language one is studying. This means that the normal, grammatical sentences have to be identified.
Corpora are finite representations of language use. Therefore, it is possible that they do not contain information relevant to the actual research problem.

(4) Adherents of the standard view of linguistic data and evidence (SVLD) (see Section 3.3) – followers of both the introspective and the corpus-based tradition – do not use statements related to ungrammatical (deviant) sentences produced during spontaneous language use as data. They simply presuppose that such sentences can be easily identified and do not contain any relevant information; therefore, they are useless and have to be dismissed from linguistic research. In contrast, Foster (Reference Foster2007) regards them as valuable and reliable (although not certainly true) data. She argues that

naturalistic ungrammatical sentences are of interest to linguists studying language production, language loss and language learning. (Foster Reference Foster2007: 73)

She points out that in order to create computer parsers that are capable of parsing ‘real’ sentences stemming from language production (and can eventually create these and similar sentences), one has to make a distinction not only between grammatical and ungrammatical sentences but between sentences which contain frequently and routinely committed, ‘realistic’ errors and which can be easily understood on the one hand, and sentences which contain invented, unrealistic mistakes on the other hand. Foster is of the opinion that this dual task can be accomplished only by relying on one's linguistic intuition. From this it follows that one has to deal with data which are not true with certainty but are inevitably affected by subjective factors, that is, with uncertain data:

Of course the decision over whether a sentence contains an error is a subjective one because it depends on a person's opinion about what constitutes an error but I would argue that some level of subjectivity is unavoidable in any linguistic task whether that be applying a grammaticality judgement or building a treebank. (Foster Reference Foster2007: 80)

We may conclude from Foster's argumentation that the standard view of linguistic data and evidence (SVLD) is built on a paradox: corpora can be regarded as perfectly reliable, objectively controllable sources only after their unreliable, faulty elements have been eliminated – and this can be done only by relying on our subjective opinion.

To summarise these considerations, the following characteristics of data can be highlighted:

(48)
1. (a) In linguistic theories, data are, according to (47), statements that are supported by some direct source at one's disposal but which are often not true with certainty. Rather, they are only plausible, that is, truth‐candidates.
2. (b) The initial plausibility of data is based on the weighing of the reliability of their source at the beginning of the argumentation process or cycle. This may, however, dynamically change at later stages of the cyclic and prismatic argumentation process.
3. (c) Data are theory‐dependent (p‐context dependent). First, from different manifestations of linguistic behaviour one can obtain statements capturing characteristics of these manifestations with the help of the conceptual apparatus of a theory. Second, different theories (plausible argumentation processes) may judge the plausibility of statements differently.
4. (d) Data are ‘given’ in the sense that – since they receive a plausibility value from direct sources – their initial plausibility is judged not with the help of inferences constructed within the given argumentation process but directly on the basis of the reliability of their source.
5. (e) If the reliability of a data source is called into question, then the usability of this source as well as the plausibility of the statements originating from it have to be re‐evaluated. This means that information concerning the reliability of the source and the relationship between the source and the statements stemming from it have to be integrated into the argumentation process. In this way the data stemming from this source will lose their data status (but not necessarily their plausibility).
6. (f) Data supply the theory with plausibility values. These values can be used to determine the plausibility of other statements with the help of plausible inferences.

Examples 31(1)--(2)

(1) The p-context-dependence (theory-dependence) of data and the well-motivatedness of interpreting data as plausible statements rather than as instances or products of linguistic behaviour can be nicely illustrated with the help of the analysis of different interpretations of the annotations and degrees of ‘badness’ as summarised by Schütze (Reference Schütze1996: 44ff.):
1. (a) In the generative literature, ungrammatical sentences are usually marked by asterisks. This, however, leaves open several possibilities of interpretation. According to Householder (Reference Householder1973: 370ff.), for example, ‘*X’ may be interpreted as
  1. ‘I would never say X’;
  2. ‘I have never seen or heard a sentence of the type of X and hereby wager you can't find an example’;
  3. ‘This is quite comprehensible, and I have heard people say it, but they were all K’s [foreigners, Southerners, etc.]; in my dialect we would say Y instead.’
2. (b) The use of the question mark is ambiguous, too, since ‘?X’ may indicate that
  1. X shows interspeaker variation, i.e. X is good for some people but bad for others;
  2. most speakers rate X marginal;
  3. X has a very literary ring, but it is not unacceptable.
  Schütze cites Andrews (Reference Andrews, Maling and Zaenen1990: 203) who developed a system of symbols in order to make the meaning of the markings used explicit. This system involves a six-point scale:
  
  ✓:
  completely acceptable and natural;
  
  ?:
  acceptable, but perhaps somewhat unnatural;
  
  ??:
  doubtful, but perhaps acceptable;
  
  ?*:
  worse, but not totally unacceptable;
  
  *:
  thoroughly unacceptable;
  
  **:
  horrible.
The application of this system means that it is not the linguistic examples themselves that one treats as data but rather statements about their acceptability such as ‘Sentence X is doubtful, but perhaps acceptable’ on the basis of the linguistic intuition of speaker Y as a direct source. Since the linguistic intuition of (single) native speakers cannot be regarded as a completely reliable source, one obtains plausible statements which may serve as data in the theory at issue. These data are theory-dependent, because they involve elements of the above system, which is, of course, only one of the possible scales which have been elaborated and can be applied in collecting and evaluating judgements from informants. In this way, acceptability judgements become integrated into the process of linguistic theorising, and they may function as starting points for the estimation of the plausibility value of hypotheses about the grammaticalness of the given sentence or structure. Namely, the degree of acceptability of a sentence cannot be equated with the degree of its grammaticalness, because the former is also determined by semantic, pragmatic, and general cognitive factors. Consequently, the plausibility of ‘X is ungrammatical’ is lower than the plausibility of ‘X is unacceptable’ on the basis of the linguistic intuition of a native speaker, and the plausibility of the statement ‘Structure Y is ungrammatical/bad’ is even lower. In such cases, the source on the basis of which the statements ‘X is ungrammatical’ and ‘Structure Y is ungrammatical/bad’ are evaluated is not direct, because these statements are inferred from acceptance judgement data and other information about other possible reasons of badness; therefore, in such p-contexts they are not data.
(2) Another aspect of the theory-dependence (p-context dependence) of data is that it may happen that a datum is evaluated to be true with certainty in one theory (p-context) but it may be deemed to be implausible (and, of course, not regarded as a datum) in another. Schütze (Reference Schütze1996: 38f.) mentions several such cases in order to illustrate the unreliability of unsystematically collected judgement data.

One of his examples is the case of the alleged that-trace effect of adjunct wh-words. Lasnik & Saito (Reference Lasnik and Saito1984) deemed sentences like Why do you think that he left? ambiguous because in their opinion, why may relate to the reason for the thinking or to the reason for the leaving. In the terminology of the p-model this means that in this p-context (theory), the statement ‘The sentence Why do you think that he left? is ambiguous’ is a datum which is true with certainty on the basis of the two linguists’ linguistic intuition as a direct source. In the p-context (theory) of Aoun et al. (Reference Aoun, Hornstein, Lightfoot and Weinberg1987), however, similar statements are held to be implausible and not treated as data on the basis of two indirect sources. The first indirect source (system of inferences) refers to the uncertainty and inaccuracy of the judgement of informants who claim that they have two readings of such sentences. The second indirect source (system of inferences) bases the refusal to accept the existence of a second reading on the argument that if such sentences were grammatical, then certain similar sentences such as Who remembers what we bought why? should be acceptable, too, which is, however, not the case.

Since the sources used by the two theories do not judge the plausibility/implausibility of the statements at issue unequivocally, they cannot be regarded as a set of sources on the basis of which a plausibility value could be assigned to the statements at issue. Schütze (Reference Schütze1996: 39) tries to re-evaluate the reliability of the sources mentioned. He comes to the conclusion that all sources used by the authors are unreliable. Since the judgements provided by the speakers he has surveyed are strongly divided, he concludes that without carefully conducted experiments it is not possible to make a decision. Therefore, the statement ‘The sentence Why do you think that he left? is ambiguous’ loses its data status because it has lost its plausibility value.

13.3 The concept of ‘evidence’

The p-model makes the introduction of several concepts of ‘evidence’ possible. Thus, it provides us with tools that enable us to grasp the relationship between data and other hypotheses of the theory in a subtle way. As a first approximation, evidence is a datum whose function is to contribute to the judgement and comparison of the plausibility of rival hypotheses, that is, incompatible (contradictory or contrary) hypotheses (see Section 10.5). We distinguish between three types of evidence.

Weak evidence for a hypothesis h simply means that one can build inference(s) on the given datum that make(s) h plausible (in the extreme case true with certainty). Weak evidence against a hypothesis h can be defined in a similar way. It means a datum on which one can build inference(s) that make(s) h implausible (in the extreme case false with certainty):

(49)
1. (a) A datum e is weak evidence for hypothesis h, if the p‐context contains statements that extend e into an indirect source on the basis of which a positive plausibility value can be assigned to h.
2. (b) A datum e is weak evidence against hypothesis h, if the p‐context contains statements that extend e into an indirect source on the basis of which a positive plausibility value can be assigned to ∼h.

Weak evidence for h is a datum that is suitable to assign a positive plausibility value to h as part of an indirect source, thereby partially supporting its truth. At the same time, however, (49) allows that e serves as weak evidence for a rival of h as well. Therefore, its use may lead to informational overdetermination.

Examples 32(1)--(2)

(1) Wurzel (Reference Wurzel and Heidolph1981) aims to decide between the following rival hypotheses concerning the structure of German affricates:
1. (h₁) The labial affricate is biphonemic, that is, its phoneme structure is /pf/.
2. (h₂) The labial affricate is monophonemic, that is, its phoneme structure is /p^f/.
The datum e₁ is weak evidence for hypothesis h₁:
1. (e₁) Pfropfen is an existing German word,
since with the help of e₁ as well as the following two plausible statements an indirect source can be created that makes h₁ plausible:
1. (p₁) If a vowel /V/ follows a formative‐initial /C₃C₂C₁/ consonant cluster, then the consonant cluster /C₂C₁/ and /C₁/ also occur in the same position.
2. (p₂) There exist German formatives with the structure /frɔ_ / and /rɔ_ /.
With the help of the statements mentioned, we gain the following plausible inferences:

It is easy to see that, on similar grounds, e₁ is weak evidence for h₁’s rival h₂, too.
(2) As we have already seen in Example 30(4), adherents of the standard view of linguistic data and evidence (SVLD) do not consider statements about ungrammatical sentences which have been obtained from spontaneous language use as relevant data. The main reason for this lies in their conviction that sentences of this kind result from the linguistic performance of the native speaker; therefore, it is not possible to infer properties of linguistic competence from them. As opposed to this, Arppe & Järvikivi (Reference Arppe and Järvikivi2007) – similarly to Foster (Reference Foster2007) – argue that data of this type, too, are relevant insofar as they may serve as (weak) evidence for the grammaticality of linguistic constructions:
Slips of the Tongue, i.e., output errors that are produced by normal native speakers unintentionally and spontaneously (and sometimes unconsciously) are another well-documented case in point. These errors are not random, but instead follow the way the language system is organised…On many occasions, the speaker in fact corrects the error – therefore even the corrections can tell us quite a lot about how our linguistic system is organised…(Arppe & Järvikivi Reference Arppe and Järvikivi2007: 101f.; emphasis as in the original)

Relative evidence stipulates stricter requirements:

(50)
1. (a) A datum e is relative evidence for hypothesis h, if
  1. (i) e is weak evidence for hypothesis h;
  2. (ii) the inference(s) connecting the premises and h allow(s) us to assign a higher plausibility value to h than the plausibility value of h’s rivals assigned by inferences also using e as a premise.
2. (b) A datum e is relative evidence against hypothesis h, if
  1. (i) e is weak evidence against hypothesis h;
  2. (ii) the plausible inference(s) connecting the premises and ∼h allow(s) us to assign a higher plausibility value to ∼h than the plausibility value of h assigned by the inferences also using e as a premise.

(50)(a) allows e to be weak evidence for rivals of h, too. In the latter case, however, e cannot be regarded as relative evidence for h’s rival, since according to (50)(a)(ii), the plausibility of h is higher than that of its rivals. To put it another way: whereas a datum may be relative evidence only for one of the rivals, it may function as weak evidence for rivals of h as well.

Example 33

In Example 32(1), e₁ is relative evidence for hypothesis h₂, since with the help of hypothesis p₃ a further indirect source can be obtained which makes h₂ plausible, while in the case of h₁, this is not the case:

(p₃) If the phoneme cluster /C₁C₂_/ occurs formative‐initially, then there exists a formative in which the phoneme cluster /_C₂C₁/ occurs formative‐finally.
(p₄) There exist German formatives with the structure /_rpf/.

With the help of these statements, the following plausible inferences can be formulated:

Here there is only one indirect source supporting h₁, while there are two for h₂. Moreover, the latter provide h₂ with a higher plausibility value than the first source in Example 32(1) does with h₁.

The third type is strong evidence:

(51)
1. (a) A datum e is strong evidence for hypothesis h, if
  1. (i) e is weak evidence for hypothesis h; and
  2. (ii) e is not weak evidence for any of h’s rivals.
2. (b) A datum e is strong evidence against hypothesis h, if
  1. (i) e is weak evidence against hypothesis h; and
  2. (ii) e is not weak evidence against any of h’s rivals.

(51)(a) requires that e exclusively makes only h plausible, that is, while e makes h plausible, it either makes its rivals implausible or it may not be extended into an indirect source on the basis of which a positive plausibility value can be assigned to the rivals of h in the given p-context. In a similar way, (51)(b) requires that e exclusively makes only h implausible. Consequently, relative as well as strong evidence may contribute to the decision among rival hypotheses and to the treatment of p-inconsistency.³

Examples 34(1)--(2)

(1) Bowdle & Gentner (Reference Bowdle and Gentner1999) mention the following rival hypotheses in cognitive metaphor research:
1. (h₁) Metaphors are based on the comparison of two conceptual domains and on the matching of their related properties.
2. (h₂) Metaphors are stable, unidirectional mappings between two conceptual domains.
3. (h₃) Metaphors are special cases of categorisation.
4. (h₄) Conventional metaphors are special cases of categorisation, while novel metaphors are special cases of analogy.
According to the authors, results of psycholinguistic experiments comparing the interpretation and processing of conventional and novel metaphors make h₁–h₃ equally implausible because the latter describe all metaphors with the same processing model. They support only h₄ and work therefore as strong evidence for h₄ in the above sense.
(2) Hoffmann (Reference Hoffmann2007) presents a case study illustrating that it may happen that a problem cannot be solved solely by relying on corpus data since they serve as weak evidence for both the given hypothesis and its rival; experimental data, however, may function as strong evidence. He characterises the problematic nature of the starting p-context as follows:
the corpus results allow for two competing hypotheses: on the one hand it is possible that sequences such as the way which the satire is achieved in…is just an accidental gap in the data that speakers would use just as they might the world which I was working in…On the other hand, it might be a construction that is not provided by the grammar, i.e., a systematic gap on par with pied piping with that/ø-relativisers…, which is generally considered ungrammatical…(Hoffmann Reference Hoffmann2007: 94; emphasis as in the original)
That is, the statement
1. (e₁) The corpus contains neither manner adjunct PPs with the structure which + P or P + that/ø, nor locative PPs with this structure.
is interpreted by Hoffmann as weak evidence for hypotheses h₁ and h₂ alike:
1. (h₁) Manner adjunct PPs and locative adjunct PPs of the structure which + P, P + that/ø are grammatical but rare.
2. (h₂) Manner adjunct PPs and locative adjunct PPs of the structure which + P, P + that/ø are ungrammatical.
The reason for this decision is that both hypotheses can be supported by plausible inferences which make use of e₁ as a premise:

In order to take a well-founded decision between h₁ and h₂, Hoffmann extends the p-context with new data:
1. (e₂) Participants of an introspection experiment judged the structures in the following order acceptable:
  
  V + P > which + P (locative adjunct) > which + P (manner adjunct) >>>> P + that/ø = word order violations = agreement errors etc.
Referring to Sorace & Keller (Reference Sorace and Keller2005), Hoffmann makes use of e₂ as follows:

To sum up, e₂ is strong evidence for hypothesis h₁ and against h₂ in the sense of (51), since inferences were set up that made h₁ plausible and h₂ implausible.

Motivation and background

It is easy to see that none of the notions of ‘evidence’ gained by the p-model satisfies the criteria laid down by the standard view of the analytical philosophy of science (SVAPS). The former do not provide criteria whose fulfilment could ensure that the datum in question perfectly supports or refutes the given hypothesis. First, since the connection between the datum and the hypothesis is in most cases established by plausible inferences relying on plausible premises, they cannot guarantee the truth or falsity of the hypothesis. Second, nothing excludes the possibility of gaining evidence against the given hypothesis on the basis of other data.

(49)--(51) are related to situations in which the datum does not guarantee but only supports or questions the given hypothesis to a certain extent. However, as a limiting case (51)(a) also covers situations in which it is guaranteed that the hypothesis is true with certainty. This is the case if the evidence e is certainly true and the p-context contains statements with the help of which e can be extended into a demonstrative inference the conclusion of which is the hypothesis h. These requirements are, of course, extremely strict stipulations.

To sum up what we have said in connection with (49)--(51), we can highlight the following properties of evidence:

(52)
1. (a) Data may be tools for eliminating informational overdetermination insofar as they may contribute to the evaluation and comparison of the plausibility of hypotheses which are the sources of p‐inconsistency.
2. (b) Data are not evidence in general but they count as evidence with respect to some hypothesis. That is, they may be weak/relative/strong evidence for or against a given hypothesis h.
3. (c) During the argumentation process, a statement may lose its evidence status. First, statements playing the role of evidence are data, that is, plausible or certainly true statements which get their plausibility value from a direct source. Therefore, when their source loses its reliability, their data status will vanish as well. Second, further sources may evaluate them as implausible. In such cases, they do not have a plausibility value on the basis of all sources which assign a plausibility value to them because the latter do not judge them unequivocally as required in Section 9.3. Thus, they are evidence for this hypothesis on the basis of a part of this set of sources but they cannot function as evidence for it on the basis of the set of all sources. Third, a premise of the inference connecting the evidence and the hypothesis supported/questioned by it may lose its plausibility as a result of changes in the p‐context.
4. (d) Evidence is theory‐dependent. On the one hand, it is a theory‐dependent datum. On the other hand, it depends on the methodology of the given theory whether the evidence and the (type of) inference connecting it with the given hypothesis is considered to be legitimate.

13.4 Conclusions

(47)--(51) introduce concepts at a highly abstract level. The reason for this is – as is also emphasised in the current literature on linguistic data and evidence – the enormous diversity of linguistic data (see Kertész & Rákosi Reference Kertész and Rákosi2009c). It is well known that the source of this diversity is that linguistics is far from being a homogeneous discipline. Rather, it is a network of theories which involve various combinations of several, in certain cases radically different, methods relying on ontological and methodological background assumptions originating from the ‘empirical’ natural sciences, from hermeneutics, or philosophy. In some cases, these theories behave complementarily, while in other cases they overlap or are in conflict with each other (see Penke & Rosenbach Reference Penke and Rosenbach2004a; Kepser & Reis Reference Kepser, Reis, Kepser and Reis2005a; Lehmann Reference Lehmann2004: 189f.). From this it follows that a number of data types can be found in linguistics that are characteristic of other disciplines. Consequently, the notions of ‘linguistic datum’ and ‘linguistic evidence’ do not have differentia specifica which would definitively distinguish them from the genus proximum of ‘scientific datum/scientific evidence’. Therefore, the highly general definitions given in (47)--(51) are not specifiable to ‘linguistics’, but must be narrowed down to specific – that is, documented – argumentation processes implemented by the methods of specific theories.

14 The solution to (P)_III

The definitions provided in (47)–(51) lead to a radical reinterpretation of the concept of ‘evidence’ in several respects:

(a) The concept of ‘datum’ in (47) serves as a basis for the introduction of the three types of evidence (i.e. weak, strong and relative), allowing us to make subtle differentiations in the relationship between a datum and two or more rival hypotheses. It may be the case that indirect sources can be built upon a datum which make at least one of the rival hypotheses plausible (weak evidence). It may happen that by comparing the strengths of the indirect sources which can be built on this datum one finds that one of the rival hypotheses is made more plausible than its rival (that is, this datum seems to vote for one of the rivals as relative evidence). Finally, it is also possible that the datum is one of the premises of inferences which make one of the rivals plausible and the other one implausible (strong evidence); this may, of course, legitimise a stronger preference for the given hypothesis. Consequently, as we have put it, data may be tools for eliminating informational overdetermination insofar as they may contribute to the evaluation and comparison of the plausibility of hypotheses which are the sources of p-inconsistency. Moreover, with weak evidence, data may be the sources of p-inconsistency as well.

To sum up, with the help of the concept ‘evidence’ it can be clarified which data can be used as elements of indirect sources, either for a given hypothesis or against it. The identification of data which are strong evidence for/against a hypothesis is especially important because this is a first step towards the (re)solution of the p-problematicness of the p-context.

(b) It is crucial that ‘evidence’ as a concept does not classify data in the traditional sense. As we emphasised in Section 13.3, data are not evidence in general but they count as evidence relative to some hypothesis. Moreover, evidence is strongly p-context- and theory-dependent. It is not ‘objective’ in the sense of the standard view of the analytical philosophy of science (SVAPS), it is not ‘given’ at the outset of inquiry, it is not ‘primary’ to the theory etc.

Consequently, in the present part of the book we have obtained the following solution to (P)_III:

(SP)_III The p‐model defines the concepts of ‘data' and ‘evidence' as introduced in Chapter 13.

Book contents

Part III - Data and evidence

Summary

Information

Part III Data and evidence

12 The problem (P)_III