What is referred to as compound spelling in the present study is treated under varying names in the literature: Morton Ball’s (Reference Morton Ball1951: 3) use of compounding to refer to spelling only results in an unfortunate terminological overlap with the word formation type. The most frequently used term seems to be hyphenation (e.g. in Bauer Reference Bauer2003: 134 and the GPO Style Manual 2008), which unfortunately suggests the use of a hyphen. Since hyphenation emerges as the most marked of the three spelling variants (cf. 7.1), this terminology is particularly unusual, considering that generally the unmarked constructions from sets of oppositions are used to refer to a whole dimension in a hyperonymous function (e.g. neutral length as against marked shortness; cf. Leech Reference Leech1981: 113–115). Furthermore, hyphenation is an ambiguous term, because it frequently refers to the splitting of words at the end of lines. As a consequence, the present account uses the term compound spelling as a general, neutral and less ambiguous term, which deals with the principles underlying the way the constituents of English compounds are combined in writing.
In order to discuss the spelling of English compounds, it is necessary to determine first what kinds of construction are recognised as belonging in that category. In spite of the multitude of books and articles dealing with compounds, Faiß’ (Reference Faiß, Kunsmann and Kuhn1981: 132) observation that there does not seem to be a generally recognised definition of what constitutes a compound still holds true more than thirty years later. This is particularly problematic in view of the large number of adjacent categories which compounds need to be distinguished from (cf. 2.1–2.4), so that most previous research has focused on the centre of the category (i.e. nominal noun+noun compounds). In the following, an attempt will therefore be made to explore the boundaries of the compound concept.
As the name indicates, compounds are compounded lexemesFootnote 1 and therefore consist of more than one constituent. Bauer (Reference Bauer1983: 29) defines a compound as “a lexeme containing two or more potential stems that has not subsequently been subjected to a derivational process” (Bauer Reference Bauer1983: 29). Since “it is quite common to find compound prepositions and compounds in other minor categories” (Bauer Reference Bauer2003: 137), it makes sense to expand the definition of the constituents beyond lexical stems, as in Quirk et al.’s (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1567) definition of the compound as “a lexical unit consisting of more than one base … and functioning both grammatically and semantically as a single word”. This implicit inclusion of grammatical constituents results in the following preliminary definition (cf. 2.6 for the final version):
A compound is a complex lexeme which consists of at least two constituents occurring as free lexemes each and which contains no affixation on the highest structural level.
However, this definition still leaves room for interpretation. Compounds need to be set apart both from other lexemes (most of which are regarded as the result of word formation processes) and from phrases (which are regarded as the result of syntactic processes), although there seems to be a tendency to regard these two domains as ever more gradient (e.g. Erman and Warren Reference Erman and Warren2000: 53). The following sections discuss the criteria which are commonly used in the literature to distinguish compounds from other linguistic entities. Furthermore, they give an overview of the types of compound recognised in the present study based on length, word formation type, part of speech and spelling.
2.1 Compounds versus Phrases
It is the distinction between compounds and phrases that seems to represent the most important difficulty – at least considering the vast amount of literature devoted to the topic. According to Donalies (Reference Donalies2003: 79), the comparison may be complicated by differences between the categories, with words being formed according to word formation rules and phrases being formed according to syntactic rules. However, the general assumption that compounding has its origin in a univerbation process applied to syntactic structures in Proto-Indo-European, whose output served as the basis for analogical formations without corresponding syntactic structures (Kastovsky Reference Kastovsky, Lieber and Štekauer2009: 328–329), would rather lend support to the currently more common view in linguistics (e.g. in valency grammar and construction grammar) that there is no clear dividing line between syntax and the lexicon (cf. e.g. Herbst and Schüller Reference Herbst and Schüller.2008: 1). From a synchronic perspective, a more important reason for the difficulty experienced in the distinction between compounds and phrases may therefore be that “the lack of inflectional morphemes in English … makes surface forms of English compounds and free syntactic groups identical in terms of their morphological forms” (Lieber and Štekauer Reference Lieber, Pavol, Lieber and Štekauer2009: 5), e.g. the English compound blackberry and the phrase black berry compared to their German equivalents Blaubeere and blaue Beere. Particularly problematic are syntactically permissible constructions with initial adjectives that are still considered compounds by many scholars (e.g. the adj+n compound black eye as against the adj+n phrase black shoes). Furthermore, English has hardly any denominal adjectives denoting material in English (such as wooden; cf. Giegerich Reference Giegerich2004: 7), so that nouns are often used with an adjective-like function in noun+noun constructions whose first noun denotes the material of the second one (e.g. steel bridge). Similarly, while English derives adjectives from place names for countries (e.g. Spain – Spanish; Italy – Italian), this is not usually the case for the name of towns, for which gradient noun+noun constructions (e.g. an Oxford don) are used as well.
The most influential discussion of possible criteria for the distinction between compounds and phrases can be found in Bauer (Reference Bauer1998). While the discussion is restricted to noun+noun combinations, many of the criteria considered there can be extended to compounds combining other parts of speech. The following sections present a critical discussion of the most commonly used criteria in the linguistic literature, in which the discussion of formal aspects is followed by the consideration of syntactic, structural and finally semantic criteria.
2.1.1 Formal Criteria
2.1.1.1 Orthographic Unity
A study which sets out to test compound spelling can obviously not base its definition of the compound on orthographic unity, but it is still necessary to discuss this criterion, because it is so frequently mentioned in the literature. Indeed, some linguists (e.g. Morton Ball Reference Morton Ball1951: 3) exclude open compounds from their compound definition altogether. However, if orthographic unity were a necessary requirement for compound status, orthographic variants with identical phonological form and meaning but different spelling would call for very distinct categorisation: girlfriend and girl-friend would have to be classified as compounds and their variant girl friend as a phrase (Donalies Reference Donalies2003: 80). Possibly for that reason most accounts of English compound spelling merely regret the fact that English compounds cannot be defined as an uninterrupted sequence of characters, but still admit open (and also hyphenated) spellings on the grounds that some patterns producing otherwise prototypical compounds (such as V+ing + N, e.g. nursing home) commonly use open spelling (Schmid Reference Schmid2011: 122). Yet while orthographic unity is no necessary defining criterion for compounds, it is a sufficient one: constructions consisting of sequences of letters which are not interrupted by a space will generally be interpreted as a single lexeme (cf. Schmid Reference Schmid2011: 132) and thus as compounds if their constituents are lexemes in their own right. As soon as characters other than the hyphen enter the sequence, this is no longer necessarily the case (cf. contractions such as don’t). With both solid and hyphenated spelling indicating word status, orthographic unity is thus a good criterion for two of the three compound spelling variant types, but not for compounds in general (a large number of which are spelled open; cf. 7.2).While orthographic unity can thus not be used as a clear defining criterion for compounds, it is, however, an exclusive criterion for phrases, as syntactic groups are neither hyphenated nor written solid (Faiß Reference Faiß, Kunsmann and Kuhn1981: 135).
2.1.1.2 Fore-Stress
The most common test criterion for compound status in the literature is presumably fore-stress, which distinguishes between compounds like ˈgreenhouses, with stress on the first constituent, and phrases like ˌgreen ˈhouses, with stress on the second (cf. also 5.4). Marchand (Reference Marchand1969: 25) sees a connection between primary stress on the first constituent and the permanent lexical relation expressed in compounds, and links primary stress on the second constituent to a mere syntactic relation. While this is in line with the fact that Germanic languages – including English – usually place word stress on the first syllable, the existence of French borrowings with word-final stress (e.g. champagne, magazine) in English has established “a precedent for end-stressed nouns in the lexicon” (Giegerich Reference Giegerich2004: 6). This might explain why fore-stress is no unproblematic test criterion for compound status in English anymore (cf. Plag, Kunter and Lappe Reference Plag, Kunter and Lappe2007; Bell and Plag Reference Bell and Plag2012):
Some compounds, such as blackcurrant, full stop (‘period’) or hotdog, are stressed on the second constituent by many speakers (Huddleston and Pullum Reference Huddleston and Pullum2002: 451, 1650). The same applies to inversion compounds such as heir apparent (Faiß Reference Faiß, Kunsmann and Kuhn1981: 133) and to almost all copulative compounds (Schmid Reference Schmid2011: 145).
Speakers may apply stress placement inconsistently to the same construction (Bauer Reference Bauer1983: 102–104) and dictionaries may also differ in their treatment of the phenomenon: thus churchwarden has initial stress in one dictionary but final stress in others (cf. Bauer Reference Bauer1998: 70).
For some lexical items, such as ice cream, there are even generally recognised alternative stress patterns (Bauer Reference Bauer1983: 102–104).
Some authors argue that there is an association between particular stress patterns and particular semantic relations: thus ‘B made of A’ (stone wall) sometimes calls forth end stress, while ‘B used for A’ (pruning shears) calls forth fore-stress, but it is unclear why one of these relations should be considered more lexical than the other (Bauer Reference Bauer1998: 71).
Combinations that are ‘too long’ always have two stresses, e.g. concert performance (Marchand Reference Marchand1960a: 16).
Taking everything into account, it seems that word stress is most distinctive in adjective+noun compounds (e.g. blackbird vs. black bird) but that fore-stress is no criterion which can distinguish all compounds from phrases. In a reversal of perspectives, however, fore-stress can be used to distinguish phrases from compounds, since phrases never have fore-stress (cf. also Giegerich Reference Giegerich2004: 21 and Faiß Reference Faiß, Kunsmann and Kuhn1981: 133).
2.1.1.3 Length
Another possible formal test criterion is length. Bauer (Reference Bauer2003: 134) states that some linguists “seem happy enough to concede girlfriend (however spelt) as a single lexeme but are less happy with longer compounds”, e.g. morphology textbook vs. morphology textbook cover and morphology textbook cover box. Although processing of long compounds should be easier in written than in spoken language, Schmid (Reference Schmid2011: 208) finds only a small number of compounds with four or more constituents in his written corpus. A statement such as “and then we looked it up in the airline cabin crew safety training manual” is uncommon even in technical language, since extremely long combinations with potential compound status are likely to be replaced by abbreviations or acronyms (Schmid Reference Schmid2011: 206). Phrases, by contrast, can be very long, e.g. due to multiple embedding. Length is thus no absolute but rather a gradient criterion.
2.1.2 Syntactic Criteria
2.1.2.1 Part-of-Speech Specification
Words and thus also compounds can be assigned a part of speech based on their inflection and the context in which they occur (Plag Reference Plag2003: 8). Since compounds do not cross phrasal boundaries, they either constitute a whole phrase on their own (e.g. a verbal compound in the simple present or past tense, such as the verb phrase in They double-checked the calculations) or they occur within a phrase (e.g. the noun fairy tale acting as the head of the noun phrase One could call it a modern fairy tale). The mere occurrence in a particular syntactic slot is not enough to classify a construction as a compound. While some constructions may be considered adjective compounds in the premodifying slot of a noun phrase (e.g. full+length in a full-length portrait), this is not automatically the case: last year’s in There were fifty nominees for last year’s prize is considered phrasal in view of the fact that the genitive inflection cannot be used with adjectives. However, without the help of other criteria, the line is sometimes difficult to draw: for the sentence Next to me sat a smiling child, few people would argue that smiling child is a compound. However, if somebody wished to do so, they could consider the construction a noun based on the part of speech of the head child, and the fact that inflection can be added (e.g. smiling children, smiling child’s, smiling children’s) would seem to support such an analysis. Part-of-speech specification is presumably most useful in the classification of constructions consisting of grammatical words, e.g. in+as+much+as, whose classification in the LDOCE example sentence Ann was guilty, inasmuch as she knew what the others were planning as a complex conjunction is supported by its possible replacement with the conjunction because. As a consequence, part-of-speech assignment is best used in combination with other criteria to distinguish compounds from phrases.
2.1.2.2 Syntactic Ill-Formedness
Another possible compound criterion is syntactic ill-formedness: some constructions consisting of free constituents on the highest level of analysis, such as forward+looking or with+out, cannot be described by means of syntactic rules, as one would rather expect the order looking+forward or with followed by a noun phrase. As a consequence, they can be considered compounds (cf. Dressler Reference Dressler, Libben and Jarema2005: 28). The opposite, syntactic well-formedness, by contrast, has no such implications, since the order of the constituents in compounds may also correspond to syntactically well-formed phrases (e.g. in greenhouse ‘a glass building for plants’). Syntactic ill-formedness is thus a possible but not necessary criterion for compound status.
2.1.2.3 Positional Mobility
Positional mobility as yet another criterion of wordhood could also be considered as potentially distinguishing between compounds and phrases: for instance, both police officers and lollipop ladies “can be used in different places in the sentence” (Bauer Reference Bauer1983: 105), e.g. as subjects or objects in Police officers stopped lollipop ladies vs. Lollipop ladies stopped police officers. However, in such cases it is actually whole phrases (including determiners if the noun is not uncountable or pluralised) which change their position. Since some syntactic phrases, particularly with adverbial function, are also relatively mobile within the sentence (e.g. For that reason, he did not come vs. He did not come for that reason), positional mobility is no valid criterion to distinguish compounds from phrases.
2.1.2.4 Uninterruptability
While it is possible to interrupt phrases (by extending the noun phrase a girl to a nice girl and even a nice young girl, or the verb phrase was going to was happily going),Footnote 2 “items cannot be inserted between formatives within a word” (Bauer Reference Bauer1983: 105), so that a compound such as the noun girl+friend cannot be extended to ?girl+nice+friend. At first sight, this statement seems to be contradicted by some examples that Bauer (Reference Bauer1983: 106) gives of “complex words of the form AB such that there is also a complex word of the form ACB”, in which “the element C forms a unit either with A or with B”. However, library book’s supposed extension to library comic-book and city office’s supposed extension to city insurance office have to be rejected as instances of interruptability, because these sequences are better analysed as new compounds with partly new constituents (i.e. comic-book and insurance office as new complex constituents). Whether a construction can be interrupted thus depends on whether it represents a unified concept. While uninterruptability can be accepted as a valid criterion to distinguish phrases from compounds, there is an exception in the form of conjoints (cf. 2.1.2.7), in which the shared constituent in “two linked units of equal status” (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 46) is only physically expressed once, e.g. when the compound iron bars is superficially interrupted by and and steel in the construction iron and steel bars.
In practice, however, uninterruptability is not always easy to determine, particularly in syntactically well-formed combinations of adjective and noun whose status as a technical term with unified meaning (cf. 2.1.4.3) depends on previous lexical knowledge (e.g. of modern man as an anatomic term). While it should be possible to insert an adjective if such constructions are phrases, the test is limited by the tendency of English adjectives to occur in a particular order. Table 2.1 combines and slightly modifies the descriptions in Endley (Reference Endley2010: 96–97), DeCapua (Reference DeCapua2008: 94–95) and Swan (Reference Swan2005: 11):
Table 2.1 Order of English adjectives within the noun phrase
| evaluation | size | condition | age | shape | colour | origin | material | function or classification | NOUN |
|---|---|---|---|---|---|---|---|---|---|
| nice | small | dusty | modern | round | black | German | woollen | political | things |
Testing requires an adjective which can at least theoretically be inserted between a potential compound’s adjective and noun constituents, and which should therefore come from a category to the right of the tested construction’s adjective in Table 2.1. For instance, the status of the construction modern man cannot be tested by the insertion of nice, since the sequence ?modern nice man violates the ordering constraint, which permits no conclusions concerning interruptability – in contrast to the insertion of German, which results in a semantic change in modern German man.
2.1.2.5 Syntactic Isolation of Constituents
According to Bauer (Reference Bauer1998: 72–74), “elements within the word should not be available to the syntax”. For instance, the status of a construction as an adj+n compound can be tested by attempting comparison if that is permitted by the adjective on its own. As a consequence, special education can be classified as a compound, because the sequence more special education would result in a change in meaning. Yet the principle of syntactic isolation is broken in attested examples of compounding such as So, I hear you’re a real cat-lover. How many do you have now?, where how many refers to cat, the modifier of the compound (Bauer Reference Bauer1998: 72). The principle is also broken in the case of some derivatives, whose status as words is uncontested, e.g. in what sharply distinguishes Chomskyan practice from that of his structuralist forbears, where his refers anaphorically to Chomsky, the base of the suffixation (Bauer Reference Bauer1998: 72). Furthermore, some constructions which are structured in parallel to (other) compounds comprise internal inflection, e.g. games mistress with an initial plural (Bauer Reference Bauer1998: 72–73), or genitive compounds such as bull’s-eye (cf. 5.5.1). While all this seems to imply that the syntactic isolation of constituents is no valid criterion to distinguish compounds from phrases, one may argue that the inflection is actually part of these latter compounds and can therefore not be modified or deleted to form a commonly used singular or non-possessive form (?game mistress, ?bull-eye). Another group of seemingly contradictory compounds is represented by constructions like sons-in-law, mothers-to-be and lookers-on, which are usually pluralised in the middle of a hyphenated sequence of letters (Swan Reference Swan2005: 517). This group can be included by modifying the principle of syntactic isolation of constituents in such a way that compounds are expected to take inflection only at their head (Donalies Reference Donalies2003). However, modern-day usage also seems to permit word-final placement of the genitive for such items (son-in-law’s, mother-to-be’s, looker-on’s; cf. e.g. www.englishforums.com/English/InflectedPeriphrasticGenitive/bvpjk/post.htm, 18 August 2017). Since neither word-final inflection nor head-only inflection seems to apply consistently, the central principle which can be derived from all of the foregoing discussion is that compounds only take each type of inflection once for all their constituents. However, the principle needs to be refined even further if one considers that compounds with internal inflection may occur in syntactic contexts requiring the same type of inflection at the end of the compound, e.g. news bulletin in the plural context Robin enjoys watching news bulletins. In order to avoid having to evaluate a construction’s status as a compound differently depending on whether the context requires the same type of inflection as at the end of the first constituent or another type of inflection or none, the most precise formulation of the principle of syntactic isolation of constituents is that each type of inflection may only be applied once to the base form of a compound (which may contain inflection at the end of the first constituent). The only exception to this principle consists in phrases of the type ‘name + and/or + name’ and their variants with more constituents linked by commas, which take a single inflection (like compounds) in spite of the fact that two inflections are theoretically possible. For instance, the book Cohesion in English is usually referred to as Halliday and Hasan’s book and not as ?Halliday’s and Hasan’s book, and the form Jack or Jill’s seems to be more common than Jack’s or Jill’s (Google search, 04 September 2015).
2.1.2.6 No Replacement of the Head by the Pro-form One
While the syntactic isolation of compounds’ constituents theoretically also covers the impossibility of replacing a compound’s head by the pro-form one, e.g. in *That’s not an oak tree but an elm one, it is listed separately here because of its frequent discussion in the literature (e.g. Bauer Reference Bauer1998: 76–78). While Schmid (Reference Schmid2011: 132) uses this criterion to distinguish phrases from compounds, Bauer (Reference Bauer1998) provides a list of counterexamples, e.g. I told you to bring me a steel bar but you have brought me an iron one or I wanted a sewing machine, but he bought a knitting one, which renders the distinctiveness of this criterion for the discrimination of compounds and phrases slightly doubtful.
2.1.2.7 No Coordination
Another traditional criterion, similar to syntactic isolation, posits that no coordination should be possible with the constituents of a compound (e.g. butter+cup), so that neither *bread and buttercups nor *buttercup and saucer are possible combinations (Bauer Reference Bauer1998: 74). However, the impossibility to coordinate the flower buttercup may be the result of the compound’s idiomaticity: since coordination requires a parallel semantic relationship between the coordinated elements (e.g. like that between buttercup and the hypothetical flower honeycup in the presumably possible combination butter(-) and honeycups), finding possible items to coordinate becomes increasingly difficult with increasing idiomaticity (Bauer Reference Bauer1998: 74–75). Where such a parallel structure exists, however – and that seems to be the case for most constructions considered compounds – coordination is possible at least in noun+noun compounds (cf. Bauer’s Reference Bauer1998: 74–75 iron and steel bars and steel bars and weights), but possibly also in verbal compounds (deep freeze and fry) and adjectival compounds (high maintenance and performance). The non-coordination of constituents is thus no absolute criterion for the distinction between compounds and phrases, either.
2.1.3 Structural Criteria
2.1.3.1 Internal Stability
According to the structural criterion of internal stability, lexical constituents “cannot be reordered within the word” without resulting in impossible variants of compounds (cf. the scrambled versions of forget-me-not: ?not-me-forget and ?not-forget-me) or existing but distinct words with a change in meaning (when garden city is reordered to city garden; cf. Bauer Reference Bauer1983: 105–107). The constituents of copulative compounds (cf. 2.5.2) are sometimes claimed to represent an exception, but it seems that these are rarely reordered in actual usage: thus singer-songwriter occurs twenty-four times in the British National Corpus, compared to zero hits for *songwriter-singer. While one may therefore generalise that compounds are internally stable, the consideration of phrase structure leads to the same result: a phrase such as the nice young girl cannot be randomly reordered. The result would be quite unusual in some cases (?the young nice girl) and ungrammatical in the majority of instances (?nice the girl young/? girl young the nice etc.). It is therefore possible to conclude that internal stability is a quality of both compounds and phrases and therefore no distinctive criterion.
2.1.3.2 Right-Headedness
Donalies (Reference Donalies2003: 84–85) and Adams (Reference Adams2001: 3) consider right-headedness as a potential indicator of compoundhood, but constructions of the type mother-in-law with an initial head lead Donalies (Reference Donalies2003) to discard this criterion. Furthermore, neither phrase compounds nor copulative compounds (cf. 2.5.2), which are considered compounds by several treatments of word formation (e.g. Bauer Reference Bauer1983; Dressler Reference Dressler, Libben and Jarema2005), fulfil this criterion. Left-headedness is observable in some phrases allowing postmodification (e.g. the noun phrase girls united), but since noun and adjective phrases are usually right-headed (e.g. a strong desire; extremely happy), headedness is no useful distinction between compounds and phrases.
2.1.3.3 Listedness
The listedness of compounds is one of the most important defining criteria mentioned in the literature and may be interpreted either in a lexicographical way (e.g. by Bauer Reference Bauer1998: 67) or with regard to storage in the mental lexicon (e.g. by Schmid Reference Schmid2011: 122). The criticism which can be applied is similar in both cases: if listedness were the only criterion for compoundhood, the addition of newly formed (and therefore at least initially unlisted) compounds after the determination of a status quo would be impossible. Conversely, both dictionaries and the mental lexicon may also list longer entities such as idioms and whole sentences (e.g. proverbs). Since rule-produced entities such as syntactic units may be listed (Langacker Reference Langacker1987: 29), listedness “cannot be used to set off compounds from anything else” (Bauer Reference Bauer1998: 68) and is therefore discarded as a criterion here.
2.1.4 Semantic Criteria
2.1.4.1 Idiomaticity
Many linguists use idiomaticity (as one aspect of listedness) to determine compound status (cf. Bauer Reference Bauer1998: 67): thus Kruisinga (Reference Kruisinga1932: 1581) defines a compound as “a combination of two words forming a unit which is not identical with the combined forms or meanings of its elements”, and, according to Marchand (Reference Marchand1960a: 18), compounds “denote an intimate, permanent relationship between the two significates to the extent that the compound is no longer to be understood as the sum of the constituent elements” (e.g. butterfly, which is clearly idiomatic). However, many compounds can actually be interpreted in a literal sense, e.g. passenger seat (‘a seat for passengers’) or oven-ready (‘ready for the oven’). Furthermore, “[a]ny syntactic group may have a meaning that is not the mere additive result of the constituents” (Marchand Reference Marchand1960a: 80). Thus both the Old English compound hēafod-gim ‘head-gem, eye’ and the parallel syntactic construction hēafdes gim ‘head’s gem, eye’ with an inflected first constituent have idiomatic meaning (Terasawa Reference Terasawa1994: 73). Taking everything into account, idiomaticity cannot clearly distinguish compounds and phrases from each other.
2.1.4.2 Semantic Specificity
A related criterion is semantic specificity. According to Faiß (Reference Faiß, Kunsmann and Kuhn1981: 134), “[m]any scholars hold that a compound is semantically more restricted or more specified than a parallel syntactic group is”. Thus a revolving door is not simply a door that revolves but a particular kind of door (Faiß Reference Faiß1978: 25), and the phrase a dancing girl differs from the compound dancing-girl by the latter’s professional status (Hansen et al. Reference Hansen, Hansen, Neubert and Schentke1990: 50). However, it is difficult to “formalise this intuitive distinction” in a more general way (Bauer Reference Bauer1978: 43) – particularly since only a small number of compounds contrast with a parallel syntactic group. As a consequence, the criterion of semantic specificity has only limited applicability for the present study.
2.1.4.3 Unified Semantic Concept
The idea that compounds refer to a unified semantic concept is very common in the literature (e.g. Plag Reference Plag2003: 7) and seems to be generally accepted. For Schmid (Reference Schmid2011: 142), the most important cognitive function of compounding is that compounds establish links between concepts, e.g. when ‘bar’ and ‘man’ are linked in the compound barman in such a way that a new concept (‘a man who serves drinks in a bar’) emerges. While it is certainly true that compounds such as fog+horn express a single idea (in this particular case one comparable to a siren), the opposite is not necessarily true, because a single idea can also be expressed by a construction which is very clearly a phrase, e.g. ‘the smell of fresh rain in a forest in the fall’ or ‘the woman who lives next door’ – for which the English language has no equivalent compounds (Plag Reference Plag2003: 7). Since one may, however, agree that all compounds as ‘complex lexemes’ refer to a unified semantic concept, whereas the majority of English phrases do not represent a single idea, the ‘unified semantic concept’ test can be used to determine potential candidates for compoundhood and will therefore be included in the final definition (cf. 2.6). For grammatical compounds such as without, the analogical requirement is a unified syntactic function, reflected in the assignment of a joint part of speech.
To determine in practice whether a construction in an English text represents a unified semantic concept, the interruptability test (cf. 2.1.2.4) can be carried out. If a sequence can only be interrupted with a change in meaning, it is considered to refer to a single idea. This test requires a very high command of English, and an inverse correlation between the number of constructions accepted as compounds and a speaker’s level of English can be expected, because the failure to imagine possible interrupting items may prompt less advanced speakers to classify borderline cases as compounds.
2.2 Compounds versus Other Lexemes
Besides the distinction from phrases on the syntactic level, compounds need to be set apart from other lexemes in the morphological dimension.
2.2.1 Compounds versus Simplex Lexemes
In the majority of cases, the distinction between compounds and simplex lexemes should not pose any practical problems, as a simplex like tree will rarely provoke uncertainty regarding potential categorisation as a compound. However, there are two types of exception: on the one hand, so-called fossilised compounds (Dressler Reference Dressler, Libben and Jarema2005: 40) or obscured compounds (Götz Reference Götz1971) are no longer recognisable as compounds, so that e.g. lord and lady cannot be analysed into constituents in present-day English anymore. As a consequence, one may argue in favour of their classification as simplex lexemes from a synchronic perspective. Conversely, unmotivatable but transparent lexemes (cf. Sanchez Reference Sanchez2008: 87) could theoretically be analysed into free pseudo-constituents, e.g. forget into for+get. This is for example the case of mush+room, a popular etymological interpretation of the French loanword mousseron (cf. Oxford English Dictionary [OED] s.v. mushroom). For the purposes of the present study (cf. also 4.1), the concept of the compound is therefore restricted to compounds which can be analysed into motivating – and thus semantically relevant – constituents in present-day English. While excluding pseudo-analyses such as forget and mushroom, the present approach includes all compounds whose parts appear morpho-semantically relevant, without consideration of their actual etymological origin.
2.2.2 Compounds versus Derivatives
In the majority of cases, the distinction between compounds and affixations is relatively unproblematic: compounds (e.g. pen friend) are formed from freely occurring constituents, and derivatives (e.g. befriend or friendship) contain at least one bound lexical affix. However, the definition of compounds cannot rely solely on the absence of lexical affixes, since some compounds (e.g. ozone-friendly) contain prefixes or suffixes within the compound’s constituents. Yet this is not the only problem with regard to the dividing line between compounding and derivation:
1. The status of a particular morpheme as a prefix is not always easy to determine, because some prefixes (e.g. after-) are formally and semantically identical with free morphemes (in this case, the preposition after). For the sake of consistency, constructions containing the affixes listed in Table A.8 on the highest level of analysis are therefore considered prefixations rather than compounds in the present study (unless the meaning of the morpheme in question is different from the meaning of the affix, e.g. in the case of postcard or adland, which clearly refer to mail and advertising rather than to the meanings of the listed affixes).
2. Compounds containing one or more combining forms of Greek or Latin origin (Plag Reference Plag2003: 74), such as the neoclassical compound biology, are special in that their constituents cannot occur as free lexemes in their own right. Since this contradicts the requirements of the compound definition at the beginning of this chapter, such items are not considered compounds in the present study.
3. The same is true of lexemes of the type cranberry, which contain morphs that are unique in the language (cf. Aronoff Reference Aronoff1976: 15). Although berry is a noun in its own right, cran never occurs on its own with the meaning it has within the complex word (i.e. not the homonymous Scotch form of crane recorded in the OED). Once again, this contradicts the vital requirement for compound status, “the declaration of independence” of a construction’s constituents (Bauer Reference Bauer, Dressler, Kastovsky, Pfeiffer and Rainer2005: 97).
4. Compound-final man forms a large number of compounds with solid spelling (e.g. fireman, policeman) and has relatively general semantics, with its meaning corresponding to little more than the suffix –er. It can therefore be considered an affixoid (cf. Bußmann Reference Bußmann2002 s.v. Affixoid, Suffixoid), but since the important difference to suffixation is that man also occurs as a free lexeme in its own right, such constructions are regarded as regular compounds here.
5. The absence of lexical affixes on the highest level of analysis is a particularly important criterion for the distinction between compounds and affixations. It plays a role in the categorisation of a number of borderline cases, particularly constructions ending in -ed or -er (e.g. bite-sized and do-gooder). Since bite and sized occur as lexemes in their own right in the English language and are also listed in the OED, in contrast to the verb ?to bitesize (from which bite-sized could have been derived), the complex adjective bite-sized is conferred compound status. The potential constituent ?gooder, by contrast, does not seem to exist as an English word and is not listed in the OED, which makes the derivation of do-gooder from the existent phrase to do good by means of suffixation the more plausible choice. However, even if all constituents of a potential compound exist as individual words (e.g. black and marketeer), the semantics of their combination need to be taken into account: since black marketeer does not refer to a marketeer with black skin but to someone selling objects on the black market, it is considered a suffixation of a phrase, i.e. [black market] + -eer, and not a compound. In the literature, the term synthetic compound is sometimes used in order to accept as compounds constructions which combine an initial noun with a final deverbal noun whose status as a word in its own right is doubtful: thus goer in church+goer and swallower in sword+swallower are “possible, but not established English words”, which “function as building blocks in word-formation” through the simultaneous application of derivation and compounding (Booij Reference Booij2007: 92).
Usually, derivatives are spelled solid, which should simplify the discrimination between compounds and derivatives, but there are some exceptions: we find hyphenation in some prefixations (e.g. co-operate or re-cover), particularly if there are orthographic or semantic reasons, such as the avoidance of sequences of identical letters or an unusual meaning (cf. 5.1.1), and also in some suffixations (e.g. bell-less, shell-like; cf. Ritter Reference Ritter2005a: 56) – but no spacing.
2.2.3 Compounds versus Other Word Formations
In most cases, the difference between compounds and other types of word formation beside derivatives is relatively clear, as demonstrated by these classical examples:
Compounds differ from acronyms in that the latter (e.g. BBC) are usually capitalised throughout and consist of constituents which have been shortened so extremely (British Broadcasting Corporation) that they cannot be considered recurring free lexemes anymore.
Blends such as brunch also contain constituents that have been shortened in such a way that they do not meet the requirement of representing recurring free lexemes (breakfast lunch).
Clippings (e.g. lab from laboratory) as shortenings are unlikely to be confused with the more complex compounds. When clippings occur as parts of potential compounds, however (e.g. language+lab; cf. 2.5), one may disagree whether such constructions are better classified as compounds consisting of two free constituents (language and lab) or as clippings of a longer compound (language+laboratory). Whenever the clippings in such constructions recur as free words in English, the present study assigns compound status to them, because they fulfil the requirements of the compound definition (cf. 2.6).
Back-formations (e.g. to edit from Latin editor; cf. OED) are not necessarily complex – but when they are, e.g. in the case of to baby-sit (from baby-sitter), the line is difficult to draw. Since they consist of free lexemes and have no affixation on the highest level of analysis, back-formations are not singled out by the compound definition used here (cf. 2.6). However, that is not necessary either, because back-formations are not distinct from compounds in their structure synchronically; only historically (Huddleston and Pullum Reference Huddleston and Pullum2012: 286). A synchronic account of compounding therefore permits a certain extent of overlap between the two categories. As long as the result is compatible with the compound definition used, a compound may have undergone various word formation processes (cf. also Huddleston and Pullum Reference Huddleston and Pullum2002: 1660).
Conversions are frequently simple lexemes, e.g. the verb to bottle, which goes back to the noun bottle (cf. e.g. Sanchez Reference Sanchez2008: 93–94). When phrases or sentences consisting of more than two components are transformed into a construction with a single part of speech (e.g. in an I-don’t-care-what-you-do attitude), this could be regarded as an instance of conversion. The present study will, however, follow the relatively common view that hyphens or concatenation in such constructions are an indication of a unified idea (cf. also Schmid Reference Schmid2011: 122) and categorise them as phrase compounds (cf. 2.5.2, 5.2 and Huddleston and Pullum Reference Huddleston and Pullum2002: 1660).
2.3 Compounds versus Multi-word Items
Since compounds consist of two or more free constituents, they need to be considered in relation to multi-word items, which always use open spelling. Jackson and Amvela (Reference Jackson and Amvela2002: 63–64) categorise compounds as one of the three main types of multi-word lexeme beside multi-word verbs and idioms.
Within the multi-word verbs, Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1150) distinguish phrasal verbs (bring up; sit down), prepositional verbs (call for; look at) and phrasal-prepositional verbs (check up on; get away with). While early English grammars often treat phrasal verbs as compounds and occasionally use hyphenation (e.g. came-in and takes-away in Solomon Lowe’s Reference Lowe1737 English Grammar Reformd …; cf. Sundby Reference Sundby, Ramisch and Wynne1997: 227), phrasal verbs are generally considered syntactic phenomena and exclusively spelled open in present-day English. The present study follows this established convention and excludes verbal combinations of verb and adverb and/or preposition from the category of compounds.
The category of idioms can be considered as consisting of “grammatical units larger than a word which are idiosyncratic in some respect” (Croft and Cruse Reference Croft and Cruse2004: 230). Some of the various existing idiom definitions (cf. Croft and Cruse Reference Croft and Cruse2004: 230–236) are gradient towards compounding, e.g. Cruse’s (Reference Cruse1986: 37) requirement for idioms to “consist of more than one lexical constituent” and to represent “a single minimal semantic constituent”. The presumably most common definition is that idioms are constructions whose meaning cannot be predicted from the meaning of the several orthographic words composing them (Palmer Reference Palmer1981: 36; cf. also Lipka Reference Lipka2002: 90), e.g. to kick the bucket, to bury the hatchet or to let the cat out of the bag (Jackson and Amvela Reference Jackson and Amvela2002: 65–66). Although idioms in this sense represent a semantic unit, they do not necessarily function grammatically like one: in contrast to the majority of compounds, idioms do not add inflection at their end. There is thus no past tense ?kick the bucketed (Palmer Reference Palmer1981: 80) – as against the past tense freeze-dried of the compound freeze-dry (whose first element is verbal and could carry inflection if the construction were a phrase). In view of the co-hyponymy of idioms and compounds, constructions which are sometimes referred to as idioms (e.g. a red herring) but take final inflection are considered compounds in the present study, while not excluding that they can be idioms at the same time.
Two additional types of multi-word item are discussed by Moon (Reference Moon, Schmitt and McCarthy1997: 45–47), namely fixed phrases (of course; at least; in fact; how do you do; excuse me; you know) and prefabs (the thing is that). These lexical chunks seem to have a very strong pragmatic function, particularly in spoken language, where their joint storage in the mental lexicon may save processing time.
Those fixed phrases which are formally whole sentences including a verb (e.g. How do you do?) and cannot be categorised as a single part of speech are not classified as compounds here. That the dividing line for the shorter fixed phrases is much more difficult to draw is reflected in their varying lexicographic treatment. For instance, the combinations in fact, by far and at least cannot be found in the electronic LDOCE, but the OED lists them as phrases. As a consequence, frequently recurring combinations of grammatical words need to be considered with particular care (cf. also Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 672–673). Since concatenation of such combinations has occurred in the past, e.g. in the complex preposition into (cf. OED s.v. into), that is frequently regarded as a compound, and since change is still in progress (thus of course is lemmatised in some reference works such as LDOCE but only listed as a subentry in the OED), frequently recurring combinations of grammatical words need to be treated following an item-based approach.
Erman and Warren (Reference Erman and Warren2000: 31) define prefabs as combinations of “at least two words favored by native speakers in preference to an alternative combination which could have been equivalent had there been no conventionalization” and distinguish the following subtypes:
lexical prefabs (idioms, compounds, habitual collocations, phrasal and prepositional verbs)
grammatical prefabs (e.g. quantifiers such as a great deal of; determiners such as that sort of; tense such as be going to; introductors such as there is)
pragmatic prefabs (e.g. discourse markers such as and then; feedback signals such as yeah quite; hedges such as I should think)
reducibles (e.g. it’s or I’m) as a category that they consider more debatable.
While this very general definition of prefabs results in a very heterogeneous group of types, the distinction of prefabs from compounds is simplified by the fact that compounds are explicitly classified as a subcategory of prefab by Erman and Warren (Reference Erman and Warren2000). Since they define words orthographically, “teacup spelt as one word would not be considered a prefab, but tea cup spelt as two words would” (Erman and Warren Reference Erman and Warren2000: 32). If one follows their definition, compounds should be considered prefabs, provided that they are spelled open and are sufficiently frequent. Based on that criterion, all open compounds listed in dictionaries can be considered prefabs – but not ad hoc compounds or relatively unestablished compounds.
Yet another addition to the set of multi-word items is constituted by the category of collocation, which overlaps to a large extent with fixed phrases and prefabs. There are two main definitions of collocation: the quantitative Firthian definition “[y]ou shall know a word by the company it keeps” (Firth Reference Firth and Firth1957: 11) is based on lexical items’ high frequency of co-occurrence (cf. McEnery, Xiao and Tono Reference McEnery, Xiao and Tono.2006: 82) and purely statistical. Hausmann’s (Reference Hausmann, Bergenholtz and Mugdan1985: 118) qualitative approach, by contrast, posits that a collocation such as to take a shower is a typical, specific and characteristic relation between a relatively context-independent base (shower) and a collocator (take) that can only be understood in relation to a particular base (cf. Hausmann Reference Hausmann and Steyer2004: 311–312). Within both approaches to collocation, compounding may be considered a special type of collocation: thus Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1537) refer to low-frequency compounds as a “collocation [that] seems relatively unestablished” and Hausmann (Reference Hausmann and Steyer2004: 317) states that a subset of compounds can be interpreted as collocations: thus the base Dach ‘roof’ of German Schiebedach ‘sunroof’ is relatively straightforward and likely to be translated by a direct equivalent into other languages, whereas the collocator schiebe(n) ‘push’ is unpredictable and e.g. translated into French by the meaning component ‘open’ in French toit ouvrant (literally, ‘opening window’) and the even more distant sun expressing the concept in English sunroof. As a consequence, the present approach considers compounds part of the hyperonymous category of collocation, with frequency playing no clear delimitating role (as in the case of prefabs).
The longest type of multi-word item is represented by proverbs (Schmitt Reference Schmitt2000: 99), e.g. Out of sight, out of mind. Since they represent full sentences ending with a punctuation mark (usually a full stop) and cannot be assigned a single part of speech, they are very clearly different from compounds.
To sum up, there is no general consensus regarding the relation between compounding and phraseology in the literature. In their comparison between the two, Granger and Paquot (Reference Granger, Paquot, Granger and Meunier2008: 32–33) – who also provide a detailed categorisation of phraseological units (Granger and Paquot Reference Granger, Paquot, Granger and Meunier2008: 42–44) – find that phraseological approaches differ in their inclusion of compounds, with the traditional view tending to exclude either all or most compounds, whereas even hyphenated compounds may be categorised as multi-word units by alternative approaches. For a summary of the approach used in the present study, cf. 2.6.
2.4 Compounds versus Names
Names are a very special and easily recognisable category: they are always nouns, they are always capitalised and they refer to individual entities or people rather than having a generic meaning, the widely accepted view being that names “may have reference, but not sense” (Lyons Reference Lyons1977: 219). Since names may e.g. combine with other names to form longer names (e.g. two first names, such as Mary Jane, or a first name and a second name, e.g. Elvis Presley), the question arises whether such combinations should be considered compounds.
An important argument for the consideration of complex names as compounds is that their constituents may occur on their own and recur in the language. In combinations of first names, all three main compound spelling variants can be observed: there is open spelling (Mary Jane), hyphenation (Mary-Lou) and solid spelling (Maryanne). All three variants may coexist for a particular complex name (Mary Lou, Mary-Lou and Marylou), but in spite of the general liberties regarding the form of names, certain spellings seem to be avoided, e.g. solid ?Malcolmchristopher, which suggests that the principles applying to compounds (e.g. length, frequency etc.) may also be at work here. If two second names are combined (e.g. Henderson-Smith; cf. Morton Ball Reference Morton Ball1939: 30), both open and hyphenated spellings are possible in English (Soanes Reference Soanes2011). Combinations of first and second names, such as Malcolm Jones, only seem to use open spelling – which makes them similar to phrases.
Yet an informal query among four native speakers of different varieties of English suggests that the relation between combined first and second names only superficially resembles that between the constituents of determinative compounds: someone who failed to understand a famous actor’s first name in a conversation about films might ask: “Sorry, which Douglas did you mean? Kirk or Michael?”, but an analogous question with compounds would be “Sorry, which room did you mean? Bedroom or bathroom?” rather than ?“Sorry, which room did you mean? Bed or bath?” Furthermore, it is possible to ask for a surname (“Which Michael did you mean? Douglas or Jackson?”) but not for a compound’s head in a parallel question (?“Which bath did you mean? Room or tub?”). In view of this difference in structure and the fact that “either noun in a personal name may be used alone to indicate the person (object) referred to” in more or less intimate and symmetrical ways, the present study follows Morton Ball (Reference Morton Ball1939: 30) in her classification of combinations of first and second name as appositions, which are typically noun phrases with reference identity (e.g. Anna and my best friend in the apposition Anna, my best friend, was here last night; cf. Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1300–1301). Names do thus not represent a unified category, and a distinction is made here between appositional combinations of first and second names and the combinations of two first names or two second names, which are considered compounds if the usual conditions for compound status apply (cf. also Morton Ball Reference Morton Ball1939: 30).
2.5 Compound Types
Much of the confusion regarding the delimitation of the category of compounds may be due to the fact that it potentially involves a large number of very different subtypes. The following sections present the various types which are commonly discussed in the literature, based on length, word formational structure, part of speech and spelling. Most accounts of compounding focus on noun+noun and noun+adjective constructions and otherwise frequently restrict themselves to “some major patterns” (Biber et al. Reference Biber, Johannson, Leech, Conrad and Finegan1999: 325). The following overview, by contrast, also attempts to provide a detailed discussion of minor compounding patterns, even though exhaustiveness may not have been achieved. Where the status of a category is controversial, the present approach follows Bauer’s (Reference Bauer1998: 65) tradition of the “lumper” by accepting as compounds all categories classified as such in the literature, as long as they are compatible with the preliminary compound definition given earlier.
2.5.1 Length
Many English compounds – in any case the most typical ones – contain only two constituents (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1567; Schmid Reference Schmid2011: 121), but each constituent of a determinative compound (cf. 2.5.2) may itself be a compound, e.g. [[motor + cycle] + [outlet + store]], and the existence of longer compounds seems to be generally accepted in the literature. Coordinative compounds (e.g. secretary-treasurer-editor or Metro-Goldwyn-Mayer; cf. Huddleston and Pullum Reference Huddleston and Pullum2002: 1648 and 2.5.2) can also consist of more than two constituents, and phrase compounds such as love-lies-bleeding (cf. 2.5.2) even have more than two constituents by definition. According to Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1567), English compounds can involve any number of constituents, but while there is no clear cut-off point, readers seem less prepared to accept a construction as a compound with growing complexity (Donalies Reference Donalies2003: 78). As a consequence, some scholars may disagree with Plag’s (Reference Plag2003: 133) example university teaching award committee member or Adams’ (Reference Adams2001: 79) claim that UK film industry task force appointment controversy represents a compound.
2.5.2 Structure
Table 2.2 summarises the various compound categories recognised in the present study from the perspective of compound structure. It draws on Adams (Reference Adams2001: 82), Bauer (Reference Bauer1983: 30–31, 212–213, 233), Marchand (Reference Marchand1969: 11–127, 380–389) and Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1570–1578).
Table 2.2 Compound types based on word formation
The status of phrase compounds is frequently disputed in the literature (e.g. by Meibauer Reference Meibauer2003: 185; Adams Reference Adams2001: 3; Plag Reference Plag2003: 136), because they are supposedly not formed according to the usual rules of word formation and rather similar to conversion (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1563, 1569). However, other accounts (e.g. Bauer Reference Bauer1983: 206–207) do recognise them as a type of compound. Among the small number of established phrase compounds, there are many family terms ending in -in-law, plant names (e.g. forget-me-not) and coordinated constructions (e.g. bread-and-butter; cf. Schmid Reference Schmid2011: 133–134). Since phrase compounds in general do not contradict the compound definition adopted here (cf. 2.6), and since some phrase compounds cannot be syntactic due to their ill-formedness (cf. the missing article in a pain-in-(the-)stomach gesture; Bauer Reference Bauer1983: 207), phrase compounds are also recognised as a subcategory of compounds in the present study.
The status of genitive compounds such as bull’s-eye (cf. also 5.1.3.1 and 5.5.1.2) is by no means undisputed, either (cf. Sauer Reference Sauer and Bammesberger1985: 309): they are frequently denied compound status (cf. Bauer Reference Bauer1983: 240–241) due to the fact that their constituents are linked by inflection, which makes them relatively phrase-like. However, interruption and reordering may result in semantic changes compared to phrases, e.g. if the compound bull’s-eye ‘target’ is contrasted with the modified phrases a bull’s blue eye and the eye is a bull’s. As a consequence, the approach adopted here follows Adams (Reference Adams2001: 80), who recognises a compound category “[n]oun-genitive s + noun”, whose members need not necessarily be as idiomatic as the example given earlier, e.g. potter’s wheel.
By contrast, a number of categories considered compounds by other accounts of English compounding (cf. the sources of Table 2.4) were not generally included, as they might contradict the preliminary definition:
Neoclassical compounds or combining form compounds such as biology consist of bound roots (bio-, -logy) by definition, which contradicts the requirement of freely occurring constituents for the whole category.
Rhyme-motivated compounds consist of two rhyming elements (e.g. hoity-toity). According to Bauer (Reference Bauer1983: 213), one of these may not exist independently, and judging from his examples, this may extend to both parts. While this contradicts the present approach’s requirement of free occurrence, rhyme-motivated compounds are accepted when their constituents occur on their own, e.g. in brain-drain.
Ablaut-motivated compounds consist of two elements differing only in their stressed vowel, e.g. shilly-shally. If at least one element does not occur freely (e.g. the first part of wishy-washy; cf. OED), they cannot be accepted as compounds here – but if both constituents do (e.g. riff-raff; cf. OED), such items are recognised as compounds.
Clipped compounds contain one or more clippings, e.g. optical art or situation comedy (Bauer Reference Bauer1983: 233–237). Since the clipping must have taken place after the compounding process in the forgoing examples (cf. 2.2.3), they are not compounds according to the present approach. However, if the constituents of such constructions occur on their own, usually as informal clippings (e.g. in language laboratory or parachute jump), they are considered compounds here.
Last but not least, the existing compound categories from the literature could be complemented and/or refined by the following potential structural compound types:
By analogy to Aronoff’s (Reference Aronoff1976: 15) cranberry morph, the term cranberry compound could be coined to designate constructions such as cranberry, which contain constituents that are unique in the language but at the same time contrast with compounds containing two (or more) free constituents, e.g. black+berry or blue+berry. Since cran does not occur on its own in the language (except in linguistics texts pointing out this fact, as in the present sentence, which would skew any corpus search), this category cannot be part of the compound definition used here but might complement other types of approach.
The categories of copulative compounds and dvandva compounds could be further refined by considering some recurring but descriptively neglected patterns:
– In dictionary titles, the order of the constituents expresses directionality: thus English–Irish is interpreted to refer to a dictionary providing translation equivalents of English words into Irish, whereas the opposite is the case of Irish–English. The relation expressed is thus ‘from… into… ’.
– In addition to the meaning component ‘against’, the order of the constituents in football matches expresses that the match takes place in the first of the two locations (e.g. Scotland in the Scotland–France match)
– In the term “yes-no interrogative clauses” (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 724), yes and no are connected by an ‘or’ relation instead of the usual ‘and’ relation found in copulative and dvandva compounds.
2.5.3 Part of Speech
The classification of compounds according to the parts of speech involved in their formation is of particular interest with regard to spelling conventions. The overview in Table 2.4, which draws heavily on Bauer (Reference Bauer1983: 201–216) and occasionally on Adams (Reference Adams1973, Reference Adams2001), Aarts (Reference Aarts2011: 34–35) and Tournier (Reference Tournier1985: 113–119), seeks to provide examples for all three spelling variants when these are attested as examples in the reference works on word formation or in the dictionaries used in the present study (particularly MED and LDOCE; cf. 4.1). Table 2.4 shows the large variety of compound types in English. It lists compounds with a maximum of three constituents and thus more types than most other accounts of English compound formation but is still far from being exhaustive. Usually, a single example is given, except in order to draw attention to large differences between the examples. Additional compound types were added whenever the compounds encountered in the dictionaries could not be classified into any of the existing categories. The names of some categories (e.g. gerund + n or n + deverbal noun) were modified (in this particular case into n-ing + n and n + n-ing), and numbers were classified as numerals rather than adjectives. Since the part of speech of compound constituents is often difficult to determine, the empirical study described here uses a very limited set of parts of speech, which subsumes all the grammatical parts of speech under a hyperonymous concept (cf. 5.6.1). In Table 2.4, however, an intermediate approach between the more traditional part-of-speech classification used for the whole compounds and that followed for the constituents in the empirical study is applied in order to do justice to the more detailed constituent-based compound type classifications found in the literature. Table 2.3 lists the parts of speech, phrases and clauses that are distinguished in Table 2.4. In addition, the genitive and the word and are used for recurring patterns.
Table 2.3 Labels used for compound classification in Table 2.4
| Label | Example | Explanation |
|---|---|---|
| active declarative clause | love lies bleeding | |
| adj | new | |
| adj-ed | masked (ball) | adjectives which are formally identical with a past participle – in compounds whose paraphrase tends away from a verbal meaning (not ‘the ball is masked’) |
| adv | long(-playing) | lexical adverbs (which may e.g. be formally identical with an adjective or are derived from adjectives by means of the suffix -ly) |
| det | an | correspond to the category determiner in Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985) |
| imperative clause | forget me not | |
| interjection | ha | |
| n | cable | |
| n-ing | (bicycle) repairing | nouns which are formally identical with a present participle |
| noun phrase | the cuff | |
| n-proper | Oxford | |
| num | two | |
| particle | out | grammatical words in several parts of speech: |
| so | prepositions, conjunctions and grammatical adverbs | |
| prepositional phrase | about town | technically the same as particle + n, but the label represents compound structure more accurately by indicating closer phrase-internal links between constituents |
| pron | something | |
| to-inf. clause | to be | |
| v | meet | |
| v-ed | (drug-)related | past participles of verbs – e.g. in compounds whose paraphrase has verbal meaning |
| v-ing | (not-)withstanding | present participles of verbs – e.g. in compounds whose paraphrase has verbal meaning |
| wh-clause | what’s it |
Note that in Table 2.4, the simultaneous presence of open and solid spelling (e.g. in the triconstituent compound Wellington airport) is placed in the middle of the scale for want of a better location, whereas combinations of hyphenation with one of the other two types are situated in more iconic positions.
Table 2.4 Compound types based on part of speech
| PoS of compound | PoS of constituents | Examples | ||
|---|---|---|---|---|
| Open | Hyphenated | Solid | ||
| n | n + n | cable television | meter-maid | manservant |
| killer app | hunter-gatherer | spoonbill | ||
| MP3 player | ||||
| n | n + n + n | law enforcement agent | ||
| n | n + ’s + n | mama’s boy | bull’s-eye | |
| n | n + n-ing | night flying | bicycle-repairing | |
| n | n-ing + n | fishing rod | ||
| n | n + n-proper | man Friday | ||
| n | n-proper + n | Oxford accent | ||
| n | n-proper + n + n | Wellington airport | ||
| n | n-proper + n-proper | Mary Jane | Cadbury-Schweppes | Marylou |
| n | n-proper + ’s + n | Adam’s apple | ||
| n | n + and + n | kith and kin | whisky-and-soda | |
| n | n + adj | heir apparent | knight-errant | |
| machine washable | ||||
| n | n + adv | centre forward | ||
| n | n + particle + n | morning-after pill | ||
| n | n + particle | looker-on | checkout | |
| passer-by | ||||
| n | n + prepositional phrase | man of God | mother-of-pearl | |
| man about town | ||||
| n | n + to-inf. clause | mother-to-be | ||
| n | n + num | number one | ||
| n | v + n | install program | goggle-box | pickpocket |
| n | v + v | make-believe | hearsay | |
| look-see | ||||
| has-been | ||||
| n | v + v + v | might-have-been | ||
| n | v + and + v | meet and greet | ||
| n | v + adv | get-together | lookalike | |
| n | v + particle | sod all | drop-out | cookout |
| n | v + particle + n | lighting up time | ||
| jumping-off point | ||||
| n | v + interjection | heave-ho | ||
| n | adj + n | new town | fast-food | software |
| outerwear | ||||
| n | adj-ed + n | masked ball | ||
| n | adj + adj | creepy-crawly | ||
| n | adj-ed + pron | loved one | ||
| n | adj + n + n | hot cross bun | ||
| hot-water bottle | ||||
| n | adj + adj + n | obsessive compulsive disorder | ||
| n | adj-ed + particle | grown-up | ||
| n | adv + v | also-ran | ||
| n | adv + v-ing + n | long-playing record | ||
| n | adv + v-ed + n | ill-gotten gains | ||
| n | adv + adj-ed + n | less developed country | ||
| n | adv + particle | close-up | ||
| n | pron + n | It girl | she-goat | |
| n | particle + n | in box | in-crowd | |
| twice-winner | ||||
| n | particle + v | to-do | ||
| n | particle + particle | once-over | ||
| n | prepositional phrase + n-ing | in-line skating | ||
| n | num + n | hundredweight | ||
| n | num + pron | twentysomething | ||
| n | num + particle | eleven-plus | ||
| n | num + num | one-two | ||
| n | num + particle + num | four-by-four | ||
| n | det + particle + n | no through road | ||
| n | interjection + interjection | hoo-ha | ||
| n | active declarative clause | keep fit | I-spy | |
| love-lies-bleeding | ||||
| n | imperative clause | forget-me-not | ||
| n | wh-clause | whatsit | ||
| v | n + n + v | crystal ball-gaze | ||
| v | n + v | hand wash | lip-read | brainwash |
| v | v + n | leapfrog | ||
| v | v + v | dare say | trickle-irrigate | typewrite |
| make do | stir-fry | |||
| v | adj + n | bad-mouth | deadhead | |
| v | adj + v | warm iron | free-associate | whitewash |
| v | adv + v | left-click | ||
| v | num + v | second-guess | ||
| adj | n + n | king-size | borderline | |
| adj | n + and + n | meat-and-potatoes | ||
| adj | n + v-ing | ocean-going | ||
| adj | n + v-ed | drug-related | ||
| adj | n + adj | medium dry HIV positive | lime-green | childproof |
| adj | n + particle | bottom-up | ||
| adj | n + prepositional phrase | matter-of-fact | ||
| adj | v + n | roll-neck (sweater) | breakneck | |
| adj | v + v | stop-go (economics) | ||
| adj | v + and + v | nip and tuck | kiss-and-tell | |
| adj | v + adj | feel-good | ||
| adj | v + adv | go-ahead | ||
| adj | v + particle | see-through (blouse) | ||
| adj | adj + n | red-brick (university) | wholemeal | |
| adj | adj + adj | deaf-mute | ||
| adj | adj-ed + particle | messed up | hoped-for | |
| adj | adj + particle | high up | ||
| adj | adj + prepositional phrase | hard of hearing | honest-to-goodness | |
| adj | adv+ adj-ed | newborn | ||
| adj | adv + v | long-stay | ||
| adj | adv + v-ed + particle | long-drawn-out | ||
| adj | adv + adv | faraway | ||
| adj | adv + particle | far-off | nearby | |
| adj | adv + prepositional phrase | just-in-time | ||
| adj | pron + adv | me-too | ||
| adj | prepositional phrase | in depth (study) | before-tax (profits) | indoor |
| off-the-cuff | ||||
| adj | particle + n | off-centre | ||
| adj | particle + v-ing | ongoing | ||
| adj | particle + adj | all-important | ||
| adj | particle + particle | on-off | ||
| adj | particle + and + particle | out-and-out | ||
| adj | num + n | five-star | ||
| 16th-century | ||||
| adj | num + adj-ed | one-sided | ||
| adj | num + adv | first-ever | ||
| adj | num + particle + num | nine-to-five | ||
| one-on-one | ||||
| adj | num + particle | one-off | ||
| adj | num + num | fifty-fifty | ||
| adj | det + n | no-nonsense | ||
| adj | det + v | no-go | ||
| adj | active declarative clause | one-size-fits-all | ||
| adv | n + particle | inside out | ||
| head first | ||||
| adv | v + v | maybe | ||
| adv | adv + adv | double-quick | ||
| adv | adv + particle | high up | nearby | |
| adv | adv + det+ adj | nevertheless | ||
| nonetheless | ||||
| adv | particle + n | indeed | ||
| adv | particle + prepositional phrase | up-to-the-minute | ||
| adv | prepositional phrase | of course | off-the-record | indoors |
| adv | num + num | fifty-fifty | ||
| adv | det + n | no place | meanwhile | |
| adv | det + particle | nowhere | ||
| prep | adv + particle | nearby | ||
| prep | particle + v-ing | notwithstanding | ||
| prep | particle + particle | because of | onto | |
| pron | pron + n | somebody | ||
| pron | pron + pron | one another | anyone | |
| pron | det + n | nobody | ||
| pron | det + pron | no one | no-one | |
| conj | particle + adv | whenever | ||
| conj | particle + particle | so that | ||
| num | num + adj | 80-odd | ||
| num | num + pron | twentysomething | ||
| num | num + num | two hundred | twenty-two | |
| det | det + det | another | ||
| interjection | n + n | fiddlesticks | ||
| interjection | interjection + interjection | tsk tsk | uh-oh | |
| interjection | declarative clause | thank you | ||
| interjection | imperative clause | farewell | ||
The listing of part-of-speech-based compound types in Table 2.4 follows Marchand’s (Reference Marchand1960a: 20) view that “[c]ompounding occurs in all word classes” by including compound verbs (which are e.g. not considered by Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985), compounds formed from grammatical morphemes (which are denied compound status e.g. by Schmid Reference Schmid2011: 128) as well as compound adverbs, determiners, numerals and interjections. However, the comparison between the compounds found in the literature with the parts of speech in Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 67–77) suggests that some parts of speech are never compounded, namely modal verbs, primary verbs and the members of the category ‘unclassified’ (i.e. the negative particle not and the infinitive marker to). Nonetheless, patterns which are missing from Table 2.4 are not automatically impossible (for a discussion of potential word formations, cf. e.g. Burgschmidt Reference Burgschmidt, Brekle and Kastovsky1977; Bauer Reference Bauer2001): as a considerable number of patterns in Table 2.4 are based on the empirical study’s compounds with identical spelling in five to six dictionaries, future research is likely to yield even more compound types to add to the list.
2.5.4 Spelling
When separate lexemes or grammatical words are combined into compounds, a distinction has to be made in writing which is not required in oral speech (cf. Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1614): while varying vowel and consonant length, different degrees of loudness or pauses of different length may blur word or constituent boundaries in spoken language (e.g. between a nice drink and an ice(d) drink; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1614), the “visual indicators of word limits” permit no gradience, and writers of English are usually forced to make an absolute decision between “total separation, hyphenation, and total juxtaposition” (Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1614), the three main types of English compound spelling.
The constituents of so-called open compounds (Merriam-Webster 2001: 99) are separated by one or more spaces, depending on the number of constituents (e.g. cable television and central nervous system). Alternative terms are spaced compound (Juhasz, Inhoff and Rayner Reference Juhasz, Inhoff and Rayner2005) or separated compound (Huddleston and Pullum Reference Huddleston and Pullum2002: 1759–1760), the first of which seems to be more common. Since – as we have seen – the “status of spaced two-word segments is uncertain” (Sundby Reference Sundby, Ramisch and Wynne1997: 225), some accounts of English compound spelling (e.g. Morton Ball Reference Morton Ball1951: 3) and some dictionaries from the nineteenth and early twentieth centuries denied compound status to open sequences (cf. Morton Ball Reference Morton Ball1939), but most present-day treatments of compounding accept and include open compounds.
In hyphenated compounds (Merriam-Webster 2001: 99), the constituents are separated by one or more hyphens, depending on the number of constituents (e.g. hunter-gatherer and forget-me-not). The alternative terms hyphened compound, hypheme (Morton Ball Reference Morton Ball1951: 3), half-compound and occasional compound (Sundby Reference Sundby, Ramisch and Wynne1997: 225) seem to be uncommon. While some psycholinguistcs studies (e.g. Juhasz et al. Reference Juhasz, Inhoff and Rayner2005) restrict compound spelling to the distinction between open and solid spelling and do not consider hyphenated compounds, this is relatively unusual.
Solid compounds (Merriam-Webster 2001: 99), such as software and twentysomething, constitute orthographic words and thus uninterrupted sequences of letters (cf. Plag Reference Plag2003: 4). They are the only type of compound that is not excluded by any compound definition. Strumpf and Douglas (Reference Biber1988: 52) use the alternative term closed compound, which is common compared to less established alternatives such as juxtaposed compound (Huddleston and Pullum Reference Huddleston and Pullum2002: 1759–1760), solideme (Morton Ball Reference Morton Ball1951: 3) or absolute compound (Sundby Reference Sundby, Ramisch and Wynne1997: 225).
Of the three spelling variants, solid spelling is the only one that can be considered immaterial. The hyphen obviously has a form, and although the gap in open compounds is not filled, it occupies a space which could have been used otherwise. The three types of compound spelling in English make use of the application or non-application of two principles:
a) concatenation vs. non-catenation
b) use of a hyphen vs. non-use of a hyphen.
The combination of these two options with their two possible values (i.e. the presence or absence of each feature) yields the three most common types of compound spelling in English, namely
| 1. | Open | (= non-concatenation + non-use of a hyphen), | e.g. boy friend |
| 2. | Hyphenated | (= concatenation + use of a hyphen), | e.g. boy-friend |
| 3. | Solid | (= concatenation + non-use of a hyphen), | e.g. boyfriend. |
A fourth variant can be derived from the logical combination of these parameters, namely non-concatenation + use of a hyphen, e.g. boy-friend. While this variant does not seem to occur on its own in the English language, there are contexts in which it may be used, namely in combinations of identically structured hyphenated compounds, the first of which is incompletely realised on the formal level because of the ellipsis of the shared second constituent, e.g. in car-owners and ship-owners (Morton Ball Reference Morton Ball1939: 96) or in boy- and girl-friends. Longer sequences are conceivable, with the proviso that the shared second constituent is always retained in the last of the compounds (cf. Morton Ball Reference Morton Ball1939: 96), e.g. in These are paragraph-, sentence-, or clause-boundaries (Nunberg Reference Nunberg1990: 69). However, this phenomenon, which is termed “floating hyphens” by Butcher (Reference Butcher1992: 154) and “elliptical compounds” by Morton Ball (Reference Morton Ball1939: 96), is frequently regarded as undesirable (cf. Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1347), and Butcher (Reference Butcher1992: 154) suggests its avoidance by rewording.Footnote 3
While the present study focuses on the three most common ways of spelling English compounds, alternatives using punctuation marks other than the hyphen exist or existed in the past:
Between the fourteenth and eighteenth centuries, the equals sign < = > was sometimes used instead of a hyphen, at least in end-of-line hyphenation (McDermott Reference McDermott1990: 13). Especially in the United States, it is still used by proofreaders to “signify the instruction to insert a hyphen” and therefore now corresponds to an alternative variant with a different pragmatic focus (Clark Reference Clark1990: 196).
A conventional (but not extremely frequent) alternative to the hyphen is the oblique stroke or slash < / >. Since slashes are not normally flanked by spaces (although usage may vary), Huddleston and Pullum (Reference Huddleston and Pullum2002: 1731) define them as part of word-level punctuation, i.e. “the marking of word boundaries and the use of punctuation marks … within a word”. Slashes are used in very specific types of compound, e.g. some noun+noun compounds describing a double title or function, such as bar/restaurant (Merriam-Webster 2001: 100–101). Quirk et al. (Reference Quirk, Greenbaum, Leech and Svartvik1985: 1570) extend their use to coordinate compounds without restricting part of speech and give adjectival examples, e.g. aural/oral (approach). Interestingly, the slash has an additive function rather than its usual alternative function in these compounds. In addition, the oblique stroke may be preferred over the hyphen “where one or more elements consist of more than one word, e.g. Bedford/Milton Keynes boundary” (Butcher Reference Butcher1992: 152).
Single or double quotation marks may be used in the spelling of long English compounds (Morton Ball Reference Morton Ball1951: 6), e.g. those including a film title such as “Gone with the Wind” remake (Google search, June 2017). The pairwise occurrence of quotation marks (Meyer Reference Meyer1987: 4–6) opens a unified slot, whose closing is indicated by the second part of the pair. While quotation marks are clear delimitators of long components, they combine with spaces or hyphens and thus cannot be considered a completely new spelling variant.
In programming languages, underscores are often used in order to connect the parts of a lexical entity, e.g. window_id_format (cf. Venezky Reference Venezky1999: 41).
From a systemic and logical point of view, any punctuation mark could be used to link the constituents of compounds – except those which usually indicate syntactic boundaries and need to be followed by a space. However, even though commas, full stops, colons, semi-colons, question marks and exclamation marks break up the unity of the compound, which would seem to prevent their use, there are some exceptions even here: thus the original compound list from the Longman Dictionary of Contemporary English (cf. 4.1) contained the two comma-separated items all-singing, all-dancing and two up, two down, and in the American spelling of Mr. Right, a compound-internal full stop occurs at the end of the abbreviated first constituent. Furthermore, in metalinguistic language use, compounds with undetermined spelling may be concatenated with a question mark between the constituents (Rot?wein ‘red?wine’; Jacobs Reference Jacobs2007: 54).
Yet another possible means of marking constituent boundaries is the modification of the standard font: thus it is customary to italicise foreign phrases within compounds (cf. Fowler’s Reference Fowler1921: 10 example “an ex officio member”), and superscript is occasionally combined with solid spelling when chemical elements and figures are conjoined (e.g. Sr90; GPO Style Manual 2008: 82).
2.6 Summary
The discussion and analysis of the spelling of English compounds are complicated by the fact that compounds represent a very heterogeneous category which seems to defy a general and generally accepted definition. As a consequence, research on English compounds is typically based on the respective scholar’s own definition. Based on the previous sections’ discussion of how compounds can be distinguished from syntactic constructions, other lexemes, multi-word items and names, the preliminary definition can now be refined: the present study defines English compounds as complex lexemes which
refer to a unified semantic concept
consist of at least two constituents that occur as free, synchronically recognisable and semantically relevant lexemes each
contain no affixation on the highest structural level
can be assigned a joint part of speech
cannot be interrupted by the insertion of lexical material
only once permit the application of each type of inflection to their base form
and (which is only important for specific subsets)
do not follow the pattern ‘name + and/or + name’ or its variants with more constituents
are not verbs consisting of a verb followed by an adverb and/or a preposition
do not combine a personal name and another lexical entity with reference identity.
It is interesting to observe the large proportion of syntactic criteria used in the delimitation of compounds, which means that syntactic criteria are applied to delimit the boundaries of a particular type of lexeme in contrast to syntactic constructions – i.e. to distinguish compounds from phrases. Conversely, and as we have seen, lexical criteria (such as orthography and stress) may be used to distinguish phrases from compounds.
The detailed definition given earlier is unusual in its combination of very general principles to follow and very specific patterns to avoid. In spite of its relative precision, the definition cannot prevent a certain degree of overlap with other categories – as in other accounts of compounding, which usually conclude that there is a continuum from obvious compounds to obvious syntactic groups (cf. e.g. Schmid Reference Schmid2011: 133; Bauer Reference Bauer1998: 83; Mondorf Reference Mondorf, Dufter, Fleischer and Seiler2009: 381; Quirk et al. Reference Quirk, Greenbaum, Leech and Svartvik1985: 1570): while a small number of the adjacent categories discussed earlier (e.g. acronyms and proverbs) can be clearly set off from compounds, compounds cannot be distinguished from collocations, since the category of compounds is hyponymous to that of collocation and thus completely integrated into it. More commonly, however, there is partial overlap, e.g. with conversion: while most compounds are not conversions (e.g. the noun bed+room), and while many conversions are not compounds (e.g. the verb to bottle), the part-of-speech assignment in phrase compounds (e.g. the adjective do-it-yourself) can be likened to conversion. Furthermore, the presence of complex bases in shortenings (such as clippings or back-formations) may result in the emergence of borderline cases with simultaneous category membership (e.g. language+lab and to baby+sit), so that only simple shortenings can be set off from compounds. While the present study’s wide compound definition results in borderline cases with gradience towards collocation or valency construction (e.g. the interjection thank you or the preposition because of), this does not affect the central empirical results, which are based on a sample of nouns, verbs, adjectives and adverbs and should therefore also be compatible with narrower compound definitions that e.g. do not accept grammatical compounds. Table 2.5 summarises the defining criteria for compounds. Those introduced by because apply to all compounds, whereas criteria introduced by if only apply to some compounds (e.g. those with fore-stress) when contrasted with a particular adjacent category.
Table 2.5 The delimitation of compounds
A fundamental question with regard to the definition of the compound concept is whether there is an essence of compoundhood, from which all other criteria can be derived. A criterion which is so basic that it cannot be avoided is the exclusive composition from at least two free and recurringFootnote 4 constituents on the highest level of analysis. Another central starting point is the presence of a unified semantic idea (cf. also Lipka’s Reference Lipka, Brekle and Kastovsky1977: 155, 161 and Schmid’s Reference Schmid2008 discussion of hypostatisation), since the consideration of a construction’s status as a compound would be futile if it lacked a unified semantic concept. Furthermore, compounds are inherently characterised by their function as single minimal syntactic units. While these basic requirements permit the direct derivation of some compound criteria (e.g. the assignment of a joint part of speech and the single addition of each type of inflection based on the unified syntactic function, or the interpretation of orthographic unity as a formal reflection of semantic unity), other compound criteria cannot be derived from the more basic criteria (e.g. fore-stress or syntactic ill-formedness). We can therefore distinguish between defining criteria, i.e. “conditions that have to be fulfilled” in order to achieve category membership, and classifying criteria, which are “applicable to varying degrees to different kinds of” category members (Handl Reference Handl, Meunier and Granger2008: 51). While all of the present account’s defining criteria are listed in the compound definition given earlier, the other criteria discussed in this chapter (e.g. fore-stress, right-headedness or listedness) are optional and therefore only mentioned in the complementary category delimitation in Table 2.5 (if at all).
The difficulties in distinguishing compounds from other constructions can be attributed to the fact that there is often no reversible relation: thus criteria such as stress or spelling can frequently be used to delimit the contrasting category, e.g. that of phrases (with no fore-stress, solid spelling or hyphenation in syntactic constructions), but the criteria in question do not apply to all compounds, as some items considered compounds by the present approach and various other researchers (e.g. apple pie) have back stress and open spelling, like phrases.Footnote 5 Nonetheless, many of these less distinctive criteria (including the unified semantic concept) can still contribute to the categorisation of individual items by means of clustering: the more compound criteria beyond the defining ones apply to a construction, the more indisputable the construction’s status as a compound.