Plain language summary
Metrical poetry (poetry with a designated pattern of stresses, lengths, or some other arranging principle: as opposed to “free verse”) is connected to both culture and language. Different kinds of meters carry individual meanings and associations, layered on top of the words – certain meters might suggest “heroic” or “funny” or “impressive.” Poems, and so meters, spread among people (or, more broadly, between cultures), and meters change over time; we wanted to study this. However, this is difficult because of the huge differences in how meters work across history and between languages. Since these ideas of change and spread suggest an evolutionary metaphor, we tried using techniques from biology and represented meters like DNA sequences. This turned out to work well. In this article, we explain how and why, as well as presenting some interesting findings drawn from very different kinds of metrical poetry throughout history. We show that our method is sensitive to minute metrical differences by tracing an odd poem by the ancient poet Catullus. It also proves useful in detecting broad structural changes in a meter that spread across a dozen European languages during the Renaissance. Finally, we show it can simultaneously handle different modern meters in different languages, paving the way to a global history of poetic forms.
Introduction
All poetic forms come from somewhere, even if, perhaps because of their familiarity, it is natural to think they have been with us forever. Indeed, a prosodic template can be copied in a culture for centuries or millennia; but it can also be altered by individuals, imported from foreign traditions, adapted from vernacular folk songs, or fundamentally changed by the pressures of language evolution. Yet these relationships are notoriously difficult to trace across languages and times: a lot of evidence is too fragmentary; it depends too much on written heritage, or modern ethnographic observations. National schools of metrical studies are also disparate, and rarely agree with each other, while literary scholarship over the course of the 20th century distanced itself from metrics, which are now more closely associated with linguistics and phonology. Research on the history of versification that was built on top of historical linguistics in the works of Antoine Meillet (Reference Meillet1923), Roman Jakobson (Reference Jakobson1929) and, later, Mikhail Gasparov (Reference Gasparov1996), was in decline, but interest is presently resurgent, following the expansion of computational and quantitative methods that allow new angles of inquiry (De Sisto Reference De Sisto2023; Polilova Reference Polilova2018; Šeļa, Plecháč, and Lassche Reference Šeļa, Plecháč and Lassche2022).
This work is a continuation of the historical line of thinking about meters and metrical variation, and aims primarily to describe relationships between the forms of poetic texts. We use local sequence alignment to identify regions of structural similarities in poems; the method relies on encoding poetic texts as strings of prosodic features using a simple four-letter alphabet that can be used with any language and is independent of the organizing verse principle (quantitative, syllabic, accentual). These sequences are then aligned to derive a distance measure based on weighted symbol (mis)matches. The resulting relationships do not inherently imply any evolutionary connection between texts, but the structural similarity is illuminating: it might identify and localize emerging metrical derivatives, show continuity in forms across times and traditions, or signal formal descent, which can be useful in historical studies. While the prosodic and structural features of verse today are increasingly used for authorship attribution (Nagy Reference Nagy2021; Plecháč Reference Plecháč2021), this work focuses on detecting patterns that transcend individual variation and shape cultural history on a large scale: the often unseen, but powerful force of poetic meters.
Poetic meters organize speech into periodic units, causing utterances to be perceived in systemic relation to each other (Frog Reference Frog2021). Meters can be defined based on various prosodic systems, the most familiar being patterns of stressed and unstressed syllables (accentual-syllabic), but they can also be defined based on syllable length (quantitative meters, as in classical Latin), or simple syllable count (isosyllabic meters in Romance languages). In addition, the prosody is only one component of what some might call a “meter.” Old English verse specifies both a prosodic pattern and also a system of alliteration, or (better known) many rhyming forms are defined by both prosody and sonic correspondences.
Meters act as rules, with varying levels of fuzziness and variation, undoubtedly constrained by cognition and languages’ natural prosody (deCastro-Arrazola Reference deCastro-Arrazola2018a; Reference deCastro-Arrazola2018b; Rubin Reference Rubin1995). These rules, even when scholars and poets formulate them prescriptively, depend on shifting social conventions and are always tested against the affordances of a language (Shapir Reference Shapir2000). This entails variation in the poetic realization of abstract patterns: extra-metrical stresses, additional syllables and other irregularities test the boundaries of metrical inertia and audience expectation. Metrical patterns are reaffirmed by individual usages that copy forms from the past: taken together, poems reveal governing tendencies that are not easily overwritten by individual flair. This important cultural aspect of metrical forms has prompted research into bottom-up approaches to meter recognition and verse regularity (Plecháč and Birnbaum Reference Plecháč and Birnbaum2023; Porter Reference Porter2018; Šeļa and Gronas Reference Šeļa and Gronas2022) that supports clustering according to the shared structural principles that emerge from data. This descriptive approach might be considered a counterpoint to generative frameworks that abstract metrical variation under the premise of “deep structures” and language-specific “correspondence rules” (Fabb and Halle Reference Fabb and Halle2008), drawing away from the historical and cultural component of poetic meters.
We evaluate our method on the non-trivial task of meter recognition, using a labelled cross-linguistic corpus, showing the method’s ability to recognize similarity in metrical organization simultaneously across different languages. Classification, however, is not our primary goal: we highlight the method’s potential for historical research in three case studies that span multiple languages, versification systems and time periods (from classical Latin quantitative verse to accentual-syllabic Romantic poetry). In the hope that the methodology will find further use in the community, we release an accompanying metronome library for Python.Footnote 1
Method
The guiding metaphor for the development of the Metronome tools was the idea of the genome, and the four DNA molecules that are the basis for all cellular life. While consisting of only four symbols, genomes vary hugely in length and are capable of expressing minor variation, such as among human siblings, or vast differences – the same symbols define a cactus, a dolphin or a hummingbird. In the same way, the alphabet for the metronome was chosen to be as simple as possible, allowing the same kinds of analyses that are performed with DNA; comparison between very disparate samples, grouping related families into clades or clusters, and expressing or locating changes over time. We use the following symbols: for strong and weak syllables, “S” and “w,” for word breaks, “.” and for end of line “∣” (the specific glyphs are only mnemonics, any would work). For example, the metronome of a line “Once upon a midnight dreary, while I pondered, weak and weary” might be expressed as S.wS.w.Sw.Sw.S.w.Sw.S.w.Sw.|. Defining prosody in this way (strong/weak vs stressed/unstressed) allows the same alphabet to encode quantitative traditions, like classical Latin, as well as modern accentual traditions. Note that, for simplicity, we mostly rely on orthographic boundaries between words, not on phonological units. However, this is not a requirement for the method to work; it depends on the encoding choices: for traditions where elision is important, such as classical Latin or some Romance forms, a word break can be omitted in the metronome representation – see, e.g., Note 3 for how this was handled for Renaissance meters.
In bioinformatics, there are several strategies for sequence alignment, depending on the task. For genomes (or metronomes) that are very different in length and possibly very dissimilar, the best way to compare sequences is with a “local sequence alignment.” This approach finds the shared region between two sequences that is the most similar. In contrast, a “global sequence alignment” is a better choice where the strings are closely related. We apply the well-known Smith–Waterman algorithm (Smith and Waterman Reference Smith and Waterman1981) for local sequence alignment, which uses a flexible alphabet and allows different bonuses and penalties for symbol (mis)matches. Smith–Waterman is agnostic regarding the sequence alphabet – it is commonly applied both to nucleic acid sequences (four symbols) as well as proteins (20 symbols), and produces a match score that depends on a bonus/penalty matrix. In our work, the best local match score is normalized using the self-score of the shorter sequence, producing a comparable distance (or similarity) in [0,1].
 The general intuition behind the bonuses and penalties (which determined the initial values) was based on empirical understanding of poetic meter – for example, matching line lengths are rewarded more than matching syllable strengths, and mismatches between word breaks and syllables are not penalized at all (although matching word breaks are rewarded, which allows the algorithm to group schemes that have customary caesura positions, discussed more in Appendix A). To optimize the bonus/penalty matrix, we then performed a series of iterative classification tasks on the cross-language dataset, and visually inspected the clustering results of several corpus subsets with known metrical schemes. The metronome package leverages methods from the BioPython package (Cock et al. Reference Cock, Antao, Chang, Chapman, Cox, Dalke, Friedberg, Hamelryck, Kauff, Wilczynski and de Hoon2009) which provides highly optimized routines for the sequence alignments. Our code adds an assortment of sampling strategies, with the core of the package being a parallel method that produces an 
 $n \times n$
 distance matrix from n input samples (which have been encoded as metronomes). These distance matrices can be used directly as inputs to create hierarchical clustering / dendrogram methods (such as hclust in R) or to spatial clustering methods like UMAP (using a “pre-calculated” metric).
$n \times n$
 distance matrix from n input samples (which have been encoded as metronomes). These distance matrices can be used directly as inputs to create hierarchical clustering / dendrogram methods (such as hclust in R) or to spatial clustering methods like UMAP (using a “pre-calculated” metric).
Evaluation
We evaluate the performance of the algorithm in a non-trivial cross-linguistic task by looking at clustering performance on samples of individual poems in twelve meters, and coming from four languages (Czech, German, Russian and Classical Latin). We compare this performance to three baselines: simple edit distance, local alignment with an unweighted substitution matrix, and a machine-learning approach based on the frequencies of prosodic n-grams.
Data and sampling
Our modern language subset (67,940 works) covers the six most common meters: iambic and trochaic pentameter, tetrameter, and tetrameter/trimeter (alternating lines of tetrameter and trimeter, sometimes called “common meter” in English). The Czech subset (44,416 works), German subset (15,172 works) and Russian subset (8,352 works) come from the PoeTree dataset (Plecháč et al. Reference Plecháč, Kolár, Cinková, Šeļa, De Sisto, Nugues, Haider and Kocnik2023). Metrical annotation in Czech was performed using machine learning methods, and the output was manually verified (Plecháč Reference Plecháč2016); in German it was produced using a rule based algorithm (Bobenhausen and Hammerich Reference Bobenhausen and Hammerich2015), and a similar approach was implemented in Russian, and used the accentual classes of words to infer rhythm. The Classical Latin subset (2,588 works) covers the six most common meters from that corpus: elegiac couplets, hendecasyllables and stichic hexameter (making up the bulk of samples and authors), as well as a few senarii, scazons and alcaics. The largest language / meter set is Czech iambic pentameter (over 17,000 works) and the smallest is Latin alcaics (38 works, all from Horace).
Since the samples are of such disparate size, we downsampled to just 10 examples of each language / meter combination, yielding 240 random samples per evaluation run. We then classified samples by meter (twelve meter labels) which implies a random baseline accuracy of about 8 percent.
Classification methods
 For the alignment-based methods, which operate in a similar way to Metronome, we created an 
 $n \times n$
 distance matrix (alignment distance), to which we applied a k-Nearest Neighbors (kNN) classifier (k=7). These alignment-based methods are as follows. For the Naïve sequence aligner, we take the best local alignment, but without permitting indels (insertions or deletions inside the aligned areas). This is similar to a local Hamming distance, and is a very unforgiving alignment (since word breaks are included), used only as a baseline. Next we added indels to the Naïve aligner, making it more like a local Levenshtein distance but with uniform match/mismatch bonuses and penalties – this is similar to the way local sequence aligners would work on DNA sequences. Finally, we used the full Metronome configuration with a variable penalty matrix. The fourth method, and the only one that was not alignment-based, was a standard machine-learning approach, in which we applied an SVM classifier to the frequencies of the 500 most-frequent symbol 9-grams, using an 80/20 train/test split. Nine-grams were chosen because the symbol alphabet is very small, and so it is advantageous to the classifier to be able to match almost an entire line. Based on testing, the best results were with 500 most-frequent nine-grams, reduced by SVD to 50 dimensions and then z-scaled. It must be noted here that the SVM classifier is hampered greatly by the presence of word-breaks. For most meters, word breaks are not part of the metrical constraints, they are only relevant to some questions of style and in a few traditions that have mandatory or customary caesurae in certain places (it is very important in classical Latin and also in modern alexandrines, for example). In other words, in terms of predicting the name of the metrical pattern, the word breaks are noise. If the aim were simply to create an optimal classifier, machine-learning approaches perform at least as well as Metronome (and are much faster), at the expense of nuance and interpretability, by simply discarding the word breaks.
$n \times n$
 distance matrix (alignment distance), to which we applied a k-Nearest Neighbors (kNN) classifier (k=7). These alignment-based methods are as follows. For the Naïve sequence aligner, we take the best local alignment, but without permitting indels (insertions or deletions inside the aligned areas). This is similar to a local Hamming distance, and is a very unforgiving alignment (since word breaks are included), used only as a baseline. Next we added indels to the Naïve aligner, making it more like a local Levenshtein distance but with uniform match/mismatch bonuses and penalties – this is similar to the way local sequence aligners would work on DNA sequences. Finally, we used the full Metronome configuration with a variable penalty matrix. The fourth method, and the only one that was not alignment-based, was a standard machine-learning approach, in which we applied an SVM classifier to the frequencies of the 500 most-frequent symbol 9-grams, using an 80/20 train/test split. Nine-grams were chosen because the symbol alphabet is very small, and so it is advantageous to the classifier to be able to match almost an entire line. Based on testing, the best results were with 500 most-frequent nine-grams, reduced by SVD to 50 dimensions and then z-scaled. It must be noted here that the SVM classifier is hampered greatly by the presence of word-breaks. For most meters, word breaks are not part of the metrical constraints, they are only relevant to some questions of style and in a few traditions that have mandatory or customary caesurae in certain places (it is very important in classical Latin and also in modern alexandrines, for example). In other words, in terms of predicting the name of the metrical pattern, the word breaks are noise. If the aim were simply to create an optimal classifier, machine-learning approaches perform at least as well as Metronome (and are much faster), at the expense of nuance and interpretability, by simply discarding the word breaks.
Results
The classification tests were run 50 times each, and the full distributions and medians are reported in Figure 1. As discussed above, the unexpectedly poor performance of the SVM classifier (usually a solid benchmark) is mainly due to the presence of word breaks, but it is also clear that the custom penalty matrix used in Metronome provides a solid increase in performance across languages and traditions, compared to the standard string aligner. Classification accuracy, however, is merely a proxy for the real utility of Metronome as a tool to understand metrical relationships between poems. In the following section we provide a few small examples of this potential.

Figure 1. Classification performance of Metronome vs SVM and two baseline alignment algorithms.
Note: Smoothed distribution of accuracy results over 50 random subsamples, with median scores.
Showcases
Showcase A: Mutations in classical Latin meter
 The first showcase simply highlights the sensitivity of the metronome to minor metrical variations. During a manual inspection of the dendrogram of the poems of Catullus, it was observed that one hendecasyllable poem, Catullus Carmen 55, was quite different to all of the others of that meter. This can be seen in Figure 2 – notice how poem 55 branches off from the rest of the hendecasyllable group much “earlier” (indicating more difference), and yet is still clearly part of the overall clade. Thinking that this might be an algorithmic error, we inspected the text, and discovered that 55 is a metrical oddity. In 55, Catullus often collapses the central double breve of the choriamb ( ) to form a molossus (
) to form a molossus ( ), yielding a line of ten syllables instead of the standard 11 (hendecasyllable is, literally, “11 syllables”). This variation can be seen in Figure 3, which shows the start of 55, alongside its metronome transformation, contrasted with 41, a normal hendecasyllable with a standard choriamb in every line. This license occurs only twice more in all of Catullus (in 58b), making 55 a very odd animal indeed.Footnote 
2
 It is precisely this kind of “loose” family grouping, based on the meters as they are written, not as they are defined by textbooks, that highlights the exciting potential of the metronome as a tool to provide insights from a “distant reading” of poetic meter.
), yielding a line of ten syllables instead of the standard 11 (hendecasyllable is, literally, “11 syllables”). This variation can be seen in Figure 3, which shows the start of 55, alongside its metronome transformation, contrasted with 41, a normal hendecasyllable with a standard choriamb in every line. This license occurs only twice more in all of Catullus (in 58b), making 55 a very odd animal indeed.Footnote 
2
 It is precisely this kind of “loose” family grouping, based on the meters as they are written, not as they are defined by textbooks, that highlights the exciting potential of the metronome as a tool to provide insights from a “distant reading” of poetic meter.

Figure 2. Cladogram of a selection of poems by Catullus. Carmen 55, while still composed in hendecasyllables, is visibly different to the rest of that clade.

Figure 3. A visual comparison of the metronome strings (formatted to add line breaks) for the beginning of Carmina 55 (variant with collapsed choriamb) and 41 (standard hendecasyllable).
The raw data for this showcase were provided by David Chamberlain, who maintains an extensive digital selection of fully scanned Greek and Latin verse under an open license (Chamberlain Reference Chamberlain2023). Minor post-processing was all that was required to convert it to the Metronome string format.
Showcase B: European diffusion of a “Renaissance meter”
The second showcase focuses on a typological comparison of different implementations of Renaissance meter. During the Renaissance, a new metrical form spread across Europe by replacing previous forms, or by causing readjustments to the pre-existing templates. The meter was characterized by an obligatory stress on the 10th syllable of a line; in Italian, because of the penultimate stress in words, lines usually ended with one additional unstressed syllable, making an 11-syllable line. This meter – the endecasillabo – became widely known in Europe through Petrarch’s poetry. Italian, and likely French, isosyllabic examples later shaped the staple meter of modern English: the iambic pentameter.
In each tradition that adopted the innovation, the meter underwent some changes and adaptations to the recipient language and versification style which led to different implementations of the same poetic form (De Sisto Reference De Sisto2020). The new meter probably originated in the Occitan tradition (Beltrami Reference Beltrami1986; Billy Reference Billy2000; Di Girolamo and Fratta Reference Di Girolamo and Fratta1999) and first reached Italy from there. The Europe-wide renown of Petrarch and his endecasillabo quickly spread the new meter to other Romance traditions, like Spanish, and vernaculars of the Italian peninsula, like Neapolitan, Venetian and Sicilian. French poetry was already using decasyllabic verse (amour courtois) (Hudson Reference Hudson1919), but this form was adjusted to the Italian trend of composing sonnets in endecasillabo (Hudson Reference Hudson1919; Key, Cave, and Bowie Reference Key, Cave and Bowie2006), before being replaced by the alexandrine meter. Occitan verse also influenced Catalan and Portuguese poetry, which already had a pre-existing decasyllabic form.
Later in the Renaissance, this new metrical form spread independently to English and to Dutch poetry. Subsequently, the Dutch Renaissance meter influenced German and Frisian poets. All West Germanic adaptations developed strict iambic cadence from the contact with isosyllabic forms. This was the beginning of modernity in European poetry: the rise of accentual-syllabic meters everywhere east of France.
The metrical alignment (see Figure 4) of small samples from many Renaissance traditions clearly distinguishes between the three previously described ‘directions’ of adoption of the form: (1) Italian peninsula and Spanish; (2) Occitan, Catalan and French; and, finally, (3) Germanic traditions that pivoted to regular iambic meter (the values on the graph show how ‘free’ the samples are with regard to the placement of stresses – note the low values in the Germanic samples, suggesting the prominence of the accentual pattern). While the finer historical relationships between traditions may remain opaque to this method (which depends on the individual samples, and technical issues with word boundaries like synalepha),Footnote 3 overall it captures the key structural and linguistic similarities emerging from the implementation of the same form.

Figure 4. A metronome-based cladogram of various samples of Renaissance meter.
Note: The inset number is the entropy-based variability from the regular metrical form (see Šeļa and Gronas Reference Šeļa and Gronas2022). Shakespeare is the most regular, de La Torre the least.
The data employed in this showcase were originally compiled by Mirella De Sisto (Reference De Sisto2020); the samples were selected from the works of poets who can be considered representative of the Renaissance tradition of their respective languages. Manual annotation was performed, in collaboration with experts in the phonology and metrics of the respective languages.
Showcase C: Modern expansion: accentual-syllabic verse
During the 18–19th centuries, foot-based accentual-syllabic verse, which emerged through interaction with isosyllabic Renaissance meter, rapidly spread from Germany to the North and East (Gasparov Reference Gasparov1996, 207–9; Kazartsev Reference Kazartsev2022) and was critical for establishing newly emerging national literatures. Through highly influential (and highly edited) folk song collections (Abrahams Reference Abrahams1993; Leerssen Reference Leerssen2012), like the Volkslieder (1786), regular trochaic meters became closely associated with oral traditions throughout Europe, while iambic verse, from German courtly odes to Shakespeare’s dramatic genius, provided a basis for high-status secular literature. This formed a cornerstone for the enduring cultural – and, as a result, semantic – opposition of iambic and trochaic meters (Šeļa, Plecháč, and Lassche Reference Šeļa, Plecháč and Lassche2022).
As with any technology, conquest and power played a major role in metrical adoption, and it is not a coincidence that “fashionable” accentual-syllabic verse spread quickly through the territories, communities and languages of the Austro-Hungarian and Russian empires, also following the unification of literary languages and school systems (Gasparov Reference Gasparov1996, 238–9). New verse often disrupted already existing versification, or weakened local alternatives (for example, syllabic verse in Polish, Czech and Ukrainian).
The last showcase (Figure 5) demonstrates the possibility of cross-linguistic alignment of distinct modern poetic forms in Czech, German and Russian. We are able to recognize the structural unity of the same meters (colors), detect the broader divide between trochee and iamb (inner and outer regions on the plot), yet preserve language-specific variation (point shapes) – the latter occurs partly because of differences in rhythm, and partly because word boundaries also encode differences in word lengths. German verse was a source for both Slavic literatures at different times, and the evidence here of structural similarity is a step toward better understanding the individual metrical lineages, and tracing formal connections through global literary history.

Figure 5. UMAP cluster of 3222 poems in Czech, German and Russian from the PoeTree corpus, in the six most common European meters.
Note: Metronome distance is used as the clustering metric.
The data presented in Figure 5 are a subset of the same data we used in our evaluation (cross-linguistic meter recognition) and comes from annotated corpora of Czech, German and Russian poetry (all parts of the PoeTree).
Discussion
We move now to consider some limitations, and opportunities for further work. Our approach was mainly informed by poetic forms with isosyllabic meters and recurrent prosodic patterns, so we expect it to struggle with forms that emerged around other principles. For example, alliteration patterns that are part of the versification system in Old English and Old Norse (dróttkvætt) cannot currently be traced and would require a different approach to encoding the “alphabet” for alignment. Tonic verse, which can be composed of lines of different syllable lengths, regulated by a constant number of stressed positions, might also be difficult to analyze, since the algorithm is sensitive to line length. The same can be said of poetic forms in the same meter that are distinguished based on the shape of stanza and/or a pattern of rhymes. For example, both ottava rima and English sonnets can be written in iambic pentameter, yet represent distinct traditions. In this case, the expansion of the alphabet would be straightforward: adding a symbol for a stanza boundary would allow us to trace basic stanza composition (but make the analysis less broadly applicable). Additionally, the general methodology of Metronome (alignment of abstracted schemes) might be applied to rhyme patterns instead of rhythm sequences: one can envision, for example, a reconstruction of the history and typology of European sonnets based on classic rhyme-to-letter encoding: abab|abab|cdc|cdc.
Our current approach is also under-investigated in relation to caesurae – fixed-position word boundaries that can play a significant role in defining meters, like the French alexandrine (a meter of 12 syllables, where a caesura consistently bisects the verse, 6+6). To address this, we performed metronome analysis on simulated variations of the alexandrine (see Appendix B), to see if caesura-based forms can be distinguished from plain syllabics in our current approach. Preliminary results are very promising: even under conservative restrictions, the metronome tends to distinguish pseudo-poems that are governed by caesurae vs pseudo-poems of the same line-length that are not.
Finally, the method depends on the availability of raw scansions of large collections of poetry. This can be tricky, since scansion itself depends on existing metrical theory, shared within a community, a consensus that is rarely maintained across national traditions of scholarship. On one hand, theory-agnostic scansion is hardly ever possible, but on the other hand, the presence of theory in automated scansion systems can create self-fulfilling cycles, where empirical patterns are forced to fit a few anticipated schemes (for review see De Sisto et al. Reference De Sisto, Hernández-Lorenzo, De la Rosa, Ros and González-Blanco2024), which regularizes and conceals actual variation.
In the authors’ other work we often advocate for the utility of bottom-up approaches to meter recognition, since any pattern that is repeated enough times, any strong organizational principle, will always make itself visible; the very nature of systematic meter will cause it to leave its mark on any observation or measurement, be it taken on a text’s prosody, morphology, or syntax (Gasparov and Tarlinskaja Reference Gasparov and Tarlinskaja2008). Meters are, essentially, quite simple technologies of rule iteration: the more you repeat one, the more obvious it becomes, even for such crude measuring instruments. This makes even a “sub-optimal” automated scansion, made without expert knowledge of the underlying theory and tradition, usable and useful.
In conclusion, this study shows how the alignment of metrical strings can be enlightening in a variety of contexts – from tracing slight deviations in well-established meters, to recognizing large groups of similar forms across languages and times. Our proposed method works well with both long and short sequences, and is applicable in a wide range of poetic traditions (our showcases feature three major ones: quantitative, syllabic and accentual-syllabic). The method’s sensitivity to large and small structural differences in verse organization highlights its potential in both historical and cross-linguistic comparative research.
Data availability statement
The repository may be found at https://github.com/bnagy/metronome-paper. All code and data are available under CC-BY, except where restricted by upstream licenses. The code repository includes full reproduction data and code for the evaluation, as well as various supplemental figures and explanations.
Acknowledgements
Data visualisations are produced with ggplot (Wickham Reference Wickham2016) and cladograms use ggtree (Yu et al. Reference Yu, Smith, Zhu, Guan and Lam2016).
Author contribution
B.N. wrote the metronome software package and performed the evaluation and Classical Latin case-study. A.Š. conceptualized use cases for metrical alignment, provided a theoretical framework and was responsible for the accentual-syllabic showcase. P.P. prepared the modern language corpus. M.D.S. was responsible for the ‘Renaissance’ meter showcase. All authors contributed equally to the preparation and editing of the final article.
Funding statement
B.N. was supported by Poland’s National Science Centre, project 2020/39/O/HS2/02931. A.Š. was supported by the project “Large-Scale Text Analysis and Methodological Foundations of Computational Stylistics” (SONATA-BIS 2017/26/E/HS2/01019). M.D.S. was supported by the research project CIGE/2022/114 funded by the Valencian Government (GE 2023, Conselleria d’Innovació, Universitats, Ciència i Societat Digital). P.P. was supported by the project “European Poetry: Distant Reading (2023–2025)” (Czech Science Foundation GA23-07727S).
Competing interests
The authors declare none.
Ethical standard
The authors affirm this research did not involve human participants. No artificial intelligence (AI) tool was employed in this research nor during the writing process.
Appendix A
 Here, we briefly detail the custom bonus/penalty matrix that differentiates Metronome from the default Smith–Waterman alignment (in most descriptions of the algorithm this is referred to as a substitution matrix). For DNA sequences with the four labels A, G, C, T, the default is to award 1 for a match and 
 $-$
1 for a mismatch. For protein sequence alignment, since some proteins are “closer” to others, there are are range of different matrices to score pairwise matches from the 20 standard amino acids, but they need not concern us here. For our four metronome symbols, we began with the intuition that a (mis)match at the end of a line was the most important, since the algorithm will begin to favour pairs (of poems) whose lines have matching regular lengths. We then awarded a greater bonus for matches in the metrically strong syllable (S) than the weak (w), and eventually decided to award a small bonus (but no penalty) for matching word breaks (.) to help match poems with fixed or customary caesurae (Table A1). From there we performed a series of benchmarks, tuning the algorithm based on what looked to us like improvements in the way poems were grouped. This is a subjective process, using our own expertise and based on the corpora we have available – it is by no means unlikely that some analyses would benefit from tuning the default substitution matrix, and this can be easily done in the metronome software package.
$-$
1 for a mismatch. For protein sequence alignment, since some proteins are “closer” to others, there are are range of different matrices to score pairwise matches from the 20 standard amino acids, but they need not concern us here. For our four metronome symbols, we began with the intuition that a (mis)match at the end of a line was the most important, since the algorithm will begin to favour pairs (of poems) whose lines have matching regular lengths. We then awarded a greater bonus for matches in the metrically strong syllable (S) than the weak (w), and eventually decided to award a small bonus (but no penalty) for matching word breaks (.) to help match poems with fixed or customary caesurae (Table A1). From there we performed a series of benchmarks, tuning the algorithm based on what looked to us like improvements in the way poems were grouped. This is a subjective process, using our own expertise and based on the corpora we have available – it is by no means unlikely that some analyses would benefit from tuning the default substitution matrix, and this can be easily done in the metronome software package.
Table A1. The bonus/penalty matrix for metronome symbol (mis)matches

Note: The bonuses and penalties are symmetric, so only the lower triangular form is shown.
Appendix B
Our analysis of modern meters in this article did not require the detection of caesurae – systemic word boundaries in the line (and often a syntactic pause) that help to define the meter in some traditions, like syllabic French verse and classical Latin. The standard modern example is the French 12-syllable alexandrine with a central caesura following a strong syllable. It is important to know whether the metronome can distinguish forms based on the presence, or placement, of caesurae. To test this, we generated pseudo-scansion sequences under four different conditions that imitate, albeit naïvely, verse prosody.
 Each generated poem has a size drawn from a Poisson distribution with 
 $\lambda =14$
, so that the length of all poems in the synthetic tests will average 14 lines, the length of a sonnet.
$\lambda =14$
, so that the length of all poems in the synthetic tests will average 14 lines, the length of a sonnet. 
- 
1. Iambic hexameters with alexandrine caesura (alex): each line is strictly a sequence of “wSwSwS.wSwSwS,” with an obligatory word boundary after the 6th syllable. All other word boundaries in the line are determined randomly from a distribution of probable word-lengths; 
- 
2. Plain iambic hexameter (Iamb-6): same as above, but no hard-coded caesura, word boundaries are assigned randomly; 
- 
3. Classic French alexandrine (alexFrench): each line has a general structure of “xxxxxS.xxxxxS,” where x is any (w or S) symbol, and word boundaries are determined freely in each hemistich. 
- 
4. “Romantic” alexandrine (alexRomantic): has a general structure of xxxS.xxxS.xxxS; this form is associated with Victor Hugo and Romantic experiments with the classic meter; 
- 
5. Plain 12-syllable meter (12syl): a hypothetical meter that does not have any restrictions, except line length. 
We generate lines under simple assumptions and do not follow the structure of any language in particular, nor do we try to plausibly reproduce word-length distribution and stress placement within words (so, e.g., unlikely words with prosodic structures like “ww” or “SSS” can occur). To distribute word boundaries within a line, we draw 1-,2-,3-, and 4-syllable words with the corresponding probability ratios 1:3:1:0.25. This ratio leads to a probability of 45 percent that there will be a word boundary after the 6th syllable in a non-alexandrine line, simply by chance. This is reasonably high, so the metronome needs to prove its sensitivity in tracing regular differences.
While this setup is unrealistic, it is transparent, and allows us to control metrical principles independent of linguistic regularities. Figure B1 shows the resulting clustering of the “poems”: it is clear that the forms that employ caesurae can be distinguished from those that do not. The alignment works better for “syllabic” alexandrines because, unlike iambics, they have an additional distinctive feature – two fixed stressed positions versus an unregulated 12-syllable line.

Figure B1. Clustering of simulated pseudo-poems that represent five conditions of the alexandrine form: three syllabic and four accentual-syllabic.
Further tests confirm the outcome: for 100 runs with 20 poems generated for each form, we measure the cluster efficiency via Adjusted Rand Index (ARI). The median ARI for iambic meters is 0.55, indicating a somewhat noisy recognition of caesura-based forms. The median ARI for syllabics is 0.95 – an almost perfect clustering of the three varieties. Thus, we expect that the main structuring principle of caesura-based forms should be discoverable, even in very noisy settings.
 
 






 
              
Rapid Responses
No Rapid Responses have been published for this article.