Hamiltonian Sets of Polygonal Paths in Assembly Graphs

Alexander Guterman; Nataša Jonoska; Elena Kreines; Artem Maksaev; Natalia Ostroukhova

doi:10.1017/S0013091526101357

Hamiltonian Sets of Polygonal Paths in Assembly Graphs

Part of: Low-dimensional topology Discrete mathematics in relation to computer science Graph theory

Published online by Cambridge University Press: 05 March 2026

Artem Maksaev and

Alexander Guterman*: Affiliation:
Department of Mathematics, Bar-Ilan University, Ramat-Gan 5290002, Israel
Nataša Jonoska: Affiliation:
Department of Mathematics and Statistics, University of South Florida, Tampa, FL 33620, USA (jonoska@usf.edu)
Elena Kreines: Affiliation:
School of Mathematical Sciences, Tel Aviv University, Tel Aviv 6997801, Israel Department of Mathematics, Ben Gurion University of the Negev, P.O.B. 653, Beer-Sheva 8410501, Israel (kreines@bgu.ac.il)
Artem Maksaev: Affiliation:
Faculty of Computer Science, HSE University, 20 Myasnitskaya Ulitsa, Moscow 101000, Russia (artmak95@mail.ru)
Natalia Ostroukhova: Affiliation:
Moscow Center of Fundamental and Applied Mathematics, Moscow 119991, Russia (natosova@gmail.com)
*: Corresponding author: Alexander Guterman, email: alexander.guterman@biu.ac.il

Article contents

Abstract
Introduction
Terminology and notation
Hamiltonian sets of polygonal paths
Proof of Conjecture
References

Rights & Permissions

Abstract

We provide four equivalent combinatorial conditions for a simple assembly graph (rigid vertex graph where all vertices are of degree 1 or 4) to have the largest number of Hamiltonian sets of polygonal paths relative to its size. These conditions serve to prove the conjecture that such a maximum, which is equal to $F_{2n+1}-1$, where $F_k$ denotes the $k$th Fibonacci number, is achieved only for special assembly graphs, called tangled cords.

Keywords

simple assembly graphs Hamiltonian sets polygonal paths Eulerian transversals Fibonacci sequence

MSC classification

Primary: 05C10: Planar graphs; geometric and topological aspects of graph theory

Secondary: 57M15: Relations with graph theory 68R15: Combinatorics on words

Information

Type: Research Article
Information: Proceedings of the Edinburgh Mathematical Society , First View , pp. 1 - 17

DOI: https://doi.org/10.1017/S0013091526101357 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2026. Published by Cambridge University Press on Behalf of The Edinburgh Mathematical Society.

1. Introduction

Simple assembly graphs were introduced in [Reference Angeleska, Jonoska, Saito and Landweber4], as a tool to describe a certain process of DNA recombination in some species of ciliates. The spatial molecular structure at the moment of recombination at a single molecular locus is modelled by a rigid vertex of degree $4$. Multiple recombinant processes can be observed on a single DNA nanochromosome; therefore, the whole process is modelled by a so-called assembly graph, a special graph with rigid vertices, see rigorous definitions in § 2 (we also refer to [Reference Angeleska, Jonoska, Saito and Landweber4] for the detailed and self-contained information). An assembled gene after the recombination is modelled by a polygonal path within the assembly graph, i.e., a path that visits a vertex exactly once and makes a ‘turn’ at every vertex (except at the initial and terminal points). A set of polygonal paths that visit every vertex exactly once corresponds to a set of genes that are formed during the recombination. Since every vertex should be visited by a polygonal path once and only once (a recombination happens only once), a set of such polygonal paths forms a so-called Hamiltonian set of polygonal paths. The maximal number of such Hamiltonian sets of polygonal paths gives the maximal number of sets of genes which a given molecule with specific recombination sites can encode. The question arises: given a molecule with $n$ recombination sites, what is the maximal number of sets of resulting DNA segments that can be obtained? This translates the question above to the problem of determining the maximal number of Hamiltonian sets of polygonal paths in an assembly graph with $n$ vertices, and further characterizing those graphs where this maximum can be achieved.

The notion of a simple assembly graph is closely related to several mathematical structures, in particular, to the notion of ribbon graphs or dessins d’enfants, which investigates cellularly embedded graphs on oriented surfaces, see [Reference Shabat and Voevodsky14]. There are several papers devoted to the possible genus range of the dessins d’enfants appearing from simple assembly graphs, see [Reference Angeleska, Jonoska and Saito3, Reference Buck, Dolzhenko, Jonoska, Saito and Valencia5]. Moreover, topological classification and enumeration of RNA structures by genus is provided in [Reference Andersen, Penner, Reidys and Waterman1]. In parallel, simple assembly graphs have been studied extensively in knot theory in various contexts, see [Reference Buck, Dolzhenko, Jonoska, Saito and Valencia5] and references therein. A simple assembly graph is related to a virtual knot diagram, while a turn at a vertex in a polygonal path corresponds to a smoothing of a vertex in the diagram [Reference Kauffman9]. In topological graph theory, a rigid vertex graph corresponds to a graph with a fixed rotation system of the edges at every vertex, and the polygonal paths correspond to so-called A-trails, for the detailed and self-contained exposition see [Reference Andersen, Bouchet and Jackson2]. Within this context, the Hamiltonian sets of polygonal paths in assembly graphs correspond to Hamiltonian sets of A-trails in rigid vertex graphs with all vertices of degree 4.

An approach to characterize graphs using words, along with the notion of word-representable graphs, was first introduced by Kitaev and Pyatkin in 2008, see [Reference Kitaev and Pyatkin13]. Since then, this theory has undergone significant development, as evidenced by subsequent works such as [Reference Fleischmann, Haschke and Löck7, Reference Kitaev, Charlier, Leroy and Rigo10–Reference Kitaev and Lozin12] and references therein. Sharing these ideas, a combinatorial framework linking simple assembly graphs and double occurrence words, i.e., the words in which each letter appears exactly twice, was later proposed in [Reference Angeleska, Jonoska and Saito3]. Developing this direction further, in 2013, the number of Hamiltonian sets of polygonal paths in a simple assembly graph was estimated to be less than or equal to $ F_{2n+1}-1$, where $F_k$ is the $k$-th Fibonacci number [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Theorem 4.1], and it was shown that the bound is tight [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Corollary 4.4]. The given graph example achieving the bound had a specific structure and was named a tangled cord. The following conjecture was stated in [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Conjecture 4.5]:

Conjecture 1.1.

The upper bound for the maximal number of Hamiltonian sets of polygonal paths, which equals $F_{2n+1}-1$, is achieved only by the tangled cord with $n$ rigid vertices of degree 4.

The main result of our paper is a proof of this conjecture. In order to do this, we provide a characterization of the graphs with the maximal number of Hamiltonian sets of polygonal paths in terms of combinatorics of the subwords of the corresponding double occurrence word. Our method is based on the careful analysis of the structure of the corresponding extremal word, which leads to the characterization of the corresponding extremal graph.

All necessary definitions and notions, such as (simple) assembly graph, polygonal paths, double occurrence words, tangled cord, etc., are presented in § 2. In § 3, we recall known facts about the number of Hamiltonian sets of polygonal paths and formulate the main results of this paper. Proposition 3.6 gives four equivalent conditions for a graph that provides the largest possible number of Hamiltonian sets of polygonal paths. Conjecture 1.1 is proven in § 4.

2. Terminology and notation

The definitions and notations below follow those introduced in [Reference Angeleska, Jonoska and Saito3, Reference Angeleska, Jonoska, Saito and Landweber4, Reference Burns, Dolzhenko, Jonoska, Muche and Saito6]. In this paper, we consider finite undirected graphs with vertices of degree 1 or 4. Loops and multiple edges are permitted (however, multiple loops are prohibited). For such a graph $\Gamma$, let $V(\Gamma)$ and $E(\Gamma)$ denote the sets of vertices and edges of $\Gamma$, respectively. Throughout the text, by degree of a vertex we mean the number of half-edges, i.e., parts of edges in a local neighbourhood of a vertex, incident to this vertex, thus all multiple edges are counted separately and a loop is counted twice. To be more precise, if there is a loop incident to $v \in V(\Gamma)$, the loop is represented with two half-edges (e.g., $e_1 $ and $ e_2$ in Figure 1(b) and (d)), therefore adding two to the degree of $v$.

Figure 1.

Some diagrams for a rigid vertex $v$ of degree 4.

For a tuple of half-edges $(e_1, e_2, e_3, e_4)$, we define its cyclic order as the following set consisting of all its cyclic permutations and their reverses:

\begin{align*} (e_1, e_2, e_3, e_4)^{cyc} = \{& (e_1, e_2, e_3, e_4), (e_2, e_3, e_4,e_1), (e_3, e_4,e_1, e_2), (e_4,e_1, e_2, e_3), \\ & (e_4, e_3, e_2, e_1), (e_3, e_2, e_1, e_4), (e_2, e_1, e_4, e_3), (e_1,e_4,e_3,e_2) \}. \end{align*}

In other words, we consider all elements of the set $(e_1, e_2, e_3, e_4)^{cyc}$ equivalent and usually refer to a single element of this ordering, mostly $(e_1, e_2, e_3, e_4)$. We say that a vertex $v \in V(\Gamma)$ of degree 4 is rigid if a cyclic order of half-edges incident to this vertex is fixed (in fact there are three ways to do this). Throughout the text, we depict graphs as locally embedded to the surface at each vertex. Each vertex $v$ is considered as a small disk such that incident edges are attached to the boundary of the disk. The cyclic order of half-edges is usually specified as we read them on the diagram following the boundary of the disk. The most general type of a rigid vertex is depicted in Figure 1(a); however, there are many other possibilities, where loops and multiple edges are involved (see, e.g., Figure 1(b–e)).

Suppose that for a vertex $v$ of degree 4, the following cyclic order of half-edges is specified: $(e_1, e_2, e_3, e_4)$. Then we say that $e_1$ and $e_2$ are neighbours in $v$, and so are $e_2$ and $e_3$, and also $e_3$ and $e_4$, as well as $e_4$ and $e_1$. Note that half-edges of a loop can be neighbours as in Figure 1(b) and (d).

An assembly graph is a finite connected graph in which all vertices are of degree $1$ or $4$ and all vertices of degree $4$ are rigid. A vertex of degree $1$ is called an endpoint.

A path in a graph is an alternating sequence of vertices and edges that starts and ends with a vertex, and the edges in this sequence are incident to the vertices that stand next to them. Throughout the text, we consider that a path $v_0e_1v_1 \dots v_{\ell-1}e_{\ell}v_\ell$ and its reverse $v_\ell e_{\ell}v_{\ell-1} \dots v_1e_1v_0$ are equal. If there are no edges in a path and so the path consists of a single vertex, it is called a singleton.

Definition 2.1 (see [Reference Angeleska, Jonoska and Saito3, Definition 3.3])

A transverse path in an assembly graph or simply a transversal is a path $v_0 e_1 v_1 e_2 \ldots e_n v_n$, where $v_0, v_n$ are endpoints, satisfying the following conditions:

(1) All the edges $e_1, \dots, e_n$ are pairwise distinct.
(2) Each $e_i$ is not a neighbour of $e_{i-1}$ in the rigid vertex $v_{i-1}$ (i = 2, …, n). When $e_i$ is a loop ( $v_{i-1} = v_i$), both half-edges of $e_i$ are marked as $e_i^{(1)}$ and $ e_i^{(2)}$ in such a way that $e_{i-1}$ and $e_i^{(1)}$ are not neighbours in $v_i$, as well as $e_i^{(2)}$ and $e_{i+1}$.

An example of a transverse path can be seen in Figure 2. Note that at the vertex $v_1$, the transverse path first follows ‘right’ half-edge, corresponding to the loop $e_1$, further going through the ‘left’ half-edge of $e_1$. Meanwhile, at $v_3$, the path first follows ‘left’ half-edge of the loop $e_4$, further going through its ‘right’ half-edge.

Figure 2.

Consecutively drawing transverse path $v_0\,e_0\,v_1\,e_1\,v_1\,e_2\,v_2\,e_3\,v_3\,e_4\,v_3\,e_5\,v_2\,e_6\,v_4$.

Two transverse paths are equivalent if they are either identical or one is the reverse of the other. A simple assembly graph is an assembly graph having an Eulerian transverse path, i.e., a transverse path visiting each edge exactly once. Note that in a simple assembly graph, there is a unique equivalence class of transverse Eulerian paths, see [Reference Angeleska, Jonoska and Saito3, Lemma 3.6]. Following [Reference Angeleska, Jonoska and Saito3], we say that two graphs are isomorphic as simple assembly graphs if there exists a graph isomorphism between the two graphs that extends to a bijection on half-edges preserving their respective cyclic order at all rigid vertices. In particular, a simple assembly graph is isomorphic to the simple assembly graph obtained from the original by taking all edges along the transversal in reverse order. This means that simple assembly graphs are uniquely determined by the transverse Eulerian path, see [Reference Angeleska, Jonoska and Saito3, Definition 3.2 and Lemma 3.7].

A path $v_0e_1v_1 \dots v_{\ell-1}e_{\ell}v_\ell$ in an assembly graph is called polygonal (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Section 2]) if all $v_0, \dots, v_{\ell}$ are pairwise distinct 4-degree vertices of $\Gamma$ and $e_{i}$ and $e_{i+1}$ are neighbours in $v_{i}$, for all $i = 1, \dots, \ell-1$. Note that a cycle (particularly, a loop) is not a polygonal path since all the vertices in it must be distinct, while a singleton vertex is a polygonal path. Examples of polygonal paths are presented in Figure 3.

Figure 3.

An example of Hamiltonian set of polygonal paths with three paths: $\{v_1 e_1 v_2 e_{15} v_7, \, v_5 e_4 v_4 e_{13} v_8 e_{10} v_3, \,v_6\}$. Note that $v_6$ is a singleton.

Informally, if we consider a vertex as a crossroad, we can either turn left or right forming a polygonal path or drive through at every vertex, which gives us a transverse path.

Two paths are called vertex-disjoint if they have no common vertex. A pairwise vertex-disjoint set of polygonal paths in an assembly graph $\Gamma$ is called Hamiltonian (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Section 2]) if its union covers all $4$-degree vertices of $\Gamma$ (see Figure 3). We denote by $\mathcal{C}(\Gamma)$ the collection of all Hamiltonian sets of polygonal paths in $\Gamma$.

Let $\Sigma =\{a_1, a_2, \ldots\}$ be a (finite) alphabet and $w = w_1 w_2 \ldots w_{N-1} \, w_{N}$ be a word of length $N$ in the alphabet $\Sigma$. The length of $w$ is denoted by $|w|$, so $|w| = N$. We set $w[i] = w_i$, i.e., the $i$th symbol of the word $w$. By $w[i:j]$ $(i\le j)$ we denote the subword $w_i \, w_{i+1} \, \ldots \, w_{j-1} \, w_j$ and we have $|w[i:j]| = j - i +1$. Let $a \in \Sigma$ occur exactly twice in a fixed word $w$, in positions $i$ and $j$, i.e., $w[i] = w[j] = a$, $i \lt j$. We define two functions ${\mathfrak o}_1$ and ${\mathfrak o}_2$ that give the first and second occurrence of $a$ in $w$, respectively, i.e., ${\mathfrak o}_1(a) = i$ and ${\mathfrak o}_2(a) =j$. In the following, the use of ${\mathfrak o}_1(a)$ or ${\mathfrak o}_2(a)$ presumes that the word $w$ is clear from the context and the letter $a$ appears in $w$ exactly twice.

An assembly word, or a double occurrence word (DOW), in the alphabet $\Sigma$ is a non-empty word in which each symbol of the alphabet occurs either exactly twice or does not occur at all. Note that this combinatorial definition is used for Gauss words in the theory of plane curves. Double occurrence words are said to be equivalent if they can be obtained one from another by renaming the alphabet symbols and/or writing the word in the reverse order. There is a one-to-one correspondence between the isomorphism classes of simple assembly graphs and the equivalence classes of double occurrence words, as proved in [Reference Angeleska, Jonoska and Saito3, Lemma 3.8]. Below we briefly provide this construction for the completeness and further use.

Listing the vertices of degree 4 visited by a transverse path in a simple assembly graph gives a DOW over the alphabet $V(\Gamma)$, since every transverse path must visit every vertex of degree 4 twice. Hence, assembly words are equivalent if and only if they correspond to isomorphic assembly graphs, as stated in [Reference Angeleska, Jonoska and Saito3, Lemma 3.8].

For a simple assembly graph $\Gamma$, we denote a representative of the corresponding class of assembly words by $w_{\Gamma}$. Vice versa, for an assembly word $w$, we denote the corresponding simple assembly graph by $\Gamma_w$, see Figure 5.

The number of degree 4 vertices in a simple assembly graph $\Gamma$ is denoted by $\mathcal{V}(\Gamma)$. We have $ 2\cdot \mathcal{V}(\Gamma_w) = |w|$ since $w$ is a DOW. We also denote cardinality of a set $X$ by $|X|$.

Definition 2.2. Let $w$ be a word in the alphabet $\Sigma$ and $\sigma \subseteq \Sigma$ be a non-empty subset of letters. Denote by $w\setminus \sigma$ the sequence of non-empty subwords obtained from $w$ by deleting the letters belonging to $\sigma$ (notice that those subwords are not necessarily assembly words even if $w$ is assembly). Denote by $w(\sigma)$ the concatenation of all occurrences of letters from $\sigma$ in $w$, keeping the order in which the letters occur in $w$.

Example 2.3. For the word $w = 134 2 134 85 67 5 7 28 6$ and $\sigma = \{2,5,8\}$, we have:

\begin{equation*}w\setminus \sigma = (134,\, 134, \, 67,\, 7, \, 6), \, w(\sigma) = 285528.\end{equation*}

Definition 2.4 (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Definition 4.2])

A tangled cord of order $n$ is the simple assembly graph that corresponds to the following double occurrence word:

\begin{equation*} w_{TC_n}= 1213243\dots (n-1)(n-2)n(n-1)n. \end{equation*}

The tangled cord is denoted by $TC_n$.

Each next word $ w_{TC_n} $ is obtained from the previous assembly word $ w_{TC_{n-1}} $ via replacing the last occurrence of the letter $ (n-1) $ with a subword $ n(n-1)n$. This operation, in terms of assembly graphs, is represented in Figure 4.

Figure 4.

How the tangled cord $TC_{n}$ is inductively constructed from $TC_{n-1}$. The visualization provides the ‘correct’ cyclic order of half-edges in each rigid vertex.

These are examples of assembly words $w_{TC_n}$ for small $n$:

\begin{equation*} w_{TC_1}=11,\ w_{TC_2}=1212,\ w_{TC_3}=121323, \ w_{TC_4}=12132434.\end{equation*}

Remark 2.5. Note that the word $w_{TC_n}$ is symmetric, i.e., if we write it in reverse order and rename symbols in ascending order, we obtain the same word.

3. Hamiltonian sets of polygonal paths

We denote the Fibonacci sequence by $(F_n)$, namely, it is the sequence defined by $F_n = F_{n-1}+F_{n-2}$ for $n\ge 2$ and $F_0=0, F_1=1$. The following upper bound is known for the number of Hamiltonian sets of polygonal paths in a simple assembly graph.

Theorem 3.1 (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Theorem 4.1])

Let $\Gamma$ be a simple assembly graph, $\mathcal{V}(\Gamma)=n $, and $\mathcal{C}$ be a collection of Hamiltonian sets of polygonal paths in $\Gamma$. Then $|\mathcal{C}| \leqslant F_{2n+1}-1.$

It was proved in [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Theorem 4.3, Corollary 4.4] that for the tangled cord $TC_n$, the equality $|\mathcal{C}| = F_{2n+1}-1$ is true, and therefore, the bound is tight for all $n$. It was also conjectured in the same paper [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Conjecture 4.5] that the tangled cord is the only graph for which the equality holds, see Conjecture 1.1.

In the definition below, we follow the notation introduced in [Reference Guterman, Kreines and Ostroukhova8, Remark 2.2]. Namely, we enumerate all the edges in the same order as they occur when going through the transversal.

Definition 3.2. Let $\Gamma$ be a simple assembly graph, $\mathcal{V}(\Gamma) = n$, and let us fix the word $w_\Gamma$ by listing all the vertices of degree 4 along the Eulerian transversal of $\Gamma$. Then we enumerate the edges of $\Gamma$ by the following rule: $e_i = (w_\Gamma[i], w_\Gamma[i+1]), i = 1,2, \dots, 2n~-~1.$ Note that the edges incident to the endpoints of $\Gamma$ are not enumerated.

Example 3.3. Figure 5 shows a graph with the assembly word $w= 112323 $ and edges labelled in the order they are encountered along the transversal.

Figure 5.

Enumerated edges in a graph with the assembly word $ 112323 $.

We define a correspondence between Hamiltonian sets of polygonal paths of a simple assembly graph $\Gamma$ and binary strings (words in the alphabet $\{0, 1\}$).

Definition 3.4. Let $\Gamma$ be a simple assembly graph, $\mathcal{V}(\Gamma) = n$, and $e_1, e_2, \dots,$ $e_{2n-1}$ be the enumerated edges of its Eulerian transversal, see Definition 3.2. We consider the map $ \Phi_\Gamma\colon \mathcal{C}(\Gamma)\rightarrow \{0,1\}^{2n-1}$ such that for $\gamma \in \mathcal{C}(\Gamma)$ and $i = 1, 2, \dots, 2n-1$,

\begin{equation*}\Phi_\Gamma(\gamma)[i] = \begin{cases} 1, & \text{if}\ e_i\ \text{belongs to some path from}\ \gamma;\\ 0, & \text{otherwise.} \end{cases}\end{equation*}

Proposition 3.5. For a simple assembly graph $\Gamma$ with $\mathcal{V}(\Gamma) = n$, the map $\Phi_{\Gamma}$ satisfies the following properties:

(1) $\Phi_{\Gamma}$ is injective.
(2) $\Phi_{\Gamma}(\gamma)$ is a string with no two consecutive ones for any $\gamma \in \mathcal{C}(\Gamma)$.
(3) $\Phi_\Gamma(\gamma)\neq \underbrace{10101\ldots01}_{2n-1}$ for any $\gamma \in \mathcal{C}(\Gamma)$.

Proof. (1) Assume the contrary. Let $\gamma, \gamma' \in \mathcal{C}(\Gamma)$ be such that $\Phi_{\Gamma}(\gamma) = \Phi_{\Gamma} (\gamma')$. Then $\gamma$ and $\gamma'$ consist of exactly the same edges. Since both sets of paths are Hamiltonian (and in particular, vertex-disjoint), we get that $\gamma$ and $\gamma'$ coincide up to singletons. Each vertex in $\Gamma$ that is not incident to any edge in $\gamma$ (respectively, $\gamma'$) must be a singleton in $\gamma$ (respectively, $\gamma'$) and hence $\gamma =\gamma'$.

(2) Let $\gamma = \{\gamma_1, \gamma_2, \dots, \gamma_s\}$ be a Hamiltonian set of polygonal paths in $\Gamma$. Consider all edges belonging to paths in $\gamma$. We claim that among these edges, there are no two consecutive edges in an Eulerian transversal of $\Gamma$. Indeed, let $u,v,w$ be vertices of $\Gamma$ such that the edges $(u, v) \in \gamma_i$ and $(v, w) \in \gamma_j$ are consecutive in the transversal. Then $i = j$, since otherwise $\gamma_i$ and $\gamma_j$ have $v$ as a common vertex, which contradicts the definition of a Hamiltonian set. Therefore both edges belong to the same path: $(u, v) \in \gamma_i$ and $(v, w) \in \gamma_i$, which contradicts the polygonality of $\gamma_i$ since by the assumption $(u, v)$ and $(v, w)$ are consecutive. So, there are no consecutive edges among all edges belonging to any path in $\gamma$. Therefore there is no binary string with two consecutive $1$ in the image of $\Phi_{\Gamma}$.

(3) Observe that a polygonal path with $k$ vertices visits $k-1$ edges. Therefore a Hamiltonian set of polygonal paths in $\Gamma$, which visits $n$ vertices, contains at most $n-1$ edges and so would correspond to a binary string with at most $n-1$ occurrences of $1$s. The binary string $\underbrace{10101\ldots01}_{2n-1}$ has $n$ occurrences of $1$s.

For any edge $e \in E(\Gamma)$, we denote by $v_1(e),v_2(e)\in V(\Gamma)$ the vertices incident to $e$, meaning that $e = (v_1(e),v_2(e))$. In particular, if $e$ is a loop, we still write $v_1(e),v_2(e)$ for the unique vertex of $e$.

Proposition 3.6. Let $\Gamma$ be a simple assembly graph, $\mathcal{V}(\Gamma) = n$, and $e_1, e_2, \dots,$ $e_{2n-1}$ be the enumerated edges of its Eulerian transversal in the sense of Definition 3.2. Let us fix the word $w_\Gamma$ in the alphabet $\{1, 2, \dots, n\}$. Then the following statements are equivalent:

(1) The number of all Hamiltonian sets of polygonal paths for the graph $\Gamma$ is maximal, i.e., $|\mathcal{C}(\Gamma)| = F_{2n+1}-1.$
(2) Any $1 \leqslant k \leqslant n-1$ pairwise non-consecutive edges $e_{i_1},\ldots,e_{i_k}$ form a vertex-disjoint set of polygonal paths.
(3) For any $1 \leqslant k \leqslant n-1$ pairwise non-consecutive edges $e_{i_1},\ldots,e_{i_k}$, there are vertices of $\Gamma$ in the list $(v_1(e_{i_1}),v_2(e_{i_1}),v_1(e_{i_2}),v_2(e_{i_2}), \ldots, v_1(e_{i_k}),v_2(e_{i_k}))$ that are included into the sequence exactly once.
(4) For any proper non-empty subset $\sigma \subset \{1,2,\dots,n\}$, at least one word from $w_\Gamma\setminus \sigma$ has an odd length.

Proof. $(1)\Leftrightarrow (2)$ The equivalence of Item (1) and Item (2) has been proven in [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Theorem 4.1], however, we provide it here for the completeness. By Proposition 3.5, Item 1, the map $\Phi_{\Gamma}$ is injective. Moreover, according to Proposition 3.5, Item 2, there is no binary string with two consecutive ones in the image of $\Phi_{\Gamma}$. Thus we can state that the number of Hamiltonian sets of polygonal paths does not exceed the number of binary strings of length $2n-1$ with no consecutive ones. It is well known that the number of binary strings of length $m$ with no consecutive ones equals $F_{m+2}$ (see, e.g., [15, Sequence A000045]). By Proposition 3.5, Item 3, the binary string $\underbrace{10101\ldots01}_{2n-1}$ does not belong to the image of $\Phi_{\Gamma}$.

Therefore, $|\mathcal{C}(\Gamma)| \leqslant F_{2n+1}-1$, and the equality holds if and only if any $k \leqslant n-1$ pairwise non-consecutive edges of the Eulerian transversal of $\Gamma$ form a vertex-disjoint set of polygonal paths in $\Gamma$.

Indeed, $|\mathcal{C}(\Gamma)| = F_{2n+1}-1$ if and only if the image of $\Phi_\Gamma$ equals all the binary strings of length $2n-1$ with no consecutive ones, excluding the string $\underbrace{10101\ldots01}_{2n-1}$. This means that any binary string in $\{0, 1\}^{2n-1}$ with $k \leqslant n-1$ ones and without consecutive ones has a preimage $\gamma \in \mathcal{C}(\Gamma)$ under $\Phi_\Gamma$, which is equivalent to the fact that any choice of $k \leqslant n-1$ non-consecutive edges from the Eulerian transversal of $\Gamma$ provides a vertex-disjoint set of polygonal paths $\gamma'$ ( $\gamma$ is obtained from $\gamma'$ by adding all non-visited vertices as singletons).

$(2)\Rightarrow (3)$ Choose $1 \leqslant k \leqslant n-1$ arbitrary pairwise non-consecutive edges from the transversal of $\Gamma$. These edges form a vertex-disjoint set of polygonal paths in $\Gamma$. Consider any path $\alpha$ in this set. By the definition of a polygonal path, the beginning of $\alpha$ cannot belong to the other paths. Hence it occurs exactly once among all the endpoints of $e_{i_1},\ldots,e_{i_k}$.

$(3)\Rightarrow (2)$ Suppose that the statement of Item (3) is true. Choose $1 \leqslant k \leqslant n-1$ arbitrary pairwise non-consecutive edges $e_{i_1},\ldots, e_{i_k}$ of the Eulerian transversal of $\Gamma$. These edges together with the vertices incident to them induce a subgraph $\Gamma'$ of $\Gamma$, possibly, disconnected. Since the edges $e_{i_1},\ldots,e_{i_k}$ are pairwise non-consecutive, their endpoints have only degrees 1 or 2 in $\Gamma'$. Therefore, $\Gamma'$ is a union of vertex-disjoint polygonal paths and cycles (without repeating vertices).

If $\Gamma'$ has no cycles, we have a vertex-disjoint set of polygonal paths. Assume that $\Gamma'$ has a cycle $C$. Then $C$ consists of some $e_{j_1},\ldots, e_{j_l}\in \{e_{i_1},\ldots, e_{i_k}\}$ and all their endpoints. Hence for the edges $e_{j_1},\ldots, e_{j_l}$ it holds that $1\le l\le k\le n-1$, but the sequence $(v_1(e_{j_1}),v_2(e_{j_1}), \ldots, v_1(e_{j_l}),v_2(e_{j_l}))$ contains each vertex twice (since $C$ is a cycle), which contradicts the conditions of Item (3).

$(3)\Rightarrow (4)$ Suppose by contradiction that all of the words from $w_\Gamma \setminus \sigma$ have even length. From each word $u = u_1u_2\ldots u_{2t} \in w_\Gamma \setminus \sigma$, we choose $t$ non-consecutive edges of the Eulerian transversal of $\Gamma$, namely, $ (u_1,u_2)$, $ (u_3,u_4),\ \dots, \ (u_{2t-1}, u_{2t})$. Since $|\sigma| \geqslant 1$, we have no more than $(2n-2)/2 = n-1$ edges chosen in total. Also, every letter from $\{1, 2, \dots, n\} \setminus \sigma$ occurs exactly twice among all endpoints of the chosen edges, contradicting Item (3).

$(4)\Rightarrow (3)$ Assume, by contradiction, that Item (3) does not hold, i.e., there exists $ k$, $1 \leqslant k \leqslant n-1$, and there exists a set $\mathcal{X} =\{e_{i_1}, e_{i_2}, \dots, e_{i_k}\}\subset E(\Gamma)$ such that the edges from $\mathcal{X}$ are pairwise non-consecutive in the transversal and each element in the sequence

\begin{equation*} \mathcal{Y}=(v_1(e_{i_1}),v_2(e_{i_1}),v_1(e_{i_2}),v_2(e_{i_2}), \ldots, v_1(e_{i_k}),v_2(e_{i_k}))\end{equation*}

occurs exactly twice.

To obtain a contradiction, we construct a proper non-empty subset $\sigma \subset \{1,2,\dots,n\}$ such that all words from $w_\Gamma \setminus \sigma$ have even length.

Let $\delta$ be the set of all distinct elements from $\mathcal{Y}$. Let us consider $\sigma = \{1,2,\dots, n\} \setminus \delta$, i.e., $\sigma$ is the set of all 4-degree vertices of $\Gamma$ that are not endpoints of the edges from $\mathcal{X}$. Note that $|\sigma| = n - k$.

Consider an arbitrary word $u \in w_\Gamma \setminus \sigma$. We show that $|u|$ is even. Let $u$ be represented as a sequence of letters $ u= u_1u_2\ldots u_{\ell}$. Note that all the letters $u_1, u_2, \dots, u_{\ell}$ correspond to vertices that belong to edges from $\mathcal{X}$, moreover, from the edges $(u_{j-1},u_j)$ and $(u_{j},u_{j+1})$ exactly one belongs to $\mathcal{X}$, for every $j = 2, \dots, \ell - 1$.

The word $u$ starts with $u_1$, hence $(u_1,u_2) \in \mathcal{X}$. However, the word $u$ is a subword of $w_\Gamma$, and by assumption, the edges in $ \mathcal{X}$ are non-consecutive in the Eulerian transversal of $\Gamma$, therefore, $(u_2,u_3) \notin \mathcal{X}$. Further, $u_3$ belongs to some edge from $\mathcal{X}$, therefore, $(u_3,u_4) \in \mathcal{X}$. Continuing this argument, we have the alternation of the inclusions:

\begin{equation*} (u_1, u_2) \in \mathcal{X},\ (u_2,u_3) \notin \mathcal{X}, \ (u_3,u_4) \in \mathcal{X} \dots \end{equation*}

The symmetric argument provides:

\begin{equation*} (u_\ell, u_{\ell - 1}) \in \mathcal{X},\ (u_{\ell - 1}, u_{\ell - 2}) \notin \mathcal{X}, \ (u_{\ell - 2},u_{\ell - 3}) \in \mathcal{X} \dots \end{equation*}

Therefore, $\ell = |u|$ is even. Since $u$ was an arbitrary word from $w_\Gamma \setminus \sigma$, we obtain a contradiction. This completes the proof.

Definition 3.7. An assembly word $w$ is maximal if $|\mathcal{C}(\Gamma_{w})| = F_{2n+1}-1$.

According to Proposition 3.6, Item (4), an assembly word is maximal if and only if after deleting any non-empty proper subset of symbols from the word, the remaining sequence of subwords contains at least one subword of an odd length.

Definition 3.8. A composition of two assembly words $u$ and $v$ without common letters (denoted by $u \circ v$) is simply a concatenation $ w = uv$.

Remark 3.9. A composition of two assembly words $u$ and $v$ can be interpreted in terms of simple assembly graphs as follows (see also [Reference Angeleska, Jonoska and Saito3, Definition 3.9]). We choose an orientation of the Eulerian transversal for each graph induced by its assembly word. Now, a composition of two directed simple assembly graphs $\Gamma_u$ and $\Gamma_v$ is the directed simple assembly graph obtained by identifying the terminal vertex of $\Gamma_u $ with the initial vertex of $\Gamma_v$. This composition is denoted by $\Gamma_u\circ \Gamma_v$.

The following is an observation that the composition of two assembly words cannot be maximal.

Corollary 3.10. If $w = u\circ v$, where $u$ and $v$ are both assembly words, then $w$ is not maximal.

Proof. Suppose $w$ is maximal. Let $\sigma$ denote the set of all letters of $u$. Then the sequence $w \setminus \sigma$ consists of the word $v$. Since $v$ is an assembly word, its length is even, which contradicts Proposition 3.6, Item (4).

4. Proof of Conjecture 1.1

In this section, we prove Conjecture 1.1, which specifies the structure of a maximal assembly word. We start showing that each maximal word in some way contains a tangled cord. Then we show that, in order to satisfy Proposition 3.6, a maximal word contains nothing but a tangled cord. We underline that everything is done in terms of assembly words, not assembly graphs, and we only return to the graph terms in Theorem 4.7 and Corollary 4.8.

The following lemma shows that a maximal word $w$ always contains a tangled cord that starts with the first letter of $w$ and ends with its last letter.

Lemma 4.1. Let $w$ be an assembly word in the alphabet $\Sigma$, $|w|=2n$, and suppose that $w$ cannot be represented as a composition of two assembly words. Then there exist $s \geqslant 1$ and $\sigma = \{t_1, t_2, \ldots, t_{s}\} \subseteq \Sigma$ such that ${\mathfrak o}_1(t_1) = 1 $, ${\mathfrak o}_2(t_s) = 2n$ and $w(\sigma) = w_{TC_s}$ up to a renaming of the alphabet letters, i.e., $w(\sigma)$ equals $t_1t_2t_1t_3t_2\dots t_s t_{s-1}t_s$.

Proof. We construct such a tangled cord step by step, making the next step while the last letter of obtained tangled cord is not the last letter of $w$.

We start with the first letter of $w$, set $t_1 = w[1]$.

If ${\mathfrak o}_2(t_1)=2n$, we set $s = 1$ and the required word is $w_{TC_1} = t_1t_1$.

Suppose that we already constructed the subword

\begin{equation*}w_{TC_{k}} = w(\{t_1, t_2, \dots, t_{k-1}, t_{k}\})\end{equation*}

of length $2k$ such that for every $1 \leqslant i \leqslant k$, the letter $t_i$ is chosen in such a way that

\begin{equation*}w(\{t_1, t_2, \dots, t_{i-1}, t_{i}\}) = w_{TC_{i}} = t_1t_2t_1t_3t_2\dots t_{i} t_{i-1}t_{i},\end{equation*}

and $ {\mathfrak o}_2(t_{i})$ is the maximal possible with this property.

If ${\mathfrak o}_2(t_{k})=2n$ then the required tangled cord is constructed. Otherwise we add the next letter as follows.

Consider the subword $w[({\mathfrak o}_2(t_{k})+1) : 2n]$. According to the conditions of the lemma, $w$ cannot be a composition of two assembly words. Then there exists a letter $t_{k+1}$ such that ${\mathfrak o}_1(t_{k+1}) \lt {\mathfrak o}_2(t_k) \lt {\mathfrak o}_2(t_{k+1})$.

Note that $1 \lt {\mathfrak o}_2(t_1) \lt {\mathfrak o}_2(t_2) \lt \ldots \lt {\mathfrak o}_2(t_k) \lt {\mathfrak o}_2(t_{k+1}).$ There exists $i$, $1 \leqslant i \leqslant k$, such that ${\mathfrak o}_2(t_{i-1}) \lt {\mathfrak o}_1(t_{k+1}) \lt {\mathfrak o}_2(t_i)$, here we formally set ${\mathfrak o}_2(t_{0}) = 1$. Then let us consider

\begin{equation*}w(\{t_1, t_2, \dots, t_i, t_{k+1}\}) = t_1t_2t_1t_3t_2\dots t_i t_{i-1}t_{k+1} t_i t_{k+1} = w_{TC_{i+1}}.\end{equation*}

Assume firstly that $i \lt k$. Then up to renaming of letters the words $w(\{t_1, t_2, \dots, t_i, t_{i+1}\})$ and $w(\{t_1, t_2, \dots, t_i, t_{k+1}\})$ are both equal to $w_{TC_{i+1}}$. However, ${\mathfrak o}_2(t_{k+1}) \gt {\mathfrak o}_2(t_{i+1})$, contradicting the choice of $t_{i+1}$. This contradiction implies that $i = k$, so $w(\{t_1, t_2, \dots, t_k, t_{k+1}\}) = w_{TC_{k+1}}$, proving that there exists at least one letter $t_{k+1}$ with the desired property. Now we choose $t_{k+1}$ in such a way that $w(\{t_1, t_2, \dots, t_k, t_{k+1}\}) = w_{TC_{k+1}}$ and ${\mathfrak o}_2(t_{k+1})$ is maximal possible. Thus we are able to continue the process.

Since $w$ has finite length, the described process stops at step $s$, returning $\sigma = \{t_1, t_2, \dots, t_s\}$ such that ${\mathfrak o}_1(t_1) = 1 $, ${\mathfrak o}_2(t_s) = 2n$ and $w(\sigma) = t_1t_2t_1t_3t_2\dots t_s t_{s-1}t_s = w_{TC_s}$. This completes the proof.

Definition 4.2. Let $w$ be a word (not necessarily an assembly word) in the alphabet $\Sigma$, $|w| = N$. We say that $w$ contains a framing tangled cord of order $s$ if there exist $t_1, t_2, \dots, t_s \in \Sigma$ such that $w[1] = t_1$, $w[N] = t_s$, and $w(\{t_1, t_2, \dots, t_s\}) = w_{TC_s}$ (up to a renaming of the alphabet letters), i.e., it equals $t_1t_2t_1t_3t_2\dots t_s t_{s-1}t_s$.

Remark 4.3. Note that if $w$ contains a framing tangled cord, then it may contain two or more different framing tangled cords by choices of different $s$ and $t_2,\dots,t_{s-1}$. For example, the word $w = 1 2 3 4 1 5 2 6 4 5 3 6$ has framing tangled cords $w(\{1, 3, 6 \}) = 131636$, $w(\{1,4,6\}) = 141646$, $w(\{1,2,5,6\}) = 12152656$.

Corollary 4.4. An assembly word contains a framing tangled cord if and only if this word cannot be represented as a composition of two assembly words.

Proof. Necessity. Follows from Lemma 4.1: an assembly word that cannot be represented as a composition of two assembly words always contains a framing tangled cord.

Sufficiency. Suppose that an assembly word $w$ contains a framing tangled cord $TC_s$ of order $s$ with letters $t_1, \dots, t_s$. Assume by the contradiction that $w$ can be represented as a composition of two words $w = u\circ v$. Consider $ K_{max} = \max \{{\mathfrak o}_2(t_i) \, | \, 1 \leqslant i \leqslant s, {\mathfrak o}_2(t_i) \leqslant |u|\}$, i.e., the last number of the position corresponding to a second occurrence of a letter from the framing tangled cord in $u$. Note that $w[K_{max}] = t_k $ for some $k = 1, 2, \dots, s$. We have $k \lt s,$ since for the framing tangled cord, ${\mathfrak o}_2(t_s)= |w| \gt |u|.$ Now, for the letter $t_{k+1}$ it follows from the tangled cord structure that ${\mathfrak o}_1 (t_{k+1}) \lt {\mathfrak o}_2 (t_k) = K_{max} \leqslant |u| \lt {\mathfrak o}_2 (t_{k+1})$. This contradicts the representation of $w$ as the composition $w=u \circ v$, since the letter $t_{k+1}$ occurs both in $u$ and $v$.

Below we show that if an assembly word $w$ contains a framing tangled cord, then one can remove some letters of the tangled cord from $w$ so that all the remaining subwords are of even length.

Example 4.5. Consider the assembly word $w$ containing a framing tangled cord of order $3$. Assume that the parity of the lengths of the subwords between the letters of the cord are given by Figure 6. To obtain subwords of even length only, one should remove the letters $t_2$ and $t_3$.

Figure 6.

Deleting $t_2$ and $t_3$ yields subwords of even length only.

Lemma 4.6. Let $w$ be a word of even length (not necessarily an assembly word) in which each symbol occurs at most twice. Suppose $|w| \gt 2s$ and $w$ contains a framing tangled cord of order $s$, whose letters are $t_1, t_2, \dots, t_s$. Then there exists non-empty $\sigma \subseteq \{t_1, t_2, \dots, t_s\}$ such that all the words in $w \setminus \sigma$ have even length.

Proof. The proof is by induction on $s$. Let $|w| = 2n$.

The base: $s = 1$. Hence $w[1] = w[2n] = t_1$, and we set $\sigma = \{t_1\}$. Then $w \setminus \sigma$ contains a unique word, and its length is $2n-2$, which is even.

Induction step. Suppose that the lemma’s statement holds for all positive integers less than $s$. Then our goal is to prove it for $s$. Recall that by the conditions, we have: $w[1] =~t_1$, $w[2n] = t_s$, and $w(\{t_1, t_2, \dots, t_s\}) = w_{TC_s}=t_1t_2t_1t_3t_2\dots t_s t_{s-1}t_s$. We distinguish the following cases.

Case 1. ${\mathfrak o}_2(t_i)$ is even for some $1 \leqslant i \lt s$. Note that ${\mathfrak o}_1(t_i)$ and ${\mathfrak o}_2(t_i)$ are well defined since the framing tangled cord is an assembly word. Then consider the word $w' = w[1:~{\mathfrak o}_2(t_i)]$ of even length. It contains a framing tangled cord of order $i$. Indeed, $w'(\{t_1, t_2, \dots, t_i\}) = w_{TC_i}=t_1t_2t_1t_3t_2\dots t_i t_{i-1}t_i$. Also, ${\mathfrak o}_1(t_{i+1}) \lt {\mathfrak o}_2(t_i)$, hence $|w'| \gt 2i$. Using induction hypotheses, one can find $\sigma \subseteq \{t_1, t_2, \dots, t_i\}$ such that all the words in $w' \setminus \sigma$ have even length. Then clearly all the words in $w \setminus \sigma$ have even length (since ${\mathfrak o}_2(t_i)$ and $|w|$ are both even), as required. See Figure 7.

Figure 7.

Illustration for Lemma 4.6, Case 1, $s=5$, $i=3$. The framing tangled cord for $w'$ is bolded.

Case 2. ${\mathfrak o}_1(t_j)$ is odd for some $1 \lt j \leqslant s$. This reduces to Case 1 via reversing the word $w$ (all odd indices become even and vice versa, since $|w|$ is even and all first occurrences of symbols become second occurrences). Note that a reversed tangled cord is also a tangled cord, i.e., the symmetric argument is valid, see Remark 2.5. See Figure 8.

Figure 8.

Illustration for Lemma 4.6, Case 2, $s=4$, $j=3$. The framing tangled cord for $w'$ is bolded.

Case 3. The remaining case corresponds to the following conditions. The number ${\mathfrak o}_1(t_j)$ is even for all $1 \lt j \leqslant s$ and the number ${\mathfrak o}_2(t_i)$ is odd for all $1 \leqslant i \lt s$. We set $\sigma = \{t_1, t_2, \dots, t_s\}$ and prove that $w \setminus \sigma$ contains only the words of even length. See Figure 9.

Figure 9.

Illustration for Lemma 4.6, Case 3, $s=4$. Odd and even positions alternate, so all the subwords between the cord’s letters are of even length.

We note that each of the elements of $w \setminus \sigma$, except the first one $w[{\mathfrak o}_1(t_1) +1 : {\mathfrak o}_1(t_2) - 1]$ and the last one $w[{\mathfrak o}_2(t_{s-1}) +1 : {\mathfrak o}_2(t_s) - 1]$, is situated exactly between first and second occurrences of two distinct letters of $TC_s$: it is either $w[{\mathfrak o}_1(t_j) +1 : {\mathfrak o}_2(t_{j-1}) - 1]$ for some $1 \lt j \leqslant s$ or $w[{\mathfrak o}_2(t_{i}) +1: {\mathfrak o}_1(t_{i+2}) - 1]$ for some $1 \leqslant i \lt s - 1$.

Since for $1 \lt j \leqslant s$ every first occurrence of $t_j$ is even and for $1\leqslant i \lt s$ every second occurrence of $t_i$ is odd, we have that the number of letters between these occurrences is even. Indeed,

\begin{align*} &\left|w[{\mathfrak o}_1(t_j) +1 : {\mathfrak o}_2(t_{j-1}) - 1]\right| = {\mathfrak o}_2(t_{j-1}) - {\mathfrak o}_1(t_j) - 1, \quad \text{ is even for}\ j \gt 1;\\ &\left|w[{\mathfrak o}_2(t_{i}) +1: {\mathfrak o}_1(t_{i+2}) - 1]\right| = {\mathfrak o}_1(t_{i+2}) - {\mathfrak o}_2(t_i) - 1, \quad \text{is even for}\ i \lt s-1.\\ \end{align*}

Finally we note that ${\mathfrak o}_1(t_1) = 1$ and ${\mathfrak o}_1(t_2)$ is even, hence

\begin{equation*}\left|w[{\mathfrak o}_1(t_1) +1: {\mathfrak o}_1(t_2) - 1]\right| = {\mathfrak o}_1(t_2) - {\mathfrak o}_1(t_1) - 1 \end{equation*}

is even. Similarly, ${\mathfrak o}_2(t_s) = 2n$ and ${\mathfrak o}_2(t_{s-1})$ is odd, hence

\begin{equation*}\left|w[{\mathfrak o}_2(t_{s-1}) +1: {\mathfrak o}_2(t_s) - 1]\right| = {\mathfrak o}_2(t_s) - {\mathfrak o}_2(t_{s-1}) - 1\end{equation*}

is even.

Thus all the words in $w \setminus \sigma$ have even length. This completes the proof.

Now we provide a characterization of the maximal words, see Definition 3.7, and state that a double occurrence word $w$ is maximal if and only if $w=w_{TC_n}$.

Theorem 4.7. Let $w$ be an assembly word in the alphabet $\Sigma = \{1,2,\dots,n\}$ and $|w| = 2n$. Then the following statements are equivalent:

1) For any proper non-empty subset $\sigma \subset \Sigma$, at least one word from $w\setminus \sigma$ has odd length.
2) $w = w_{TC_n}$ up to renaming of symbols.

Proof. We prove the equivalence by consecutively applying facts from above.

1. $1 \Rightarrow 2$ First, we note that according to Proposition 3.6, $w$ is a maximal word (see Definition 3.7). Hence, due to Corollary 3.10, $w$ cannot be represented as a composition of two assembly words. From Lemma 4.1, it follows that $w$ contains a framing tangled cord $TC_s$ of order $s$. Finally, if $|w| \gt 2s$, then Lemma 4.6 yields that $w$ is not maximal, which contradicts Item 1. Hence $|w| = 2s = 2n$, thus $w = w_{TC_n}$ up to renaming of symbols.
2. $2\Rightarrow 1$ It is known (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Corollary 4.4]) that $w_{TC_n}$ is a maximal word. Hence Item 1 follows from Proposition 3.6.

Application of Theorem 4.7 to the result of Proposition 3.6 implies our main result.

Corollary 4.8. Let $\Gamma$ be a simple assembly graph, $\mathcal{V}(\Gamma) = n$. The number of all Hamiltonian sets of polygonal paths for the graph $\Gamma$ is maximal, i.e., $|\mathcal{C}(\Gamma)| = F_{2n+1}-1$ if and only if $\Gamma = TC_n$ (up to an isomorphism).

This proves Conjecture 1.1. As a corollary, we obtain that each non-maximal word can be split into subwords of even length via the removal of some tangled cord. In fact, the corollary below states that the subword formed by the minimal set of symbols that can be removed from a word such that the remaining subwords are of even length always forms a tangled cord.

Corollary 4.9. Let $w$ be an assembly word in the alphabet $\Sigma$, $|w| = 2n$. Suppose that $w \neq w_{TC_n}$ (so it is not a maximal word), and let $\sigma \subset \Sigma$ be a non-empty subset of letters of minimal size such that $w \setminus \sigma$ is a collection of subwords of even length only. Then $\Gamma_{w(\sigma)}$ is a tangled cord.

Proof. Suppose that $|\sigma| = k$ and $w(\sigma) = s_1 s_2 \dots s_{2k}$ is not a tangled cord. Note that each letter from $\sigma$ occurs in $w(\sigma)$ exactly twice, i.e., $s_i$ are not all distinct. Consider the following representation of $w$:

\begin{equation*}w = S_1\,s_1\,S_2\,s_2 \dots S_{2k} \, s_{2k} \, S_{2k+1}.\end{equation*}

Here for each $i\in \{1,2, \dots, 2k+1\}$ we have that $S_i$ is either an empty subword or $S_i\in w \setminus \sigma.$

Note that $w(\sigma)$ is not a maximal word by Theorem 4.7. Then, according to Proposition 3.6, there exists a proper non-empty subset $\gamma \subset \sigma$ such that $w(\sigma) \setminus \gamma$ consists of subwords of even length only.

Let $|\gamma| = l \lt k$ and note that $w(\gamma)$ represents as $w(\gamma) = (w(\sigma))(\gamma) = g_1 g_2 \dots g_{2l}$, $g_i\in \Sigma$. We have a representation of $w(\sigma):$

\begin{equation*}w(\sigma) = G_1 g_1 G_2 g_2 \dots G_{2l} g_{2l} G_{2l+1}.\end{equation*}

Here for each $j\in \{1,2, \dots, 2l+1\}$ we have that $G_j$ is either an empty subword or $G_j\in w (\sigma) \setminus \gamma.$ Note that $G_1 = s_1 s_2\dots s_{|G_1|}$, $g_1 = s_{|G_1|+1}$, in general, $g_i = s_{|G_1|+\dots + |G_i|+i}$ and

\begin{equation*}G_i = w(\sigma)\left[|G_1|+\dots + |G_{i-1}|+ (i-1)+1: |G_1|+\dots + |G_i|+i -1 \right].\end{equation*}

Finally, consider the representation of the initial word $w$ with the letters from $\gamma$ and subwords:

\begin{equation*}w = U_1 g_1 U_2 g_2 \dots U_{2l} g_{2l} U_{2l+1}.\end{equation*}

Here for each $t\in \{1,2, \dots, 2l+1\}$ we have that $U_t$ is either an empty subword or $U_t\in w \setminus \gamma.$ For example, $U_1 = S_1s_1S_2s_2\dots s_{|G_1|}S_{|G_1|+1}$. Note that $|U_i|$ is even. To be more precise,

\begin{equation*} |U_i| = |G_i| + \sum_{r = |G_1| + \ldots + |G_{i-1}| + i}^{|G_1| + \ldots + |G_{i}| + i} |S_r|, \quad i = 1, 2, \dots, 2l+1. \end{equation*}

Here each $|S_r|$ is even by the conditions, $|G_i|$ is even by the choice of $\gamma$ and it is the number of letters from $s_1, s_2, \dots, s_{2k}$ that are presented in $U_i$.

Therefore, $w \setminus \gamma$ consists of subwords of even length only, which contradicts the fact that the size of $\sigma$ is minimal. The obtained contradiction implies that $\Gamma_{w(\sigma)}$ is the tangled cord $TC_k$.

Acknowledgements

The authors are cordially thankful to the referees for their valuable suggestions improving the presentation of the results and for the important references.

The research of E. Kreines was partially supported by ISF Grant 1092/22, also she is grateful to colleagues at Tel Aviv University for a warm working atmosphere. N. Jonoska was partially supported by the grants NSF DMS-2054321, CCF-2107267, CCF-2505771, and the W.M. Keck Foundation. The work of A. Maksaev was supported by the HSE University Basic Research Program.

References

Andersen, J. E., Penner, R. C., Reidys, C. M. and Waterman, M. S., Topological classification and enumeration of RNA structures by genus, J. Math. Biol. 67(5) (2013), 1261–1278.CrossRef Google Scholar PubMed

Andersen, L. D., Bouchet, A. and Jackson, B., Orthogonal A-trails of 4-regular graphs embedded in surfaces of low genus, J. Comb. Theory, Ser. B 66(2) (1996), 232–246.CrossRef Google Scholar

Angeleska, A., Jonoska, N. and Saito, M., DNA recombinations through assembly graphs, Discrete Appl. Math. 157(14) (2009), 3020–3037.CrossRef Google Scholar

Angeleska, A., Jonoska, N., Saito, M. and Landweber, L. F., RNA-guided DNA assembly, J. Theor. Biol. 248(4) (2007), 706–720.CrossRef Google Scholar PubMed

Buck, D., Dolzhenko, E., Jonoska, N., Saito, M. and Valencia, K., Genus Ranges of 4-Regular Rigid Vertex Graphs, Electron. J. Combin. 22(3) (2015), 3–47.CrossRef Google Scholar PubMed

Burns, J., Dolzhenko, E., Jonoska, N., Muche, T. and Saito, M., Four-regular graphs with rigid vertices associated to DNA recombination, Discrete Appl. Math. 161(10-11) (2013), 1378–1394.CrossRef Google Scholar

Fleischmann, P., Haschke, L. and Löck, T. et al. Word-representable graphs from a word’s perspective, Acta Inform. 61 (2024), 383–400.CrossRef Google Scholar

Guterman, A. E., Kreines, E. M. and Ostroukhova, N. V., Double occurrence words: their graphs and matrices, J. Math. Sci. (New-York) 249 (2020), 139–157.CrossRef Google Scholar

Kauffman, L. H., Virtual knot theory, Eur. J. Combin. 20(7) (1999), 663–690.CrossRef Google Scholar

Kitaev, S., A Comprehensive Introduction to the Theory of Word-Representable Graphs. Developments in Language Theory. DLT 2017. Lecture Notes in Computer Science, In: Charlier, E., Leroy, J. Rigo, M., (eds.) Volume 10396 (Springer, Cham, 2017).Google Scholar

Kitaev, S., Patterns in Permutations and Words (Springer, 2011).CrossRef Google Scholar

Kitaev, S. and Lozin, V., Words and Graphs (Springer, 2015).CrossRef Google Scholar

Kitaev, S. and Pyatkin, A. V., On representable graphs, J. Autom. Lang. Comb. 13(1) (2008), 45–54.Google Scholar

Shabat, G. B. and Voevodsky, V. A., Drawing curves over number fields, The Grothendieck Festschrift III, Progress in Mathematics, 88 (1990), 199–277.Google Scholar

The On-Line Encyclopedia of Integer Sequences, http://oeis.org/.Google Scholar

Figure 1. Some diagrams for a rigid vertex $v$ of degree 4.

Figure 2. Consecutively drawing transverse path $v_0\,e_0\,v_1\,e_1\,v_1\,e_2\,v_2\,e_3\,v_3\,e_4\,v_3\,e_5\,v_2\,e_6\,v_4$.

Figure 3. An example of Hamiltonian set of polygonal paths with three paths: $\{v_1 e_1 v_2 e_{15} v_7, \, v_5 e_4 v_4 e_{13} v_8 e_{10} v_3, \,v_6\}$. Note that $v_6$ is a singleton.

Figure 4. How the tangled cord $TC_{n}$ is inductively constructed from $TC_{n-1}$. The visualization provides the ‘correct’ cyclic order of half-edges in each rigid vertex.

Figure 5. Enumerated edges in a graph with the assembly word $ 112323 $.

Figure 6. Deleting $t_2$ and $t_3$ yields subwords of even length only.

Figure 7. Illustration for Lemma 4.6, Case 1, $s=5$, $i=3$. The framing tangled cord for $w'$ is bolded.

Figure 8. Illustration for Lemma 4.6, Case 2, $s=4$, $j=3$. The framing tangled cord for $w'$ is bolded.

Figure 9. Illustration for Lemma 4.6, Case 3, $s=4$. Odd and even positions alternate, so all the subwords between the cord’s letters are of even length.

Article contents

Hamiltonian Sets of Polygonal Paths in Assembly Graphs

Abstract

Keywords

MSC classification

Information

1. Introduction

Conjecture 1.1.

2. Terminology and notation

Definition 2.1 (see [Reference Angeleska, Jonoska and Saito3, Definition 3.3])

Definition 2.4 (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Definition 4.2])

3. Hamiltonian sets of polygonal paths

Theorem 3.1 (see [Reference Burns, Dolzhenko, Jonoska, Muche and Saito6, Theorem 4.1])

4. Proof of Conjecture 1.1

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests