## 1. Problem description

Design criteria for stellarators stem from many physical phenomena as well as many disciplines. For instance, designs must consider fusion performance metrics from multiple models and scales, engineering specifications that determine the realizability of the design, and financial timelines that set the pace of research and construction. The collection of these criteria give rise to highly constrained design optimization problems with many, potentially competing, objectives. Any design decision ultimately comes with a trade-off. For instance, an improvement in particle confinement may increase the burden on engineers to build more complex coils, and tightening of financial constraints may simplify the design and worsen some aspects of transport. When making design decisions, such as how to weight or balance objectives in an optimization or how tightly to enforce a constraint, it is critical to understand how each choice affects other aspects of the design.

To understand trade-offs that appear in stellarator design, we turn to the use of multi-objective optimization (MOO) (Ehrgott Reference Ehrgott2005; Miettinen Reference Miettinen2012; Emmerich & Deutz Reference Emmerich and Deutz2018). MOO treats problems of the form

where $\boldsymbol {x}\in \mathbb {R}^{{n_{\boldsymbol {x}}}}$ is a vector of design variables, $\boldsymbol {\varOmega }\subseteq \mathbb {R}^{{n_{\boldsymbol {x}}}}$ is a compact set and $F_1,\ldots, F_{{n_{\boldsymbol {F}}}}$ are, typically differentiable, objectives. MOO explores the ‘Pareto optimal’ space of solutions – those which are neither better nor worse than one another and provide a distinct trade-off in the value of objectives – as well as develops an understanding of how a reduction in one objective may require worsening of another.

In this study, we introduce multi-objective optimization in the context of stellarator design. We discuss the basics of MOO, as well as a practical solution method for solving MOO problems: the $\boldsymbol {\epsilon }$-constraint method. We also present a continuation method which leverages a local expansion of the Pareto front to explore it efficiently. We apply these MOO methods to bring insight into the selection of two common design parameters: the aspect ratio of an ideal magnetohydrodynamic (MHD) equilibrium and the total length of the electromagnetic coils. The aspect ratio has long been considered to have a trade-off with the degree to which a configuration is quasi-symmetric. A conjecture by Garren & Boozer (Reference Garren and Boozer1991) states that quasi-symmetry can only be achieved through second order in the inverse aspect ratio, and hence that decreasing aspect ratio would result in a worsening of the degree of quasi-symmetry. Our numerical experiments suggest that this is indeed the case, but the trade-off is modest. We also explore the relationship between the total allowable length of the coil pack and the ability for the coils to reproduce a target magnetic field. There is a natural trade-off here since longer more complex coils can reproduce more intricate fields than the same number of shorter coils (Landreman Reference Landreman2017; Wechsung *et al.* Reference Wechsung, Landreman, Giuliani, Cerfon and Stadler2022*b*).

Upon computing the Pareto front, a stellarator designer can visualize the trade-off between multiple objectives and decide how to balance this trade-off in their design. Since the Pareto front provides a set of solutions rather than a single solution, an engineer or physicist can choose between multiple Pareto optimal designs by evaluating them under additional design criteria.

This paper is structured as follows: in § 2, we review MOO, as well as two methods for solving MOO problems. Subsequently, in § 3, we apply the optimization methods to explore two trade-off problems in stellarator design. Finally, in § 4, we look beyond our case studies to the more general use of MOO for stellarator design and discuss future directions.

We use bold characters, such as $\boldsymbol {x}$, to denote vectors. We denote a vector $\boldsymbol {x}$ with entry $j$ removed as $\bar {\boldsymbol {x}}^j = (x_1,\ldots,x_{j-1},x_{j+1},\ldots,x_{n_{\boldsymbol {x}}})$. We compare vectors using vector inequalities: if $\boldsymbol {x} \le \boldsymbol {y}$, then $x_j \le y_j$ for $j=1,\ldots,{n_{\boldsymbol {x}}}$.

## 2. Multi-objective optimization

Multi-objective optimization problems typically do not have a single solution, but rather an entire set of solutions. These are points which balance a trade-off between objectives: improvement in one objective implies a worsening of the other. The set of solutions to a MOO problem is called the Pareto front. In this section, we briefly formalize this notion, before introducing practical solution techniques for MOO problems.

The idea of a trade-off can be formalized through the notion of non-dominance. A point $\boldsymbol {x}\in \boldsymbol {\varOmega }$ dominates $\boldsymbol {y}\in \boldsymbol {\varOmega }$, with respect to the objectives $\boldsymbol {F}$, if $\boldsymbol {F}(\boldsymbol {x}) \le \boldsymbol {F}(\boldsymbol {y})$ and $\boldsymbol {F}(\boldsymbol {x}) \neq \boldsymbol {F}(\boldsymbol {y})$. Informally, $\boldsymbol {x}$ is at least as good as $\boldsymbol {y}$ in all the objectives. If $\boldsymbol {x}$ is not dominated by any other point in $\boldsymbol {\varOmega }$, then $\boldsymbol {x}$ is called efficient, or Pareto optimal, and $\boldsymbol {F}(\boldsymbol {x})$ is a non-dominated point. Plainly, $\boldsymbol {x}$ is efficient if there is no other point that is at least as good in all objectives and better in at least one objective. For example, if $\boldsymbol {x}$ is efficient, then another point $\boldsymbol {y}$ could still satisfy $F_1(\boldsymbol {y}) < F_1(\boldsymbol {x})$, but then it would have to hold that $F_j(\boldsymbol {y}) > F_j(\boldsymbol {x})$ for some $j\neq 1$. The set of all efficient points forms the efficient set and the set of all non-dominated points forms the non-dominated set, which is more commonly known as the Pareto front. Finding the Pareto front is the goal of multi-objective optimization. A visualization of an efficient set and Pareto front are shown in figure 1. Intuitively, efficient points are those which cannot be compared by their objective values alone. For instance, $\boldsymbol {x}$ may have less complex coil shapes than $\boldsymbol {y}$, but may not reproduce a target magnetic field as well. Based off of these facts alone, neither $\boldsymbol {x}$ nor $\boldsymbol {y}$ is a dominant configuration and additional information would be needed to determine which configuration is preferable.

In the absence of convex objectives, finding globally Pareto efficient solutions is difficult. Instead, we settle for searching for weakly, locally efficient points, which is all most MOO algorithms can guarantee in this setting, given a finite sample size. Here, $\boldsymbol {x}\in \boldsymbol {\varOmega }$ is weakly efficient (Ehrgott Reference Ehrgott2005) if there is no $\boldsymbol {y}\in \boldsymbol {\varOmega }$ such that $\boldsymbol {y}$ strictly dominates $\boldsymbol {x}$, $\boldsymbol {F}(\boldsymbol {y})<\boldsymbol {F}(\boldsymbol {x})$. The set of weakly efficient points contains all efficient points but can also contain ‘extensions’ from the edge of the Pareto front, see figure 2. A point $\boldsymbol {x}\in \boldsymbol {\varOmega }$ is locally efficient (Emmerich & Deutz Reference Emmerich and Deutz2018) if there exists a non-empty open ball containing $\boldsymbol {x}$ such that $\boldsymbol {F}(\boldsymbol {x})$ is non-dominated in the intersection of the ball and $\boldsymbol {\varOmega }$. A local Pareto front can be found by solving (1.1) with local optimization methods, rather than globally solving (1.1) to compute the true Pareto front. In figure 2, the dashed line indicates a local efficient set, which is Pareto optimal within the shaded region. We find that finding weakly, locally efficient solutions is enough to provide insight into the trade-offs of interest.

MOO problems are most commonly solved by either directly applying MOO algorithms (Deb *et al.* Reference Deb, Pratap, Agarwal and Meyarivan2002; Knowles Reference Knowles2006; Wagner *et al.* Reference Wagner, Emmerich, Deutz and Ponweiser2010; Daulton *et al.* Reference Daulton, Eriksson, Balandat and Bakshy2022; Chang & Wild Reference Chang and Wild2023) or by reformulating the MOO problem as a series of scalar optimization problems that can be solved with scalar optimization methods. Algorithmic approaches attempt to find and explore the efficient set by taking steps to jointly minimize a combination of the objectives. These methods are a practical option for problems with many objectives, since the algorithm can weigh how to efficiently explore the high dimensional Pareto front. Scalarization methods, however, allow for more user involvement and incorporation of problem specific information, such as derivatives, which many MOO algorithms do not use. Scalarization methods can be formulated and solved efficiently because they rely on scalar optimization methods, and allow the user to control the exploration of the efficient set. A particularly popular scalarization approach is the ‘linearization method’ (Ehrgott Reference Ehrgott2005). The linearization method finds efficient points by solving the linearized problem

where the user-selected weight vector, $\boldsymbol {\alpha }$, must be non-negative and sum to one. The linearization method, however, has two major drawbacks (Das & Dennis Reference Das and Dennis1996): (1) if the Pareto front is non-convex, then no weight vector $\boldsymbol {\alpha }$ exists such that the solution to the linearized problem, (2.1), lies on the non-convex portion of the Pareto front; (2) a uniformly spaced selection of weight vectors does not uniformly explore the Pareto front. For these reasons, it is worthwhile to look beyond the linearization method when solving non-convex MOO problems.

In the remainder of this section, we discuss a scalarization method, the $\boldsymbol {\epsilon }$-constraint method, which overcomes the drawbacks of the linearization method and is particularly useful for trade-offs in stellarator design. We also discuss a continuation method for locally exploring the Pareto front efficiently.

### 2.1. The $\epsilon$-constraint method

Multi-objective optimization problems can be transformed into a series of scalar optimization problems via scalarization methods. One such scalarization method, the $\boldsymbol {\epsilon }$-constraint method (Ehrgott Reference Ehrgott2005), finds (weakly) Pareto optimal points by minimizing a single objective subject to upper bound inequality constraints on all other objectives, with an upper bound parameter $\boldsymbol {\epsilon }\in \mathbb {R}^{{n_{\boldsymbol {F}}}-1}$. To find points on the Pareto front, the $\boldsymbol {\epsilon }$-constraint method solves problems of the form

where the user selects the index $j$ and vector $\boldsymbol {\epsilon }$. The $\boldsymbol {\epsilon }$-constraint method theoretically guarantees that it can be used to find any point on the Pareto front: if $\boldsymbol {x}$ is a (local) solution to (2.2), then it is weakly (locally) efficient. Furthermore, $\boldsymbol {x}$ is efficient if and only if it is a solution to (2.2) for all choices of $j\in \{1,\ldots,{n_{\boldsymbol {F}}}\}$ (Ehrgott Reference Ehrgott2005).

The $\boldsymbol {\epsilon }$-constraint method is particularly relevant to stellarator optimization since we often have bounds on the range of the objectives, which allows us to easily select the upper bound parameter, $\boldsymbol {\epsilon }$. For example, devices often have an aspect ratio between $3$ and $10$, a coil must be at least as long as the minor circumference of the plasma, and it should not be longer than a few times that. By selecting $\boldsymbol {\epsilon }$, targeted areas of the Pareto front can be searched. Thisis more interpretable than using other scalarization techniques, such as a weighted sum of objectives, which requires selecting abstract weights that do not have any clear relation to the Pareto front.

### 2.2. Continuation methods

Continuation methods (Hillermeier *et al.* Reference Hillermeier2001; Schtze, Dell'Aere & Dellnitz Reference Schütze, Dell'Aere and Dellnitz2005; Gkaragkounis *et al.* Reference Gkaragkounis, Papoutsis-Kiachagias, Asouti and Giannakoglou2018; Peitz, Ober-Blöbaum & Dellnitz Reference Peitz, Ober-Blöbaum and Dellnitz2019; Vasilopoulos *et al.* Reference Vasilopoulos, Asouti, Giannakoglou and Meyer2021) in MOO explore the Pareto front locally around a given efficient point. By using optimality conditions and the implicit function theorem (Krantz & Parks Reference Krantz and Parks2002), continuation methods build local models of the Pareto front, which they use to estimate nearby efficient points. Local exploration of the Pareto front with continuation methods comes at a relatively low computational cost when compared to restarting a scalarization solver, such as the $\boldsymbol {\epsilon }$-constraint method, from scratch. However, when combined, continuation methods and scalarization solvers make a powerful pair: the scalarization methods find efficient points over distinct parts of the Pareto front and the low-cost continuation method is used to fill in the space between points. In this section, we discuss a predictor–corrector continuation method based off of the $\boldsymbol {\epsilon }$-constraint method, which first predicts an estimate of an efficient point using a Taylor expansion and subsequently uses the $\boldsymbol {\epsilon }$-constraint method to correct the predictions, see Algorithm . While other predictor–corrector continuation schemes (Schütze *et al.* Reference Schütze, Dell'Aere and Dellnitz2005; Peitz *et al.* Reference Peitz, Ober-Blöbaum and Dellnitz2019; Vasilopoulos *et al.* Reference Vasilopoulos, Asouti, Giannakoglou and Meyer2021) write the expansion in terms of weights, or target points, we expand the Pareto front in terms of $\boldsymbol {\epsilon }$ because it is easily interpreted and allows for the $\boldsymbol {\epsilon }$-constraint method to be used in the corrector step.

The key idea of the predictor step is to notice that the $\boldsymbol {\epsilon }$-constraint method allows us to parametrically describe weakly efficient points in terms of $\boldsymbol {\epsilon }$, i.e. we can write weakly efficient points as $\boldsymbol {x}(\boldsymbol {\epsilon })$. In fact, as we will show, under reasonable conditions, $\boldsymbol {x}(\boldsymbol {\epsilon })$ is a continuously differentiable map. The predictor step leverages this parametric representation to approximate the efficient set by a Taylor expansion. Given an efficient point $\boldsymbol {x}$, which is a solution to (2.2) with parameter $\boldsymbol {\epsilon }$, the predictor step locally approximates the efficient set by

where ${\rm \Delta} \boldsymbol {\epsilon }$ is a small change in $\boldsymbol {\epsilon }$, and $\nabla _{\boldsymbol {\epsilon }} \boldsymbol {x}$ is the Jacobian of $\boldsymbol {x}(\boldsymbol {\epsilon })$ with respect to $\boldsymbol {\epsilon }$. Though less accurate, a zeroth-order Taylor expansion could be used in place of the first-order expansion. A zeroth-order expansion simplifies the predictor–corrector method, Algorithm , by using the last known efficient point $\boldsymbol {x}(\boldsymbol {\epsilon })$ as a prediction of a nearby efficient point $\boldsymbol {x}(\boldsymbol {\epsilon } + {\rm \Delta} \boldsymbol {\epsilon })$, and is equivalent to warm starting the $\boldsymbol {\epsilon }$-constraint method with $\boldsymbol {x}(\boldsymbol {\epsilon })$. The zeroth-order predictor is useful when the Jacobian $\nabla _{\boldsymbol {\epsilon }}\boldsymbol {x}$ is intractable to approximate or the cost of running the $\boldsymbol {\epsilon }$-constraint method is low. However, if $\nabla _{\boldsymbol {\epsilon }}\boldsymbol {x}$ can be approximated, then the first-order expansion will give more accurate predictions of $\boldsymbol {x}(\boldsymbol {\epsilon }+{\rm \Delta} \boldsymbol {\epsilon })$ than the zeroth-order expansion, and should be used. A visualization of the first-order predictor corrector method is shown in figure 3. In the remainder of this section, we will discuss conditions which determine when the expansion, (2.3), exists and a method of computing the Jacobian $\nabla _{\boldsymbol {\epsilon }}\boldsymbol {x}$.

The existence of the expansion, (2.3), and the computation of the Jacobian $\nabla _{\boldsymbol {\epsilon }}\boldsymbol {x}$ rely on applying the implicit function theorem (Krantz & Parks Reference Krantz and Parks2002) to the first order necessary conditions for optimality of (2.2), the Karush–Kuhn–Tucker (KKT) conditions (Nocedal & Wright Reference Nocedal and Wright1999). The KKT conditions are necessary conditions for a point $\boldsymbol {x}$ to be an optimal solution to an inequality constrained optimization problem, such as the $\boldsymbol {\epsilon }$-constraint problem (2.2). In the case of the $\boldsymbol {\epsilon }$-constraint method, points satisfying the KKT conditions for (2.2)are also weakly locally efficient.

Suppose that $\boldsymbol {x}$ is an optimal solution to (2.2), and $\mathcal {A}\subseteq \{1,\ldots,j-1,j+1,\ldots {n_{\boldsymbol {F}}}\}$ is the set of $m$ active constraints, i.e. the set of constraints $i$ for which $F_i(\boldsymbol {x})=\epsilon _i$. The KKT conditions with the linear independence constraint qualification (LICQ) (Nocedal & Wright Reference Nocedal and Wright1999) state that there exist $\boldsymbol {\lambda }\in \mathbb {R}^{m}$ such that $\boldsymbol {\lambda } \ge 0$ and

Notice that (2.4) and (2.5) show that if a constraint is active and $\lambda _i >0$, then the solution of the $\boldsymbol {\epsilon }$-constraint method is dependent on $\epsilon _i$ in the sense that a small change in $\epsilon _i$ could change the solution to the system of equations. In this case, it is reasonable to write $\boldsymbol {x}$ as a function of $\boldsymbol {\epsilon }_{\mathcal {A}} = (\epsilon _i)_{i\in \mathcal {A}}$ to stress the dependence of solutions on the parameters, $\boldsymbol {x}(\boldsymbol {\epsilon }_{\mathcal {A}})$. The implicit function theorem gives us the mathematical leeway to do so.

Informally, the implicit function theorem states that if there exists a point $\boldsymbol {x}^*$, with Lagrange multipliers $\boldsymbol {\lambda }^*$ and constraint parameters $\boldsymbol {\epsilon }_{\mathcal {A}}^*$, which satisfy the KKT conditions, and if the Jacobian, $\boldsymbol {D}(\boldsymbol {x}^*,\boldsymbol {\lambda }^*)$, of the left-hand side of (2.4) and (2.5) with respect to $(\boldsymbol {x},\boldsymbol {\lambda })$ is invertible, then $\boldsymbol {x}(\boldsymbol {\epsilon }_{\mathcal {A}}), \boldsymbol {\lambda }(\boldsymbol {\epsilon }_{\mathcal {A}})$ are continuously differentiable functions of $\boldsymbol {\epsilon }_{\mathcal {A}}$ around $\boldsymbol {\epsilon }_{\mathcal {A}}^*$. Taking the form of a continuously differentiable function, the space of solutions to the $\boldsymbol {\epsilon }$-constraint problem (2.2) can be approximated with the first-order Taylor expansion, (2.3). The Jacobian $\nabla _{\boldsymbol {\epsilon }}\boldsymbol {x}$ can be computed by solving the system

The system (2.7) is derived by treating $\boldsymbol {x}$ and $\boldsymbol {\lambda }$ as a function of $\boldsymbol {\epsilon }$ and differentiating (2.4) and (2.5) with respect to $\boldsymbol {\epsilon }$. Importantly, $\boldsymbol {D}$ depends on second derivatives of the objective functions. If $\boldsymbol {D}$ is invertible, then the implicit function theorem guarantees existence of the expansion (2.3), and a unique solution exists to (2.7) (Krantz & Parks Reference Krantz and Parks2002).

Invertibility of the Jacobian matrix $\boldsymbol {D}$ occurs naturally when $\boldsymbol {x},\boldsymbol {\lambda }$ satisfy the KKT conditions, and $\boldsymbol {x}$ is a strict local minimum: $\nabla ^2 F_j(\boldsymbol {x}) + \sum _{i\neq j}\lambda _i \nabla ^2 F_i(\boldsymbol {x})$ is positive definite over the orthogonal subspace to $\{\boldsymbol {\nabla } F_i\}$ for all $i\in \mathcal {A}$ such that $\lambda _i>0$. However, the expansion does not exist if there are no active constraints: if all $F_i(\boldsymbol {x}) < \epsilon _i$, then a small variation in any $\boldsymbol {\epsilon }_i$ should not affect the solution. For example, in figure 1, the predictor–corrector method cannot be used beyond the tips of either branch of the efficient set since the KKT conditions cease to hold. Furthermore, if some constraint $i$ is active but the associated Lagrange multiplier $\lambda _i =0$, then a small increase in $\epsilon _i$ will not change the solution to (2.2), but a small decrease would. In this case, $\lambda _i$ is only sub-differentiable (Clarke Reference Clarke1990) with respect to $\epsilon _i$. While an expansion can be derived in this case, we ignore it for simplicity, as this case only appears at the edges of the efficient set or at saddle points of $F_j$. We are primarily concerned with the case when all active constraints have strictly positive Lagrange multipliers, since in this case, there is a trade-off to be made between objectives: reduction in $F_j$ comes at the cost of increase in $\epsilon _i$.

Now that we have established when a local expansion of the efficient set exists, and how to compute it, we can discuss the predictor–corrector method. The predictor step of the predictor–corrector method takes as input an efficient point $\boldsymbol {x}$ and a step size ${\rm \Delta} \boldsymbol {\epsilon }$. Subsequently, we compute the Lagrange multiplier $\boldsymbol {\lambda }$. If some components of $\boldsymbol {\lambda }$ are strictly positive, then we compute $\boldsymbol {D}(\boldsymbol {x},\boldsymbol {\lambda })$, solve (2.7) for $\nabla _{\boldsymbol {\epsilon }}\boldsymbol {x}$, then predict a weakly efficient point via the expansion, (2.3).

When a simulation is used to compute $\boldsymbol {F}$, it may not be possible to evaluate $\boldsymbol {F}$ at the predicted point since the simulation can ‘fail’. For example, the Variational Moments Equilibrium Code (VMEC) may fail to converge if the predicted point represents an ideal MHD equilibrium with self-intersecting flux surfaces. To proceed with the corrector step, the predicted point must evaluate under $\boldsymbol {F}$. To ensure that such a point can be found, the predicted point can be modified using a backtracking line search (Nocedal & Wright Reference Nocedal and Wright1999). The backtracking procedure iteratively reduces ${\rm \Delta} \boldsymbol {\epsilon }$ to $\gamma {\rm \Delta} \boldsymbol {\epsilon }$ for $\gamma \in (0,1)$ and predicts the value of $\boldsymbol {x}(\boldsymbol {\epsilon }+{\rm \Delta} \boldsymbol {\epsilon })$, until $\boldsymbol {F}$ can be evaluated at the predicted point $\boldsymbol {x}(\boldsymbol {\epsilon }+{\rm \Delta} \boldsymbol {\epsilon })$ or ${\rm \Delta} \boldsymbol {\epsilon }$ is smaller than some tolerance $\tau _{\varDelta }$.

For small enough ${\rm \Delta} \boldsymbol {\epsilon }$, the prediction will typically be a very good estimate of an efficient point. Nonetheless, it is helpful to finely resolve the point, particularly as this allows us to compute the subsequent predictor step accurately. The corrector step corrects the prediction by simply solving the $\boldsymbol {\epsilon }$-constraint problem, (2.2), with new constraint parameters $\boldsymbol {\epsilon } + {\rm \Delta} \boldsymbol {\epsilon }$. This method is detailed in Algorithm .

#### 2.2.1. Computation of the Lagrange multiplier

In practice, we rarely have a point $\boldsymbol {x}$ that exactly satisfies any constraint with equality. Typically, an inequality constraint may approximately be satisfied with equality, i.e. $\overline {F}^j(\boldsymbol {x}) = \boldsymbol {\epsilon } - \nu$ for a small value of $\nu > 0$. In this case, theoretically the constraint is not active and so the Lagrange multipliers are equal to zero. However, the numerics suggest that the constraint is active and the Lagrange multiplier should not be zero. In practice, we determine the active set $\mathcal {A}$ as the set of all constraints with $\nu _i< \tau$ for some small tolerance $\tau$. The Lagrange multiplier can then be computed by solving the KKT conditions for a perturbed version of (2.2) where the constraint right hand side, $\epsilon _i$, is shifted to $\epsilon _i - \nu$ for all constraints in $\mathcal {A}$. In this way, the constraints hold with exact equality, since $\overline {F}^j(\boldsymbol {x}) = \boldsymbol {\epsilon } - \nu$, and we are justified in computing the potentially non-zero Lagrange multipliers. The Lagrange multipliers are then estimated as the solution to

Importantly, this procedure affects the prediction step in that the expansion is taken around $\boldsymbol {\epsilon } + \nu$. However, this is accounted for in step 1 of Algorithm .

## 3. Numerical experiments

In this section, we use MOO to bring insight into the selection of two common stellarator design parameters: the aspect ratio of an ideal MHD equilibrium and the total length of the electromagnetic coils.

Our first experiment attempts to determine if there is a trade-off between achieving precise quasi-symmetry in a ‘stage-one’ stellarator design (an ideal MHD equilibrium) and having low aspect ratio. The Garren–Boozer conjecture (Garren & Boozer Reference Garren and Boozer1991) suggests that exact quasi-symmetry is only possible at high aspect ratio. In addition, precise quasi-symmetry has been achieved throughout a volume in high aspect ratio stellarators (Giuliani *et al.* Reference Giuliani, Wechsung, Landreman, Stadler and Cerfon2022*b*; Landreman Reference Landreman2022; Landreman & Paul Reference Landreman and Paul2022; Wechsung *et al.* Reference Wechsung, Landreman, Giuliani, Cerfon and Stadler2022*b*). However, it is unclear at what rate quasi-symmetry decays as the aspect ratio is increased or if this trade-off applies to precise quasi-symmetry at all. Our first experiment answers the following question: ‘To what extent does the aspect ratio limit the degree to which quasi-symmetry can be achieved throughout a volume?’

Our second experiment considers a trade-off in the ‘stage-two’ design problem, where coils are optimized to fit a target magnetic field. A target magnetic field can be recreated arbitrarily well by coils which have no constraint on their length. However, when restricted to have a reasonably short length for engineering purposes, coils may not be able to reproduce a target magnetic field. In this problem, we aim to understand how reduction in the allowable coil length worsens the reproduction of the target magnetic field.

### 3.1. Problem 1: the aspect ratio and quasi-symmetry trade-off

We seek a plasma boundary shape, parametrized by $\boldsymbol {x}$, of an ideal MHD equilibrium for a quasi-helical (QH) stellarator configuration that has minimal aspect ratio $A(\boldsymbol {x})$ and deviation from quasi-symmetry $Q_{M,N}(\boldsymbol {x})$,

Including bound constraints on the aspect ratio $A_l=3, A_u=10$, restricts our decision space to a realistic range of configurations. For convenience, we collect the ${n_{\boldsymbol {F}}}=2$ objectives into the vector $\boldsymbol {F}(\boldsymbol {x}) :=(A(\boldsymbol {x}),Q_{M,N}(\boldsymbol {x}))$.

The violation of quasi-symmetry is defined as the quasi-symmetry-ratio-residual objective employed by Landreman & Paul (Reference Landreman and Paul2022). The objective measures the departure from quasi-symmetry throughout the plasma volume as the sum of flux surface averages across surfaces $(s_1,\ldots,s_{n_s})$,

The helicity parameters, $M,N$, determine the type of quasi-symmetry that $Q_{M,N}$ measures: $M=1,N=0$ being quasi-axisymmetry, and $M=1, N=\pm k{n_{\textrm {fp}}}$ with non-zero integers $k$ being quasi-helical symmetry (Landreman & Paul Reference Landreman and Paul2022). For our experiments, the helicity parameters were set to $M=1,N=-{n_{\textrm {fp}}}$ for the ${n_{\textrm {fp}}}=4$ field period magnetic field. Discretization of the flux surface average over the poloidal and toroidal angles results in an objective with sum-of-squares structure, $Q_{M,N}(\boldsymbol {x}) = \sum _{i=1}^{{n_{Q}}} Q_{M,N,i}^2(\boldsymbol {x})$, where $Q_{M,N,i}$ measures the violation of quasi-symmetry at a point throughout the volume. For a given plasma boundary, $\boldsymbol {x}$, we compute the aspect ratio using the definition in the VMEC code,

*a–c*)\begin{equation} A(\boldsymbol{x}) = R_{\text{major}}(\boldsymbol{x})/R_{\text{minor}}(\boldsymbol{x}) \quad R_{\text{minor}} = \sqrt{\bar{C}_A(\boldsymbol{x})/{\rm \pi}} \quad R_{\text{major}(\boldsymbol{x})} = \frac{V(\boldsymbol{x})}{2{\rm \pi}^2R_{\text{minor}}^2(\boldsymbol{x})}, \end{equation}

where $\bar {C}_A$ is the average cross-sectional area of the surface and $V$ is the volume enclosed by the surface.

The decision variables, $\boldsymbol {x}$, are Fourier amplitudes that describe the shape of the plasma boundary. The plasma boundary is represented in terms of the standard cylindrical coordinates $(R,\phi,Z)$ where $R,Z$ are parametrized as a Fourier series in the poloidal and toroidal angles $\theta$ and $\phi$,

The number of modes describing the surface ${n_\textrm {sm}} = \{0,1,2,\ldots \}$ can be increased to achieve more intricate boundary representations. Field period symmetry with ${n_{\textrm {fp}}}$ periods and stellarator symmetry have been assumed. The major radius $R_{0,0}$ is held fixed throughout the optimization to fix the scale of the design. The Fourier amplitudes are collected into the decision variable via $\boldsymbol {x} = (R_{0,1},\ldots, Z_{0,0},\ldots )$. The total number of decision variables satisfies ${n_{\boldsymbol {x}}} = 4{n_\textrm {sm}}^2 + 4{n_\textrm {sm}}$.

Numerical experiments were performed using SIMSOPT (Landreman *et al.* Reference Landreman, Medasani, Wechsung, Giuliani, Jorge and Zhu2021) to handle variables, compute objectives and interface with VMEC which computed the ideal MHD equilibria from the plasma boundary representation. The $\boldsymbol {\epsilon }$-constraint method was used to find points along the Pareto front where quasi-symmetry was set to be the target function for minimization while the aspect ratio was constrained by $\epsilon$. The bound constraints on the aspect ratio made specifying $\epsilon$ straightforward: $\epsilon$ was set to linearly spaced values between $A_l$ and $A_u$. The $\boldsymbol {\epsilon }$-constraint problem (2.2) was solved by reformulating the constrained optimization problem with a quadratic penalty method (Nocedal & Wright Reference Nocedal and Wright1999), see § A. The penalty parameter was increased from an initial value of $1$ by a factor of $10$ at each iteration, and the quadratic penalty sub-problems were solved by applying a Gauss–Newton optimization routine. The optimization was run until the the change in $Q_{M,N}(\boldsymbol {x})$ was of the order of machine precision, $10^{-16}$, or the norm of the gradient reached $10^{-10}$. Upon taking a step with size of the order of $10^{-10}$, the optimization algorithm would terminate due to significant computational noise in the output of VMEC. SIMSOPT was used to compute forward difference gradients via MPI-based concurrent function evaluations. To avoid local minima, the number of Fourier modes, ${n_\textrm {sm}}$, was increased iteratively from $1$ to $5$, reaching ${n_{\boldsymbol {x}}}=120$ variables. The $\epsilon$-constraint problem was solved with the penalty method after each increase of ${n_\textrm {sm}}$.

To reduce the computational overhead of using the $\epsilon$-constraint method, we used the predictor–corrector method, introduced in § 2.2, to explore the Pareto front between solutions of the $\epsilon$-constraint method. For the high dimensional decision space of interest, the full Hessians $\nabla ^2 Q_{M,N}$ and $\nabla ^2 A$ are too expensive to compute with finite differences. Instead, we approximated the Hessian $\nabla ^2 Q_{M,N}$ with a Gauss–Newton Hessian approximation and use a diagonal second-order central difference approximation to $\nabla ^2 A$.

Figure 4 shows the Pareto front for the problem (3.1). Figure 5 shows three-dimensional renderings of the three solutions highlighted by the square, star and diamond markers in figure 4, as well as contour plots of the field strength in Boozer coordinates. From figure 4, it is clear that low values of the quasi-symmetry violation, $Q_{1,-4}(\boldsymbol {x}) < 10^{-3}$, can be achieved at all aspect ratios in our range. There is a slight trend indicating that quasi-symmetry may be achieved more precisely at higher aspect ratios. Nonetheless, by viewing the contour plots of the magnetic field strength in Boozer coordinates in figure 5, we see that configurations from all parts of the Pareto front have visibly precise quasi-symmetry.

While it seems that precise quasi-symmetry can be achieved at all aspect ratios considered, it is not clear how well the reduction in quasi-symmetry translates to an improvement in particle confinement. After all, achieving quasi-symmetry is a proxy for the true goal of achieving good confinement. To this end, we computed the fraction of alpha particles lost from $10$ distinct Pareto optimal configurations, with aspect ratios 3, 3.5, 4, 4.5, 4.9, 5.6, 6.0, 8.5, 8.7 and 9. To compute the losses, we scaled the Pareto optimal configurations to the ARIES-CS reactor (Najmabadi *et al.* Reference Najmabadi, Raffray, Abdel-Khalik, Bromberg, Crosatti, El-Guebaly, Garabedian, Grossman, Henderson and Ibrahim2008) scale ($1.7$ m minor radius and $B_{0,0}(s=0) = 5.7$ T field strength on axis), and traced $5000$ particles born on the $s=1/4$ flux surface as well $5000$ particles born on the $s=1/2$ flux surface until a terminal time of $0.1$ seconds. The reactor scaling and particle count were also used by Bader *et al.* (Reference Bader, Drevlak, Anderson, Faber, Hegna, Likin, Schmitt and Talmadge2019) and Landreman & Paul (Reference Landreman and Paul2022). Particles were traced according to the vacuum guiding center equations in Boozer coordinates and were deemed lost if they crossed the $s=1$ flux surface. The loss fractions are shown in table 1. Not a single particle born on the $s=1/4$ flux surface was lost from nine of the ten configurations, and the remaining configuration, with aspect ratio $5.6$, lost only $0.22\,\%$ of the alpha particles. A slightly larger fraction of the alpha particles born on the $s=1/2$ flux surface were lost, between $0\,\%$ and $2.6\,\%$for each configuration. The loss fraction of alpha particles born on the $s=1/2$ flux surface increases slightly as the aspect ratio increases up until $A\approx 6$, then drops to approximately zero on the right tail of the Pareto front. It seems that there is no clear trend between the aspect ratio of the Pareto optimal designs and the loss fractions for particles born on either of the two surfaces. Thus, while there is a slight trade-off between quasi-symmetry and aspect ratio, the trade-off between confinement and aspect ratio may be more complex.

Spatially, we may expect or hope that the set of desirable configurations forms a compact, connected region. However, we find the contrary – the efficient set is not connected. Qualitatively, the large gap in the Pareto front in figure 4 hints that the efficient set is undergoing a ‘branch change’. Branched curves, or branches, are parametric families of solutions to (2.4)–(2.6) which can be computed by continuation. Two branches can either be connected by a non-smoothness or be entirely disconnected sets, as in the case of figure 1. Branch changes are reflected in the Pareto front via a non-smoothness or gap where the Pareto front transitions from representing points on one branch to representing points on another branch. Quantitatively, we measure the branch change by evaluating the Lagrange multipliers at the left edge of the gap in the Pareto front (roughly $A = 6.11$) in figure 4; a branch change is indicated by $\lambda = 0$, since this implies that the aspect ratio can be increased without causing a reduction in the quasi-symmetry. Indeed, we find that the Lagrange multipliers are approximately zero and hence that the efficient set corresponding to the small piece of the Pareto front, with aspect ratio greater than or equal to $8.5$, is disconnected from the remainder of the efficient set. Within the gap in the Pareto front, configurations still can achieve exceptional levels of quasi-symmetry, as shown by the grey points in figure 4. The increase in quasi-symmetry in the gap may not be a physical phenomena, but may instead be due to poor numerical refinement in VMEC, local optimization or the representation of decision variables. It is interesting to see nonetheless that the set of desirable configurations is disparate.

All the Pareto optimal configurations found achieve exceptional levels of quasi-symmetry and particle confinement. From this, it is clear that the aspect ratio is not limiting the device performance. However, engineering criteria on coil shape, fabrication tolerances and placement tolerances may exhibit a greater trade-off with particle confinement, since the magnetic field generated by coils may not align well with the target magnetic field computed by ideal MHD. In the next section, we consider the trade-off between coil length and the ability for coils to reproduce a target magnetic field.

### 3.2. Problem 2: the coil length and quadratic flux trade-off

Given the shape of the last close flux surface, $\mathcal {S}$, the ‘stage-two’ stellarator optimization problem seeks to find magnetic coils with magnetic field $\boldsymbol {B}$, such that the field is orthogonal to the surface normal, $\boldsymbol {B}\boldsymbol {\cdot } \boldsymbol {n}=0$ (Merkel Reference Merkel1987; Zhu *et al.* Reference Zhu, Hudson, Song and Wan2017). Coils with a tight constraint on their length will not be able to form intricate shapes and reduce the normal component of the magnetic field to zero, whereas coils which are longer can form arbitrary shapes that better minimize $\boldsymbol {B}\boldsymbol {\cdot } \boldsymbol {n}$. Long coils, on the other hand, are undesirable from a financial and engineering standpoint: longer coils are more expensive to build, often have higher curvature and are more difficult to fit into the space around the device. In this experiment, we seek to understand the trade-off between coil length and the ability of coils to reproduce a target magnetic field.

We develop insight into this trade-off by solving the multi-objective problem

where $J_{\boldsymbol {n}}$ is the quadratic flux,

$L$ is the total length of the ${n_{C}}$ coils considered,

and the decision variables $\boldsymbol {x}$ represent coil shape parameters and the coil currents. The quadratic flux is a standard metric used to find coils in the stage-two problem (Merkel Reference Merkel1987; Landreman Reference Landreman2017; Zhu *et al.* Reference Zhu, Hudson, Song and Wan2017; Singh *et al.* Reference Singh, Kruger, Bader, Zhu, Hudson and Anderson2020; Kruger *et al.* Reference Kruger, Zhu, Bader, Anderson and Singh2021; Glas *et al.* Reference Glas, Padidar, Kellison and Bindel2022; Wechsung *et al.* Reference Wechsung, Giuliani, Landreman, Cerfon and Stadler2022*a*). To solve the bi-objective coil problem, (3.5), we use the coil optimization framework implemented in SIMSOPT. The modular coils are described by a Fourier representation of the Cartesian coordinates with ${n_\textrm {cm}}$ modes. The Fourier representation of the $x$-coordinate of the $i$th coil is

with analogous forms for the $y$ and $z$ coordinates. Each coil is additionally equipped with a current, though the current of the first coil is held fixed at $I_1 = 10^{5}$ A throughout the optimization. The number of Fourier amplitudes per coil is $3(2{n_\textrm {cm}} + 1)$ making the total number of design variables ${n_{\boldsymbol {x}}} = {n_{C}} 3(2{n_\textrm {cm}} + 1) + {n_{C}} -1$. The coils considered satisfy field-period symmetry, as well as stellarator symmetry. Thus, the quadratic flux is only computed over a half-field-period, the total coil length is only taken over the coils in a half-field-period and the decision variables $\boldsymbol {x}$ only represent the ${n_{C}}$ coils in a half-field-period. The plasma boundary shape $\mathcal {S}$ used in this experiment is that of the ${n_{\textrm {fp}}}=2$ field-period quasi-axisymmetric (QA) configuration from Landreman & Paul (Reference Landreman and Paul2022), which we will refer to as LP-QA. The ${n_\textrm {cm}}=5$ Fourier modes were used to describe the coil shapes.

To avoid ill-posedness of the objectives due to non-uniqueness of the underlying coil parametrization, a penalty on the variation of the arc length was used to encourage the coils to have uniform incremental arc length. The arc length variation objective discretizes each coil into $n_{\textrm {av}}$ segments and computes the variance of the set of arc lengths $\{l_i\}_{i=1}^{n_{\textrm {av}}}$ for each coil, i.e. for a single coil, $V =\textrm {Var}[\{l_i\}_{i=1}^{n_{\textrm {av}}}]$. The total arc length variation over the coil set is

The bi-objective problem was solved by applying the $\boldsymbol {\epsilon }$-constraint method, where quadratic flux was the objective and the total coil length was the constraint. The predictor–corrector method was not used to investigate this trade-off, since the computational expense of solving the coil optimization problem with the $\boldsymbol {\epsilon }$-constraint method was minimal. However, a warm-starting procedure or ‘zeroth-order’ predictor–corrector method, was used in which solutions of the $\boldsymbol {\epsilon }$-constraint method were used as a good initial guess of efficient points with similar values of $\boldsymbol {\epsilon }$. The $\boldsymbol {\epsilon }$-constraint problems were solved with a quadratic penalty method (Nocedal & Wright Reference Nocedal and Wright1999), see § A. The penalty parameter was increased from an initial value of $1$ by a factor of $10$ at each iteration, and the quadratic penalty sub-problems were solved by the L-BFGS-B algorithm (Zhu *et al.* Reference Zhu, Byrd, Lu and Nocedal1997) until the norm of the gradient reached $10^{-14}$ or $30\ 000$ iterations ellapsed. The arc length variation (3.9) was included as a penalty term with fixed penalty parameter $10^{-4}$.

Figure 6 shows the Pareto front for the bi-objective coil optimization problem (3.5) when ${n_{C}}=4$ coils are used per half-field-period. Figure 6 also shows three-dimensional renderings of the two Pareto optimal coils sets, highlighted as the star and diamond in the plot of the Pareto front. Figure 6 shows that increasing the coil length substantially improves the ability for coils to reduce the quadratic flux up until the average coil length over the minor circumference of the plasma boundary is approximately $6$, after which we numerically find no improvement. In addition, we find that when the average coil length over the minor circumference of the plasma boundary exceeds roughly $6$, the coils become exceedingly complex: coils curvature becomes large and coils begin to pass under one another, competing for space. At these coil lengths, coil curvature and the coil-to-coil separation should also be constrained.

While figure 6 shows that constraining the total coil length limits the ability for coils to reproduce a magnetic field, it does not show how the limits on the total coil length relate to a loss in quasi-symmetry or loss in confinement. In figure 7, we measure the extent to which the Pareto optimal coil configurations from figure 6 generate quasi-symmetric magnetic fields. To do so, we fit a quadratic flux minimizing (QFM) surface (Dewar, Hudson & Price Reference Dewar, Hudson and Price1994) to the coil-generated magnetic field $\boldsymbol {B}$, evaluate VMEC using the QFM surface as the boundary shape and compute the quasi-symmetric metric $Q_{1,0}$. The QFM surface is constrained to have identical volume to the LP-QA target surface. We find the QFM surface by solving the following problem in SIMSOPT:

Figure 7 shows that the quasi-axisymmetry degrades significantly, by four orders of magnitude, as the coil length constraints are tightened. Unlike the longer coil sets, the shorter coil sets generate significant coil ripple, degrading the quasi-symmetry. While it may not be possible to entirely avoid the loss of quasi-symmetry when coil constraints are tightened, the loss may be diminished by using single-stage optimization approaches to design the coils. Part of the loss of quasi-symmetry is inherent to the two-stage approach to coil design: the coils are optimized to reproduce a target magnetic field, but not to generate a quasi-symmetric field. Single-stage optimization approaches (Giuliani *et al.* Reference Giuliani, Wechsung, Cerfon, Stadler and Landreman2022*a*,Reference Giuliani, Wechsung, Landreman, Stadler and Cerfon*b* ; Jorge *et al.* Reference Jorge, Goodman, Landreman, Rodrigues and Wechsung2023), however, directly optimize coils for quasi-symmetry, ensuring that they achieve optimal quasi-symmetry levels over the space of coils satisfying the coil length constraint. Single-stage approaches would hence improve the unfortunate trade-off between coil length and quasi-symmetry.

## 4. Discussion

Understanding trade-offs in stellarator designs is particularly important in designing high-performance devices that satisfy the multitude of physical, engineering and financial criteria. Throughout this study, we have shown how MOO can be used to investigate trade-offs and develop insight into the role of design parameters. By quantifying the Pareto front, stellarator designers can develop an intuitive understanding of how one design criteria can affect other aspects of the design. Furthermore, the Pareto front provides a host of candidate configurations from which engineers and physicists can choose.

The example problems considered here were biobjective problems, but many stellarator design problems may have three or more competing objectives. MOO methods are useful in this context, however, the dimension of the Pareto front grows with the number of objectives, which makes visualizing the Pareto front difficult and thoroughly exploring the Pareto front often intractable. For this reason, we generally recommend exploring trade-offs between two or three objectives at a time. For three objectives, scalarization methods (Ehrgott Reference Ehrgott2005; Miettinen Reference Miettinen2012) and other gradient-based MOO methods (Fliege & Svaiter Reference Fliege and Svaiter2000; Schäffler, Schultz & Weinzierl Reference Schäffler, Schultz and Weinzierl2002; Désidéri Reference Désidéri2012; Gkaragkounis *et al.* Reference Gkaragkounis, Papoutsis-Kiachagias, Asouti and Giannakoglou2018; Peitz & Dellnitz Reference Peitz and Dellnitz2018) are an appropriate choice since they allow the use of derivatives and for the user to dictate which part of the Pareto front is explored. For problems with more than three objectives, parallel algorithmic approaches will find promising configurations most efficiently (Deb *et al.* Reference Deb, Pratap, Agarwal and Meyarivan2002; Knowles Reference Knowles2006; Wagner *et al.* Reference Wagner, Emmerich, Deutz and Ponweiser2010; Daulton *et al.* Reference Daulton, Eriksson, Balandat and Bakshy2022; Chang & Wild Reference Chang and Wild2023).

Looking beyond the two trade-offs considered here, there are a host of other trade-offs that appear in stellarator design that deserve attention. There is a natural trade-off, for instance, between coil complexity and quasi-symmetry that should be explored using a direct single-state optimization method, such as with the near-axis expansion (Giuliani *et al.* Reference Giuliani, Wechsung, Cerfon, Stadler and Landreman2022*a*). In addition, it is not well understood how stability criteria trade off with particle confinement criteria, or how design flexibility for multipurpose coils trades off with volume and quasi-symmetry (Lee *et al.* Reference Lee, Paul, Stadler and Landreman2022). Ideally, these problems should be solved and the Pareto optimal solutions should be tabulated in a way that allows practitioners to easily survey a multitude of configurations and analyse their strengths and weaknesses as a holistic set.

## Acknowledgements

*Editor Per Helander thanks the referees for their advice in evaluating this article.*

## Funding

This work was generously supported by a grant from the Simons Foundation (No. 560651, D.B.).

## Declaration of interest

The authors report no conflict of interest.

## Data availability

Code and data can be found at https://doi.org/10.5281/zenodo.7838063.

## Appendix A. Quadratic penalty methods

The quadratic penalty method (QPM) (Nocedal & Wright Reference Nocedal and Wright1999) solves inequality constrained optimization problems, such as (2.2), by solving a sequence of unconstrained optimization problems. The unconstrained optimization problems minimize a ‘merit function’, which captures the effects of the constraints through a smooth penalty on constraint violation.

For example, to solve (2.2), the QPM solves a sequence of unconstrained problems of the form

where $\mu _k>0$ is the ‘penalty parameter’. As $\mu _k\to \infty$, the sequence of minima $\{\boldsymbol {x}_k\}$ of the penalty function can converge to minima of the $\boldsymbol {\epsilon }$-constraint problem, (2.2). Heuristically, $\mu _0$ can be set to $1$ and increased by a factor of $10$ after each optimization, until the solution of (A1) reaches a desired level of constraint violation. Solutions of one iteration of (A1) should be used to warm start the optimization of subsequent iterations.

A practical drawback of the quadratic penalty approach is that achieving highly accurate solutions of (2.2) requires increasing $\mu _k$ to large values, and thus solving ill-conditioned optimization problems. This can be circumvented by using augmented Lagrangian methods (Nocedal & Wright Reference Nocedal and Wright1999), in place of the QPM. While somewhat inefficient in terms of sample complexity, the QPM is easy to implement and finds approximate solutions of the constrained optimization problem quite well in practice.