Role Detection in Bicycle-Sharing Networks Using Multilayer Stochastic Block Models

In urban spatial networks, there is an interdependency between neighborhood roles and the transportation methods between neighborhoods. In this paper, we classify docking stations in bicycle-sharing networks to gain insight into the human mobility patterns of three major United States cities. We propose novel time-dependent stochastic block models (SBMs), with degree-heterogeneous blocks and either mixed or discrete block membership, which classify nodes based on their time-dependent activity patterns. We apply these models to (1) detect the roles of bicycle-sharing docking stations and (2) describe the traffic within and between blocks of stations over the course of a day. Our models successfully uncover work, home, and other districts; they also reveal activity patterns in these districts that are particular to each city. Our work has direct application to the design and maintenance of bicycle-sharing systems, and it can be applied more broadly to community detection in temporal and multilayer networks with heterogeneous degrees.


Introduction
Transportation systems and commuting patterns shape and reveal the functional regions in a city [1,2]. The continuing development of network-analysis methods and the increasing availability of transportation data (from a multitude of transportation modes) gives exciting opportunities to improve understanding of urban dynamics at both large and small scales. One prominent example is bicycle-sharing systems, which are an emerging vehicle of urban transportation that can adapt quickly to the needs of travelers. The number of bicycle-sharing programs worldwide has grown rapidly, from 5 in 2005 to 1571 in 2018 [3]. Over 50 systems were launched in the United States alone from 2010 to 2016, and over 20 bicycle-sharing systems have been launched in France since 2005 [4,5]. Many existing bicycle-sharing systems are also growing. For example, the number of stations in New York City's 'Citi Bike' system has more than doubled since it began in 2013. Docked bicycle-sharing systems follow a general structure: Groups of bicycles are parked at 'stations' (which are also called 'docks' or 'hubs') throughout a coverage area, and users withdraw and return bicycles to these stations on demand, with a cost that depends on usage time. A growing portion of bicycle-share systems are dockless (as are the increasingly prominent e-scooters), so users can park bicycles at any location in a coverage area. In the present paper, we analyze docked systems, but we consider how to adapt our models to dockless systems in Section 7.
Data from bicycle-sharing systems are available from many cities throughout the world [6][7][8][9][10]. Such data, which include detailed temporal records and GPS-tracked routes in some cases, are helpful for capturing commuting behavior [11]. Therefore, it is valuable to analyze bicycle-sharing data to increase our understanding of urban flows and the properties of human com-muting. Bicycle-sharing is used often for 'last-mile' transportation, bridging the gap between public transportation and a final destination [12], and insights into the dynamics of bicycle-sharing systems can help transit systems evolve to meet the needs of changing cities [13,14]. Studying bicycle-sharing systems is also helpful for maintaining and expanding them [15]. For example, it can facilitate efficient redistribution of bicycles, a question that has received much research attention [16][17][18][19].
In the present paper, we propose two models of temporal network connectivity to capture the functional roles of bicycle-sharing stations. We use the lens of mesoscale-structure detection in time-dependent networks [20][21][22]. We examine trip histories from bicycle-sharing systems in the form of multilayer networks [23][24][25] in which each layer is a network of trips that start in a given hour.
We aim to partition each of our networks based on a relational equivalence of nodes (a perspective with a rich history in the social-networks literature [26,27]), rather than on high internal traffic within sets of nodes [7]. Data that has been aggregated over long periods of time can shed light on 'community' structure in the latter sense [6] through a partition of a network into contiguous spatial clusters [7]. However, it ignores how bicycle-sharing usage relates to travel patterns during a day [11]. By contrast, our models are designed to detect functional roles of bicycle-sharing stations based on time-dependent behavior.
The models that we introduce in this paper are time-dependent extensions of stochastic block models (SBMs) [28][29][30][31]. We include parameters to describe intra-block and inter-block traffic for each hour, but we fix the block assignment of each node over time. That is, we treat a bicycle-sharing network as a temporal multilayer network with fixed node identities across layers [23,24]. We use the terminology 'time-dependent' to emphasize the time-dependent nature of the application of interest and associated interpretations of our results, but our models are relevant for any multilayer network with the same set of nodes across layers and no interlayer edges. We also note that the layers do not need to be ordered.
We introduce mixed-membership and discrete-membership versions of our model, where nodes can be members of multiple blocks or exactly one block, respectively. Both versions of our model are degree-corrected. In an SBM without degree correction, the expected weight of an edge is determined by the block memberships of its incident nodes. By contrast, in a degreecorrected SBM, the expected weight of an edge also depends on the activity levels (i.e., the degrees) of its incident nodes. We extend the degree correction introduced by Karrer and Newman in [30] to our multilayer SBMs (see Section 3). Incorporating degree correction when modeling networks with heterogeneous degrees can increase the performance of a model while only slightly increasing its complexity [30].
In Section 6, we illustrate that degree correction enables us to identify blocks based on their functional roles in a network, rather than based on their levels of activity. This is especially important for bicycle-sharing networks in which stations have heterogeneous numbers of parking spaces for bicycles and different neighborhoods have different baseline levels of bicycle usage. Our models are applicable to both directed and undirected networks, but we consider only directed examples in the present paper. Additionally, although our models are inspired by the analysis of bicycle-sharing systems, they are applicable more generally to multilayer networks in which nodes have heterogeneous degrees and belong to fixed classes.
The SBMs that we introduce have parameters that describe block-toblock activity that are distinct for a given period of time; however, blockmembership (i.e., community-membership) parameters are fixed across all time periods. Most related SBMs that have been introduced differ in one or both of these conditions. Yang et al. [42] introduced a discrete-membership SBM in which the parameters for block-to-block activity are fixed over time but block memberships evolve over time. Xu and Hero [45] and Matias et al. [46] proposed related models (for unweighted and weighted networks, respectively). However, unlike the model in Yang et al., their models do not fix block-to-block activity parameters over time. Zhang et al. developed a time-dependent block model with degree correction in which block membership does not change over time and the edge dynamics are described by a continuous-time Markov process, such that edges are added or removed between block pairs at constant rates over time [43]. Mixed-membership SBMs with time-evolving communities have also been developed [47,48]. See Rossetti and Cazabet for a survey of community-detection methods for timedependent networks [49].
Community detection has been applied to urban bicycle sharing using a variety of approaches [6,7,36,[50][51][52][53][54]. Zhu et al. [55] applied k-means cluster-ing to undirected, time-dependent usage data from bicycle-sharing systems and other urban systems in New York City. Austwick et al. [6] examined modularity optimization with a directed and spatially-corrected null model to identify communities of stations in several cities. However, they detected communities in time-aggregated data, and their discussion pointed out that there are significant limitations when examining community structure while ignoring time-dependent behavior for bicycle-sharing applications. Munoz-Mendez et al. [7] identified communities by hour for bicycle-sharing data from London using an InfoMap algorithm [51] that respects the directed nature of edges in the underlying trip networks. The changes that they discovered in communities over time highlight the importance of time of day in the usage of bicycle-sharing systems.
There is some recent work that is closely related to ours. Matias et al. [56] constructed a time-dependent, discrete-membership SBM with fixed blocks over time and applied it to bicycle-sharing networks in London. They detected some functional blocks, but their approach does not incorporate degree correction. Xie and Wang [52] employed an approach that does not use an SBM directly, yet they were able to successfully partition a bicyclesharing network to find home and work roles of bicycle-sharing stations. They used the ratio of in-degree to out-degree at different times to discover homework splits during peak commute times. In the present paper, we observe such splits using our two-block models. A similarity of their approach to ours is that it corrects for degree; a key difference is that they relied on human supervision to determine peak hours, whereas our models implicitly increase the weights of more-active time periods in our likelihood function. Etienne and Latifa [5] used a Poisson mixture model to cluster bicycle-sharing stations in Paris based on their time-dependent usage profiles. They were able to capture time-dependent activity for each group, distinguish between incoming and outgoing activity, control for the overall activity level of a given station (via degree correction), and associate identified groups with their role in the city. A major difference between their approach and ours is that we distinguish activity between blocks, which allows us to detect behavior like last-mile commuting that occurs within blocks. We also consider mixed membership, which they proposed as a subject for future work.
Our paper proceeds as follows. In Section 2, we list our data sources and present basic statistical analysis of the data. In Section 3, we introduce the two versions of our time-dependent SBM -a discrete model and a mixed-membership model -and we show that they are equivalent up to a constraint. In Section 4, we describe the estimation algorithms for our discrete and mixed-membership SBMs. In Section 5, we generate synthetic networks using both of our models and a related model by Matias et al. [46], and we evaluate the fits of our models and theirs to such generated data. In Section 6, we present the results of our models for the bicycle-sharing networks in Los Angeles, San Francisco, and New York City. We discuss the implications of our work for bicycle-sharing systems and suggest areas of further study in Section 7. We show some additional details of our work in the Appendix. We provide code and data to implement our models and replicate the results in our paper at https://github.com/jcarlen/tdsbm_ supplementary_material.

Data
We examine United States bicycle-sharing systems in Los Angeles, the Bay Area, and New York City. For Los Angeles, we study only the system's downtown region, which is self-contained in the sense that very few trips connect it to stations that are not downtown. Similarly, for the Bay Area, we consider only the San Francisco network and exclude stations in Mountain View, Palo Alto, and San Jose. (If we applied our models to these nearly disconnected components, it would be better to treat them as separate networks than to model them jointly.) Each of our three focal systems covers a dense urban area, but they vary in their sizes and daily usage. Because of this variation and how the data were reported, we study different time periods for each system. We also selected our time periods to exclude days that are likely to be extremely hot or extremely cold. Each of the bicycle-sharing systems that we study has an open-data portal, from which we downloaded our data. After cleaning (see our discussion in the next paragraph), our data consist of the following:  [59].
A trip consists of a user checking out a bicycle from a fixed location (a station that includes multiple parking spaces) and returning it to a station. The data for each trip include the starting time; the ending time; and the starting and ending locations by station ID, latitude, and longitude. Each data set also has a few additional fields about the users; these details include whether they have memberships in the bicycle-sharing system, but we do not use this information in our investigation. We cleaned the data by removing anomalous trips, including extremely short and extremely long trips, 1 and trips to or from a station that were used for testing or maintenance (as indicated in the data). We also excluded a very small number of stations (two in Los Angeles and six in New York City) that did not have at least one departure and at least one arrival during the given time period. Finally, we excluded one station in New York City that was accessible to other stations only by ferry; it was involved in only nine trips during the given time period. In total, cleaning removed 7.1 % of the trips in Los Angeles, 4.5 % of the trips in San Francisco, and 1.4 % of the trips in New York City.
We retain self-edges, which represent trips that start and end at the same station. Although it is common to remove self-edges when analyzing networks [60], we expect the self-edges in our data to have a very similar data-generating mechanism as trips to nearby stations. As in [30], including self-edges also simplifies some aspects of parameter estimation. When fitting our models, we include only weekday trips, as we observe that weekday and weekend activity follow distinct patterns (see Figure 1). Moreover, weekend activity does not reflect commuting behavior. From the data sets, we construct multilayer networks that are both weighted and directed. In our networks, nodes represent stations, edge values encode the number of directed trips from one station to another that begin in a specified time period, and each of the 24 layers consists of the trips that start in a certain hour.

Preliminary Data Analysis
When examining the bicycle-sharing data sets, we discovered strong daily usage patterns, which motivate the development of our time-dependent SBMs. In Figure 1, we illustrate these usage patterns by plotting the number of trips 1 We take extremely short trips to be those that last two minutes or less. We take extremely long trips to be those that last 90 minutes or more in Los Angeles and San Francisco and 120 minutes or more in New York City (because of the larger coverage areas of stations in New York City,). by starting hour for each city. In each city, weekday activity spikes during morning and evening commuting hours (and Los Angeles also has a mid-day spike), whereas weekend trips peak in early afternoon. Similar patterns were observed previously for bicycle-sharing systems in New York and many other cities [5,6,52,55,61].
For individual stations, the morning and evening peaks for arriving trips (i.e., in-edges) and departing trips (i.e., out-edges) are often unbalanced. One direction has a stronger morning peak, and the opposite direction has a stronger evening peak. In the bottom panel of Figure 1, we show an example of this imbalance for the six busiest stations in San Francisco. (In Appendix 7.1, we further explore this issue using singular-value decomposition (SVD) of the trip data.) These patterns suggest the primary ways in which stations are used in commuting, and they motivate our time-dependent identification of stations into 'home' and 'work' types.
Another characteristic of our data that we incorporate into the design of our models is the strong positive Pearson correlation coefficient between the total (summed over all time periods) in-degree and out-degree of each station: 0.99 in New York, 0.98 in San Francisco, and 0.91 in Los Angeles. Accordingly, we apply degree correction to the overall degree of each station, rather than to the in-degrees and out-degrees separately (see Section 3.1). This correlation is an intrinsic feature of docked bicycle-sharing systems, because at some point bicycles must be returned to a station before any new trips can originate there. (The use of trucks to redistribute bicycles within a system can loosen this requirement.) More generally, the strong correlation also relates to a famous axiom of human mobility: for each current of travel, there is a countercurrent [1].

Our Stochastic Block Models
In this section, we introduce our time-dependent mixed-membership stochastic block model (TDMM-SBM) and time-dependent discrete stochastic block model (TDD-SBM). Stochastic block models are a popular class of statistical network models [31]. The motivating principle of SBMs is a notion of stochastic equivalence in which edges whose incident nodes have the same block membership are identically distributed. It is a standard assumption of SBMs that edge weights are independent, given the block memberships of its nodes. More formally, an unweighted (i.e., Bernoulli) random graph Y (with adjacency matrix A) from an SBM with K blocks is defined by where G (with components g i P t1, 2, . . . , Ku) is a vector of block assignments for the nodes of Y and η is a KˆK matrix of block-to-block edge probabilities. This definition allows directed graphs, in which η can be asymmetric. For early presentations of SBMs, see [28,29,62,63]. More recent advances have added flexibility to SBMs. Examples include mixed-membership and overlapping-membership SBMs [64,65], models with covariates [66], degreecorrected SBMs [30], and Bayesian implementations [31]. Applications of SBM to time-dependent networks include (1) discrete-membership [42,43] and mixed-membership [47,48] versions in which nodes can change blocks and (2) discrete-membership versions with fixed blocks over time, but without degree-correction [45,56].

Time-Dependent Mixed-Membership Stochastic Block Model (TDMM-SBM)
We now describe the framework for our mixed-membership SBM. Let i, j P N (with |N | " N ) be nodes, which represent bicycle stations; let g, h P K (with |K| " K) be blocks. We treat both the number of nodes and the number of blocks as fixed and given. Our data is a three-dimensional array of size pN, N, T q, where T is the number of time slices (i.e., time layers). We consider hourly groupings of the trips based on their starting times, and we do not include interlayer edges. 2 The quantity A ijt is the observed number of trips from station i to station j with a starting time that is at least t and less than t`1. LetÃ ij " ř 23 t"0 A ijt denote the edge weights of the associated time-aggregated matrix. Our network is a directed multilayer network, so we count each trip that both starts and ends at a node i during hour t (i.e., self-edges) exactly once in A iit .
For each node i, there is a latent length-K vector of real numbers C ig P r0, 1s. These numbers, which we will estimate, represent the mixed-membership block assignment of each node. The block-assignment parameter C ig indicates the 'strength' of node i in block g. For each ordered pair g, h of blocks and each time t P t0, 1, . . . , 23u (where t " 0 represents the hour that starts at midnight), there is a parameter ω ght , which we call the 'block-connectivity' parameter or 'block-to-block' parameter, that represents the directed activity from block g to block h during hour t. Note that ω ght need not be equal to ω hgt if a network is directed; in our application, this captures any asymmetries in the numbers of trips with respect to reversing origins and destinations. We also define the notationω gh " ř 23 t"0 ω ght for our time-aggregated matrix. For each pair of nodes, i and j, we assume that the number of trips that depart from i at time t and arrive at j is Poisson-distributed with mean µ ijt " ř g,h C ig ω ght C jh . Our use of the Poisson distribution follows [30] and [31], facilitates computation, and is standard for modeling count data (although overdispersion is a concern). For identifiability, we apply the constraint ř i C ig " 1 for all g. This does not constrain the set of possible models in terms of realizable mean edge activities µ ijt . Consider a model with unconstrained parameters ω ght and C ig . The model with parameters ω 1 ght and C 1 ig such that C 1 ig " C ig ř j C jg and ω 1 ght " ω ght´ř j C jg¯´ř j C jh¯i s an equivalent parameterization, because the means of the distributions of edge weights are equal to the those of the model with unconstrained parameters. That is, µ 1 ijt " Given that ř i C ig " 1, we can think of C ig as the proportion of the total activity of block g from the activity of node i. We interpret C ig relative to C ih as how strongly block g is associated with node i relative to how strongly block h is associated with node i. The expected total number of trips at node i is ř g C ig ř h,t pω ght`ωhgt q. Accordingly, ř g C ig is a measure of the activity of node i in which we do not weight each C ig term by the total activity of the corresponding block. We use these quantities when plotting the block assignments that we infer in our data using the TDMM-SBM (see the left panels of Figures 3 and 6) because they help ensure that we do not overlook blocks with important usage patterns but relatively lower activity. The parameter C ig is analogous to the degree-correction parameter for SBMs that was introduced in [30], but we apply it to mixed block membership. We elaborate on this connection in Subsection 3.2, where we introduce a model that specifies that nodes have only one block.
We now compute the likelihood function that we will optimize to obtain the maximum-likelihood estimate (MLE). We assume conditional independence between the Poisson-distributed hourly numbers of trips along each edge, given model parameters, so the likelihood of the data is where ω and C give the model parameters (i.e., ω " tω ght u and C " tC ig u). Note that µ ijt " ř g,h C ig ω ght C jh is a function of these parameters, the set N of nodes in the network is fixed and given, and the number K of blocks is also fixed and given. The unnormalized log-likelihood is although we omit the addition of the constant´ř i,j,t log pA ijt !q because it does not affect maximum of the function. (In (3) and throughout this paper, we use 'log' to denote the natural logarithm.)

Time-Dependent Discrete Stochastic Block Model (TDD-SBM)
We derive a single-membership SBM from our mixed-membership SBM by making the extra assumption that, for each node i P N , it is the case that C ig ą 0 for only one block g P K. (This SBM is thus the 'discrete version' of our model.) For our single-membership SBM, we introduce some new notation to aid our description and promote consistency with the notation in other work [30,67]. For a given node i, the block g for which C ig ą 0 is the block g i that includes node i. We introduce a scalar parameter θ i " C ig i for each node i; it is a multilayer extension of the degree-correction parameter in Karrer and Newman [30]. The expectation of the Poisson distribution of the value of an edge from node i to node j at time t is We retain the sum constraints of our mixed-membership model, so ř iPg θ i " 1 for all g.
We compute optimal values for the parameters ω and θ " tθ i u iPN . As in our TDMM-SBM, take N and K to be fixed and given. Again dropping the constant term´ř i,j,t log pA ijt !q, the log-likelihood of our single-membership SBM is This resembles the degree-corrected SBM in equation (14) of [30], but we have adapted it to directed networks and summed over time layers.
We find explicit formulas for the conditional MLEs of θ i and ω ght , given the block memberships g i . In the following calculations, removal of t from the subscript of a parameter and insertion of a tilde designates a sum over all t. Specifically, we defineÃ ij " where we have used the blockwise sum constraints on θ to simplify ř iPg,jPh θ i θ j to 1. Therefore, the conditional MLE for w ght iŝ where m ght is the sum of the weights of the edges from nodes in block g to nodes in block h during hour t. That is, m ght " ř iPg,jPh A ijt . We then differentiate with respect to θ i to obtain where k i " ř j´Ã ij`Ãji¯i s the degree of node i (i.e., the sum of the indegree and the out-degree 3 of i) over all time periods;m gh " ř 23 t"0 m ght ; and κ g " ř h pm gh`mhg q is the sum of the in-degrees and out-degrees of all nodes in block g over all time periods. The term 2m gg in the equation for κ g implies that we count each intra-block edge twice: once for emanating from g and once for arriving at g. Similarly, k i includes the term 2Ã ii , so we count self-edges twice in this term.
The parameter MLEĈ ig i "θ i in the TDD-SBM for directed, multilayer networks is analogous to the degree-correction parameter in the degreecorrected SBM for undirected networks with one layer [30]. Indeed, the MLE for the degree correction parameter of a node in the latter model is the degree of that node divided by the sum of the degrees of all nodes in its block. Another similarity between a degree-corrected SBM and our TDD-SBM is that at the MLE of the TDD-SBM, the expectation of the (time-aggregated) degree of a node i is equal to the degree of node i from the observed data. That is, ř t ř j pµ ijt`µjit q " k i . (See Appendix 7.2 for the proof.) For our mixed-membership SBM, we are not aware of such a precise relationship between the data and the expected value of model statistics, although there does appear to be a strong positive correlation between the time-aggregated node degrees and the sum of the mixed-membership parameters ( ř g C ig for all i).
We now substitute the above conditional parameter estimates into the unnormalized log-likelihood of the TDD-SBM to obtain wherem is the total number of edges in the network. By a similar calculation as one in [30], we obtain where k in,it and k out,it are the in-degrees and out-degrees of node i during hour t, the quantity κ in,gt " ř iPg k in,it is the number of edges that arrive at g in hour t, and κ out,gt " ř iPg k out,it is the number of edges that leave g in hour t. Including only the terms that depend on block assignments yields the objective function which we maximize to obtain the MLE of the block assignments. Unlike the directed SBM of Zhu et al. [67], we do not have two degreecorrection parameters for each station. Their directed SBM includes a parameter θ in i to modify in-degree and a parameter θ out i to modify out-degree. The validity of using undirected degree-correction parameters tθ i u iPN to model directed networks depends on the presence of a large correlation between the time-aggregated in-degrees and out-degrees of the nodes. In this situation, including distinct θ in i and θ out i parameters would lead to overfitting. We found Pearson correlations between in-degree and out-degree of over 0.9 for all of the bicycle-sharing networks that we examined (see Section 2).
Our model is able to capture the directedness of travel at a given time because the ω ght parameters are not constrained to be symmetric. To increase the applicability of our work to other networks (which may not have such large correlations), we provide estimation code in our implementation materials (see Section 4) for a version of our TDD-SBM with directed degreecorrection parameters θ in i and θ out i .

Computations
In this section, we describe the algorithms that we use for the TDMM-SBM and the TDD-SBM.

Inference using the TDMM-SBM
Let ω " tω ght u be the KˆKˆT array of block-connectivity parameters, and let C " tC ig u be the matrix of node-strength parameters. We estimate the model parameters using a two-step gradient descent. 4 First, we move in the direction of the gradient with respect to ω and update the block-connectivity parameters. Second, we move along the direction of the gradient with respect to C and update the node-strength parameters. In our algorithm, we let ω pnq and C pnq denote the n th updates of the block-connectivity and node-strength parameters, respectively. We initialize the algorithm with random values ω p0q and C p0q with components distributed according to exppXq, where X is a Gaussian random variable with mean 0 and variance 1. (That is, we draw random values from a log-normal distribution.) We denote the mean activity along edge pi, jq with initial parameters ω p0q and C p0q by µ p0q ijt . We scale the parameters so that the TDMM-SBM at the starting point of the optimization has the same mean number of trips as the data. Specifically, we multiply the block-connectivity parameters ω p0q ght by´ř t ω p0q ght¯´1´ři,j,t A ijt¯{ K 2 and normalize C 0 to satisfy the constraint ř i C p0q ig " 1 for each block g. This yields which is the total number of edges in the network. Without this scaling, the initial parameters can have very small magnitudes, such that the mean total number of trips from the TDMM-SBM with these initial parameters is much smaller than the total number of trips in the data. In that situation, the initial likelihood is much smaller than the likelihood at the MLE.
To ensure that our estimated parameters are nonnegative, we use the following change of variables: exppω pnq q " ω pnq and exppC pnq q " C pnq . We can then write the gradient descent as where h pnq and η pnq are small positive numbers. From the definitions ofC pnq andω pnq , we write We take the exponential of a vector to signify the application of the exponential to each component of the vector. Let h p0q " η p0q " ∆ ą 0 be the fixed initial step size. For our application, we choose ∆ " 10´4. We generate two candidate updates for ω pn`1q for the first step in our algorithm using h pn`1q " 1.2 h pnq and h pn`1q " 0.8 h pnq , and we choose the one that gives the ω pn`1q value that yields the larger pC pnq , ω pn`1q q. Similarly, we choose the one of η pn`1q " 1.2 η pnq or η pn`1q " 0.8 η pnq that gives the C pn`1q value that yields the larger pC pn`1q , ω pn`1q q.
We compute the gradient of the log-likelihood function using the chain rule. Recall that we compute the log-likelihood in two parts. One part is the computation of the mean µ ijt " ř K g,h C ig ω t gh C jh of the number of trips from node i to node j at time t. We then insert the expression for the mean into " ř t ř i,j pA ijt logpµ ijt q´µ ijt q. We compute the derivative of with respect to µ ijt to obtain The derivative of µ ijt with respect to C kg is where δ ab denotes the Kronecker delta (i.e., δ ab " 1 if a " b and δ ab " 0 if a ‰ b). The derivative of µ ijt with respect to ω ght is Using the above calculations, we see that the derivatives of with respect to C kg and ω ght are We run gradient descent until four significant digits of the base-10 floatingpoint representation of the log-likelihood (3) do not change for 600 consecutive steps. For the networks that we examine, this usually takes between 600 and 5000 iterations, with models with more blocks generally needing more iterations to reach this stopping criterion. Because of the non-convexity of the log-likelihood function (3), we are not guaranteed to reach a global optimum. Most of the time, our method converges to an interesting local optimum (which may also be a global optimum) that appears to reveal functional roles of bicycle stations (see Section 6). Our results produce recognizable block-connectivity parameters ω ght (e.g., home-work commute patterns and leisure-usage patterns) and the parameters C ig indicate known spatial divisions of the stations (e.g., residential versus commercial districts). In some cases, however, our algorithm converges to an uninteresting local optimum; for example, sometimes the block-assignment parameters C ig for each station appear as if they are assigned independently at random. To improve our results, we run our algorithm repeatedly (specifically, 10 times for each network) and store the estimate with the largest likelihood. We compare the parameters that we obtain from gradient descent to those that we obtain by running a Hamiltonian Monte Carlo (HMC) sampling method in a Bayesian framework with weak priors. (We implement this sampling method in Stan [68,69].) The log-likelihoods that result from our gradient-descent method are as good or better than those that we obtain with the HMC method. The HMC method is more computationally and memory intensive than our gradient-descent method, although it may be preferable in applications in which one has meaningful prior information about parameters. Improving our optimization method and investigating trade-offs between accuracy and efficiency are worthwhile topics for future work. For instance, it likely will be beneficial to adapt optimization methods for related timedependent SBMs [42,45,47,48,56] to the optimization of our SBMs.
The repository at https://github.com/jcarlen/tdsbm_supplementary_ material has our Python implementation of our gradient-descent method and code for our inference in R using Stan.

Inference using the TDD-SBM
To fit our TDD-SBM, we use a Kernighan-Lin-type (KL-type) algorithm [70] that we base on the one in [30]. (In [30], Karrer and Newman noted that their KL-type algorithm is faster than node-switching algorithms and achieves results that are similar to those from such algorithms for their time-independent degree-corrected SBM.) Given K blocks, we initialize the KL-type algorithm by assigning each node to a block uniformly at random, so each node has a probability of 1 K of being assigned to a given block. The algorithm then calculates the best possible block reassignment for any single node with respect to the associated change in log-likelihood (either the largest increase or the smallest decrease). Subsequently, we make the best reassignment for a different node, which we again choose uniformly at random, with respect to change in log-likelihood. The algorithm cycles through all nodes; a key feature of the algorithm is that a node that has been reassigned cannot move again until all other nodes have moved. One set of sequential reassignment of all nodes constitutes one step of the algorithm. The algorithm then searches all of the states (with respect to block membership of nodes) that have occurred during the step, and it selects the state with the maximum log-likelihood of any state during that step. This state is the starting point for the next step of the algorithm. A single run of the algorithm finishes when a step does not increase the log-likelihood beyond a preset tolerance value near 0. (In practice, we use 1ˆ10´4.) To find block assignments that are as good as possible, we do many runs of the algorithm for each network; this process is easily parallelizable. In our examples, we use 50 runs per network. We initialize each run randomly, as described above.
Another key feature of the KL-type algorithm is that changes in the block membership of nodes affect only the terms of the objective function that involve the origin and destination blocks of the change. The simplified objective function (7) is a sum over block-pair terms over T time layers. Consequently, we do not need to recalculate the full objective function at each step. This estimation method is practical for networks with up to a few thousand nodes, depending on available computing resources and on whether one can execute runs of the algorithm in parallel.
We implement our KL-type algorithm for TDD-SBM in R using Rcpp [71,72]. The back-end calculations are in C++ for speed, and we return results in R to enable visualization and other downstream tasks. Our implementation can also estimate time-independent SBMs, including directed and/or degree-corrected ones, a variant of TDD-SBM variant without degree correction, and a TDD-SBM with directed degree-correction parameters θ in i and θ out i . This facilitates comparison of the results of inference from various related SBMs. See https://github.com/jcarlen/sbm for our R package sbmt for parameter estimation for the TDD-SBM. We provide code (which uses the sbmt package) at https://github.com/jcarlen/tdsbm_ supplementary_material to replicate our examples in Section 6.

Evaluation of Model Fitting for Synthetic Networks
In this section, we generate data using three network models -our TDD-SBM, a variant of our TDD-SBM without degree correction, and our TDMM-SBM. For networks that we generate from discrete-membership models, we estimate model parameters using the TDD-SBM, the TDD-SBM variant without degree correction, and the Poisson-process SBM (PPSBM) of Ma-tias et al. [56]. 5 (Their model also does not have degree correction.) For the networks that we generate from the TDMM-SBM, we estimate model parameters using the TDMM-SBM and the TDD-SBM. We compare how well these estimated models fit the generated data. We illustrate three key points. First, our estimation methods are able to recover data-generating parameters for generated examples. Second, incorporating degree correction in our models facilitates the identification of blocks with similar temporal activity patterns, as opposed to ones with similar activity levels (i.e., similar degrees), when nodes have heterogeneous degrees. Third, fitting our discrete-membership model (TDD-SBM) to data that is generated from our mixed-membership model (TDMM-SBM) can result in worse parameter estimates with respect to the likelihood of observed data than when fitting to these data using our TDMM-SBM. That is, one can lose information when modeling mixed-membership networks as discrete-membership networks. We set up our generated networks as follows. We construct TDD-SBM and TDMM-SBM networks with 16 time layers to facilitate comparison to the PPSBM. In Figure 2, we show the mean edge weight between nodes in a given pair of blocks for each time layer in the TDD-SBM. We design the underlying curves to resemble patterns that we observed when studying the urban bicycle-sharing networks. Using these mean edge weights and a given number of nodes in each block, we calculate the expected sums of the edge weights from one block to another that originate in a given layer (the MLE of ω in (5)). These are the block-connectivity parameters ω " tω ght u that we use to generate our data.
We construct both two-block and three-block networks, where the former only uses values in the upper 2ˆ2 panel of Figure 2. For simplicity, we divide block membership equally between blocks for the TDD-SBM. For example, if there are 30 nodes and two blocks, 15 nodes belong to each block. We assume that the degree-correction parameters θ take one of five distinct values t1x, 2x, . . . , 5xu, which are distributed evenly within each block; the value of x satisfies the blockwise sum constraints on θ. With these parameters and conditions, we construct multilayer networks using our TDD-SBM and TDMM-SBM. In each case, we produce 100 instantiations of the networks. All of the networks that we generate are directed, and we allow self-edges. We also generate and fit networks using a version of our TDD-SBM without degree correction. Discussion of this model is beyond the scope of this paper, but we implemented parameter estimation for it in our sbmt package. Its log-likelihood is which is the log-likelihood of the TDD-SBM without the degree-correction terms θ.
For the networks that we generate (using the TDD-SBM or the TDD-SBM without degree correction), we estimate and compare their model parameters by fitting them using the TDD-SBM, the TDD-SBM without degree correction, and the PPSBM of Matias et al. [56] (which is implemented in the ppsbm package in R [73]). PPSBM is a time-dependent SBM model that is similar to TDD-SBM without degree correction. It differs in that it assumes continuous-time networks and either adaptively splits the data into equalsized bins (to use an approach based on non-parametric histograms) or uses a kernel-intensity estimator of parameters over time. For our comparison, we implement the former approach because the data input structure aligns well with our preprocessed data. We estimate the parameters of the PPSBM using a variational expectation-maximization (EM) algorithm. The ppsbm package includes a method to select the number of blocks using integrated classification likelihood (ICL).
In Table 1, we show the results of networks that we generate using the TDD-SBM and the TDD-SBM without degree correction. The first two columns describe the number of blocks (K) and number of nodes (N ) that we use to construct networks. (Recall that we pre-determine these values in our models, so they are given.) We generate these networks using our TDD-SBM with and without degree correction as indicated in the 'Gen DC' column. We indicate the model type that we use to fit the data in the 'Fit Method' column, and we indicate whether it incorporates degree correction in the 'Fit DC' column. Table 1: Evaluation of parameter-estimation methods for data that we generate using the TDD-SBM and the TDD-SBM without degree correction. We estimate parameters using both of these models and also using the PPSBM of Matias et al. [56]. For each experiment, we indicate the number of blocks (K) and the number of nodes (N ) in the networks. The 'Gen DC' column indicates whether or not we use degree correction when generating the data, and the 'Fit DC' column indicates whether or not we use degree correction when we fit the networks. The 'Fit Method' column indicates which model we use for fitting. To the right of the vertical line, we show the mean values over 100 instantiations of the networks. We show the associated standard deviations in parentheses. The 'ARI' column gives the adjusted Rand index between the inferred block assignments and the ground-truth assignments. The 'MAPE' column gives the mean absolute-percentage error, expressed as a fraction, between the block-connectivity parameters ω andω. In the 'Gen LLIK' column, we give the unnormalized log-likelihoods of the generated networks under the data-generating model. In the 'Diff LLIK' column, we give the result of subtracting the unnormalized log-likelihoods of the generated networks using the data-generating model from the unnormalized log-likelihoods using the estimated model.  (8)  We evaluate the success of each estimation method at detecting the true blocks assignments by calculating the adjusted Rand index (ARI) using the implementation in the fossil package in R [74]. We report these values in the 'ARI' column of Table 1. The ARI is a function of two group-membership vectors and achieves its maximum value of 1 when they match perfectly (up to label permutations). We expect an ARI of 0 when we compare two uniformly random permutations of block assignments that preserve the block sizes.
We evaluate the ability of the estimation methods to recover the true value of the parameter ω using the mean absolute-percentage error (MAPE) betweenω and ω. The formula for the MAPE is MAPE " 1 ght ř ght |ω ght´ωght ω ght |, which we report in the 'MAPE' column of Table 1. We also compare the unnormalized log-likelihoods of the generated networks under the data-generating model (in the 'Gen LLIK' column) to the unnormalized log-likelihoods under the estimated model. Given the similarities of the candidate models, this value provides an overall measure of how well we estimate the Poisson means of the edge weights. We show the differences between these two quantities in the 'Diff LLIK' column, where a positive value signifies that the data is more likely under the estimated model. We find perfect matches (i.e., an ARI of 1 with standard deviation of 0) between the true and estimated block-membership assignments in all cases when the data-generation and estimation methods match and when the estimation model (TDD-SBM) is a generalization of the network-generation model (TDD-SBM without degree correction). We also find perfect matches when the data is generated from the TDD-SBM without degree correction and is estimated by the PPSBM; this illustrates the similarity of these two models. In all of those cases, there is very little error inω, as indicated by the small MAPE values, and the likelihood of the generated data under the estimated model is slightly larger than under the data-generating model. (Recall that we fit the former to the generated data.) This indicates that our TDD-SBM estimation method is effective in these examples.
When we generate networks using the TDD-SBM with degree correction, the two methods without degree correction (i.e., the TDD-SBM variant and PPSBM) do a poor job of fitting the data. In the three-block networks that we generate, we obtain large values of both ARI and MAPE. Even when we recover the blocks as well as in the two-block examples, we obtain much smaller log-likelihoods of the generated networks under the estimated models than under the data-generating model. This is the case because using a model without degree correction implies that all edges between a given pair of blocks are identically distributed, regardless of the nodes to which the edges are attached. This does not accurately reflect our generated data, which has heterogeneous degrees within blocks.
For the PPSBM estimation method, the results in Table 1 reflect parameter estimates when we set the number of blocks to be equal to the value (which is two or three in our examples) in our data-generation mechanism. When we apply the block-selection method of Matias et al. [56] to models with between one and four blocks (using the implementation in the ppsbm package), we identify the true number of blocks when we generate the data without degree-correction. However, when we apply it to data that we generate using the TDD-SBM with degree correction and let it search for up to four blocks, we overestimate the number of blocks. Interestingly, the PPSBM method attempts to classify nodes based both on their roles and on degreecorrection factors. SBMs without degree correction determine roles based both on overall degree and on connectivity patterns.
We are interested in how successful our mixed-membership model (i.e., TDMM-SBM) is able to recover data-generating parameters and how much log-likelihood we lose when we use a TDD-SBM to fit a network that we generate using the TDMM-SBM with degree correction. For each combination of number of blocks (K) and number of nodes (N ), we draw values of C uniformly at random from t0, 1, . . . , 5u and then normalize each block so that ř i C ig " 1. We set the values of ω equal to those that we used above for the degree-corrected TDD-SBM with corresponding numbers of blocks and nodes. We also include results for networks that we generate from parameters of TDMM-SBM that we determine from fitting to the Los Angeles bicyclesharing network with two and three blocks. In those examples, N " 61, because that is the number bicycle-docking stations in that network. We summarize our results in Table 2.
Because block membership is mixed, we do not use ARI to compareĈ with C. Instead, we calculate the blockwise absolute error (BAE), which is the sum of absolute error averaged over the number of blocks. That is, To evaluate the error ofω, we calculate the MAPE. We also calculate a 'pairwise' MAPE ofω (which we denote by MAPE p in Table 2), in which we average the absolute percentage error over each pair of blocks. That is, MAPE p " 1 . This quantity is less susceptible than MAPE to having very large values when individual values of ω ght are near 0.
We compare the log-likelihoods of the generated networks under the datagenerating TDMM-SBM ('Gen LLIK') to their log-likelihoods under the estimated parameters, and we report the difference (the latter minus the former) in the 'Diff LLIK' column of Table 2. Finally, we fit a degree-corrected TDD-SBM to the generated networks and calculate the log-likelihood of the networks under that model. We report this quantity minus the log-likelihood from the 'Gen LLIK' column in the 'Diff LLIK Discrete' column. Table 2: Evaluation of parameter-estimation methods for data that we generate using the TDMM-SBM. For each experiment, we indicate the number of blocks (K) and the number of nodes (N ) in the networks. To the right of the vertical line, we show the mean values over 100 instantiations of the networks. We show the associated standard deviations in parentheses. The 'BAE' column gives the blockwise absolute errors for the block-membership parameters. The 'MAPE' column gives the mean absolute-percentage error, expressed as a fraction, between the block-connectivity parameters ω andω. We show '-' in this column when the result is extremely large due to values of ω ght near 0. The 'MAPE p ' column gives the pairwise MAPE ofω. In the 'Gen LLIK' column, we give the unnormalized log-likelihoods of the generated networks under the data-generating TDMM-SBM. In the 'Diff LLIK' column, we give the result of subtracting 'Gen LLIK' from the unnormalized log-likelihoods of the generated networks using the estimated TDMM-SBM. In the 'Diff LLIK Discrete' column, we give the result of subtracting 'Gen LLIK' from the unnormalized log-likelihoods of the generated networks using the degree-corrected TDD-SBM. We find that the likelihood of the generated data when we use the estimated parameters is slightly larger than the likelihood of the data when we use the data-generating parameters. (Because of the randomness in the networks that we generate, we expect that a network that we generate using a given model will have a larger likelihood when we use the parameters that we fit to that network than when we use the data-generating parameters.) Additionally, the likelihood of the generated data under parameters that we estimate using the TDD-SBM is significantly smaller than those that we obtain under the data-generating TDMM-SBM parameters, especially for the three-block models. (See the 'Diff LLIK Discrete' column of Table 2.) In short, we gain a lot of information by using our mixed-membership model in this case.
The BAE, MAPE, and MAPE p values (i.e., the 'error values') tend to be small for the two-block models, indicating that we are recovering the data-generating parameters fairly well in these cases. (We do not report MAPE values for generated networks when we use the estimated parameters of TDMM-SBM of Los Angeles, which has N " 61 bicycle-sharing stations, because a few values of ω that are very close to 0 drastically inflate those values.) However, the values of BAE, MAPE, and MAPE p are much larger for the three-block models (except for the BAE for Los Angeles). The fact that the estimated parameters achieve similar likelihoods as the data-generating parameters despite fairly large values of BAE, MAPE, and MAPE p suggests that the TDMM-SBM is not completely identifiable in these cases. Although the BAE error values are not as large for the networks that we generate using parameter values that we estimate from the data from Los Angeles. it is important to be aware of this identifiability limitation when interpreting the parameters of our TDMM-SBM. This is especially the case for ω, given the large MAPE and MAPE p values ofω that we obtained for all three-block TDMM-SBMs.
The identifiability of time-dependent SBMs is a relatively unexplored research topic. Matias and Miele [46] pointed out that the time-dependent SBM of Xu and Hero [45] is not fully identifiable, and they introduced a demonstrably identifiable (up to label permutations) time-dependent SBM. They also established that identifiability is an inherent limitation of time-dependent SBMs in which block-membership parameters and block-connectivity parameters vary over time, unless one applies additional constraints or penalization. As we described in Section 3, our models only allow the block-connectivity parameters to vary. Further research is necessary to establish the conditions for identifiability of our time-dependent SBMs.

Results on Bicycle-Sharing Networks
We apply our models to bicycle-sharing networks in Los Angeles, San Francisco, and New York City. (See Section 2 for full descriptions of these data sets.) The networks that we examine in downtown Los Angeles and San Francisco are relatively small; they had 61 stations and 35 stations, respectively, at the time that we collected our data. The stations in these networks are concentrated in downtown areas, where residential buildings and highrise office buildings are interspersed. The New York City network is much larger than the other two networks. It includes about 600 stations, which encompass a range of commercial areas, residential neighborhoods, parks, and manufacturing areas.
Before discussing our results in detail, we highlight some of our key insights. We present results for two-block models for the Los Angeles (see Section 6.1) and San Francisco (see Section 6.2) bicycle-sharing networks that illustrate the ability of our models to detect functional roles in the form of 'home' and 'work' roles of bicycle-sharing stations. These results facilitate the ability to compare bicycle-sharing systems across cities. We fit discretemembership and mixed-membership models with two, three, and four blocks to each of our data sets; and we fit five-block and six-block models in several cases. We provide the output of these models and all code to recreate our results at https://github.com/jcarlen/tdsbm_supplementary_ material. In most cases, we find that additional blocks do not illuminate new functional roles beyond the home and work roles. For example, using block models with additional blocks for Los Angeles and San Francisco result in small variations of these roles but increase the noise in the time variation of the block-connectivity parametersω ght . In Section 6.3, we present five-block TDMM-SBM and TDD-SBM fits of a subset of the New York City bicyclesharing network that illustrates another role (a 'park' block) that does not appear distinctly when we employ models with fewer blocks. A method to select the number of blocks is beyond the scope of our paper, but we discuss aspects of this model-selection problem in Appendix 7.4 and illustrate it by fitting mixed-membership models with up to 10 blocks on the Los Angeles bicycle-sharing network. In the right panel, we scale the size of each node based on its degree.

Downtown Los Angeles
In Figure 3, we show the mixed-membership (TDMM-SBM) and degreecorrected discrete (TDD-SBM) block assignments of two-block models of the downtown Los Angeles bicycle-sharing network. For the TDMM-SBM, we scale the size of each node i in our plots based on its value of ř g C ig . We refer to these sums as 'C total' values. These values correlate strongly with node degree (specifically, the sum of the in-degree and the out-degree), which is evident in the similarity of node sizes in the left and right panels of Figure  3. For both models, we observe that home and work blocks are interspersed geographically. As part of our discussion in this subsection, we will describe our method for determining the block labels in Figure 3.
The TDMM-SBM result reveals a group of stations (which we color in gray) in the left panel of Figure 3 that are neither strongly home-identified nor strongly work-identified. Instead, they possess a roughly even mixture of the two types. For this network, the TDD-SBM output is very similar to what we obtain from a discretization of the TDMM-SBM output (where we discretize by assigning each node i to the block with the maximum value of its C ig parameter), but this is not true for all of our bicycle-sharing networks.
Our model does not yield 'home' and 'work' labels for each block on its own, so we use the time-dependent block-connectivity parameter estimateŝ ω ght to assign these labels. We assign the labels heuristically under the assumption that the 'home' block is the origin of many trips to the work block in the morning and the 'work' block is the origin of many trips to the home block in the evening. Figure 4, which showsω ght for each possible value of g and h, with the hour t on the horizontal axis, supports our labeling. Given our labeling, we observe a clear peak in home-to-work traffic in the mornings and work-to-home traffic in the evenings. We make similar 'home' and 'work' assignments for San Francisco and New York City. In Los Angeles, the traffic within the work block peaks in the middle of the day. This may represent lunchtime errands, leisure activity, or tourist activity, as there are many tourist attractions in the downtown area. The traffic within the home block has a mild evening peak and has the least activity overall among the block pairs. To further validate our block labels, we use the zoning map for downtown Los Angeles from the Los Angeles Department of City Planning [75]. 6 Zon-ing ordinances determine the allowable uses of city land. They distinguish land that is available for commercial uses, industrial uses, residential uses, park districts, and other uses. In the background of Figure 5, we show a simplified version of the underlying zoning map; in this map, we have grouped similar designations). The industrial areas house a mixture of manufacturing and commercial uses. Public facilities include government buildings, public schools, parking under freeways, and police and fire stations [76]. In downtown Los Angeles, manufacturing and industrial areas are split cleanly from residential areas, whereas residential and commercial areas are intermixed across the bicycle-sharing system's coverage area.  are in or near zones for pure residential use or mixed residential and commercial use. We find that many stations that are not predominantly homeidentified or work-identified align with mixed-use commercial/residential zones. The discrete-role plot (see the right panel of Figure 3) has a stripe of 'home' stations that cut diagonally through the 'work' stations. In Figure 5, we see that this aligns roughly with areas that are zoned for purely residential use. By contrast, industrial and public facility zones tend to host stations that are mostly work-identified, although some of the most strongly work-identified stations are in mixed-used areas.
One station that seems to deviate from the overall pattern is the heavilytrafficked station at Union Station. Although it is adjacent to a public facility zone with many government buildings, it is also strongly home-identified. This may seem surprising on its surface, but this classification is consistent with other home-identified stations because Union Station is a major transit hub for the Los Angeles metropolitan area. Accordingly, many morning trips originate there, as commuters transition from other forms of transportation, and many evening trips conclude there. This type of activity pattern that is sensibly associated with home-identified stations. Such idiosyncrasies of transit hubs also arise in our results for San Francisco and New York City.

San Francisco
In Figure 6, we compare the two-block TDMM-SBM and two-block degreecorrected TDD-SBM for San Francisco. As we saw for Los Angeles, the San Francisco blocks are interspersed geographically and stations vary from strongly home-identified ones to strongly work-identified ones. The two most strongly home-identified stations are located at a major transit hub, the San Francisco Caltrain Station on 4th Street. In the right panel, we scale the size of each node based on its degree.
In Figure 7, we show the estimated traffic between the 'home' and 'work' blocks for the TDMM-SBM and the TDD-SBM. As in the downtown Los Angeles network, we observe inter-block commuting. However, unlike in downtown LA, the TDD-SBM reveals intra-block morning and evening peaks in both the home and work blocks. This may be due to last-mile commuting, such as using bicycle-sharing facilities to get to or from a train station. We find that the ten most-popular trips within blocks (i.e., home-to-home travel or work-to-work travel when there are two blocks) that originate during the heavy commuting hours of 7-9 am and 4-6 pm all begin at major public transportation stops in the morning and end at those stops in the evening. Recognizing last-mile usage is important for integrating bicycle sharing with nearby public transportation. One possible reason that we do not observe a similar phenomenon in downtown LA is that San Franciscans are more likely than residents of Los Angeles to use public transportation [77]. The intra-block morning and evening peaks may also arise from the intermixing of commercial and residential uses of land. In other words, some travel within blocks may also constitute commuting. Figure 7: Estimated time-dependent block-connectivity parametersω ght for the two-block TDMM-SBM and two-block degree-corrected TDD-SBM for San Francisco.

Comparison to SBM Variations
We briefly compare our above results for Los Angeles and San Francisco to results that we obtain for these networks by employing a time-independent SBM and a time-dependent SBM without degree correction. We do this to illustrate that both time-dependence and degree correction are necessary to achieve our goal of detecting functional roles of bicycle-sharing stations.
To fit the time-independent SBM, we calculate the time-independent adjacency matrixÃ ij " ř 23 t"0 A ijt and fit the two-block degree-corrected TDD-SBM to the associated time-aggregated network with a single layer. For the downtown Los Angeles network, we observe a clear geographically-based division in the results of the time-independent SBM (see Figure 8). For the San Francisco network, the differences between the blocks of the time-dependent SBM and time-independent SBM are less noticeable, although they are still present. This confirms that our time-dependent SBMs detect behavior that is not evident in the time-aggregated data. We now fit the variant of our two-block TDD-SBM without degree correction (see Section 5) to the Los Angeles and San Francisco bicycle-sharing networks. In Figure 9, we see in this case that stations are now divided into one group of small-degree stations and one group of large-degree stations, where 'degree' in the above sentence is the sum of the in-degree and the out-degree. These blocks differ significantly from those that we obtained using the TDD-SBM with degree correction. (See Figures 3 and 6.) For example, the TDD-SBM for Los Angeles with degree correction classifies the the high-traffic node at Union Station as a 'home' station because of its frequent departures in the morning and frequent arrivals in the evening, whereas the the TDD-SBM without degree correction groups Union Station with other high-traffic stations that are predominantly 'work' stations in the degree-corrected model. Figure 9: Estimated blocks from a TDD-SBM variant without degree correction for bicycle-sharing networks in (left) Los Angeles and (right) San Francisco. We scale the size of each node based on its degree.

New York City
We fit the TDMM-SBM and the degree-corrected TDD-SBM to the New York City bicycle-sharing network with two through six blocks. Our results suggest that there is a limit to the size of networks for which our timedependent SBMs can recover blocks that are based on functional roles of stations (rather than on geographical locations). Therefore, in the present subsection, we apply our models to a subset of the New York City bicyclesharing network to gain insight about the functional roles of stations in the subset. We select this subset based on the results of applying our threeblock TDD-SBM to the entire New York City network. We examine the subnetwork that is induced by the stations that are in one of these three blocks. We explore three-block models of the entire New York City network in Appendix 7.3. In those results, we infer that stations are assigned to blocks along geographical lines, rather than to blocks with distinct functions. We examine the New York City bicycle-sharing network on a scale that is smaller than the full system and fit models to the subset of stations and trips that lie within the Manhattan (home) block of the three-block TDD-SBM (see Figure A.4) that we examine in Appendix 7.3. We refer to this subnetwork as the 'Manhattan' network, even though it does not include all of the stations in the borough of Manhattan. This network consists of 166 stations and 256,840 trips. In Figure 10, we show our results for a five-block TDMM-SBM and five-block TDD-SBM on the Manhattan network. The area without stations in the middle of each panel of Figure 10 is Central Park, which has stations on its perimeter but not in its interior.
The blocks that we infer using the five-block TDMM-SBM and five-block TDD-SBM outline similar subregions. These models yield block-membership parameters that capture the residential and commercial sections of the area much better than the three-block TDD-SBM and TDMM-SBM of the full New York City network. One can see this by comparing the five-block network results with the underlying zoning map for the area in Figure A.6. The stations in residential zones tend to have larger block-membership parameters for 'home' blocks than for 'work' blocks, and the opposite is true for stations in commercial zones. We label the five detected blocks as (clockwise from top left) 'home (west)', 'park', 'home (east)', 'work', and 'mixed'. We base these labels on the land usage of the underlying areas and the time-dependent block-connectivity parametersω ght (see Figure 11). The TDMM-SBM block assignments demarcate similar geographical subregions for the blocks as the TDD-SBM block assignments. However, when we use the TDMM-SBM, the subregions tend to blend together, rather than having sharp boundaries. (See Figure 10.) We highlight the appearance of the 'park' block, which we have not observed in previous examples and has distinctive behavior. The park block is similar to a residential block in terms of its spike in morning traffic to the work block and its spike in evening traffic from the work block, but it has distinct intra-block activity that peaks in the afternoon. The intra-block activity resembles weekend activity in the New York City bicycle-sharing system as a whole (see Figure 1); this reflects leisure use of the bicycles. Bicycles near Central Park (which also places them near several major museums) are likely to be used by tourists and other non-commuters during the day for leisure or for travel to nearby attractions.
In Figure 11, we show the values of the block-connectivity parametersω ght for the five-block TDD-SBM and the five-block TDMM-SBM. Our estimates ofω ght for these models illustrate important differences in the behavior of different blocks that we can observe only by using a time-dependent model. 7 We observe some overlap in the time-dependent behavior of blocks, so there is potential overfitting. For example, the home (east), home (west), and mixed blocks have traffic that is similar to that in blocks other than their own. However, examining the Manhattan network with models with fewer than five blocks do not cleanly distinguish the 'park' block of stations from other residential stations. One reason that our time-dependent SBMs for the Manhattan network perform better (with respect to detecting functionally meaningful blocks) than models that we apply to the entire New York City network (see Appendix 7.3) is the dependence of station-to-station trip counts on the distance between stations. Although our SBMs correct for the overall activity of each station, they do not normalize expected edge values by the distance between stations. In a small geographical area, such as the coverage areas of the Los Angeles and San Francisco networks, this is a reasonable choice, as all stations are within 'biking distance' of each other. However, when examining a system as large as New York City's, the lack of distance correction weakens the functional groupings that we obtain with our time-dependent SBMs. Intra-block trips dwarf inter-block trips (see Figure A.5), and it seems more reasonable to interpret each block as its own ecosystem.

Conclusions and Discussion
We presented two types of time-dependent stochastic block models and used them to reveal interesting structures in urban bicycle-sharing networks. We formulated these SBMs to account both for degree heterogeneity and for a balance between cumulative in-degrees and out-degrees of bicycle stations over the course of a day. (The latter feature reflects a classical axiom of Ravenstein [78] that every current of human mobility has an associated countercurrent [1].) Our SBMs group stations based on their activity over time, allowing us to identify them with home and work roles. Work stations are characterized by inflow from home stations in the morning and outflow to home stations in the afternoon and evening, and residential stations have the opposite characterization. It is also sometimes possible to identify other roles, such as a Manhattan park block that combines residential and leisure/touristic behavior.
We illustrated through case studies in Los Angeles, San Francisco, and New York City how our mixed-membership and discrete SBMs can provide complementary insights about transportation patterns in bicycle-sharing systems. We found that many bicycle-sharing stations in our three focal cities serve a mixture of roles, which we captured with our mixed-membership SBM. However, in some cases, we observed that discrete-membership SBMs that use fewer parameters provide a clearer picture of the usage patterns that are associated with each block. We also demonstrated the importance of time-dependence and degree correction in our models by comparing our results to time-independent and SBMs without degree correction.
We evaluated our block labels by comparing them to city zoning maps. The home-work structure that we detected generally aligns well with the underlying zones, but with important deviations near major transit hubs. It is common to evaluate the results of community detection by comparison with so-called 'ground-truth' communities [22,35] (although it is crucial to encourage caution with respect to such evaluations [79]). The time-dependent commute flows that we detected with our SBMs enabled us to identify and label the functional roles of blocks of bicycle-sharing stations without a corresponding zoning map. In the future, it will be worthwhile to compare the traffic patterns that we detected to activity patterns from other mobility data, such as taxis, subway systems, e-scooters, and geo-tagged mobile apps [55].
There are several worthwhile ways to improve our models and algorithms.
One valuable extension is to allow distributions other than the Poisson distribution for the numbers of trips between pairs of stations. Additionally, although it is a convenient simplification to assume independence between the number of trips that start each hour, it may be beneficial to relax this assumption. For example, they are not independent if there are stations that run out of bicycles at some point. As we have discussed at length, mixed-membership and discrete-membership SBMs can reveal different insights, as can examining SBMs with different numbers of blocks. This helps illustrate why it is important to consider model selection in greater depth. It is also worth considering other estimation techniques for both our mixedmembership and discrete-membership models, especially if one alters the assumption of conditionally independent Poisson edge distributions that we used to simplify our parameter estimation. Variational inference has been employed for many related models [42,46,47,56], and it is likely that it can also be used in our setting to improve performance (especially for large networks). Another important direction for future work is the exploration of different methods for preprocessing data to include only the most important edges. The two most apparent ways to do this are (1) eliminating potentially insignificant edges by thresholding and (2) choosing time layers that reduce variance. The preferential-attachment model of Zhu et al. [67] gives one possible approach for eliminating insignificant edges. The way that one splits the times of a day can improve both accuracy and computational efficiency by reducing the total number of parameters. For example, in the cities that we studied, bicycle trips occurred sporadically between 1 am and 5 am, so it may be desirable lump all of these time layers into one time interval to decrease the number of parameters by three and thereby decrease variance. There exist methods that aim to find suitable ways to segment time periods [80], and trying to find the best ones to use in different situations is an active area of research.
Broadening our models to incorporate spatial data [81] is another natural direction to build on our work. For example, it may enable one to identify functional blocks in geographically diffuse networks such the New York City bicycle-sharing network. The radiation, intervening-opportunities, and gravity models have had some success at modeling human mobility over various distances [1]. These models put more weight on longer trips, and some mobility models take into account opportunities that lie between origin and destination locations. Some mobility models also possess statistical justifica-tion based on entropy arguments, and it is worthwhile to investigate methods to incorporate them into SBMs. Mobility models have already been incorporated into null models in time-dependent modularity objective functions in the work of Sarzynska et al. [82], who found that radiation and gravity null models perform better than the usual Newman-Girvan null model (which is a variant of a configuration model [83]) for spatial networks. One can use similar ideas to incorporate spatial information into SBMs. The value of using spatial null models for bicycle-sharing systems has been examined previously [6,61], so this is a very interesting direction to pursue.
It is also worthwhile to use our time time-dependent SBMs and extensions of them to study other types of mobility data. One example is dockless vehicle-sharing networks, such as e-scooter-sharing programs. If we view the usage of stations as a proxy for a spatial function of demand for bicycles, then data from dockless systems may better approximate such a spatially varying function. One possibility is to partition a city into a grid (including comparing computations that use different levels of granularity) and measure the usage in each region, taking care to recognize irregularities from transit hubs. Depending on how heavily these systems are used in commuting, we may discover functional blocks other than ones for home 'home' and 'work'. Looking even further forward, it will also be possible to tailor our methods to analyze multimodal transportation systems and other urban flows, which are particularly suitable for analysis using multilayer networks [14,23,24,84].

Singular vectors for bicycle-sharing data
To inform our modeling approach, we explore the imbalance between morning and evening activity by station for each network. We calculate the singularvalue decomposition (SVD) of the matrices of in-degree and out-degree for each station by hour. To be explicit, entry i, j of the matrix of in-degrees is equal to the total number of trips that arrive at station i in hour j, and we construct the matrix of out-degrees analogously for departing trips. We show results for Los Angeles, San Francisco, and New York City in figures A.1, A.2, and A.3, respectively. The first two principle components either strengthen both observed peaks or weaken one peak while strengthening the other. Across the three data sets, the first two singular vectors explain between 88% and 97% percent of the variation in the corresponding matrix, supporting the importance of peak morning and peak evening commutes for classifying stations.   7.2 Proof that the expected node degrees of the MLE network of the TDD-SBM equal the observed node degrees Suppose that we generate a network from the MLE of our time-dependent discrete-membership stochastic block model (TDD-SBM). We prove that the expected value of the degree of a node (i.e., the sum of its in-degree and outdegree over all time periods) is the same as the degree of the node in the observed data. That is, we prove that ř j ř 23 t"0 pµ ijt`µjit q " k i " ř j ř 23 t"0 A ijt . Let i and j be nodes in the network, and let t P t0, 1, . . . , 23u index the time layers. Additionally, recall that g denotes block g in the SBM and that κ g is the sum of the in-degrees and the out-degrees of all nodes in block g over all time periods. We calculate We are not aware of a relationship between the expected degrees and degrees of the observed data for our mixed-membership SBM.

TDMM-SBM and TDD-SBM for the New York
City Bicycle-Sharing Network In Figure A.4, we compare our results from a three-block TDMM-SBM and a three-block TDD-SBM for the New York City bicycle-sharing network. In initial calculations, we found that a two-block TDD-SBM divides the network along the East River into a Manhattan block and Brooklyn block and that a two-block TDMM-SBM divides the network slightly farther north in Lower Manhattan. We do not show the outputs of the two-block models, but we provide the reproduction code, estimated parameters, and associated visualizations at https://github.com/jcarlen/tdsbm_supplementary_material. In the right panel, we scale the area of each node based on the sum of its in-degree and out-degree (divided by the maximum value of that sum).
In Figure A.5, we compare the inferred inter-block traffic, as captured by the values ofω ght , for the three-block TDMM-SBM and three-block TDD-SBM. We observe prominently that all intra-block traffic has two peaks and much higher hourly trip counts than inter-block traffic. The double peaks are reminiscent of the overall system activity in Figure 1. This may be due in part to last-mile commuting, as we saw in the San Francisco bicycle-sharing network. However, for a system this geographically large, the double peaks and minimal inter-block traffic suggests that it is useful (and important) to consider each block as its own ecosystem. We also find strong similarity between results from our TDD-SBM and a three-block time-independent SBM for time-aggregated data for New York City (not shown), providing further evidence that our time-dependent SBMs are not capturing time-dependent roles for New York City when we consider the its entire bicycle-sharing sys-tem. Consequently, we choose the labels of these blocks (see Figures A.4 and A.5) based on the primary borough and zone type of each block's stations, as indicated in the zoning map for this part of New York City [85] in Figure  A.6. In Figure A.6, we illustrate that there is general overlap, although it is far from perfect, between (1) the Upper Manhattan ('home') block and residential areas or parks and between (2) the Lower Manhattan ('work') block and commercial or manufacturing areas. All stations in Brooklyn are in the third block, which contains mostly residential areas. We show only the blocks that we infer using our TDD-SBM in Figure A.6, but the same reasoning motivates our labels for the three-block TDMM-SBM. No block has exclusively commercial or residential areas.
We examined several time-dependent SBMs for New York City with more than three blocks to try to discover functional blocks, but we found that the blocks were still primarily geographical. In some cases, the TDMM-SBM with four or more blocks found functional divisions within smaller geographical areas (subdividing the blocks in Figure A.4), but none of our models detected system-wide 'home' or 'work' blocks. See our code and figures at https:// github.com/jcarlen/tdsbm_supplementary_material to fit and visualize time-dependent SBMs of the New York City network with a number of blocks other than three.
Across our models of this network, the distance between pairs of stations is the feature that is associated most strongly with whether they belong to the same block. As we noted above, the detected blocks represent primarily geographical divisions, with most of the traffic occurring within blocks. Our time-dependent models do not detect functional blocks in New York City, in contrast to the results of applying them to the Los Angeles and San Francisco networks. We believe that this occurs because the New York City bicyclesharing network is geographically larger than the San Francisco and Los Angeles bicycle-sharing networks. Our models only detect functional blocks if such divisions correlate with traffic between stations.

Model Selection
Although statistically rigorous model selection is outside the scope of our paper, we briefly compare the number of parameters in our mixed-membership SBM (TDMM-SBM) and discrete-membership SBM (TDD-SBM). This is valuable for considering model-selection criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), that penalize a model based on its number of parameters. For a network with N nodes, K blocks, and T time layers, the number of parameters in the TDMM-SBM is KN´K`T K 2 (8) and the number of parameters in the TDD-SBM is The first term of (8) comes from the fact that each node in the TDMM-SBM has K parameters (C ig , with g P t1, . . . , Ku) that express the strength of membership in each block. By contrast, each node in the TDD-SBM has one parameter for block membership and one degree-correction parameter. Therefore, given a value of N , the first term in (8) increases linearly with the number of blocks, whereas the corresponding term in (9) is fixed. Otherwise, formulas (8) and (9) are equivalent. The´K term in each formula arises from the identifiability constraints in each model. As we described in Section 3, these constraints are ř i C ig " 1 (for each block g) for the TDMM-SBM and ř iPg θ i " 1 (for each block g) for TDD-SBM. The last term (i.e., T K 2 ) in each formula is the total number of ω ght terms in the model (see Section 3.1).
To illustrate these model-selection criteria, we give the unnormalized loglikelihoods and numbers of parameters (N p ) in TDMM-SBM and TDD-SMB of the Manhattan subnetwork (which has N " 166 nodes) with 2-6 blocks (see Table A.1). In this example, the TDMM-SBM outperforms the TDD-SBM with respect to log-likelihood when the two models have the same number of parameters. This result makes sense because of the additional constraint of the TDD-SBM that each station must belong to exactly one block.   ) in models of the Manhattan network, which has N " 166 nodes.

TDMM-SBM TDD-SBM
In Figure A.7, we plot the Akaike information criterion (AIC) [86], which is given by AIC " p2ˆN p q´p2ˆlog-likelihoodq , versus the number of blocks in the TDMM-SBM for the Los Angeles bicyclesharing network with 2-10 blocks. We consider models with a larger numbers of blocks in the Los Angeles network than we did in the Manhattan network because the former's smaller number of stations allows faster computations.
The AIC is a cost function for comparing the relative qualities of statistical models. Given two models, one regards the model with a smaller AIC as the 'better' model. The AIC takes into account both the likelihood and the complexity of a model, where one measures the latter based on the number of parameters in a model. In Figure A.7, we see that the AIC of the TDMM-SBM for the Los Angeles bicycle-sharing network decreases as we increase the number of blocks from 2 to 10. According to the AIC criterion, one would select the 10-block model. However, plots of the block-connectivity parametersω ght for the TDMM-SBM of the Los Angeles network with seven or more blocks are noisier and no more informative than ones from the TDMM-SBM with fewer blocks. They suggest strongly that there is overfitting in those cases. This is in line with experiments by Funke and Becker [87] who found that the AIC overestimates the number of blocks for several SBM variants.
Our time-dependent SBMs are incomplete descriptions of the data-generating process in the bicycle-sharing networks. For example, they do not account for the distances between stations. Accordingly, the AIC tends to recommend a model with more blocks to compensate for this incompleteness, but (as we just discussed) visual inspection reveals the likely overfitting that results from including too many blocks. Our goal is to gain insight into the bicyclesharing systems, and we prefer models with fewer blocks that describe station roles in broad but interpretable term over less-interpretable models with more blocks but smaller AIC values. Choosing appropriate model-selection criteria for SBMs deserves additional consideration and is an active research area [88][89][90][91][92]. We leave such an investigation for future work.