Social mixing and network characteristics of COVID-19 patients before and after widespread interventions: A population-based study

SARS-CoV-2 rapidly spreads among humans via social networks, with social mixing and network characteristics potentially facilitating transmission. However, limited data on topological structural features has hindered in-depth studies. Existing research is based on snapshot analyses, preventing temporal investigations of network changes. Comparing network characteristics over time offers additional insights into transmission dynamics. We examined confirmed COVID-19 patients from an eastern Chinese province, analyzing social mixing and network characteristics using transmission network topology before and after widespread interventions. Between the two time periods, the percentage of singleton networks increased from 38.9 to 62.8 ; the average shortest path length decreased from 1.53 to 1.14 ; the average betweenness reduced from 0.65 to 0.11 ; the average cluster size dropped from 4.05 to 2.72 ; and the out-degree had a slight but nonsignificant decline from 0.75 to 0.63 Results show that nonpharmaceutical interventions effectively disrupted transmission networks, preventing further disease spread. Additionally, we found that the networks’ dynamic structure provided more information than solely examining infection curves after applying descriptive and agent-based modeling approaches. In summary, we investigated social mixing and network characteristics of COVID-19 patients during different pandemic stages, revealing transmission network heterogeneities.

All epidemiological information and laboratory confirmation were collected by specialists in provincial or municipal CDC or hospitals in Zhejiang.Extra efforts were conducted to correct typographical errors (such as names in the same pronunciation) and explore additional information such as family relation that is missed in raw records.

Notation of network and definitions of network characteristics
We use a 2-tuple (N, g) to represent a network object, where N is the set of indexes of nodes and the adjacency matrix g is a real-valued n × n matrix, in which entry g ij represents the relation between i and j.In a transmission network with specified transmission directions, g ij = 1 means that i is the source case of j and j is the secondary case of i.The out-degree of a node is the number of edges directed out from it and the in-degree of a node is the number of edges that ends with it.We let d + i and d − i denote the out-degree and in-degree of the ith node, respectively.Thus if d + i > 0 and d − i = 0, the ith node is an indexed case which is the origin of a cluster.On the other hand, if d + i = 0 and d − i > 0, the ith node is a terminal case that induces no other secondary case.If both d + i = 0 and d − i = 0, the ith node does not belong to any cluster, and it is marked as a singleton in the network.We let S denote the set of singletons.The reasons for a node becoming a singleton are twofold.For one thing, it was an imported case and did not infect anybody.
For another, the transmission linkage of it was inexplicit, and epidemiologists were not able to identify neither the source case nor secondary cases of it.For example, there was a large outbreak within a prison, but we were unable to identify an explicit transmission chain, and thus, most of the involved cases were considered singletons.In calculating network characteristics, we neglect the existence of singletons inside; i.e., we focus on the sub-network N \ S.
Throughout this article, we mainly consider four basic graphical measures of the transmission network: 1) Average out-degree, 2) average shortest path length, 3) diameter of clusters, and 4) sizes of clusters, which would characterize the number of secondary cases, the cohesion of the transmission occurrence, the generations of the epidemic spread and the developed size for one clustered epidemic event, respectively.The average out-degree is the sum of out-degree for non-singleton cases divided by the total number of them, that is j∈N \S d + j /|N \ S|, where | • | denotes the cardinality of a set.For the average shortest path length (ASPL), we firstly define that path(i, j) = 1 if there exists a directed path beginning from the ith and ending in the jth nodes, otherwise path(i, j) = 0.Among all paths, the distance between the ith and the jth nodes, dist(i, j) is defined as the length of their shortest path (i.e., geodesic).If path(i, j) = 0, we assume the distance between them is 0.
The average shortest path length is defined as [3] ASPL = i,j∈N,i̸ =j dist(i, j) i,j∈N,i̸ =j path(i, j) Moreover, betweenness centrality of node v is defined as where g ij is the total number of shortest paths from node i to node j and g ivj is the total number of shortest paths from node i to node j via node v. Therefore, C B (v) quantifies the information transportation that passes through node v. Furthermore, information is originated from instead of transporting through the source node of a tree-shaped network, the betweenness centrality of it is always zero.Similarly, the betweenness centrality of a terminal node inside a tree-shaped network is also zero.Therefore, in a transmission network (composed of tree-shaped sub-networks), only the intermediate nodes between the source and terminal nodes contribute to the measure of betweenness.
The diameter of a connected transmission network is the maximum distance within it; i.e., the distance from the indexed case to the farthest terminal case.
In the context of epidemiology, the diameter of a transmission network is the maximum generation of the virus spread within it.Since the whole transmission network can be decomposed into a collection of clusters or components, we calculate the average diameter of clusters by averaging the diameter of each cluster.Lastly, the size of a sub-network is the number of nodes within it.Thus, the average size of clusters is calculated by averaging the sizes of clusters.

Agent-based transmission network model for simulations
A detailed description of how we build an agent-based transmission network model is presented in the main text.Here we provide supplementary materials on its settings.Key parameters for the outbreak reconstruction are summarized in Table S1.As described in the main text, we first build a social network considering the household, geographical, and random connections between people.Specifically, we utilize our observational data to give realistic settings for parameters such as household size and family-based age distribution.In addition, we construct a social connection network consistent with the age-specific contact rates matrix explored in detail by Zhang et al. [4] and assign weights in different connections.
In terms of the transmission processes, we consider both pre-symptomatic infectiousness and post-symptomatic viral shedding.More precisely, patients are able to transmit COVID-19 before showing symptoms [5] (pre-symptomatic infectiousness) and will lose infectiousness afterwards due to insufficient viral loading [5,6].Thus, we assume five compartments in our model: susceptible, exposed, pre-symptomatic infectious, post-symptomatic infectious and removed state.Every node is initially susceptible.After exposure to known infectious cases, a node has a probability (1 − (1 − β) n in the main-text) to be infected and will then be transferred to the exposed state.In the exposed state, cases are non-infectious.After a period of time (we assume it as a proportion of the incubation period, the duration from being infected to symptom onset), cases will be transferred to the pre-symptomatic infectious state.Afterwards, cases will show symptoms and move to the post-symptomatic infectious state.
As long as they develop symptoms, they will be assigned a removal period, the duration between symptom onset and isolation.After a removal period, cases will be quarantined and no longer participate in the transmission processes, i.e., move to the removed state.Note that in very early stage of the outbreak, the speed of case finding is relatively slow.Therefore, the removal period can be longer than the post-symptomatic infectious period as cases could be physically free while already losing their infectiousness capabilities.
Settings for those periods as well as age-dependent heterogeneity are in accordance with some previous studies [5][6][7].For the removal of infected cases, we set the removal period based on our observational data (see Fig. S8 (a)).
Moreover, we incorporated a dynamic change of contact pattern through the pre-outbreak, lockdown, and resumption phases based on previously reported contact matrices observed during the pre-outbreak and outbreak periods [4] (see Construction of daily age-specific contact matrix, Fig. S2 and Fig. S3).

Construction of daily age-specific contact matrix
According to Zhang et al. [4], we can get the age-specific contact matrix for both baseline period and outbreak period, denoted by c base and c outbreak .However, neither the intermediate state in between nor the contact pattern in the post-lockdown period was observed.Therefore, we assume that it decrease as a time-dependent function following Tan et al. [8] (Fig. S3).At the beginning, we assume the contact rate declined in a very small scale (ϵ) before January 10th ij .For 1 ≤ d ≤ 16, the monotonic decline function followed a logistic curve given by the following equation: Here, if λ m is chosen as 2 log (ϵ/(1 − ϵ))/m and ϵ is sufficiently small (e.g.ε = 0.01), m could be viewed as the duration of the decreasing process [8] (as illustrated in Fig. S2).
ij is the percentage of decrease.As discussed above, we set m = 13, t 0 = 3.On the other hand, in terms of the resumption process, for 32 ≤ d ≤ 62, we invert the decreasing process by: where m = 30 with λ m as defined above and In a nutshell, we can get a series of contact matrices for every day d (c (d) ) presented in Fig. S3.

Graphical characteristics of the observed transmission data
We collected data on 1349 confirmed SARS-CoV-2 infections identified in Zhejiang Province as well as their baseline information and epidemiological tracing notes.From information collected through contact tracing, we partially recovered the infector-infectee transmission chains between cases.If one case had more than one potential source of infection, we sampled only one.
Sensitivity results are presented in variation of sampling a source case.Thereafter, a transmission network can be constructed by combining all transmission pairs.We then computed four basic graphical measures to assess the transmission network quantitatively: 1) Average out-degree, 2) average shortest path length, 3) diameter of clusters, and 4) sizes of clusters.Among them, there exists heterogeneity related to demographical factors such as age and household transmission.Relatively older adult cases accounted for more significant contributions than younger adult and adolescent cases in the transmission processes (top-right corner in Fig. S4 (a)).Notably, the average out-degree (i.e., number of secondary infections) induced by cases aged between 40 and 59 is considerably higher than that induced by cases from other age groups (Fig. S4 (b)).Furthermore, those aged between 40 and 59 were more frequently identified as the indexed cases in the transmission network, while cases from other age groups were substantially more likely to be positioned as the terminal cases.In terms of household transmission, 54.5% of the terminal transmission occurred within the household, while household transmission only accounted for 39.3% of non-terminal transmission.

Dynamic epidemiological and graphical characteristics of the transmission network
Across periods before and during the outbreak (that is period I and period II), there existed a quantity of change both on epidemiological and graphical aspects.First, the average removal period (the duration from symptom onset to isolation, which reflects the speed of case finding) was decreasing over time, starting from nearly 20 days and ending with virtually zero (Fig. S8 (a)).In aggregation, the average removal period in period I was 6.41 days, while in period II, it was 4.18 days.Moreover, the curve of imported cases over time peaked at the declaration of lock-down, with most of them from period I (Fig. S8 (b)).On a graphical aspect, the secondary infection induced by cases of all ages showed a clear distinction between periods (Fig. S6 (b)).Cases aged between 40 and 59, the age groups with the largest average out-degree in period I, encountered a considerable decrease in this quantity in period II.However, for all other ages, the secondary infection increased.Their wax and wane jointly resulted in a relatively homogeneous distribution in average out-degree across age groups in period II.Besides, the heterogeneity of cases' location in the network related to their age also changed by periods (Fig. S6 (a)).Except in those below 20, cases belonging to other age groups had an increasing proportion as an indexed case between periods I and II.Conversely, the proportion as a terminal case dropped in nearly all age groups.Household transmission often resulted in terminal transmission during both periods, but that proportion considerably rose from period I to period II.In addition, from period I to II, some graphical measures such as average out-degree, average shorted path length, and the average size of clusters encountered noticeable recession.Both the large-spreading events (each with an out-degree at least 3) and large clusters (with a size of at least 5) became less common in period II.Most of the shortest path length between cases concentrated below 2, also suggesting that clusters became much more cohesive around their indexed cases and less forked.

Reconstruction of the transmission network
Using the agent-based transmission network model described in the main text as well as Agent-based transmission network model for simulations under realistic settings given in Table S1, we reconstructed a transmission network for the outbreak in Zhejiang from January 8th to February 23rd.From Fig. S9, the daily number of the new-onset cases in reconstruction fitted well to the observed one.

Sensitivity analysis on the split-point of time periods January 20th as the split-point
We chose the time before January 20th as the first period and the time after January 21st as the second period and repeat the analysis on the main text.
The results show that some network attributes significantly changed across periods.In details, the proportion of singletons significantly increased from Compared to the result on the main text using January 23 as the splitpoint, on a significance level of 0.05, there are two statistically different results: the decrease of proportion of super-spreaders became insignificant and the decrease of average diameter of clusters become significant.This is because after re-allocating the time between January 21st and January 23rd into period II, the number of super-spreading events in period II and the average diameter of clusters in period I increased significantly.

Distribution of family size
According to the observed data, the distribution of family size is illustrated in Fig. S12.

Age distribution by different family size
According to the observed data, the age distribution on a sample of families of different size is presented in Table S2, in which number represents age year.

Removal period
Weibull distribution with mean µt on day t and scale µt/Γ(1.2) is assigned according to Fig. S8 (a) Infection rate Infection rate per contact (β) Peak at 5% [11] (see Table S3) Susceptibility Susceptibility in each age group See Fig. S1 [12] Contact number Daily contact number to each age group Contact matrix in each day calculated from contact matrix in Shanghai before and during the outbreak, see Fig. S3 The pattern of decreasing and resuming See (24, 34, 41, 60), (10,19,41,44) Table S3: Adjusted relative risk by day from symptom onset [2] and the derived transmissbility with peak at 5% Fig. S3: Average contact number by days that is constructed based on the baseline contact matrix in Shanghai [4] and contact rates function in Fig. S2.Shaded area represents the period between January 24th and February 10th when Zhejiang was adopting highest-level response.

( 3
days from January 8th) and started to drop afterwards.Zhejiang provincial government upgraded its infectious disease alert category to the highest level on January 23rd, 2020, we assumed that the social contact frequency dropped to the lowest level afterwards.The provincial government started the reopening on February 10th, 33 days from January 8th, after which the social contact frequency increased.Because COVID-19 cases were still being reported sporadically, we assumed the social contact frequency, in the following one month, equal to an average of the contact levels in the baseline period and the outbreak period.Briefly, let c (d) be the contact matrix in the d day.Then c (0) = c base , c (d) = c outbreak for 16 ≤ d ≤ 32 and c (d) = (c base + c outbreak )/2 for d ≥ 63.We denote the average contact number of the ith age group to the jth age group at time d as c (d)

Fig. S7 :Fig. S9 :
Fig. S7: Distribution of network characteristics across periods.(a) distribution of out-degree of non-singletons; (b) distribution of shortest path length; (c) distribution of betweenness centrality of non-singletons; (d) distribution of diameter of clusters; (e) distribution of size of clusters.For each network attribute, we use Pearson χ 2 test to compare the distribution across periods, and use Benjamini-Hochberg to adjust the p-values.
Fig. S13: Age distribution on Zhejiang on the year 2020

Table S2 :
Age distribution by different family size