Homophily in the formation and development of learning networks among university students

Abstract Students’ personal learning networks can be a valuable resource of success in higher education: they offer opportunities for academic and personal support and provide sources of information related to exams or homework. We study the determinants of learning networks using a panel study among university students in their first and second year of study. A long-standing question in social network analysis has been whether the tendency of individuals with similar characteristics to form ties is a result of preferences “choice homophily” or rather selective opportunities “induced homophily”. We expect a latent preference for homophilic learning partnerships with regard to attributes, such as gender, ability, and social origin. We estimate recently developed temporal exponential random graph models to control for previous network structure and study changes in learning ties among students. The results show that especially for males, same-gender partnerships are preferred over heterogeneous ties, while chances for tie formation decrease with the difference in academic ability among students. Social origin is a significant factor in the crosssectional exploration but does appear to be less important in the formation of new (strong) partnerships during the course of studies.


Introduction
Social networks have long been acknowledged as a resource for human capital formation (Coleman, 1988). Most notably, following Granovetter (1973), interpersonal networks have been studied as a predictor of labor market outcomes (e.g., (Lin, 1999;Krug & Rebien, 2012). However, social networks can also be a valuable resource for educational success (e.g., Sacerdote, 2001;Zimmerman, 2003;Hasan & Bagde, 2013). For instance, studies on peer effects have established that interacting with high-achieving co-students can positively affect an individual's grades in secondary as well as tertiary education (Hanushek et al., 2003;Lavy et al., 2012;Lomi et al., 2011). The mechanisms behind such peer effects are usually regarded to be straightforward: Academically able students as roommates, friends, or learning partners can be consulted in case of questions or personal knowledge gaps; they may, for example, explain a scientific theory or recommend additional literature (see, e.g., Hasan & Bagde, 2013). However, besides the effects of peer ability, structural network properties also appear to predict academic success (Calvó-Armengol et al., 2009). For instance, being in a central position within a network increases a student's chances of receiving important information flowing across the network. This can directly be related not only to homework or the preparation of exams but also to organizational issues facilitating life on campus. Finally, social integration is known to increase student satisfaction and prevent college dropout (Tinto, 1975).
While the consequences of social networks have extensively been studied, much less is known about how student networks in higher education form and evolve. In this paper, we investigate the formation and development of learning ties among undergraduate students, drawing on a two-wave panel survey in which students listed their acquaintances and learning partners from a complete list of all their co-students enrolled in the same study program. Our aim is to analyze to what degree student networks are structured by factors, such as gender, social origin, or academic ability. Specifically, we want to know if these segmentations can be attributed to individual preferences or rather to network structural effects.
Homophily or assortative mating-the tendency of individuals to interact with partners with whom they share similarities-is a common finding in many social networks (McPherson et al., 2001). In the context of college students, homophily is often found with regard to race and ethnicity (e.g., Mayer & Puller, 2008), gender (e.g., Godley, 2008), geographic origin (e.g., Lee et al., 2011), or socioeconomic status (e.g., Wimmer & Lewis, 2010). Structural factors such as having the same study program or academic year can also predict student tie formation (Pilbeam & Denyer, 2009). These processes are of importance for educational research because the simple combination of homophily and social influence can produce and sustain not only patterns of cultural diversity (Axelrod, 1997) but also societal closure and barriers between groups even in the absence of explicit enmity across group lines (Centola et al., 2007). For instance, educational inequality can be reproduced if high-status and high-achieving students tend to stick with their kind and knowledge as well as cultural and social capital is prevented from diffusing to students of lower-class background (see DiMaggio & Garip, 2012).
A classical problem in social network analysis is the question whether homophily can be causally attributed to individual preferences or whether it occurs as the result of network structural and other processes (see, e.g., Kandel, 1978;Goodreau et al., 2009;Kossinets & Watts, 2009). In other words, "choice homophily" has to be separated from "induced homophily" (McPherson & Smith-Lovin, 1987). In the following, we refer to the outcome, that is, an observed pattern of ties between actors of similar characteristics when we use the term homophily. By contrast, the mechanisms producing homophily are explicitly referred to as, for example, individual preferences for similar partners.
At least three alternative mechanisms have to be ruled out before homophily can be assumed to be due to individuals' preferences for similar partners (Steglich et al., 2010, see also Rivera et al., 2010). First, homophily with regard to a specific attribute (e.g., race) can be the "byproduct" of an individual's preference for homophilic ties with regard to other attributes (e.g., social status, geographic origin, or leisure-time interests, see Wimmer & Lewis, 2010;Stark & Flache, 2012). Students might prefer friends with a similar socioeconomic background (or other characteristics such as a similar taste in music), and as a result, student networks might also be segregated by race and ethnicity because of the correlation between these and the causally relevant variables.
As a second mechanism, homophily can be the product of the opportunity structure provided by the individual's network (Blau, 1977;Kossinets & Watts, 2009). Compositional differences in the social context of individuals can affect the chances to meet people with specific characteristics. As a simple example, skewed distributions of gender or social status in university courses increase the probability of homophilic ties. As a more structural effect commonly observed in social network analysis, triadic closure can promote the formation of ties. This effect exists if I am more likely to meet and interact with a friend of a friend compared with someone with whom I do not share any mutual friends. Friends of friends can, for various reasons (including the mechanisms described here), be similar to me in terms of social status, ethnicity, or other variables. Hence, segmentations within networks can be reinforced.
Finally, influence or contagion mechanisms can produce homophily as a cross-sectional descriptive result although no causal effects in the form of individual preferences exist (Kandel, 1978). For instance, a conservative student who befriends a liberal co-student might change his or her political attitudes over time and become more liberal because of his peer's influence (e.g., Newcomb, 1961). An ex-post cross-sectional survey would classify this as a homophilic tie with regard to political attitudes. However, inferring from such a cross-sectional view that preference for like-minded acquaintances initiated the friendship would evidently be wrong.
Our goal in this paper is to assess to what degree either individual preferences or network structural factors are responsible for segmentation tendencies among students' learning networks with regard to gender, socioeconomic status, and academic ability. Our strategy relies on the longitudinal character of our dataset, controlling for previous acquaintances and learning networks among college freshmen, to analyze the determinants of new ties in the students' second year. The general idea is that ruling out the competing mechanisms outlined above facilitates inferences about latent preferences for homophily. We explicitly address the first two of these mechanisms-homophily as by-product and homophily as a consequence of network structural factors-in the statistical models, whereas the question of coevolution of networks and character traits (peer influence) is accounted for by focusing on time-constant covariates; this methodological choice will be discussed in greater detail below. We first provide a descriptive overview of students' networks using traditional social network analysis methods. In the next step, we use recently developed temporal exponential random graph models (TERGMs) controlling for the lagged dependent network structure to study the change in students' learning relationships.

Theoretical considerations
The dependent variable of interest is the formation of learning ties that defined as voluntary meetings of two students outside the class schedule with the purpose of learning. Learning networks consist of specific social relationships among students. They represent students' personal peer networks in which they cooperate (e.g., share tasks) or collaborate (i.e., actively interact) with selected fellow students with objectives, such as academic learning, receiving relevant information, preparing course assignments. Corresponding peer interaction can be predicted by characteristics of students, groups, and tasks (Webb, 1989).
Having efficient social networks at their disposal is a vital resource for learners (Coleman, 1988). Forms of cooperative and collaborative learning have repeatedly proved to be more effective than individualistic learning across subject areas and across age groups (Johnson & Johnson, 1987;2002;Hattie, 2009). Several social mechanisms can be expected to contribute to these findings. First, learning ties help to distribute study-relevant information, and they also enhance available educational resources by the shared usage of goods and services, for example, books or scientific software (Lin, 2001). Second, following cognitive load theory, it can be argued that learning by an individual is less effective and efficient than learning by a group of individuals particularly in a situation of high-task complexity (Kirschner et al., 2009). Dividing the processing of information across individuals means dividing information across a larger reservoir of cognitive capacity. Third, learning ties to fellow students can be related to motivation and effort by the provision of social and emotional support and eventually better social integration (Wilcox et al., 2005), which in turn is related to lower rates of dropout from higher education (Tinto, 1975).
Consequently, we can expect learning ties to be of high relevance for educational outcomes. However, only few empirical studies have so far looked into their formation from a social network perspective. While many studies have analyzed social networks among school pupils (e.g., Shrum et al., 1988;Goodreau et al., 2009;Steglich et al., 2010;Stehlé et al., 2013;Krivitsky & Handcock, 2014;Smith et al., 2014;Kruse et al., 2016), research on university students is less abundant. As Biancani and McFarland (2013, p. 176) conclude in their review, "the descriptive literature on social networks among students is quite thin." Most of the existing literature deals with friendship ties in higher education (Van Duijn et al., 2003;Godley, 2008;Mayer & Puller, 2008;Wimmer & Lewis, 2010). Only recently, some research has also focused on learning ties among students (Rienties et al., 2013). The literature on friendship ties among students consistently produces evidence of homophily with regard to characteristics, such as race and ethnicity, gender, or socioeconomic background. However, the mechanisms in the formation and maintenance of learning ties might differ from friendship networks. This is particularly conceivable with regard to students' ability. Moreover, the existing findings on homophily in school students' learning networks cannot be immediately extrapolated to those of students in higher education. University students are more mature and possess, on average, a higher level of self-control and strategic planning. Therefore, functional criteria might become relatively more important for the formation of interindividual learning ties.
We consider the initiation of a learning partnership to be a rational choice in a high-cost situation, since meeting with others outside class requires the investment of a significant amount of time and effort (e.g., Windzio & Bicer, 2013). There are several returns to this investment that students might expect when considering meeting with others in their free time to study together. On the one hand, these meetings can have direct positive effects on the learning progress, since learning partners might be able to explain theories or methods or recommend literature in preparation of exams (Hasan & Bagde, 2013). From this argument, one could infer that partners' characteristics (such as academic performance) impact the utility of learning ties. Thus, choosing the "wrong" partner could result in considerable opportunity costs. On the other hand, being connected to other students can also be beneficial with respect to information flowing through students' networks (Calvó-Armengol et al., 2009). For instance, if one student gets their hands on last year's exam from an older fellow student, they might pass it on to some of their co-students via phones or social media. Being in a learning group and being connected to fellow students increases the chances to receive information of this sort through the network. From this perspective, one might argue that any connection is better than no connection at all. Of course, the expected utility from learning partnerships may go beyond study-related benefits. For instance, social and recreational motives can obviously also be important when meeting with fellow students to learn together.
Following established models of friendship choice (e.g., Zeng & Xie, 2008;Currarini et al., 2009), we conceive the selection of a learning partner as a function of preference and opportunity. Preference is understood as latent preference, a psychological disposition to favor one choice alternative over another. This means that people might not be aware of their preferences and therefore direct observation of the "stated preference" (e.g., via survey questions) is not appropriate. Rather, the classical concept in economics of "revealed preference" is adapted, in the sense that an observed realization of a choice (here, the nomination of a learning partner) is interpreted as revealing the individual's preference if the opportunity structure is appropriately taken into consideration (see, e.g., Currarini et al., 2010). For instance, if in an experimental setting establishing an unconstrained choice situation, a consumer chooses brand A over brand B, it is reasonable to infer that this reveals his or her preference. In the real world, availability of brands A and B in the consumer's everyday environment might differ, and this as well as other aspects of the opportunity structure have to be controlled for.
We consider two types of structural effects that can shape opportunities for the selection of learning partners. The first is related to the organizational environment of the study course. For instance, if within a specific study program, two students have chosen the same minor subject, we assume they are more likely to meet on campus and should thus have an increased chance of forming a learning tie. The second type of structural effects concern the individual's place within the network at a previous point in time. We have to consider the composition of the network environment becoming salient in "friends of friends"-effects such as triadic closure as described above. If, for instance, most of my friends' friends are male, I am more likely to make new friends with males in the future, even in the absence of any preference for a specific gender. Moreover, for future learning partners, we hold that past networks of acquaintances are an important factor to be In sum, we expect that similarity is perceived as a clue for shared interest and easier cooperation. Therefore, we assume a latent preference for homophilic ties with regard to gender and social origin. Our first hypothesis thus states that same-gender learning ties are more likely to be formed compared with gender heterogeneous ties if the opportunity structure is taken into account. Likewise, as our second hypothesis, we expect that students of similar social background are ceteris paribus more likely to engage in learning meetings compared with students of different social origin. For our third hypothesis, we assume "ability asymmetry" to be the dominant mechanism behind the expected tendency to meet with students with similar ability. This is where our research on learning ties deviates from previous studies on friendship networks, although the outcome (homophily) is the same. We see (perceived) ability as a proxy for future academic performance and hence expect all students to favor high-ability partners. Owing to the two-sided nature of learning partnership choice, a desired tie is less likely to be reciprocated if the difference in ability is large. Therefore, our third hypothesis reads: the likelihood of new learning ties decreases with the dissimilarity of students regarding academic ability.

Data
Our data come from a two-wave survey among college students from a large German university. A detailed description of the data can be found in Hillmert & Lang (2015). The study population consisted of all first-and second-year students enrolled in Sociology during the time of the survey (N = 307). With participation rates of 77% among first-and 88% among second-year students, the first wave (conducted in early 2014) included 255 participants. In the second wave, carried out in spring 2015, 144 of those (or 56%) responded again and completed the questionnaire. Among the nonrespondents in the second wave were presumably many university dropouts or students who changed their subject which is frequent in this early stage. We can infer this from the fact that many invitations for the follow-up survey to the student's university e-mail account bounced back, but we have no definite breakdown of the reasons for nonresponse. While this is a considerable dropout rate, we have no reasons to believe that our sample was biased by nonresponse, since the mechanisms under study presumably do not differ by participation group. Descriptive comparisons lend some support for this assumption. There are no significant differences in the distribution of our main variables of interest between panel participants and dropouts. 1 In addition to removing survey dropouts from the models, we also present a specification where all respondents who participated in at least one wave are included. In this (directed network) specification, nonrespondents can still be picked by active study participants as learning partners.
A special feature of the survey was the fact that network data were gathered asking study participants to name all their partners from a complete list of their 306 co-students. By contrast, most other studies use survey questions where participants nominate a limited number of (e.g., up to five) friends from their memory. We collected data on four types of networks: acquaintances, acquaintances the respondents knew already before starting their study course, friends, and learning partners. For our application, two of these networks are relevant: learning partners and acquaintances the respondents knew already before university. The two respective survey questions read: "Who among the mentioned fellow students did you already know before beginning your current study program?" and "How often do you meet with the following students outside of university courses to learn together?" (italics in the questionnaire). In addition to the two networks of learning partners from the two survey waves, we use the network of acquaintances before university as an additional time step (termed "wave 0" since it supposedly represents the opportunity structure before the first observation).
Frequency of learning meetings for each partner was asked on a seven-point scale ranging from "once in two months or less often" to "every day" on a list of all 307 students. We thus have weighted network data regarding learning partners (previous acquaintances are binary). While the chosen methodology (TERGMs) cannot currently estimate ordinal networks, we can differentiate "strong" from "weak" ties. In the following, "strong" learning ties are defined as regular meetings at least twice a month. The threshold for this classification is based on the observed distribution of the variable, such that approximately half of all ties fall into the respective category.
Since we analyze survey data from individual respondents, an important issue is how to deal with disagreement between partners over the nature of their relationship. It is well known from other studies that partnership nominations are often not mutual. For instance, Brewer & Webster (2000), asking students to list all their friends from a residence hall, found that their respondents forgot 20% of their friends on average. In the widely utilized friendship module from the National Longitudinal Survey of Adolescent Health (Add Health), only 40% of friendship nominations are reciprocated (Mouw & Entwisle, 2006). In our data, a little over 50% of all learning ties are mutual. Non-mutual ties are partly due to the fact that respondents could name learning partners who did not themselves participate in the survey. If we restrict the sample to participants of both waves, a third of all ties is reciprocated. Since our theoretical concept of learning ties is two-sided, we use the undirected (weak) ties as the main operationalization throughout the paper, but we provide replications with mutual and strong ties as robustness checks in the Appendix.
Regarding our independent variables of interest, social origin is operationalized by parental education. Each parent is assigned a value on a five-point scale ranging from zero (no education) to five (university degree). For each respondent, the mean of both parental values is used. Gender is coded 0 for males and 1 for females. For academic ability, two variables were initially taken into consideration. The first is the grade point average (GPA) from the respondents' (upperlevel) high school diploma (German "Abitur"), also referred to as university entrance degree. The second is the mean value from self-reported grades for recent exams and papers during participants' course of study. For performance while attending university (recent grades), an interesting question might concern the coevolution of networks and grades (see, e.g., Lomi et al., 2011). For instance, do low-achieving students get better over time if they meet and learn together with highachieving partners? We do not address this latter question, however, since the actual performance at university was not available to us. Self-reported recent grades can also be affected by unobserved heterogeneity regarding the most recent exam or term paper as a result of, for example, some courses having higher requirements than others or lecturers with different grading standards. Moreover, we sought to circumvent questions of endogeneity and direction of causality by focusing only on non-influenceable (time-constant) determinants. In doing so, we can be more confident that we identify individual preferences for a specific partner characteristic which are not biased by influence, contagion, or feedback effects. Therefore, we used high school diploma GPA as an invariable indicator of ability.
As additional control variables, we considered minor subject (all students were enrolled in Sociology, but for some of them it was their minor subject) and academic year (first or second year in the first wave). Furthermore, we coded respondents who themselves or at least one of whose parents was born abroad as students of ethnic minority or immigrant background. Sample sizes for particular ethnic groups (e.g., students of Turkish or Russian descent) were too small to produce meaningful results with regard to questions about same-ethnic ties. Ethnic minority status is therefore considered a proxy for shared experiences as students from immigrant families rather than a test for homophily based on ethnicity. There were generally few missing data regarding demographics and other covariates; only high school diploma grades and parental education had notable amounts of missing values (6% and 13%, respectively). We used a chained equations algorithm for multiple imputation implemented in the mice package (Van Buuren & Groothuis-Oudshoorn, 2011) for R (R Core Team, 2018) to impute missing data.

Temporal exponential random graph models
Exponential random graph models (ERGMs, also known as p* models) are a frequently used family of models to study the determinants of ties between nodes with specific characteristics in a network (for an introduction, see Robins et al., 2007). The basic idea is to simulate a large number of random networks based on the observed network parameters (such as the number of ties) and then compare, say, the number of observed same-gender ties with the respective number obtained from the simulations. One is usually interested in knowing whether an effect is found significantly more often in the real world compared with the simulated data. The advantage of ERGMs is their capability to control for various network parameters, such as transitivity or triadic closure. This can to some degree address the question whether homophily is promoted by the structure of the network. The ERGM can be expressed as the probability of a network (i.e., an adjacency matrix) N given a vector of model parameters θ: where h(N) is a vector of statistics (such as degrees or triangle counts) corresponding to θ and c(θ) is a normalizing constant.
Despite the possibility to separate network structural effects from the influence of nodal covariates (such as gender or social origin), causal inference is hardly possible with cross-sectional ERGMs (Goodreau et al., 2009, p. 122). In a cross-sectional design, we have to consider that network structure can be influenced by preferences at t-1. For instance, we might find that a student with an academic family background has many other students with highly educated parents in his surrounding network, and thus, his opportunity structure is biased toward homophilic ties, but it might of course be that the student actively sought to affiliate with this group in the past because of a preference for same-status ties. Wimmer and Lewis (2010, p. 593) call this "homophily-based self-selection" which results in a different network composition and thus changed the opportunity structure in the future. Moreover, a frequent finding is that homophilic relationships last longer compared with, for example, male-female friendships or interracial marriages (Rivera et al., 2010, p. 95;Krivitsky & Handcock, 2014). If this is the case, cross-sectional homophily can be found even in the absence of differences in the likelihood of tie creation between homophilic and heterogeneous relationships. Accordingly, the limits of cross-sectional network data for the identification of choice homophily have long been recognized (e.g., Kandel, 1978). Using longitudinal data allows us to study the emergence of new (and the abrogation of old) ties when past network structure is taken into account. For instance, if we only consider new ties between students who previously did not share any mutual friends, are same-status ties more likely to emerge compared with heterogeneous partnerships? If this is the case, we have stronger reasons to assume a latent preference for homophilic ties as compared with cross-sectional analyses.
Recently, several methods have been developed for the dynamic analysis of network data. One approach that builds on ERGMs is TERGMs, (see Hanneke et al., 2010). A few applications of TERGMs in the social sciences have so far been published (examples include Cranmer et al., 2014;Czarna et al., 2016). In the TERGM, the joint probability of a sequence of networks can be expressed as the product of the individual probabilities (as defined in (1)) for the networks N t at time points t. Following the notation in Leifeld et al. (2018), if T is the total number of time steps a network is observed consecutively, and k is the number of time points in the past on which a current network N k+1 depends, then In its basic form, this is a "pooled" ERGM where all observed networks are modeled jointly. To model temporal dependency, a term for the lagged network can be added to the h(.) function of network statistics. For this model, the interpretation changes to the existence/nonexistence of ties conditional on their previous existence. In our case, since we treat the network of acquaintances before university as t = 0 in addition to our two surveys on learning partnerships, we have a total number of time points T = 3, and we estimate both a pooled model as well as a model with a network autocorrelation memory term. In addition, we estimate separable TERGMs (STERGMs) which differentiate between the formation and stability/dissolution of ties. For the formation model, the adjacency matrix is replaced by a transition matrix where the entries denote whether a tie has changed from 0 (at t − 1) to 1 (at t) compared with the previous wave. For the dissolution model, the matrix differentiates stable (1) from other ties (0), comparing the present with the previous network. Another popular approach for longitudinal network analysis is stochastic actor-oriented models (SAOM, see Steglich et al., 2010), also known by their software as SIENA. SAOMs are frequently applied to model the co-evolution of social networks and individual characteristics, such as school performance, attitudes, or delinquent behavior (e.g., Van Duijn et al., 2003;Leszczensky & Pink, 2015). A basic question for applied research on longitudinal networks is whether to use TERGMs or SAOMs. A comparison of the two models can be found in Desmarais & Cranmer (2017). While the models are overall very similar, there are a few differences that need consideration. SAOMs estimate a function for tie evolution and an additional function for behavior. Since we exclude questions of influence or behavior from our analysis, this feature provides no added value for our analysis. SAOMs also make more restrictive assumptions regarding the form of network change as compared with the TERGM: In a number of micro-steps between two observed time steps, an actor considers whether to create or abrogate one tie at a time in the SAOM. While this is not fully implausible for our application, we also have to take into account that the temporal structure of university courses presumably has a strong influence on the timing of new learning partnerships, so the less restrictive assumptions by the TERGM might be more appropriate. We specify TERGMs in the following but also report results from SAOMs in the appendix. 2 For estimation, we rely on Markov chain Monte Carlo maximum likelihood estimation (see Desmarais & Cranmer, 2012), implemented as the "mtergm" function in the xergm package for R (Leifeld et al., 2018). These models allow us to directly compare effects from dynamic network models with the static results obtained from the pooled ERGMs. We first estimate a pooled model with all three waves. Next, we introduce a term for the lagged dependent network, since the existence of ties in the second year is likely to depend on whether contact was already present in the first year (or before university). An important methodological issue regards our theoretical assumption of the acquaintance network being a primary "pool" for choosing learning partners. Social contacts often do not emerge from nothing but are rather based on previous (loose) acquaintances or their friends. Since we have information on current and previous acquaintances (before entering university), we can model this assumption in the dynamic analyses. We add a third time step (wave 0), consisting of the acquaintance network before learning ties were formed in wave 1. Hence, learning ties at wave 1 are analyzed considering whom the students already knew before searching for learning partners. This means that in the TERGM with an autocorrelation memory term, the lagged network for learning ties at t = 1 is the acquaintance network at t = 0, whereas for learning ties at t = 2, it is the learning network at t = 1. Table 1 provides a summary of the model coefficients included in the TERGMs and their expected range in light of the theoretical discussion presented above. Figure 1 visualizes undirected learning ties for all participants of both waves (N = 141). The plotting algorithm uses wave 1 data for all ties to locate nodes on the diagram and retains their position on the subsequent graphs. This way, individual nodes can be compared over the two waves. The visual inspection already yields several interesting points. First, network density considerably

Autoregressive network memory term Positive
If a tie between two nodes existed in the past, the probability of this tie to persist in the present wave is expected to be higher Figure 1. Learning ties among a sample of university students, panel waves 1 (t = 1) and 2 (t = 2). Note. Light-colored (yellow) nodes represent female students, whereas dark (blue) nodes represent male students.
increases from wave 1 to 2. Thus, there is a large amount of variance over time (creation of new and abrogation of old ties) to be potentially explained by TERGMs. It is also obvious that while some students are already embedded in extensive networks, others maintain no learning ties to the surveyed co-students. Several outliers report a large number of ties compared with the average (2.6 ties in wave 1 and 3.6 in wave 2). We investigated these outliers in greater detail to check whether our results depend on their inclusion or exclusion (which is not the case). It should also be noted, that, for instance, regarding the dark node with many ties at the top of the right panel in Figure 1, almost all of these ties are actually reciprocated by the respective partners in the survey. This is thus apparently not a survey artifact. Figure 1 also reveals some network clustering by gender: Female nodes are shown in light and males in dark color. Several same-gender clusters can be identified for both males and females. A table with descriptive statistics is given in the appendix. We further explore the likelihood of tie formation for different ego-alter combinations of gender as well as difference with regard to parental education and high school diploma GPA in Figure 3. The plot shows the proportion of observed ties compared with the expected value for each category. The expected values come from a series of random graph simulations (Erdös-Rényi models) with the observed number of ties and node covariates.
As the results show, male-male learning ties are considerably more often observed in our data than what would have been expected from a random choice model. Female-female partnerships are also significantly more often observed than expected, but the difference is much smaller. In wave 2, female homophily has disappeared but male homophily is still strong. Similarly, the larger the difference in parental education, the lower the probability of a tie compared with a random graph (see lower panel in Figure 2). For academic ability, in contrast, the differences are statistically significant but less obvious.

TERGM results
Can these descriptive results be reproduced when network structural factors are considered? Table 2 displays results from the TERGMs. The first model is a pooled ERGM, while the second model adds a memory term for network autocorrelation. Models 3 and 4 come from a STERGM, differentiating between formation (Model 3) and dissolution (Model 4) of ties. For these models, the network is represented by the transition matrices, that is, the change in ties from t to t + 1 rather than the ties themselves. For instance, between wave 1 and 2, a total of 270 ties were newly formed, 90 previously existing ties were dissolved, and 165 remained stable.
All models contain a network statistic for the number of edges. In order to take network clustering ("friend of a friend" effect) into account, we include the geometrically weighted edgewise shared partner (GWESP) statistic (see Goodreau et al., 2009). The GWESP counts the number of shared partners for two individuals that share a tie, placing lower weight on additional partners with a decay parameter alpha. If alpha is fixed at a value of zero, the GWESP is equivalent to a count of all edges that are in at least one triangle. At higher values, greater weight is put on edges with more than one shared partner. We found that fixing alpha at a value of 1 leads to better model fit (see below) as compared with other values such as 0 or 0.69 (which is the default value in RSiena). The statistically significant coefficient for GWESP in all models suggests that triadic closure and related clustering phenomena play a role for tie creation as well as dissolution.
Sharing the same minor subject or starting in the same year also influences opportunity structure with regard to potential learning partners. These two effects turn out to be significant in all models except for the term for the same minor subject in the dissolution model. There are also a few differences between the pooled (Model 1) and the model with an autoregressive network term (Model 2). For instance, two students of ethnic minority background are more likely to share a tie compared with a random model, but if the past network structure is taken into account, this effect becomes insignificant.
With regard to our main independent variables of interest, significant effects for homophily can be observed in all but the dissolution models. Same-gender ties are significantly more likely compared with different-gender ties. For high school diploma GPA, taken as an indicator of academic ability, our models confirm that greater difference between ego and alter decreases the chances of tie formation. The same is true for parental education. This indicates that gender, academic ability, and social status play a role for initiations of learning partnerships and this effect cannot be explained by network structural factors. We argue that this can indeed be interpreted as revealing a preference among students for learning ties with partners of the same gender or similar social status, since plausible alternative explanations based on biased opportunity 'structure have been Note. Left panels refer to wave 1 data, whereas right panels referto wave 2 data. Difference in high school GPA and parental education is computed by comparing which third of the empirical distribution ego and alter are located in. ruled out. In contrast, homophilic ties are not more likely to persist compared with dissimilar partnerships. In the persistence/dissolution model, gender, academic ability, and parental education are not significant.
As an example of how the model estimates can be read, consider the effect of the constellation that two students share the same gender on the probability of a new tie to form (column "Formation" in Table 2). For a more intuitive interpretation, the conditional log-odds of such a tie to develop can be calculated as follows: if two students are of different gender and all other characteristics have a value of 0 as well (i.e., the students do not share any friends (of friends), are not enrolled in the same minor subject), the possibility of a mutual tie to be formed between waves 1 and 2 is exp(−5.75) = 3.2% (cf. coefficient for edges). If, however, both students share the same gender, the probability of an edge to form increases to exp(−5.75 + 0.32) = 4.4%. While this difference is statistically significant, it is also evident that the formation of specific new learning ties between students over this period of time is comparatively rare, regardless of gender.
In order to assess how well our models reproduce the observed data, we present goodness-offit plots in Figure 3. These plots compare the observed distribution of the number of degrees per node, the number of edgewise shared partners per node, and the geodesic distance (solid lines) with 100 simulated networks based on the TERGM results of Model 1 (pooled model, upper panels in Figure 3) and Model 2 (with autoregressive network term, lower panels). If the observed values fall in the range of the distribution from the simulated values, this can be taken to indicate that the model reproduces the data well. In addition, curves for the receiver-operator characteristic (ROC) and precision-recall (PR) curve are shown and compared with the respective curves from random models (transparent lines). These statistics are calculated predicting the existence (1) versus nonexistence (0) of a tie in wave 2 based on the model fitted on the entire dataset (including wave 0). Thus, while not real out-of-sample predictions (since both survey waves are needed to fit the TERGM with autoregressive term), these plots enable us to directly compare the pooled ERGM with the TERGM with lagged network term regarding their predictive accuracy. It is apparent that the TERGM with autoregressive term (lower panels in Figure 3) has a much higher predictive performance in terms of both area under ROC curve as well as PR curve. The pooled model also underestimates the number of degrees and edgewise shared partners per node, while simulations based on the TERGM reproduce the observed distribution reasonably well.
A few alternative model specifications are presented as robustness checks in the Appendix. First, we restrict the network to strong ties, that is, learning partnerships with a frequency of meeting at least twice a month (Table A1). A number of differences can be observed compared with the specification in Table 2. Most notably, the effect for parental education is only significant in the pooled model, but not any more in the models taking account the temporal structure of the data. This might suggest that parental education plays a less prominent role in the emergence of strong learning partnerships than what the cross-sectional descriptive evidence would suggest. In addition, we restrict the network to reciprocated ties, that is, to ties where both partners named each other in the survey (Table A2). Here, although the number of observed ties is much lower, the effects are virtually identical compared with the models with undirected ties (Table 2).
Another important issue concerns the amount of missing data from our survey. Given an attrition rate of 44%, all models presented so far rely on all study participants who responded to both waves of the survey. An alternative is presented in Table A3, where also panel dropouts are included in the models. These individuals are actively present during wave 1 and not during wave 2, but they can still be selected as partners by other respondents. This is why we employ a directed network in Table A3 as opposed to the undirected specification which has been considered theoretically more plausible in the previous sections. The models thus also include a term for reciprocity. The number of nodes now increases from 141 to 252 in Table A3, and we control whether the individual dropped out at wave 2 in the form of an additional covariate. Interestingly, the overall results look largely similar to the previous analyses, despite the differences in modeling. We still find homophily regarding gender, social background, and academic ability when new ties are formed, while controlling for the previous network structure as well as other individual characteristics. The only important difference relates to the effect of gender homophily, which in this model is important not only for the creation of new ties but also for the stability of existing ties (see STERGM results in Table A3), while the latter effect is not present in all other models. This indicates that the question of how to measure tie dissolution in the presence of significant nonresponse rates might challenge interpretations of edge stability, but it does not appear to be an issue for tie formation-which is the main theoretical focus here.
Finally, the TERGM results are largely mirrored by SAOM (RSiena) analyses (Table A4). Again, the likelihood of learning tie formation increases with similarity with regard to high school diploma GPA and for same-gender partnerships. Parental education, by contrast, is not statistically significant in the SAOMs, challenging the cross-sectional descriptive findings (Figure 2) and the TERGM results from Table 2. Together with the less stable effect for parental education in the robustness tests (Table A1), we conclude that evidence for a preference for social status homophily is much weaker in our temporal models compared with what one might anticipate from the descriptive figures. The opposite is true for academic ability: while the descriptive effects are rather small, we find robust evidence for the notion that the more dissimilar two students are in terms of academic ability, the less likely the formation of a learning tie.

Summary and conclusion
A classical question in social network analysis is concerned with the separation of "choice homophily" from "induced homophily." Our application deals with the formation of learning partnerships in university. We expect a latent individual preference for same-gender ties and partnerships with students of similar social origin because similarity may be perceived as a clue for shared interests and easier cooperation. For academic ability, we formulate an argument assuming an effect of "ability asymmetry": all students probably prefer able learning partners, but since learning networks are a typical case of two-sided matching, the likelihood of tie formation can be expected to decrease with the dissimilarity in academic ability. As a consequence, learning ties are likely to be homophilic also with respect to academic ability.
There are at least three different mechanisms that can lead to "induced homophily, " which is not due to people's actual preferences for partners who are similar to themselves: homophily regarding a specific characteristic as a by-product of homophily regarding another characteristic, homophily induced by network structural effects, and homophily as a result of peer-influence processes. In this paper, we set up a research design to rule out these three alternative mechanisms in order to reasonably infer that there is a latent preference for similar partners. To rule our network structural effects, we exploit temporal variation from a longitudinal survey and control for previous network structure when looking at newly formed relationships. Regarding the possibility of homophily as a by-product, we control for additional characteristics (such as ethnicity or minor academic subject) when modeling the effects of our focal variables. Finally, the question of "selection versus influence" is addressed by restricting the study to time-invariant characteristics. Processes of peer influence or the coevolution of network and individual characteristics thus cannot affect our main parameters of interest. Of course, this also means that some potentially interesting questions, for example, whether there is peer influence regarding academic achievement, are beyond the scope of this paper.
We base our models on a two-wave survey of learning networks among university students. Descriptive evidence shows that especially for males, same-gender relations are formed more frequently than heterogeneous ties. Learning partnerships also become less frequent with the difference between partners in terms of academic ability and parental education. To investigate whether this cross-sectional homophily can be attributed to differences in opportunity structure, we specify recently developed TERGMs. The results suggest that selective opportunities and network structural effects cannot explain why learning ties are formed more frequently between students of the same gender, similar academic ability, and similar social status. Even when we consider the network structure at t = 1 as well as the network of acquaintances that students already had when entering university ("wave 0"), new ties (e.g., between two formerly unrelated students) are significantly more likely to be observed when partners share the same gender or have a similar high school diploma GPA or parental education. Having ruled out network structural effects that can lead to "induced homophily, " we conclude that a latent preference for homophilic ties has been revealed with this design. However, there is no evidence that these homophilic ties also last longer compared with dissimilar ties.
Several limitations to this study are evident. We have restricted our analyses to time-constant independent variables to avoid issues of endogeneity that may arise as a consequence of feedback or contagion processes. Such processes could be especially interesting in the context of coevolutionary processes of learning networks and academic performance. That is, homophily as a result of people influencing each other-for example, students assimilating each other in performance by learning together-cannot be detected with our data. Moreover, the emergence mechanism of how two students form a learning partnership is left in a "black box" since we observe only the outcome. This is relevant because we theorized that low-achieving students have a preference for high-achieving learning partners which is probably not reciprocated, leading to an observed pattern of homophily. We cannot empirically distinguish this proposed mechanism from a "preference for homophily" with the current design. Finally, mechanisms of how learning partnerships are formed might differ between contextual units such as fields of study or countries and also over a longer observation period (extending to later stages of study or subsequent master's degree courses).
A few important insights can be generated that cross-sectional models might overlook. First, homophily with regard to academic ability is rather small in the descriptive results, but when the temporal structure of the data is considered, a robust negative effect on the likelihood of new tie formation emerges from the dissimilarity of students regarding ability. Rather, the opposite is true for the effect of social status that measured in our analysis by parental education. The cross-sectional view reveals that larger differences in social status are associated with much lower chances for learning ties between students. However, in the panel models conditioning on previous learning partners and acquaintances, this effect is much smaller and not statistically significant in several model specifications, including the models of strong learning ties and the SAOM/Siena models. This might indicate that a preference for similar partners with regard to their social status is less prevalent compared with, for example, preferences for same-gender partners. Instead, homophily regarding social status might more often be the product of network structural effects or preferences for homophily in other domains. All in all, we believe our proposed approach provides a useful framework for future analyses on social networks in education.   Note: Undirected networks, only mutual ties (reciprocated partner denominations). Columns 3 and 4 are STERGMs. The dissolution model (column 4) only converged without the terms for minority students and difference in parental education due to high multicollinearity between the covariates for the low number of dissolved ties (65 in total). * * * p < 0.001, * * p < 0.01, * p < 0.05.  Note: Undirected networks. Models did not converge when adding wave 0 data (compare TERGMs, Table 2). * * * p < 0.001, * * p < 0.01, * p < 0.05.