Measuring reciprocity: Double sampling, concordance, and network construction

Abstract Reciprocity—the mutual provisioning of support/goods—is a pervasive feature of social life. Directed networks provide a way to examine the structure of reciprocity in a community. However, measuring social networks involves assumptions about what relationships matter and how to elicit them, which may impact observed reciprocity. In particular, the practice of aggregating multiple sources of data on the same relationship (e.g., “double-sampled” data, where both the “giver” and “receiver” are asked to report on their relationship) may have pronounced impacts on network structure. To investigate these issues, we examine concordance (ties reported by both parties) and reciprocity in a set of directed, double-sampled social support networks. We find low concordance in people’s responses. Taking either the union (including any reported ties) or the intersection (including only concordant ties) of double-sampled relationships results in dramatically higher levels of reciprocity. Using multilevel exponential random graph models of social support networks from 75 villages in India, we show that these changes cannot be fully explained by the increase in the number of ties produced by layer aggregation. Respondents’ tendency to name the same people as both givers and receivers of support plays an important role, but this tendency varies across contexts and relationships type. We argue that no single method should necessarily be seen as the “correct” choice for aggregation of multiple sources of data on a single relationship type. Methods of aggregation should depend on the research question, the context, and the relationship in question.


Introduction
Reciprocity-the mutual provisioning of support/goods-is recognized as a core feature of social relations in all of the social and behavioral sciences. Whether sociology (Gouldner, 1960), anthropology (Mauss, 1954), economics (Polanyi, 1957), or evolutionary biology (Trivers, 1971), reciprocity and exchange are seen as fundamental structuring forces of social groups. Dyadic reciprocity, where two partners mutually support each other, is a pervasive social norm dictating expected behavior in many contexts. Social scientists have also worked to identify reciprocity in practice, as an empirically observed behavioral pattern. However, understanding the importance of reciprocity in "real life, " and how it might vary across cultures and contexts, presents fundamental methodological challenges, both statistical and in terms of data collection. In the last two decades, developments in network science have begun to address the analytical challenge of reciprocity, namely the non-independence of reciprocal ties (Snijders, 2002;Safdari et al., 2021).
Here, we examine how common methods of network data collection and aggregation in the social sciences may impact the measurement of dyadic reciprocity.

Reciprocity
Classic anthropological work has highlighted the prevalence of a normative expectation of reciprocity cross-culturally, with many relationships understood as being characterized by mutual support and obligation, leading to long-term balanced exchange (Malinowski, 1922(Malinowski, , 1926Mauss, 1954;Sahlins, 1972). This has been reinforced by modeling work that demonstrates the many conditions under which reciprocity can be sustained (Trivers, 1971;Kranton, 1996). Quantitative empirical work has further demonstrated that reciprocity can explain much of the observed patterns of food sharing in humans and other primates (Jaeggi & Gurven, 2013). High levels of reciprocity have been found across a wide range of social and cultural contexts (Baldassarri, 2015;Gurven, 2004;Kasper & Borgerhoff Mulder, 2015;Ready et al., 2020b;Vaquera & Kao, 2008;Szell et al., 2010). Such observations have convincingly established that reciprocity is pervasive across cultures, settings, and relationship types.
What we do not yet have a comprehensive understanding of, though, is how and why reciprocity varies across contexts, both cross-culturally and across domains such as friendship, food sharing, or lending. While contemporary methods of network analysis permit the contribution of reciprocity to tie formation to be estimated, the degree of reciprocity observed in empirical networks is also necessarily contingent on methods of data collection. A crucial barrier to such comparative work is differences in how data are collected and derived. Here, we outline how methodological decisions in survey-based research can impact resultant measures of reciprocity, hindering comparative efforts. We focus on two issues: informant recall and "double sampling."

Informant accuracy and double sampling
While some network datasets are gathered through direct observation (e.g., records of phone calls), many are based on self-reports, because exchanges cannot always be directly observed and because some kinds of ties, such as feelings of closeness or friendship, are perceptual. This raises the issue of the accuracy of respondents' reporting of their relationships. Much work has been done to assess the accuracy of people's reports of their networks and has generally found it to be somewhat wanting (Marsden, 1990). A series of papers by Killworth et al. (Killworth & Bernard, 1976;Bernard & Killworth, 1977;Bernard et al., 1979Bernard et al., , 1982, for example, damningly concluded that "informants are inaccurate. [...] [T]here appears to be systematic distortion of how informants recall just about everything" (Bernard et al., 1984). Inaccuracies in measurement are well recognized as impacting network structure and therefore network summary statistics (Marsden, 1990;Feld & Carter, 2002;Wang et al., 2012).
These "inaccuracies" to some extent reflect respondents' subjective perception of their networks, termed "cognitive social structures" (Krackhardt, 1987;Freeman et al., 1987). These cognitive schemas help people organize and remember their social worlds and are reflected in the responses they give to social support name generators. Subsequent work has therefore attempted to identify which relationships are likely to be forgotten or potentially falsely recalled, by which types of respondents. For example, respondents are more likely to name alters who are better connected (Marin, 2004) or of higher status (Grippa & Gloor, 2009;Ball & Newman, 2013;Shakya et al., 2017). Respondents may also be more likely to recall relationships that are strong (Marin, 2004) or that involve common partners (Brashears, 2013). Different types of relationships may also be more or less prone to biased reporting, for instance, people tend to remember reciprocity in "liking" relationships, and transitivity in influence relationships (De Soto, 1960). Certain types of relationships may be more likely to be remembered, and persons with many ties may be more likely to forget some of them (Bell et al., 2007). People's network positions and roles may also impact their perception of the network structure (Marineau et al., 2018;Ready et al., 2020a). Reciprocity and concordance in directed networks. If a network is directed and double-sampled, assuming that A reports one tie with B, then there are eight possible graphs. Two of these possibilities produce reciprocity across layers, that is, reciprocal ties that would not exist if the data were single-sampled.
One way to estimate the "inaccuracies" in people's nominations is to look at the agreement (concordance) between people nominally reporting on the same relationship. When there is complete sampling of a network, undirected ties or partnerships of various forms (e.g., friends, drug co-use partners, and sexual partners) should be reported by both members. However, studies have found far from perfect concordance when both parties are queried, with agreement generally ranging from 40% to 60% (Marsden, 1990), though some more recent work has found higher values (adams & Moody, 2007). It is crucial to note, however, that inconsistent reports do not necessarily reflect errors or inaccuracies in recall: they may legitimately reflect different perceptions of and investment in relationships (Carley & Krackhardt, 1996).
Past studies of concordance have generally focused on relationships that are presumed to be symmetric and so undirected (Marsden, 1990;adams & Moody, 2007). 1 In such cases, concordance and reciprocity are essentially equivalent. As such, these studies can speak only to reciprocity of nomination, not reciprocity of exchange. The latter is what is more generally of interest in the social sciences (beyond questions of measurement) and is what we concern ourselves with here.
In directed networks, reciprocity and concordance are not equivalent (Figure 1). Reciprocal ties occur within single-layer, directed networks when Ali says Beth gives to him, and Beth says Ali gives to her (Figure 1, bottom left). Concordant reports are generally not possible because individuals are not asked to report on the same set of ties (e.g., Ali and Beth each report who they give food to, but not who gives food to them). In some social network surveys, however, directed relationships are asked about in ways that facilitate a study of concordance between partners. A common practice with asymmetric relationships (i.e., ones that are inherently directed) is to "double-sample" (Nolin, 2008) by asking respondents not only who they turn to for various types of support but also who turns to them. Assuming complete sampling, this means that the same directed relationship should be reported on twice, once by the giver and once by the receiver.
Comparing network "layers" representing the same relationship in different directions (e.g., giving vs. receiving food) allows concordance to be measured in directed networks. Concordance occurs where both reporters agree on a tie in one direction and can be evaluated by comparing reports across double-sampled layers. For example, if Ali says he gives food to Beth, and Beth says Ali gives food to her, these are concordant reports for the tie A → B. Reciprocity occurs where ties appear in both directions, which, as shown in Figure 1, can occur in multiple ways with double-sampled data. First, reciprocal ties can occur as they do in single-sampled networks (e.g., A reports A → B and B reports B → A), irrespective of the aggregation of the two layers. Second, and uniquely for double-sampled data, reciprocal ties can also arise from the aggregation of the two layers, as when Ali reports that he both gives to and receives from Beth.
Double sampling is generally seen as a technique for dealing with the potential for bias and inaccuracy in people's reporting, with the assumption being that multiple insights into the same relationship will jointly be more accurate than a single report. This is similar to viewing each name generator as a layer of a multiplex network (Kivelä et al., 2014). When relationships are reported on by multiple individuals, different approaches can be taken to deal with apparent inconsistencies in reporting. The simplest techniques involve considering the two networks in isolation (Koster, 2018;Simpson, in press), or combining them, either by taking the union (i.e., including all reported ties) (Nolin, 2010; or, opposingly, taking the intersection (i.e., taking only concordant ties) (Krackhardt & Kilduff, 1990). These two decision rules have been referred to as allowing "unilateral nominations" for the union and requiring "mutual assent" for the intersection (Lee & Butts, 2018). Other techniques try to assess and then account for differences in each informant's accuracy, as with cultural consensus approaches (Romney & Weller, 1984;An & Schramski, 2015), while others use a more explicitly inferential and Bayesian approach (Butts, 2003;Lee & Butts, 2020;Newman, 2018;Young et al., 2021;Redhead et al., 2021). Aggregation techniques may also depend on whether networks are weighted or unweighted and how many sources of information are being integrated. Aggregation procedures will have important consequences for network density and possibly for other important network measures. Here, we consider the impact of simple aggregation techniques on the measurement of reciprocity.

The problem
If we wish to draw conclusions about observed differences in reciprocity across contexts-or even in similar contexts where different methods have been used-we need to ensure that we are making appropriate comparisons. However, comparison across different networks is a challenging problem (Faust & Skvoretz, 2002). With researchers taking potentially many different approaches when sampling, eliciting, and aggregating social support relationships, it is unclear how comparable measures of reciprocity calculated on different networks are. Biases in respondents' nominations-and researchers' attempts to account for those potential biases in nominationsmay be impactful for calculations of reciprocity. In particular, insofar as reciprocal relationships are stronger, they may be inherently easier to recall. Further, if people have a desire for or an expectation of balanced, reciprocal relationships (Heider, 1958;Blau, 1964;Freeman, 1992;Krackhardt & Kilduff, 1999), then they may be particularly likely to report that the people they turn to for support similarly come to them.
In this paper, we ask to what extent measures of reciprocity are impacted by differences in the elicitation approach and aggregation technique used. We do so by drawing on a set of social support survey datasets that double-sampled relationships. These data allow us, first of all, to examine concordance in nominations across the various relationship types, and second, to consider how different decisions about how to treat non-concordant ties impact the resulting measures of reciprocity. Such work is a necessary precursor to any effort to compare measures of reciprocity from different networks, which may have been constructed in different ways.

Data
We primarily rely on social support network data from 75 villages in Karnataka, India (Banerjee et al., 2013), with double sampling of four relationship types: borrowing and lending money, borrowing and lending household items, getting and giving advice, and visiting or hosting visitors.
These networks allow us to assess the variation across networks that were elicited using similar techniques in similar contexts. In this project, the social support network questions were asked of essentially all adult household members in a subsample of all households. Overall, roughly 46% of households and 25% of individuals (including children) per village were included. For our analyses, we consider networks of all surveyed individuals (i.e., we do not aggregate to households, and we exclude individuals who were not surveyed). Surveys were conducted by interviewers using pen and paper, with the survey form effectively limiting responses to four nominations for each question (though this limit was rarely reached by respondents).
For purposes of comparison, we also draw on several additional datasets that include doublesampled relationships, including two social support networks from two villages in Tamil Nadu, India (Power & Ready, 2019), country food and meal sharing networks from an Inuit village in the Canadian Arctic (Ready, 2016), and nine exchange and support networks from Mpimbwe, Tanzania (Kasper & Borgerhoff Mulder, 2015). These networks generally have higher coverage than the Karnataka data and provide an important counterpoint, given differences in the substance of the questions asked. The networks from the two villages in Tamil Nadu are quite comparable in substance and style to those from Karnataka, but with better coverage (94% and 97% of adult residents). The networks from the Canadian Arctic are based on interviews with households heads about common sharing partners at the household level, with 75% coverage of households. Finally, the networks from Tanzania are retrospective questions with concrete time horizons (e.g., "In the last seven days..."), asking about a range of exchange relationships, with 84% of adult household heads. These three datasets were gathered by interviewers using pen and paper surveys. All questions were free-list name generators where respondents could give as many nominations as they wished. The original survey prompts used in these studies are provided in Table  S1 and summary statistics for the resulting networks are provided in Table S2.
Throughout the analyses, in order to distinguish between the two "layers" of double-sampled questions in a consistent manner, we use the term "incoming" layer to refer to name generators where the respondent reported receiving assistance or material aid (e.g., who lends you household items?). "Outgoing" layers refer to name generators where the respondent reported providing assistance or material aid to others (e.g., to whom do you lend household items?). "Ingoing" and "outgoing" therefore refer to the direction of the name generator from the perspective of the person answering the question. In key papers employing these datasets, the union of the doublesampled questions has generally been used as the network of interest (Banerjee et al., 2013;Kasper & Borgerhoff Mulder, 2015).

Concordance
We calculate concordance between all dyads, first for all ties reported across either the incoming or the outgoing network layers (Table 1 column "Overall" and Figure 2 show the distribution of values for the Karnataka dataset). We find that concordance across layers is generally low, with an average of 10% across all networks-decidedly lower than the 40%-60% range outlined in Marsden (1990) for a variety of undirected networks (e.g., "best friends" or "closest intimates") or the more recent (undirected) social, sexual, and drug-use networks presented in adams & Moody (2007), which are yet higher still. As shown in Figure 2, concordance is quite low in the Karnataka dataset, regardless of the relationship type. Table 1 columns "Incoming" and "Outgoing" show the proportion of edges in each individual layer that are contributing to concordant ties in the combined set of layers. For instance, if Layer 1 had 20 edges, half of which were all present in Layer 2, and the Layer 2 had 100 edges, the values would be 9% (10/110) concordance overall across all unique reported edges, 50% (10/20) in Layer 1 and 10% (10/100) in Layer 2. The values reported in Table 1 demonstrate the extent to which Table 1. Proportion of concordant ties in each double-sampled relationship in the datasets. "Incoming" refers to the name generator in which respondents reporting receiving a type of support, while "Outgoing" refers to respondents reporting giving support to others. The column "Overall" reports the proportion of concordant ties in the union of the incoming and outgoing layers  Bicycle 0.07 0.11 0.14 the double-sampled layers capture different sets of edges: in none of the networks we examine can one of the layers be considered to be largely a subset of the other (the highest overlap is 43%, for the incoming meal sharing layer in Kangiqsujuaq). Generally, concordance in the outgoing layer is equal to or higher than that of the associated incoming layer, with the exception of the Kangiqsujuaq networks. This difference reflects the relative number of nominations in each of these layers: people often gave more nominations in response to questions asking from whom they received support than to questions about to whom they gave support. The latter observation brings our attention to other differences in respondents' nominations for incoming and outgoing prompts. In the Karnataka advice and loan networks, the incoming layers tend to have a more skewed degree distribution than the outgoing layers, with a few individuals receiving many nominations as providers of help. This is reflected in the heightened in-degree centralization for these layers (Figure 3), where centralization is defined as the tendency of a single point to be more central than other points in the network and the measure we use assesses the difference from the most centralized possible graph of the same size (i.e., a star) (Freeman, 1979). For certain relationships, when people are asked "who would you go to for support, " this is likely to pick up a few key "hubs" or "stars" who have many ties. In contrast, when people are asked "who comes to you for support, " these structures are not detected because these high-activity people do not list substantially more alters. This phenomenon is also present in some of the other datasets, such as the Kangiqsujuaq food sharing data, where some households are consistently named by others as a source of food (centralization incoming layer = 0.26, outgoing layer = 0.12). This asymmetry appears to depend on the relationship type. For example, we do not see it across the incoming versus outgoing layers in the Karnataka household items and visitation networks, or the Kangiqsujuaq meal sharing data. Aggregation procedures will have an important impact on these assymmetries; in particular, they will be erased by considering only concordant ties.

Reciprocity
We calculate the proportion of reciprocal ties for each individual layer, as well as for the networks produced by the intersection and the union of the two layers (Figure 4 and Tables S2 and S3). Reciprocity is broadly comparable in outgoing and incoming layers (i.e., the reciprocity observed in giving a loan vs. asking for a loan are similar). However, when either the intersection or union is taken, we see a substantial increase in reciprocity in most of the networks. Due to the low concordance in the datasets, taking the union results in networks with considerably more edges than either of the contributing layers and taking the intersection results in networks with substantially fewer edges. For example, in the Karnataka village networks, while the individual layers have an average density of 0.57%, the average density in the union networks is 1.03%, and the average density in the intersection networks is 0.11%. Despite the clear consequences of these aggregration procedures for the density of these merged networks, both show a dramatic increase in reciprocity. What might account for these increases?
We undertake further analyses to determine whether the observed changes in reciprocity are due primarily to the mathematics of aggregation. To do this, we fit multilevel exponential random graph models (Stewart & Schweinberger, 2018) to each layer of the double-sampled relationship types in the Karnataka dataset. We include terms for the number of edges, the number of nodes with in-and out-degree of zero through two (which encompasses the majority of nodes), reciprocity, and transitivity (geometrically weighted edgewise shared partnerships), to effectively match the core structure of each of the individual layers. 2 We then use the coefficients from the MLERGMs to simulate networks for each of the layers (e.g., borrow money and lend money) and generate aggregates of the two simulated layers, for each of the 75 villages, 100 times. These simulated aggregated networks provide a baseline against which to compare the observed aggregated networks. In the case of the union networks, the increase in reciprocity could potentially be driven by the increase in density (i.e., more reciprocity could occur simply because there are more edges in the networks). Figure 5 shows reciprocity in the Karnataka dataset for the individual layers and aggregated networks, compared to reciprocity in the simulated networks. The results show that the dramatic increases in reciprocity observed in the empirical union networks is not reflected in the simulated union networks. Thus, the increase in reciprocity seen in the empirical union networks cannot simply be a consequence of increased network density.
It is informative to consider the source of the reciprocal ties in the union networks: reciprocal ties could occur either because the two parties report the directed relationship in a single layer, or because one party reports the tie across the two layers ( Figure 1). In Table 2, we show the proportion of reciprocal ties in the union networks where ties are reported by both parties ("within-layer reciprocity, " Figure 1). Note that many entries are quite low, implying that much of the observed reciprocity in the union networks comes from a single reporter asserting a reciprocal relationship. However, this is highly variable across the datasets we examine. For instance, the networks from Kangiqsujuaq and Mtakula tend to have a much higher proportion of reciprocal ties that involve reports from both parties than the Indian networks (although many of the Mtakula networks are so sparse that the data are relatively uninformative).
The union network for advice in the Karnataka data has a lower overall level of reciprocity than the other networks in this dataset, but Table 2 suggests that when reciprocity is present in this network, it is somewhat more likely to have been reported by both parties. To explore this further, we calculate the probability of repeat naming in the Karnataka dataset (i.e., for all people listed by respondents in the first of two double-sampled questions, how many were listed again by the same respondent in the complementary question). These are often quite high: for example, sharing household items in Karnataka has a repeat nomination rate of 72% on average across the 75 villages. In the advice network, the rate of repeat nominations is low compared to other tie types in the same dataset (advice = 41%, visiting = 73%, and money = 58%) suggesting that people perceive advice relationships as being less reciprocal.
For the intersection networks, in comparison to the unaggregated networks, we see higher levels of reciprocity in the Karnataka networks despite lower density. Here again we use the simulated networks to provide a baseline for comparison. Starting from the simulated union networks, we randomly select the proportion of edges that were concordant in the associated empirical network and calculate reciprocity on the resulting subsetted network, effectively matching the empirical  intersection network size ( Figure 5). 3 Of the few edges that are retained in these simulated intersection networks, fewer still are reciprocal; the vast majority of the simulated networks have no reciprocal ties. This is what we would expect purely from the effect of the decrease in density, given the removal of edges. However, in the empirical data, reciprocity still increases substantially in the intersection networks, similar to the increase seen in the union networks. This suggests that concordant ties have a similar rate of reciprocity to non-concordant ties. The greater range of reciprocity values for the intersection networks reflects a heightened sensitivity to the particular set of edges that are retained, due to the small number of edges.

Discussion
Decisions on how to integrate multiple perspectives on what is nominally the same relationship have a dramatic effect on the structure of the resulting networks. Here, we have focused on reciprocity because of its theoretical and substantive importance, but other fundamental network measures also change through the process of aggregation. Perhaps most obviously, the density of a network is impacted by the decision to include or censor different sets of edges. However, as we have shown in our analysis of the Karnataka networks, changes in network density are insufficient to explain the changes in reciprocity that we observe. Instead, the changes are linked to the interdependence between double-sampled name generators. Contrary to previous work looking at concordance in undirected networks, here we have found that people's responses across questions about directed relationships show very low levels of concordance. The highest level of concordance in the set of networks we examined was 23%; most of the networks had levels closer to 10%. At first glance, this suggests remarkably low agreement on the existence of directed social ties. However, non-concordance does not necessarily mean that the relationships reported are not "real" or meaningful.
The interdependence of name generators is shaped by various factors including cognitive processes, aspects of the study design, and expectations for reciprocity in different kinds of relationships. First of all, conflicting reports between informants not only reflect potential informant inaccuracy (some ties may be forgotten or falsely reported) but also suggest that different people perceive their relationships differently (Krackhardt, 1987;Freeman et al., 1987;Comola & Fafchamps, 2014). For more subjective relationships that involve personal feelings and perceptions or hypothetical situations, givers and receivers may have different understandings of their relationship. While a person may feel that they could turn to another for advice, the potential advice-giver may not be aware that they are seen in this light. Even for more concrete relationships (like sharing food or household items), conflicting reports may reflect that a tie is differently valued by giver and receiver. Consider our finding that some prompts result in a few "stars" being named by many respondents. While "stars" may not list all of those who named them as provisioning support, this may accurately reflect that they perceive those relationships to be relatively unimportant, though they may be quite important to the receiver.
Different procedures for aggregating networks-taking the union (the most expansive set of edges) or taking the intersection (only concordant edges)-could imply conflicting presumptions about the "accuracy" of people's responses (and the sources of discrepancy between respondents), and about the dependence between name generators. For concrete prompts (e.g., "who did you lend money to in the past week?"), taking the union presumes that some respondents may be forgetful, but that all respondents are reporting "truthfully." Taking the intersection presumes that some respondents may exaggerate, and that concordant ties are most likely to be "truthful." Equally, if responses are seen as largely accurate but reflecting different subjective valuations of those relationships, then the union and intersection networks imply different analytical interests: the union seeks to capture the widest set of possible ties, however important, while the intersection focuses attention only on those that are mutually recognized. Given the low concordance in nominations we observe, these two sets of expectations result in very different networks, with the intersection comprising only a very small subset of reports.
Despite these differences, in the set of networks we have examined, the union and intersection procedures both generally result in levels of reciprocity that are substantially higher than what is observed in the individual layers. In the Karnataka network dataset, for example, both the union and the intersection result in reciprocity values that are roughly three times larger than what is seen in the individual layers. This is because respondents have a tendency to name the same people as both givers and receivers of exchanges (Table 2).
To some extent, this tendency to repeat nominations may be due to priming (i.e., naming the same people because they have already come to mind in the previous response). However, the extent of repeat nominations appears to depend not only on the study context but also on the type of relationship involved. In the Karnataka dataset, although the individual layers have similar levels of reciprocity across all relationship types, the increase in reciprocity in the advice networks after aggregation is less than that seen in the others ( Figure 5). One potential explanation for this pattern is that certain relationship types may be more hierarchical and that people tolerate more "imbalance" in these relationships. For example, people may be more likely to ask more senior persons for advice and may not expect to be asked for advice by them in return. The Karnataka networks indicate that people believe that many of their relationships are reciprocal (Appadurai, 1985), but also that they have differing expectations of reciprocity for different kinds of interactions (such as providing advice vs. lending household items). Although it is not clear from these data the extent to which these differences reflect actual behavior, it is clear that double-sampled relationships are potentially revealing of people's expectations about what kinds of relationships should be reciprocal.
We have focused on how network elicitation and aggregation might impact observed reciprocity, but we are leaving questions about why these processes impact network structure largely unanswered (e.g., due to priming, due to expectations of reciprocity, etc.). To some extent, this is due to the inherent ambiguity of the data that is being elicited. For instance, we have suggested that it is the nature of the double-sampled questions that is leading to potentially inflated reciprocity values, but it is also possible that "actual" reciprocity makes the reporting of concordant ties more likely. In the Karnataka data, the latter explanation cannot fully account for the patterns we have observed (because we do not see higher levels of reciprocity in the intersection than in the union networks), but it is worth emphasizing that the complexities and subjectivities of the data we are considering makes it particularly difficult to adjudicate between alternative explanations.
With the multilevel exponential random graph models, we attempted to identify some of the potential mechanisms that could account for the seemingly inflated reciprocity values observed in the union and intersection networks. Much more could potentially be done with such models to try to more fully identify the potential mechanisms at play. We have already suggested that some of the patterns observed may be driven by individuals with high in-degree, which could potentially be modeled with receiver effect terms. Future work should develop generative models aimed at better capturing the processes that produce the interdependence or apparent disagreement across layers and respondents. This may be through assessing the "credibility" of reporters (An & Schramski, 2015) or by using Bayesian approaches to assess consensus or disagreement about the existence or strength of particular ties (Butts, 2003;Lee & Butts, 2020;Newman, 2018;Young et al., 2021;Redhead et al., 2021).

Conclusion
Double sampling is generally seen as being of value insofar as it will help to create more complete and more accurate representations of people's social connections, thanks to the multiple perspectives it provides. Here, we have challenged this notion by showing that different aggregation techniques can result in very different network structures. What is clear from our analysis is that double-sampled reports are not independent, and that double sampling has the potential to reveal important insights into how people understand their relationships. So, what should researchers consider when designing studies that gather or use directed (and potentially double-sampled) network data?
First, are you concerned with actual interactions or with people's perception of their networks? As emphasized in the cognitive networks literature (Krackhardt, 1987;Freeman et al., 1987;Carley & Krackhardt, 1996;Marin, 2004), people's beliefs about their relationships, even if they are inaccurate representations of "actual" interactions, can be highly relevant to understanding their behavior. In these circumstances, non-concordance may not be of particular concern, as each respondent's nominations can potentially be treated as an accurate representation of their own perception. However, in such cases, it is important to reflect on what the aggregate of people's perceptions actually represents, and how it may (or may not) have real-world consequences (Krackhardt, 1987;Freeman et al., 1987).
Second, what are the local norms of reciprocity for the tie type of interest? As we have suggested here, strong norms of reciprocity may lead people to "inflate" their responses by repeatedly naming individuals. This can result in drastically increased reciprocity in aggregated networks. This should be especially true for relationship types that are expected to be mutually supportive and balanced, while relationships that are expected to be hierarchical and imbalanced may be less impacted. Again, however, the "inflation" here is not necessarily an inaccuracy. For instance, a person who asks a favor once a year may report a reciprocal relationship that is true but not remembered by a partner who is much more active. Close attention also needs to be paid to the impacts of question order and priming on the repeat naming of alters in network questionnaires.
Third, how concentrated is the resource being exchanged or the activity undertaken? The presence of "star" structures, where a few individuals have a large number of ties, may often be linked to an uneven distribution of resources between individuals (e.g., there may be relatively few money lenders in a community). Relationships that are more constrained by time or physical distance, such as visiting or borrowing everyday items, may be less likely to show such asymmetries. If such stars are not anticipated but appear in the data, this may reflect potential biases in people's responses, such as a potential bias in naming patterns related to prestige or reknown (e.g., if elected officials are often named). Indeed, previous research suggests that a low ratio of self-reported to alter-reported interactions may be linked to leadership, with perceived leaders (even informal leaders) being more likely to underreport their activity (Grippa & Gloor, 2009). We encourage network researchers to reflect on the relationships between network structure, other contextual factors such as resource distribution, and responses in social network surveys. Asymmetries revealed through double sampling may shed light onto these relationships and can potentially be used to inform the choice of aggregation technique.
Recent research has suggested that, between the union and the intersection, the intersection may provide more accurate depictions of social structure, particularly in sparse graphs (Lee & Butts, 2018). But in datasets like those considered here, this could mean discarding up to 90% of the data. Given the strong presumptions inherent in the union and the intersection, other more nuanced aggregation techniques should clearly be considered. Above, we suggested that more complex models have the potential to help researchers better understand the data-generating process and the potential biases within it; such models may then also be useful in helping researchers account for these biases to arrive at a representation of the network that is a better basis for inference (Redhead et al., 2021). However, given the subjectivity involved in the perception of relationships, it is not the case that aggregation methods, whether simple or complex, can always recover an objective "groundtruth." There is not a "one size fits all" solution here: the researcher must consider the particular context and relationships in question in order to determine which approach to aggregation is most principled.
The need to consider the particularities of each network further highlights that comparisons across networks need to be undertaken with caution. Slightly different prompts may produce very different networks (e.g., "who would you ask for advice?" vs. "who did you ask for advice in the last week?"). As we have shown here, comparing single layers to aggregated networks may be particularly misleading. Even within networks aggregated in the same way, different norms may lead to more "inflation" of reciprocity with double sampling. Researchers should pay close attention to the sociocultural setting, data collection approaches (including the elicitation technique such as name generators vs. roster methods, the particular prompt, question order, etc.), and data aggregation techiques, in order to determine if direct comparison across datasets is warranted. We note that the issues discussed here should also be considered when aggregating multiple relationship types into a single network (e.g., different kinds of exchanges aggregated to a single "social support" network).
Overall, then, we encourage researchers to think carefully about how respondents will engage with the questions that are posed to them and how this will impact the observed network(s) under study. Research designs can help assess (within-study) measurement validity and contribute to a better understanding of some of the issues raised in this paper. Such efforts should be undertaken early, at the stage of designing and piloting survey instruments. As one of the reviewers suggested, using multidimensional scales to assess relationship strength may provide more detailed data that can hopefully help tease out the reasons for lack of concordance between respondents. For instance, in response to a question such as "who do you ask for advice?" one might consider a scaled answer such as "could be asked if needed, " "have previously asked, " and "asked within the last month." One would expect concordance to correlate with such a scale.
In conclusion, we see great value in double sampling, despite its complexities. We emphasize that there is no one "right" way to collect and aggregate data on people's relationships. Rather, appropriate methods to represent a network depend on the research question and require careful reflection about the properties of ties that are considered to be important to the outcome of interest. What appear to be methodological decisions are thus also theoretical ones.