Sampling, Data Collection Modes, and Research Ethics

Brea L. Perry; Bernice A. Pescosolido; Stephen P. Borgatti

doi:10.1017/9781316443255.004

3 Sampling, Data Collection Modes, and Research Ethics

An early and important decision in the research process, regardless of what is being studied, is the development of strategies for sampling and data collection. Many of the sampling issues associated with ego network research are general (e.g., representativeness), while others are more unique to egocentric methodology. Here, we provide an introduction to the common sampling designs used in ego network research, providing more detail where ego network methodology diverges from standard approaches (e.g., link-tracing designs such as respondent-driven sampling).

Another critical decision in any research project is the mode of data collection. Egocentric data is typically collected using surveys administered by an interviewer in person, over the phone, or, more recently, using self-administered surveys conducted online. In this chapter, face-to-face, paper, and phone modalities are compared in the context of ego network research. Issues such as social desirability, response quality, and interviewer effects are discussed for each mode of data collection. At the same time, we also provide an introduction to observational and archival methods for collecting ego network data, offering real world examples of each approach. Observational and archival methods have distinct advantages – principally the ability to directly access information about networks without self-report biases. However, they also have nontrivial limitations and special considerations, which are discussed in this chapter.

Finally, we address the question of ethics. Network research does not have a close fit with the origins and intentions of principles of human subjects research, such as those established by The Belmont Report (National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research 1979). Most notably, inherent in the egocentric approach is observational reporting of egos about their alters, raising the question of whether alters qualify as human subjects and are covered by informed consent policies .

3.1 Sampling Methods for Ego Network Research

Sampling methods refer to the procedures through which some members of a population are selected for inclusion in a research sample, while others are excluded. Collecting data on ego or personal networks proceeds in two steps: First, one of many different methods is implemented to draw a probability or convenience sample of ego respondents. Second, these ego respondents are asked to enumerate members of their networks using one or more questions called generators (e.g., “Name the people you are close to”). These generators ultimately determine the nonrandom sample of alters included in the research (see Chapter 4 for detail).

Sampling decisions always require trade-offs between the objectives of a study and the resources that are available. They are important in any research design. However, as Barrat and colleagues (Reference Barrat, Barthelemy and Vespignani2008) caution, network studies, in particular, require a critical eye that begins with an informed understanding of how data are gathered. For both egocentric and sociocentric network data, the key issue is drawing boundaries, as discussed in Chapters 2 and 4. What is the most useful boundary that encompasses the set of nodes crucial to the research? Importantly, no sample can be perfect given that influential and central alters can come from any of a vast array of domains of social life. Consequently, some significant network members will always be outside the sample (Borgatti et al. Reference Borgatti, Everett and Johnson2013). In the end, having a sample of cases with known properties and with a clear understanding of the limits of generalizability to the population of interest is critical. In this sense, the development and assessment of the quality of an egocentric dataset does not differ from traditional research methods.

3.1.1 Sampling Egos from a Large Population

One of the most common approaches for obtaining egocentric data in the general population uses a probability sample design that focuses on selection of egos. Indeed, the ability to use probability-based methods that yield representative samples of large populations is a significant advantage of the ego network approach over sociocentric network research. No special strategies for sampling ego respondents from large populations are required – any of a number of different methods common in survey research can be used (e.g., simple random sampling, stratified sampling, cluster sampling). The key is to draw the sample of egos in a way that represents the population at large. These sampling methods are well developed and not discussed here at length since many classic references are available (e.g., Kish Reference Kish1965; Cochran Reference Cochran1977; Kalton Reference Kalton1983; for a more formal treatment specific to networks, see Frank Reference Frank, Carrington, Scott and Wasserman2005).

While the cleanest approach is a simple random sample, resources often prohibit this, particularly in egocentric network designs that are already high-cost and high-burden. For example, in Wellman’s (Reference Wellman1979) study of primary ties among East Yorkers, a sample survey of 845 adults over eighteen years of age was drawn from a specific geographical area rather than the Canadian population as a whole to narrow the sampling frame. Similarly, the General Social Survey’s initial module on “important matters” networks of the US population in 1985 (Marsden Reference Marsden1987) used a multi-stage area probability sample of census blocks to select US households. This covered about 97 percent of the resident population, excluding those living in college dorms and military quarters. The path-breaking National Health and Social Life Survey (NHSLS) also used this approach (Laumann et al. Reference Laumann, Gagnon, Michael and Michaels1994) to test the theory that both social characteristics and social networks influence sexual behavior.

Since resource constraints often prohibit the use of pure random samples in network research (or otherwise), researchers need to carefully consider the biases created by other approaches. For example, at the block level, early GSS surveys moved to a quota design, where interviewers canvased for any respondents living in the selected block, while adhering to sex, age, and employment status quotas. This design, while substantially more resource-efficient, introduced potential biases associated with the overinclusion of people who happened to be at home when the interviewer came to their house. This source of bias requires correction to minimize error. More broadly, the basic point is that perfect designs are expensive, time-consuming, and often impractical. However, without a clear sampling plan which foresees and accounts for such potential biases, the conclusions derived from the data analyses cannot be evaluated by either the researchers themselves or their readers .

3.1.2 Targeted Sampling for Subgroups or “Hidden” Populations

For some research questions, a sample of the general population of egos is not possible or even desirable. Instead, sampling targets a narrow subset of the population in a specific group or experiencing a particular event, episode, or condition, and then obtains the personal networks of those egos. In this case, finding an appropriate seed location from which to draw ego respondents is essential. For example, in the Indianapolis Network Mental Health Study (Pescosolido et al. Reference Pescosolido, Brooks-Gardner and Lubell1998), the goal was to understand the nature, dynamics, and impact of social networks among individuals making their first contact with the mental health treatment system. In the INMHS, all individuals entering two major treatment facilities, one public and one private, were screened. If they fit the inclusion criteria, they were sent a letter of invitation to participate in the study and were contacted after that point by research staff. This continued until the target size of the sample or time frame for selection was reached. Given the rare nature of the event (i.e., while many people used each treatment system, most were repeat users rather than “first-timers”), the selection continued across a number of years.

Distinct sampling approaches for egocentric data collection may be necessary in these kinds of situations – that is, where potential respondents are difficult to locate from a practical standpoint or may even be intentionally concealing themselves. There are a number of factors that make a population relatively “hidden” from researchers. In such cases, there may be no meaningful sampling frame for identifying potential respondents, or the characteristic of concern may not be located in a particular site (e.g., recent immigrants or young widows). More traditionally, populations are classified as hidden because the behavior of interest is clandestine, illegal, and/or stigmatized (e.g., secret clubs, drug users, college students with sexually transmitted infections). Studies focusing on hidden populations are important because traditional sampling methods rarely capture enough of these kinds of respondents to address research questions about them.

Link-tracing designs are a common approach to sampling participants in hidden populations. These rely on the existence of a pattern of contact between members of the population (Frank & Snijders Reference Frank and Snijders1994), using “seed” respondents to identify and gain access to additional respondents known to the seed person. That is, “the social relation itself is used as a chain of connection for building the group” (Scott Reference Scott1991: 59). Because of the ethical issues associated with seed respondents providing identifying information about other potential participants without their consent, respondents themselves typically recruit their alters into the study. Though this avoids breaches of confidentiality, which are especially problematic when studying stigmatized populations, this feature of snowball sampling makes it difficult to accurately assess response rates and nonresponse biases (Heckathorn Reference Heckathorn2002).

Link-tracing designs have long been used to reveal the network structure of hidden populations, such as in Kadushin’s (Reference Kadushin1968) early comparative study of power elites (see also Coleman Reference Coleman1958). However, the transition from link-tracing for studying social networks to link-tracing as a convenience sampling method led to a discussion of problems of inference associated with this approach (Erickson Reference Erickson1979). Notably, because the characteristics of the sample will be inherently biased by the selection of a particular set of seed respondents and their typically homophilous networks (Heckathorn Reference Heckathorn2002), seeds should ideally be randomly chosen. For example, a researcher could approach every nth person to walk through the door of a randomly selected group of sites (e.g., bars, support groups) frequented by members of a hidden population. Likewise, link-tracing designs can create biases associated with volunteerism (i.e., members of hidden populations who agree to participate, as well as their network members, are likely to differ systematically from those who do not). These methods are also disproportionately likely to include in the sample respondents with large networks and network members with particular kinds of affiliations (e.g., friends, kin) that are especially likely to be nominated by seeds (Erickson Reference Erickson1979). Also, areas of high redundancy can be suspect, leading Borgatti and coauthors (Reference Borgatti, Everett and Johnson2013) to suggest that bridging ties be considered for further steps out to avoid capturing only one dense segment of a network .

More recently, researchers have engaged in efforts to differentiate distinct but related link-tracing approaches – and their associated strengths and weaknesses – that are often erroneously referred to as snowball sampling (Magnani et al. Reference Magnani, Sabin, Saidel and Heckathorn2005). In a true snowball sampling approach, an initial set of seed respondents (sometimes referred to as indexes, Valente Reference Valente2010: 42) is asked about their social ties, often of some particular type (e.g., people with whom the seed respondent did intravenous drugs). In turn, these alters are interviewed, and all of their alters are enumerated and subsequently interviewed (i.e., alters’ alters). This is repeated until the researchers conclude that a saturation point has been reached. Snowball sampling is typically used to generate whole network data rather than egocentric network data. However, it is a useful starting point for discussion since alternative approaches are essentially adaptations of the original snowball sampling design.

A number of variations on snowball sampling, also characterized as link-tracing designs, have been used to address specific limitations of this approach and are more conducive to the collection of ego network data. The chain link or chain referral design is a form of snowball sampling where initial seeds report their ties and nominate only one person for the next step outwards. The initial subjects serve as seeds through which wave one subjects are recruited; wave one subjects in turn recruit wave two subjects; and the sample subsequently expands wave by wave until the desired sample size is reached (Heckathorn Reference Heckathorn2011). Relative to snowball sampling, the chain referral method requires more steps out from the initial seed respondent, and a larger number of starter seeds to attain an adequate sample size. Consequently, the sample as a whole is generally more socially distant from seed respondents, and seed respondents themselves may be more diverse. Theoretically, this will increase the representativeness and the variation in the sample. Another advantage of the chain referral method is that the researcher can ask the seed respondent to refer a particular kind of alter (e.g., a woman, a person over the age of fifty, a drug user who has never been in treatment), which provides some degree of control in shaping the characteristics of the sample. This allows the researcher to select for specific attributes to achieve a sample that is believed to reflect the target population as a whole (Biernacki & Waldorf Reference Biernacki and Waldorf1981).

A closely related approach often used in network research is the random walk design. This is a form of link-tracing where the seeds provide multiple network contacts, but only a subset (often one) of each seed’s alters is randomly selected from the nominated ties. This method is used to obtain probability samples of large social networks in hidden populations or in cases where no known sampling frame exists. In this case, the sampling unit is the network as a whole rather than the individual, and the goal is to estimate properties of the network. Consequently, the random walk design is especially appropriate when the process or outcome being studied is clearly linked to structural characteristics of networks, such as disease transmission or diffusion of information. An advantage of the random walk design over some other link-tracing methods is that it introduces an element of randomness that permits the derivation of unbiased indicators and estimates of their precision (Heckathorn Reference Heckathorn2002). However, this requires a series of additional procedures that constitute what is often called respondent-driven sampling.

Respondent-driven sampling (RDS) methods were established largely in response to the criticism that link-tracing designs are essentially a form of convenience sampling and that biases associated with them prohibit any claims of representativeness (Heckathorn Reference Heckathorn1997). The key feature of RDS, which is similar to chain referral or random walk designs, is that it is implemented in a way that allows for the calculation of selection probabilities. That is, seeds in the initial wave become limited recruiters in the next sampling stage. Specifically, each participant is given two to four coupons to enlist more participants under the following requirements: (1) documentation of who recruited whom; (2) rationing of coupons (three per seed is recommended); (3) information on personal networks is gathered and reported; and (4) recruiters and recruits must know one another through a preexisting relationship. When these conditions are met, long recruitment chains are likely.

The principal advantage of RDS is that Markov chain theory can be used to derive population estimates and sampling weights. Contrary to conventional wisdom, this allows bias from the convenience sample of initial subjects to be progressively attenuated at a geometric rate as the sample expands across waves. The final estimates from these procedures are asymptotically unbiased (see Heckathorn Reference Heckathorn1997, Reference Heckathorn2002; Salganik & Heckathorn Reference Salganik and Heckathorn2004). The implication is that this sampling method could potentially become reliable when the number of waves is sufficiently large, even if the initial sample is not random. However, in cases where different groups (e.g., racial or ethnic groups) are sufficiently segregated (referred to as the inbreeding terms), the number of waves has to be large for each group. That is, when boundaries separating groups are impermeable, RDS should be used to draw samples from within, not across, groups. Additional and more detailed information about RDS designs and the calculation of estimates and sample weights is available (Salganik & Heckathorn Reference Salganik and Heckathorn2004; Volz & Heckathorn Reference Volz and Heckathorn2008; Gile & Handcock Reference Gile and Handcock2010).

3.2 Data Collection Modes for Ego Network Surveys

Following sampling decisions, another important consideration for network researchers using survey designs is the mode of data collection. Decisions about mode of survey administration have nontrivial consequences for data quality in any research study. However, the complex steps and procedures required to collect data on ego networks using surveys likely exacerbates the consequences of different administration modes.

Egocentric data collection through surveys is a high-burden task requiring multiple steps, including tools for generating a list of alter names (i.e., name generators), a series of questions designed to elicit information about each alter named (i.e., name interpreters), and another tool for gathering information about ties between alters (i.e., density, or adjacency, matrix). The name generating process carries a particularly high cognitive burden for respondents, who are asked to process difficult and subjective questions (e.g., Who are you close to? Who do you talk to about important matters?; Bearman & Parigi Reference Bearman and Parigi2004). Respondents must then scan a large cognitive network – upwards of 300 meaningful social ties or 1,500 acquaintances (Freeman & Thompson Reference Freeman, Thompson and Kochen1989; Killworth et al. Reference Killworth, Johnsen, Russell Bernard, Shelley and McCarty1990; McCarty et al. Reference McCarty, Killworth, Bernard, Johnsen and Shelley2001) – for alters that meet the generator criteria, and return these to the interviewer. The name interpretation and matrix tasks are also burdensome because they are time-consuming and repetitive, potentially leading to cognitive shortcutting (e.g., providing the same response for successive pairs without processing the question). In general, the more cognitively complex and burdensome the survey instrument, the higher the stakes associated with methods for collecting the data.

Prior to discussing the advantages and disadvantages of face-to-face, telephone, and online administration modes, describing some of the most common problems affecting data quality in survey research is in order. Mode of administration affects the following conditions:

Social desirability bias is the tendency for respondents to provide inaccurate answers to maintain or achieve a favorable impression of themselves. It may involve underreporting sensitive or negative behaviors and opinions (e.g., drug use) or over-reporting positive or prosocial ones (e.g., volunteering), both with respect to egos themselves and their alters. Because respondents are likely to want to appear well liked, socially active, and supported, social desirability can be an important source of bias in egocentric research.
Satisficing refers to behaviors that reduce the cognitive burden of completing surveys, but adversely affect response quality. The respondent must carefully read or listen to the question, interpret its meaning, retrieve information from memory, and formulate and correctly return a summary judgment on the basis of that memory (Krosnick Reference Krosnick1999). Many respondents cannot or are unwilling to exert this level of effort for the entire duration of a survey interview. Consequently, respondents often engage in various satisficing practices to complete the interview while reducing their cognitive burden, including executing all the cognitive steps required, but with less effort, or shortcutting some of the cognitive steps and providing answers that are inaccurate but defensible. Satisficing is more common when the interview task is difficult, and is therefore highly relevant to ego network research. Examples include nonresponse and “don’t know” responses, or nondifferentiation (e.g., providing the same response to a name interpreter for all alters).
Interviewer effects occur when variations in the way that surveys are administered systematically influence responses. Some interviewers are more enthusiastic, provide positive reinforcement, and prompt for valid responses. Others can seem bored or disengaged, move through the interview too quickly, and can even encourage satisficing or truncating the number of alters given (Marsden Reference Marsden2003; Eagle & Proeschold-Bell Reference Eagle and Proeschold-Bell2015). These differences in interviewers’ behaviors or attitudes can result in substantial variation in response quality. Interviewer effects are of particular concern when a multiple name generator approach is used or when the instrument is especially long or complex because both the interviewer and the respondent can become burned out and seek cognitive shortcuts.

Face-to-face, telephone, and online administration of egocentric network surveys have distinct advantages and drawbacks with respect to these potential problems.

3.2.1 Face-to-Face Administration

Because of the complexity and high burden on respondents associated with egocentric network surveys, face-to-face interviewing is the preferred and most common administration mode. Most of the classic ego network data were collected through face-to-face interviews. Barry Wellman’s East York study of community ties in Toronto, Canada was conducted in 1968 using face-to-face administration of paper surveys with 845 adults, and later in-depth interviews with a subset of the initial probability sample. A decade later, Fischer’s Northern California Community Study (Fischer Reference Fischer1977a), conducted in fifty communities in and around San Francisco, used face-to-face interviews to collect extensive ego network data from 1,050 individuals, recording responses on paper. More recently, studies such as the National Longitudinal Study of Adolescent to Adult Health (Add Health) have used face-to-face administration for the in-home interview, despite the availability and efficiency of web-based administration (Harris et al. Reference Harris, Halpern and Whitsel2009). The nationally representative Add Health study began in 1994 and is now entering its fifth wave of data collection. Most questions are read aloud in person by the interviewer, including the ego network component. However, to acquire data about drug use, sexual behavior, and other sensitive topics, there are computer-assisted self-administered components. Multi-mode administration is ideal for long and complicated instruments with sensitive questions that are likely to cause social desirability bias. Though large-scale population or community studies such as these have produced invaluable insights into social behavior, they are extremely expensive and complicated to implement, often requiring collaboration between multiple academic and government institutions (Golder & Macy Reference Golder and Macy2014).

Face-to-face administration continues to be the gold standard for ego network research because it is less cognitively challenging for the respondent than telephone or self-administered (e.g., web-based) surveys (Bowling Reference Bowling2005), easing the burden of egocentric data collection on respondents. Face-to-face administration requires only that the respondent use basic verbal and listening skills. While telephone interviews require the same skills, they have a greater auditory burden since nonverbal cues are unreadable. Moreover, respondents must remember response options since hand cards are unavailable and there is no other written text to which they can refer (Christian, Dillman, & Smyth Reference 316Christian, Dillman and Smyth2008). Self-administered surveys, however, are the most burdensome (Bowling Reference Bowling2005) because the respondent must use visual, reading, and numeric skills, and must also rely on writing and keyboard skills (e.g., during the name generator task or open-ended responses), depending on whether the survey is pen-and-paper or web-based.

Interviews that are administered face-to-face, involving a back and forth between interviewer and respondent, offer a number of other important benefits. It is a significant advantage to have interviewers on hand to give directions and answer questions (Vehovar et al. Reference Vehovar, Manfreda, Koren and Hlebec2008). Face-to-face interviews also perform better than telephone and online surveys with regard to satisficing because of the reduced social distance between the interviewer and respondent. Drawing on three national mode experiments, Holbrook, Green, and Krosnick (Reference Holbrook, Green and Krosnick2003) compared levels of satisficing in face-to-face and telephone interviews. They measured satisficing using three outcomes: choosing a no-opinion response (e.g., “don’t know”), failing to differentiate between multiple items in a response scale, and acquiescing toward agreement regardless of content. They found that respondents were significantly less likely to engage in these satisficing behaviors in face-to-face interviews compared to telephone interviews. Research suggests that interviewers develop rapport with respondents in face-to-face interviewers, but do so to a lesser degree in phone interviews and not at all in online surveys (Drolet & Morris Reference Drolet and Morris2000). Because this rapport leads to better cooperation on complex tasks and respondent investment in the data collection process, satisficing is less likely to occur. Also, interviewers can pick up on nonverbal cues that respondents are losing focus or interest, and can respond with positive reinforcement or other techniques for drawing them back in (Shuy Reference Shuy, Grubrium and Holstein2002). Finally, relative to web-based surveys, face-to-face (and telephone) surveys provide opportunities for probing no-opinion and nonresponse (Heerwegh Reference Heerwegh2009).

Of course, face-to-face data collection has disadvantages as well. Most notably, it is significantly more expensive and time-consuming than other modes of survey administration. Moreover, egocentric studies have been shown to be vulnerable to interviewer effects (Marsden Reference Marsden2003), which are eliminated when self-administered surveys are used. Face-to-face data collection is also more susceptible to social desirability bias compared to other modes, particularly when obtaining sensitive data (Tourangeau & Yan Reference Tourangeau and Yan2007). Consequently, network researchers have sought alternatives to face-to-face data collection, and conducted evaluations of the reliability and validity of data collected through other modes .

3.2.2 Telephone Administration

Telephone-administered interviews have been a popular mode of data collection for many decades (Holbrook et al. Reference Holbrook, Green and Krosnick2003). This mode is appealing to network researchers because it shares some of the advantages of face-to-face data collection (e.g., presence of an interviewer to answer questions and provide feedback), but at a substantially lower cost. For example, the Social Capital-USA (SC-USA) survey was collected via random digit dialed telephone interviews in 2004, with companion studies conducted in China and Taiwan (Lin, Fu, & Chen Reference Lin, Fu and Chen2013). This national study of 3,000 currently or previously employed adults in the United States used position generators to evaluate access to social capital through egocentric networks. These data have been leveraged to answer important questions about network-based social capital, explaining the relative advantages of white men over women and other racial/ethnic groups in the labor market (McDonald, Lin, & Ao Reference McDonald, Lin and Ao2009; McDonald Reference McDonald2011). The SC-USA instrument was narrow in focus and fairly brief, taking an average of only thirty-five minutes to complete. Moreover, because the position generator is less complex and burdensome to administer than name generators (see Chapter 5), this instrument was ideally suited to telephone administration.

The principal advantage of telephone administration relative to the face-to-face mode is cost and efficiency. It is possible to complete a larger number of surveys more quickly and with less administration. It is easier to supervise interviewers when they are working in one location, leading to better standardization and quality control capabilities, and reducing interviewer effects. Additionally, surveys administered over the phone may be less susceptible to social desirability bias since there is more social distance between the interviewer and respondent (Kogovšek & Ferligoj Reference Kogovšek and Ferligoj2005a).

However, telephone administration has a number of serious drawbacks. Random digit dialing – a feature that is critical to the efficiency of telephone interviewing – faces technology challenges (Kempf & Remington Reference Kempf and Remington2007). These include voicemail, caller ID, call blocking, and exclusive use of cell phones, which threaten population coverage, response rates, and participation rates. In particular, response rates for telephone surveys are very low (~50 percent, on average), and have continued to trend downward since the mid-1970s (Curtin, Presser, & Singer Reference Curtin, Presser and Singer2005). Finally, respondents being interviewed by phone could be engaged in various other tasks simultaneous to the interview (Holbrook et al. Reference Holbrook, Green and Krosnick2003). Multitasking siphons cognitive resources away from the interview task, threatening the quality of the data produced. Further, the advent of mobile technology means that area codes cannot be used to select respondents from a geographical area nor to geocode the respondent residence. This problem negates a major early advantage of telephone administration and requires that place data be added to the survey instrument.

Despite the challenges of telephone interviewing, this mode produces moderately reliable and valid ego network data. Kogovšek and colleagues (Reference Kogovšek, Ferligoj, Coenders and Saris2002) compared the reliability and validity of egocentric data collected using face-to-face and telephone interviews. Contrary to their hypotheses, the authors concluded that the telephone administration mode with all interpreter questions asked by alter (i.e., rather than asking one question about every alter in turn) yielded better validity estimates than the face-to-face interviews. The authors speculated that these patterns are attributable to the sensitive nature of questions about personal relationships, arguing that respondents might feel more comfortable answering such questions over the phone. Additionally, for test-retest reliability, performance of telephone relative to face-to-face modes varied depending on the specific measure. For more cognitively demanding and objective name interpreters (e.g., frequency of contact), face-to-face administration produced better reliability, while telephone interviews performed better for subjective measures that could be answered quickly (e.g., closeness). These comparisons regarding validity and test-retest reliability were later confirmed by the same research group using a meta-analysis (Kogovšek & Ferligoj 2005).

Research suggests that telephone administration of ego network surveys also produces fewer errors than web-based administration. Kogovšek (Reference Kogovšek2006) compared web and telephone administration of an ego network module that included three name generators and nine name interpreters. She concluded that the test-retest reliability of network measurement is better when data are collected via the telephone, compared to web-based surveys. She attributed her findings to the presence of an interviewer answering questions and setting the pace of survey administration, reducing the likelihood of errors and satisficing.

In all, these studies indicate that telephone administration of ego network surveys may continue to be an acceptable replacement for face-to-face administration. However, as technology rapidly changes, researchers should give careful consideration to the use of traditional telephone approaches to data collection. It is not known how telephone administration would compare to the face-to-face mode when the social network module is only one component of a longer instrument, as is typically the case, or when the network module is particularly lengthy and complex. In these cases, face-to-face is likely to outperform telephone administration (Holbrook et al. Reference Holbrook, Green and Krosnick2003). Given these contingencies and the broader challenges associated with telephone interviewing (Bowling Reference Bowling2005), face-to-face remains the ideal mode when there are sufficient resources and time for an in-person interview .

3.2.3 Web-Based Administration

With an ever-increasing proportion of people gaining access to the Internet globally, many researchers are turning to web-based survey designs (Fricker & Schonlau Reference Fricker and Schonlau2002). The Internet has been a transformative development in social science research, with substantial advantages associated with greater economy of scale (Golder & Macy Reference Golder and Macy2014). The ability to obtain very large samples quickly and at a relatively low cost is facilitated by companies providing access to web panels of survey participants. Some of these panels are opt-in, such as Amazon’s Mechanical Turk, which provides an online global labor market for self-selected respondents paid per survey (Buhrmester, Kwang, & Gosling Reference Buhrmester, Kwang and Gosling2011). Others, such as the Gallup panel, recruit individuals to web panels through random digit dialing or address-based sampling to achieve a nationally representative pool of respondents that is not compensated for participation (Rao, Kaminska, & McCutcheon Reference Rao, Kaminska and McCutcheon2010). Researchers can then pay to access these panels. For example, GfK, with its KnowledgePanel^®, meets quality standards for federal and peer-reviewed studies. Using a complete list of US households and web-enabled technology, respondents in selected households without technological capability receive equipment and service connection; others receive monetary compensation. TESS, funded by the National Science Foundation, is a competitive, no-cost option open to graduate students, postdocs, and faculty, for survey-based social science research using the GfK panel.

Recently, researchers using ego network designs have begun to leverage such web panels. O’Malley and coauthors (Reference O’Malley, Arbesman, Steiger, Fowler and Christakis2012) developed a web-based survey instrument for collecting information on up to eight alters using name generators adapted from the General Social Survey and National Social Life, Health, and Aging Project (Burt Reference Burt1984; Laumann et al. Reference 326Laumann, Gagnon, Michael and Michaels1995). They used a longitudinal, probability-based Gallup panel of households in the United States to access 6,000 randomly selected potential respondents. A total of 3,232 respondents agreed to participate (a 54 percent response rate), completing name generator, name interpreter, and adjacency matrix tasks, in addition to providing information on health and social behavior. Analysis of these data yielded important insights, suggesting that larger networks are associated both with engagement in prosocial behaviors and with better health status.

Web-based surveys have clear advantages, most notably the low cost and speed of data collection. They can be completed with little administration and at a fraction of the cost of face-to-face interviews. Moreover, web-based surveys eliminate the problem of interviewer effects (Manfreda, Vehovar, & Hlebec Reference Manfreda, Vehovar and Hlebec2004), which can be particularly pronounced in ego network research because interviewers and respondents alike experience fatigue (Eagle & Proeschold-Bell Reference Eagle and Proeschold-Bell2015). This can lead interviewers to speed up the pace of the interview (compromising comprehension and cognitive processing) and to consciously or unconsciously give cues that truncate the number of alters named (Marsden Reference Marsden2003). Finally, web-based surveys are less vulnerable to social desirability bias than interviewer-administered surveys. Internet administration increases reporting of sensitive information, as well as reporting accuracy relative to telephone and face-to-face administration (Duffy et al. Reference Duffy, Smith, Terhanian and Bremer2005; Tourangeau & Yan Reference Tourangeau and Yan2007; Kreuter, Presser, & Tourangeau Reference Kreuter, Presser and Tourangeau2008).

Nonetheless, web-based surveys have real disadvantages for egocentric designs. Most problematic is that there is no interviewer to guide the respondent through the complicated data collection task. Likewise, it is impossible to control the environment in which respondents complete web-based surveys, leading to errors associated with distractions, outside sources of information, or multitasking (e.g., checking Facebook while completing a survey; Rand Reference Rand2012). Some respondents in web panels click randomly to efficiently complete a task – an extreme form of satisficing – necessitating precautions to identify and correct for these behaviors (Golder & Macy Reference Golder and Macy2014). Moreover, missing data are a significant issue, with web-based surveys yielding high rates of nonresponse, “don’t know” responses, and incomplete interviews (Heerwegh Reference Heerwegh2009). These can be related to technical issues (e.g., incompatibility among different browsers, software versions, and Internet settings), which can arise during web-based surveys. However, incompletion is more often due to respondent burnout, which is correlated with network size (Manfreda et al. Reference Manfreda, Vehovar and Hlebec2004; Vehovar et al. Reference Vehovar, Manfreda, Koren and Hlebec2008). Thus, when collecting data on larger networks online, there are likely to be biases caused by egos with larger networks abandoning the survey during the name interpreter or adjacency matrix tasks.

Matzat and Snijders (Reference Matzat and Snijders2010) conducted two studies comparing egocentric data collection via the web and face-to-face interviews. They found that respondents answered more “mechanically” in the online survey, and that this compromised the quality of network data. Specifically, there were high dropout rates during the network instrument in the web-based survey – 27 percent in the online mode compared to only seven percent offline. Moreover, satisficing behaviors were more common in the web-based survey. There were more missing data in the name interpreter section, and respondents were significantly more likely to engage in nondifferentiation in the web-based survey relative to face-to-face administration. Mode also affected network measures, with web-based surveys yielding smaller network size and larger density because respondents provided fewer names and tended to report that all alters were related. Overall, Matzat and Snijders (Reference Matzat and Snijders2010) concluded that traditional network tools (e.g., the name generator, name interpreters) are not effective in an online format, and that alternative techniques for collecting egocentric data using web-based surveys are needed to hold respondents’ attention and motivate them to exert cognitive effort.

Making the most of web-based surveys for egocentric research. While face-to-face interviewing is the preferred method for ego network research, it is not always possible to use this mode of administration. Researchers have recently concluded that reasonably reliable and valid network data can be generated using self-administered surveys via the Internet (Coromina & Coenders Reference Coromina and Coenders2006; Kogovšek Reference Kogovšek2006; Vehovar et al. Reference Vehovar, Manfreda, Koren and Hlebec2008). In response to concerns described above, web-based surveys for collecting ego network data have begun to incorporate rich visual components and response methods designed to be more interactive and to keep respondents engaged (e.g., Hogan et al. Reference Hogan, Melville and Philips2016). Examples of software for web-based collection of egocentric data include EgoWeb 2.0, OpenEddi, Social Mirror, and netCanvas. Others, described below, have focused on improving web-based collection of ego network data to obtain the best possible results.

Published evaluations of web-based surveys for egocentric data collection suggest that it is critical to provide detailed and specific instructions for the name generator task (Manfreda et al. Reference Manfreda, Vehovar and Hlebec2004). Respondents of web-based surveys tend to make errors in this section of the network instrument, in particular. Examples of common mistakes include alluding to multiple alters in one space, such as “parents” or “family,” typing multiple names in one box, or providing the same name multiple times. Such errors are relatively common (e.g., about eight percent in one study; Vehovar et al. Reference Vehovar, Manfreda, Koren and Hlebec2008), and are difficult and time-consuming to correct at the data cleaning stage, but can be minimized with clear and effective instructions.

Other research indicates that the number of empty boxes presented in the name generator task influences the number of alters named (Manfreda et al. Reference Manfreda, Vehovar and Hlebec2004). For example, a nontrivial percent of respondents will fill all the boxes because they believe this is what the researcher wants. Manfreda and colleagues (Reference Manfreda, Vehovar and Hlebec2004) found that when they supplied thirty spaces for names of alters from whom they could borrow things, 15 percent of respondents named thirty alters, though it would be highly unusual to have a lending network of this size. Vehovar and colleagues (Reference Vehovar, Manfreda, Koren and Hlebec2008) confirmed the strong relationship between number of boxes and network size by randomizing respondents to different conditions. Thus, use of a graphical interface for adding alters may be preferable to using empty boxes for obtaining alter names in a web-based survey. Alternatively, some suggest imposing a limitation on the number of alters that can be named in web-based surveys (Manfreda et al. Reference Manfreda, Vehovar and Hlebec2004; Gerich & Lehner Reference Gerich and Lehner2006), but this solution has its own drawbacks, discussed in Chapter 4. In short, serious consideration must be given to the formatting of any data collection instrument that is not face-to-face.

Additionally, the number of alters named by a respondent is strongly associated with the likelihood of missing data on the name interpreter task. Manfreda and colleagues (Reference Manfreda, Vehovar and Hlebec2004) found that respondents providing five or fewer alters mostly completed the name interpreter section, but nonresponse increased thereafter with each additional alter, as shown in Figure 3.1 (see also Vehovar, Manfreda et al. Reference Vehovar, Manfreda, Koren and Hlebec2008). It is critical, therefore, to increase motivation to complete the name interpreter section of the interview, particularly for those with large networks. Alternatively, imposing a limit on the number of alters ego can name also reduces this problem (Manfreda et al. Reference Manfreda, Vehovar and Hlebec2004; Gerich & Lehner Reference Gerich and Lehner2006).

Figure 3.1 Name interpreter completion as a function of number of alters named in web-based ego network surveys (grayscale)

Note: From Manfreda et al. (Reference Manfreda, Vehovar and Hlebec2004)

The organization of the name interpreter task also has important implications for data quality. Name interpreters can be administered by question or by alter. A meta-analysis of three network studies found that for web-based surveys, overall reliability and validity are stronger when responses are elicited by question (Coromina & Coenders Reference Coromina and Coenders2006), contrary to findings for telephone administration of ego network tools (Kogovšek et al. Reference Kogovšek, Ferligoj, Coenders and Saris2002). Vehovar and colleagues (Reference Vehovar, Manfreda, Koren and Hlebec2008) found that dropout and nonresponse rates were significantly higher for the alter-wise compared to the question-wise format for name interpreters, suggesting that responding by question is less cognitively burdensome.

Finally, some visual features of web-based surveys, such as presenting multiple items in the form of grids, are not conducive to holding respondents’ attention, and may reduce the level of cognitive processing (Tourangeau, Couper, & Conrad Reference Tourangeau, Couper and Conrad2004). This type of design format has been associated with increased item intercorrelation and nondifferentiation (i.e., not using all of the response categories available). For example, respondents may click mechanically without thinking about their answers. This format is often used for name interpreters and evaluation of alter-alter ties in online surveys, threatening the quality of these aspects of egocentric data collection.

In contrast, network visualization can be used to improve the survey experience and enhance data quality (Coromina & Coenders Reference Coromina and Coenders2006). For example, Tubaro, Casilli, and Mounier (Reference Tubaro, Casilli and Mounier2013) developed a graphic interface that allows respondents to draw their networks directly, reducing the amount of time required to complete the name generator tool and some name interpreters (see Figure 3.2). Similarly, alter-alter ties can be elicited by having respondents draw lines between nodes containing alters’ names (Fagan & Eddens Reference Fagan and Eddens2014). These strategies reduce the length of the interview and the banality of adjacency matrix tasks that ask about every pair of alters (see Chapter 6). In addition to being more efficient and engaging, such tools provide the immediate gratification of a holistic view of respondents’ networks, creating motivation to complete the network tasks. These tools are discussed in greater depth in Chapter 11 .

Figure 3.2 Online participant-aided sociogram for collecting egocentric network data (grayscale)

Notes: Buttons at right are used to add and move alters, create groups, designate alter attributes, and connect alters to one another; from Tubaro et al. (Reference Tubaro, Casilli and Mounier2013).

3.3 Observational and Archival Methods for Egocentric Data Collection

Ego network data can also be generated using observational and archival methods, though these are used much less frequently than survey methods. Early anthropological work employed social network concepts and methods to synthesize complex patterns of relationships that were observed directly in the course of ethnographic field work (Barnes Reference Barnes1954; Mitchell Reference Mitchell1969), constituting some of the foundational contributions to a social network perspective. In addition, archives have long been used to generate network data, and are becoming particularly valuable in the age of social media. The Internet constitutes a vast resource of publicly available electronic archives – such as Facebook or Twitter interactions – that are being leveraged to answer important research questions.

3.3.1 Generating Ego Networks through Observation

Social network data can be obtained by directly observing relationships or interactions between individuals, either as an unobtrusive observer or as a participant, usually for a substantial period of time. It is possible to generate reliable data that are unaffected by self-reporting, though they may be influenced by the observer’s interpretation rather than respondents’ perceptions in the survey format. These observed ties between actors, as well as characteristics of ties or individuals, can then be entered into conventional statistical software for analysis and visualization. Often, observational network data are enhanced with qualitative insight into the meanings or motivations of network structure or dynamics, including why ties form or dissolve, and how alters influence egos’ attitudes, beliefs, or behaviors (Bellotti Reference Bellotti2014).

An early example of the use of observation to generate network data comes from William Foote Whyte’s Street Corner Society (Reference Whyte1943). In this three-year ethnography, Whyte shadowed one man (“Doc”) through his everyday life in a Boston neighborhood that was home to working class first- and second-generation Italian immigrants. In addition to mapping Doc’s egocentric network by recording his interactions (see Figure 3.3), he eventually expanded his observations to include a number of his alters’ and alters’ alters’ interactions, constructing the network of the entire slum. As the leader of a group of criminally involved corner boys, and a bridge to other gangs and subgroups, Doc’s life provided unique insight into the politics and organization of social life in a low-income urban ethnic community. More recently, Uzzi (Reference Uzzi1997) and Mische (Reference Mische2008) employed similar methods to construct the ego networks of women’s garment firms in the New York City apparel industry, and networks of youth activists engaging in political participation in Brazil, respectively.

Figure 3.3 Doc’s egocentric network, as observed by Whyte (grayscale, Reference Whyte1943)

The principal advantage of observational methods is the naturalness of the data, which are not subject to many of the errors and biases inherent in survey research (e.g., social desirability, satisficing, interviewer effects). In short, actual interactions are charted in real time and do not rely on people’s descriptions or recollections (Marshall & Rossman Reference Marshall, Rossman, Marshall and Rossman1995). Consequently, network data collected through observation may be more accurate; and, because they occur in natural settings, provide unique insight into the context and meaning of network ties or exchanges between network members (Bellotti Reference Bellotti2014). In many cases, researchers gain valuable but unanticipated information from observations that would not have been included on a survey.

However, ethnographic observation of social networks also has limitations. Only a small number of networks can be observed or recorded, because doing so is extremely time-consuming. Further, bias is difficult to assess since it is not possible to measure how the observer might have changed subjects’ behaviors or interactions. This is particularly true when participant observation is employed and the researcher is an identified or concealed member of the networks being observed (Merriam & Tisdell Reference Merriam and Tisdell2015). Likewise, researchers’ observations may be biased by their own immersion in a particular community or culture, since the researcher must record and interpret what is observed. Finally, observations are limited to interactions – we cannot directly observe relational attitudes, such as who trusts whom .

3.3.2 Obtaining Ego Networks through Archival Research

The term “archival data” refers to any information that existed prior to a research project and was not produced for the purposes of research. Archival data takes many forms, including public records, legal documents, memoirs, or audio or video recordings. More recently, email chains and social media exchanges have been used to generate network data. Though not produced by the researcher, archival data are still open to subjectivity since the researcher chooses which documents or information to include and which to exclude (Brettell Reference Brettell and Bernard1998).

Archives have primarily been used to generate sociocentric network data. A classic example is Padgett and Ansell’s (Reference Padgett and Ansell1993) study of 15th-century Florentine families. Using archival data compiled by historians, including tax reports, employment data, and property records, the authors coded for nine types of relationships between powerful families (e.g., intermarriage, business ties, political relations). Using these data, Padgett and Ansell explained the Medici family’s rise to political power, documenting how the Medici exploited their structural position vis-à-vis structural holes in a historical period of uncertainty. Another classic dataset generated using archival sources is the Southern women affiliation network. These data were constructed using an ethnography (Davis, Gardner, & Gardner Reference Davis, Gardner and Gardner1941) of social stratification in 1930s Natchez, Mississippi. The study included information about the coattendance of eighteen Southern women at fourteen social events ascertained via newspapers, guest lists, and interviews. These data have been used to develop new network measures and test theories about the structure of small groups and identification of group members as core or peripheral (Freeman Reference Freeman, Breiger, Carley and Pattison2003).

Ego network data have infrequently been obtained using historical archives, but research by Edwards and Crossley (Reference Edwards and Crossley2009) provides a recent example. Drawing on speeches, newspaper articles, and personal written correspondence, the researchers constructed the egocentric social network of the militant British suffragette Helen Watts between 1909 and 1914 (see Figure 3.4). Through a mixed method approach that combined formal structural analysis with insights from the content of written correspondence, the researchers identified alters that were key figures in Watts’ radicalization, as well as alters who became involved in the suffragette movement as a result of their contact with Watts. Thus, they were able to demonstrate that activism is both a consequence of social network contacts and a cause of social network dynamics.

Figure 3.4 Egocentric social network of suffragette Helen Watts, produced using archival data (grayscale)

Note: From Edwards and Crossley (Reference Edwards and Crossley2009)

Archival methods have a number of advantages. Like data generated through observation, archival methods are not subject to self-reporting bias. From a practical standpoint, archival research can be very cost effective, since no data collection is required. Also, archival research provides opportunities to address unique empirical questions that cannot be examined any other way. For example, archives make it possible to explore historical events or periods from a network perspective, and to trace change (or stability) over time in how personal networks are structured or how they function. A researcher can also study sensitive or inflammatory topics and hidden populations more easily using archival data (e.g., what kinds of people joined the Ku Klux Klan; Fryer & Levitt Reference Fryer and Levitt2012), since individuals are unlikely to report on these activities in a survey and may be difficult to access.

Yet, there are limitations associated with archival methods for egocentric research. Like observational methods, the time-consuming and laborious nature of gathering and coding archival data often constrain the number of networks that can be examined. Consequently, archival research often constitutes a case study that is difficult to replicate. Moreover, since the data preexist the research project, key variables of interest may not be available, limiting the kinds of questions that can be addressed. Finally, because media reporting and the recording of even official data are conducted through human processes, they are subject to error and bias that must be considered (Gitlin Reference Gitlin1980).

3.3.3 Archival Social Media Data

In the past decade, online social networks have been used to generate sociocentric and egocentric network data using, for example, contacts or exchanges on Facebook or Twitter. “Web scraping” constitutes a form of archival data collection because the researcher obtains relational data after the fact (even if very soon after) that was generated for a purpose other than research (i.e., socialization or information sharing online). Software for obtaining social network data online has been developed and used successfully in social science research. These software packages include NodeXL for scraping, analyzing, and visualizing various types of social media data in network form, and NameGenWeb for downloading Facebook friendship relations as egocentric networks (Hogan Reference Hogan2010; see detailed descriptions in Hansen, Shneiderman, & Smith Reference Hansen, Shneiderman and Smith2010).

Brooks and colleagues (Reference Brooks, Hogan and Ellison2014), for example, explored associations between the structure of Facebook friendship networks and perceptions of social capital in the online environment. They used a modified version of Hogan’s (Reference Hogan2010) NameGenWeb to obtain online egocentric social network data from 238 university students. In addition to leveraging archival Facebook data, they collected self-report data from participants using a web-based survey. They found that Facebook networks are more compartmentalized than offline networks, and tend to be characterized by nonoverlapping cohesive clusters corresponding to different foci of activity (e.g., high school friends, college friends, family, coworkers). Consequently, relationships between structural properties of online ego networks and perceptions of Facebook social capital were opposite what would be expected based on findings from offline networks. In contrast, using similar methods and a much larger sample, Dunbar and colleagues (Reference Dunbar, Arnaboldi, Conti and Passarella2015) found structural similarities between online networks through Facebook and Twitter and offline face-to-face networks. Specifically, they found that online personal communities have a layered core-periphery structure that mirrors offline network structure, and are similar with respect to the number of alters (e.g., 5, 15, 50, and 150) and frequency of contact with alters in each layer.

Online social networks have enormous potential for providing insight into natural interactions using web links, email threads, and other digital traces left behind by web users. Vast amounts of naturalistic data exist online, and can be scraped directly from online sources. This allows researchers to efficiently generate or obtain big network datasets that would be impossible to construct using survey methodology. In addition, because individuals’ online networks through social media tend to be much larger than ego networks collected using name generators, they provide opportunities for examining structural properties that are not meaningful for smaller networks (Brooks et al. Reference Brooks, Hogan and Ellison2014).

However, online egocentric data have their challenges. When there is no respondent to report on the meaning of different ties or interactions, these may be misinterpreted. For example, what does “friendship” mean on Facebook (e.g., what resources does a Facebook friend provide?), and how does this vary from person to person (Lewis et al. Reference Lewis, Kaufman, Gonzalez, Wimmer and Christakis2008)? Moreover, online relationships typically involve less intensive contact and are easier to maintain, relative to offline ones, and therefore may have unique costs and benefits. This introduces questions about the generalizability of research on online social networks to relationships and interactions that occur in other contexts (Hogan Reference Hogan2008).

Additionally, web scraping raises ethical challenges that potentially violate the Common Rule and institutional review board (IRB) guidelines (US Department of Health & Human Services 1991). For example, it is difficult to obtain informed consent from individuals contributing data to Twitter or Facebook, and confidentiality or anonymity often cannot be guaranteed, even when sensitive information is being collected (Vitak, Shilton, & Ashktorab Reference Vitak, Shilton and Ashktorab2016). In an article aptly entitled, “‘But the data is already public …” Zimmer (Reference Zimmer2010) recounts a series of projects in which AOL, Facebook, and other social media sites provided access to data for research purposes in which individuals and organizations were easily identifiable. This revelation has three important implications: First, it underscores the fragility of the presumed privacy of potential research subjects on the Internet. Second, it reveals gaps in the professional and institutional protections for human subjects. Specifically, these breaches occurred despite the efforts of researchers to deidentify the data, after full review by the institution’s IRB, and in spite of the implementation of “terms of use” for secondary data users. Third, these cases demonstrate that the use of public network data is fraught with difficulties, even as debates continue, expectations change, and individuals increasingly demand and gain control over restrictions placed on their own data (Bos et al. Reference Bos, Karahalios, Musgrove-Chavez and Poole2009). This leads us to a discussion of ethical considerations in ego network research more broadly .

3.4 The Question of Ethics

Including human subjects in research raises important ethical issues. The problem for network science, as Charles Kadushin (Reference Kadushin2012) argued, is that social network research involving humans does not have a close fit with the origins and intentions of human subjects regulations. Specifically, the Belmont Report (National Commission for Protection of Human Subjects of Biomedical & Behavioral Research 1978) is out-of-date, directed toward medical interventions and deceptive research, and fundamentally out-of-line with the premise of network research. On the latter point, Kadushin argued that “the collection of names of either individuals or social units is not incidental to the research but the very point” (Reference Kadushin2012: 188). Further, network science must now confront big data and computational social science, where disciplines that lack a history of contact with human subjects face novel ethical issues unlikely to have been considered previously. To date, there has been little agreement among researchers about fundamental ethical and scientific issues, or about human subjects protection relevant to network research. However, as Morris (Reference Morris2004) concludes, the implications of human subjects protections for network research are profound.

3.4.1 The Belmont Report

Despite some of the issues specific to network studies, the basic principles of The Belmont Report apply to any research that seeks to understand behavior that involves humans in any way. Klovdahl provides the essential definition of “human subject” from the Common Federal Rule: “a human subject means a living individual, about whom an investigator (whether professional or student) conducting research obtains: 1) data through intervention or interaction with the individual, or 2) identifiable private information” (Klovdahl Reference Klovdahl2005: 127). When this is the case, the following principles from The Belmont Report apply:

Respect for persons targets the autonomy of all potential human subjects and the requirement to treat them with courtesy and respect. This includes the use of informed consent, which asserts that researchers must be wholly truthful about the goals, procedures, or potential consequences of a research project.
Beneficence refers to researchers’ obligation to do no harm. They must maximize the benefits of the research for the participants, science, and society as a whole. At the same time, they are required to anticipate and minimize to the extent possible any risks to those who participate in the research.
Justice reflects the requirement that research be administered using fair and reasonable procedures, and avoid exploitation of participants. It also ensures that both the costs and benefits of research for potential participants be distributed fairly and equally .

To these , Salganik (Reference Salganik2017) has recently added another principle based on social changes since The Belmont Report was released, and on recommendations from The Menlo Report (Dittrich & Kenneally Reference Dittrich and Kenneally2012).

Respect for Law and Public Interest asserts that researchers must understand the relevance of existing laws, terms of service and contracts, and comply with the rules set therein. In addition, in accordance with transparency-based accountability, researchers must be clear about and take responsibility for goals, methods, and results throughout the research process.

Principles of The Belmont Report have been translated into different standards in each country. However, regardless of how they are implemented in practice, adherence to these principles reflects the position that some research is unacceptable, no matter the potential to contribute to scientific understanding (see Kadushin Reference Kadushin2012 for a brief history of ethics violations in research).

3.4.2 Ethical Issues in Network Research

Focusing more narrowly on the ethics of social network studies, it is critical to have a general understanding of how the principles of The Belmont Report apply to network science research involving human subjects. However, the specific issues that arise during the course of planning and conducting network research are to some degree unique to each project, and to the standards of each institutional review board. That said, we provide ethical guidelines that distill Klovdahl’s (Reference Klovdahl2005) requirements for social network studies down to five points. As he specifically focused on the network spread of infectious disease, these statements are just a springboard for a fuller discussion of ethical issues in egocentric research.

First, Klovdahl (Reference Klovdahl2005) argued that no surgical, pharmaceutical, or other medical treatment can be withheld. To generalize this point, social network research cannot withhold any services or benefits to which individuals would normally have access. In research that involves any kind of intervention, a control group would have to meet the minimal standard of interventions that are typically provided (e.g., “treatment as usual,” standard education). This reflects the question of who benefits, as well as the mandate to do no harm. The issue of withholding benefits is becoming more relevant for the study of social networks, as researchers have begun developing and testing “network interventions” for the diffusion of beneficial information, innovations, or social norms through networks. This raises questions about how and when to offer “treatment as usual,” especially when the question at hand is whether the “treatment” ever reaches non-treated individuals or sectors of the network. Additionally, the question of who could potentially benefit from network interventions is sometimes ambiguous. For instance, outcomes may primarily benefit organizations rather than individuals (e.g., a study of the diffusion of innovations that increase the productivity of corporations), though an argument could be made that individual employees are indirectly affected.

Second, network research based on personal interviews with primary participants requires voluntary and informed consent (Klovdahl Reference Klovdahl2005). While full reviews by IRBs and the requirement to obtain informed consent are standard in human subjects research, these can be waived under certain circumstances. For studies that involve data collection on the personal networks of individuals, informed consent is always required for the ego respondent. However, an issue that is central to egocentric research, in particular, and network science, in general, is how to classify alters or other members of the network (Bos et al. Reference Bos, Karahalios, Musgrove-Chavez and Poole2009). More specifically, observation and proxy reporting is inherent in the process of asking ego respondents to provide information about alters. Klovdahl (Reference Klovdahl2005) and others concluded that alters are, in fact, human subjects under the Common Rule, in part because private information about them is likely to be asked. This particularly applies to objective attributes of alters, such as their HIV status, rather than ego’s feelings about the alters. That said, there exists in the Common Rule a waiver of consent option if four conditions are met. These conditions are: (1) minimal risk such as those participants are exposed to in daily life or during the performance of routine physical and psychological examination; (2) inability to practicably conduct the research without the waiver or alteration of standard consent procedures; (3) no adverse effects on the ego’s or alters’ rights and welfare if consent is waived; and (4) any pertinent information about the study is provided to egos and alters later, if appropriate. Because these conditions typically apply to alters in ego network studies, especially when no identifying information about alters is collected (e.g., using first names only, or pseudonyms), most researchers agree that alters do not need to be consented. Moreover, as Klovdahl (Reference Klovdahl2005) noted, seeking consent from alters could actually weaken the confidentiality protection for ego respondents, creating risk where none existed before. An important exception is link-tracing sampling designs for gaining access to hidden populations, described above. In this case, ego respondents must approach alters, who then contact the researchers themselves and give consent to participate in the study. While this may seem stringent, and does affect the completion rate among alters, ethical considerations override these concerns, just as they do in accessing egos .

Third, Klovdahl (Reference Klovdahl2005) argued that effective means for the protection of the confidentiality of research data should be in place. Consistent with this, many existing secondary datasets that contain network data (e.g., Add Health) have specific data-handling protocols for public use, including requiring that analyses be conducted on computers that are not connected to the Internet or to a network. For original data collection, it is important that any researcher and staff involved in the study complete necessary human subjects training and be included in an IRB application, and adhere to strict protocols regarding the sharing or storage of data or information about participants. Also, while network scientists often leverage visualization approaches for describing the spatial interrelationships among data points (Green et al. Reference Green, Hoppa, Young and Blanchard2003; Lee et al. Reference Lee, Li, Shi, Cheung and Thornton2006; Lee et al. Reference Lee, Park, Kay, Christakis, Oltvai and Barabasi2008), the presentation and publication of data with node labels has to be justified as not violating protections for confidentiality and anonymity. Well-known examples of identifiable actors in networks include historical figures like the Florentine families (Padgett & Ansell Reference Padgett and Ansell1993), countries (e.g., Alderson & Beckfield Reference Alderson and Beckfield2004), and known terrorists and terrorist organizations (Asal & Rethemeyer Reference Asal and Rethemeyer2015).

Fourth, and related to the above point, ego and alter data should be deidentified at the earliest possible date, and no data retained beyond the end of a project should contain information permitting the identification of any particular ego or alter (Klovdahl Reference Klovdahl2005). This condition applies, in particular, when full names of egos and alters are collected. Often, researchers must obtain names or other information to identify the same alters over time, or to identify alters that appear in multiple ego networks. In these cases, participants’ names must be converted to network node numbers as soon as possible. While attributes can continue to be associated with both egos and alters, it is important to consider whether an individual can be identified on the basis of those attributes, even after traditional identifying information has been removed. For example, if there is only one male Latino teacher in a small school, color coding or otherwise demarking gender and race in the school’s faculty network violates common standards of anonymity. Moreover, if the research team involves members of an organization or community, they should not have access to the identified files or to preliminary analyses that identify individuals. Along these lines, some researchers have devised “data dusting” procedures that alter very small aspects of the data that will not influence analyses or findings, but which mask the identification of individuals (Prithiviraj & Porkodi Reference Prithiviraj and Porkodi2015).

Fifth, Klovdahl (Reference Klovdahl2005) suggested that no identifying data should be shared outside the project without further IRB approval. In this contemporary era of open-sourced data, funding agencies often require that data are made available at the end of a project. This raises ethical issues that need to be considered at the start of the research project rather than as an afterthought. For many kinds of administrative data (e.g., health records), part of the “consent to treat” includes clauses about the future use of data for research purposes. This model – where participants are informed during the initial consent process that their deidentified data may at some point be shared with other researchers – might be appropriate for basic science research, as well. In addition, all of the other points regarding protections for human subjects described above do apply once secondary data are obtained for analysis. In some cases (e.g., applied research), data are proprietary or protections cannot be guaranteed. When true, these data may not meet criteria for scientific publications since they cannot be shared with other researchers who seek to replicate or challenge findings. In network science, findings from data that cannot be shared have been published in the scientific literature, and the ethicality of this practice remains an open question.

In sum, since no human activity is risk free, researchers are required to assess and minimize risk, explain the risks that are present, and offer solutions in situations where risks become more than minimal (e.g., emergency protocols). It is likely that each research project brings unique concerns in terms of form and substance. In the end, the onus is on researchers to be clear about where their study lands regarding ethical parameters, but according to both Klovdahl (Reference Klovdahl2005) and Kadushin (Reference Kadushin2012), well-designed social network research should be able to meet the Common Rule. Going one step further, Sagalnik (Reference Salganik2017) sees the IRB regulations and approval process as a “floor” not a “ceiling.” That said, cutting-edge research will likely always encounter unanswered ethical questions.

3.5 Conclusion

This chapter has provided an introduction to basic considerations for constructing probability and targeted samples of ego respondents, and for using link-tracing designs to access networks of hidden populations. Our goal has been to provide an overview of basic sampling considerations, necessitating the sacrifice of depth for breadth. That said, we have provided references to more detailed discussions of the costs and benefits of various sampling approaches published elsewhere.

An additional aim of this chapter has been to review data collection modes for surveys measuring ego networks. Collecting such data is time-consuming and often costly with respect to respondent burden and project resources, increasing the temptation to use resource-efficient data collection modalities, such as self-administered online surveys. However, this can introduce specific biases that are more problematic in egocentric study designs than other types of research. In light of growing Internet usage coupled with the public’s decreasing willingness to participate in research studies, we have provided some guidelines for appropriate use of online ego network data collection. In addition to addressing these survey research methods, we have also provided brief descriptions of observational and archival methods for collecting ego network data, highlighting some of the benefits and drawbacks of these approaches.

Finally, in this chapter we have considered ethical questions pertaining to human subjects research, both broadly and with respect to issues that are unique to ego networks research. We have argued that ethical principles and requirements are guidelines that have to be tailored to the particulars of specific research designs, including egocentric research. In the end, it is not whether one faces ethical dilemmas in research but, rather, how one acknowledges them, collaborates with others for advice, and develops a well-defined set of procedures to protect those who are willing to participate in the advancement of knowledge that, in turn, may improve the conditions of life.

Book contents

3 - Sampling, Data Collection Modes, and Research Ethics

Summary

Information

3 Sampling, Data Collection Modes, and Research Ethics

3.1 Sampling Methods for Ego Network Research

3.1.1 Sampling Egos from a Large Population

3.1.2 Targeted Sampling for Subgroups or “Hidden” Populations