Interest in work–family (WF) research continues to increase in popularity worldwide, encouraging advances in WF researchers’ methods and practices. Generally, WF research involves many different methodological considerations that have been previously reviewed (cf. Casper, De Hauw, & Wayne, Reference Casper, De Hauw, Wayne, Major and Burke2013; Casper, Eby, Bordeaux, Lockwood, & Lambert, Reference Casper, Eby, Bordeaux, Lockwood and Lambert2007; Lapierre & McMullan, Reference Lapierre, McMullan, Allen and Eby2016); however, the fact that WF research is increasingly being conducted in the global context introduces additional methodological complexities that warrant discussion. Therefore, the purpose of this chapter is to provide an overview of several key research considerations specific to global WF research (i.e., WF research conducted in countries and regions beyond the United States or comparative studies that include multiple countries or regions). First, I discuss how global WF research examining single- and multiple-country contexts has been conducted to date. This is followed by descriptions of sampling techniques and types of participants typically involved in such research as well as the various research designs commonly used. Last, brief summaries of various data collection methods relevant to global WF research are presented.
Country Considerations
The number of different countries that have been examined within the WF literature is quite large. However, important distinctions must be made between WF studies conducted in single (i.e., all participants are from one country or cultural region) versus multiple country (i.e., participants are from two or more countries or cultural regions) contexts.
Single-Country Studies
WF studies most commonly examine participants within a single national context. Such research has typically been conducted within industrialized Anglophone contexts similar to the United States, such as the United Kingdom and Australia. Beyond these contexts, WF research conducted within single countries is prolific across the European continent (e.g., the Netherlands, Spain, Finland) and East Asia (e.g., China, Japan, and South Korea). Less commonly, WF research has occurred in South and Southeast Asia (e.g., India, Pakistan, Malaysia), the Middle East (e.g., Lebanon, Iran, Israel), and Latin America (e.g., Brazil, Mexico, Dominican Republic). Almost no research has been conducted in Africa, with a few studies conducted in Nigeria and Kenya serving as the main exceptions. These single-country studies are necessary to build basic foundational understanding of how WF constructs and processes operate in places with different cultural, political, and economic contexts than the oft studied Anglophone countries. However, it is difficult to compare findings across studies as researchers often employ different sampling strategies and measures. Furthermore, results may not be generalizable over time if the country in question has undergone significant political, economic, legislative, or social changes that can impact individuals’ employment and family roles.
Multiple-Country Studies
WF researchers have also included participants from different countries within a single study, usually for cross-national and cross-cultural comparative purposes. The number of different countries included depends largely on the type of data collected; studies that collect primary data typically do so using two or three different countries, whereas archival data from large-scale surveys are typically used when examining a larger number of countries.
Multiple-country WF studies are beneficial because they allow for direct comparison of how WF constructs operate in different cultural, economic, political, and religious contexts. Comparisons between Anglophone and East Asian countries is common (see Shockley, Douek, Smith, Yu, Dumani, & French, Reference Shockley, Douek, Smith, Yu, Dumani and French2017, for a review). The theoretical basis for such comparisons often lies in collectivism-individualism (i.e., the degree that cultures value collective distribution of resources, collective action, and cohesiveness with one’s organization and family; House, Hanges, Javidan, Dorfman, & Gupta, Reference House, Hanges, Javidan, Dorfman and Gupta2004). Researchers often use country affiliation to infer participants’ cultural values given that Anglo countries are designated as highly individualistic and East Asian countries are collectivistic in nature (House et al., Reference House, Hanges, Javidan, Dorfman and Gupta2004; Matsumoto & Yoo, Reference Matsumoto and Yoo2006). Furthermore, the impetus behind these cultural comparisons is that the cultural value of interest (e.g., collectivism) impacts the way work and family interest and/or patterns of correlates with work–family constructs. For example, researchers often explain weaker relationships between WF conflict and individual well-being among East Asian versus Western samples by arguing that individuals from East Asian cultures may perceive work success as being more integral for meeting family responsibilities and therefore do not view conflict as detrimental in the same was that those in individualistic countries such as the United States do (Aryee, Fields, & Luk, Reference Aryee, Fields and Luk1999). Additionally, other cross-cultural WF researchers chose the specific countries based on the fact that there is a dearth of research conducted in those areas. For example, Wang, Lawler, and Shi (Reference Wang, Lawler and Shi2011) surveyed participants from China, Thailand, Kenya, and India to examine how family-friendly organizational policies relate to WF conflict for both the overall combined sample and across countries. Lastly, multiple-country WF research has also been conducted in countries that have similar cultural values and share geographic boundaries. Such studies are important because they examine whether these cultural and geographic similarities translate into similarities in WF constructs and processes. For example, Öun’s (Reference Öun2012) study of four Scandinavian countries found significantly lower levels of WF conflict in participants from Finland compared to those from Denmark, Norway, or Sweden (among which there were no significant differences). This study highlights the importance of considering national context in WF research even when such contexts are traditionally viewed as being culturally similar; as globalization increases across time, meaningful changes in what a country’s culture constitutes is likely to occur (Matsumoto & Yoo, Reference Matsumoto and Yoo2006).
At an even higher level of analysis, multiple-study WF research also compares different groups of countries. This involves sorting countries into categories based on meaningful country-level characteristics. This approach was used in Spector et al.’s (Reference Spector, Cooper, Poelmans, Allen, O’Driscoll, Sanchez and Lu2004) study of three country clusters that combined countries into Anglophone, Latin American, and East Asian clusters for analysis. Such studies allow for a more macro-level perspective on how WF constructs operate across a truly globalized context, providing insight as to whether and how WF processes can even be aggregated across many different country contexts.
Important differences exist regarding how WF researchers account for participants from different countries in multiple-country studies. Some studies account for country membership by conducting separate analyses by country (e.g., Lallukka et al., Reference Lallukka, Chandola, Roos, Cable, Sekine, Kagamimori and Lahelma2010) or examining country membership as a moderating variable (e.g., Barnes-Farrell et al., Reference Barnes-Farrell, Davies-Schrils, McGonagle, Walsh, Di Milia, Fischer and Tepas2008). However, other studies simply conduct analyses on the entire multiple-country sample without such considerations, effectively treating them as a single sample (e.g., Hill, Erickson, Fellows, Martinegro, & Allen, Reference Hill, Erickson, Fellows, Martinengo and Allen2014). Combining samples via this method is problematic because it does not allow for the comparison of results across samples and thus contributes very little in understanding WF on a global scale. Researchers are thus urged to somehow account for differences among participants in terms of country affiliation. Researchers should also consider controlling for relevant country-level variables which may account for why country differences are observed in multiple-country WF research; doing so would provide greater support that observed effects are due to cross-cultural factors rather than other factors. For example, economic, legal, and technological differences across countries (e.g., prevalence of part-time labor, mandated parental leave laws, availability of infrastructure to support telecommuting) represent credible bases for differences in how effectively individuals manage work and family (Joplin, Francesco, Schaffer, & Lau, Reference Joplin, Shaffer, Francesco and Lau2003).
Participant and Sampling Considerations
Consideration of participant demographics and sampling methods within global WF studies is critical given the extreme diversity of individuals around the world. This is especially important for WF research given that individuals’ work and family backgrounds undoubtedly play an important role in forming hypotheses, interpreting results, and determining if results can be compared across studies and contexts. Specifically, meaningful comparisons across countries within a single study are difficult when the samples are critically different (e.g., comparing physicians to front line hotel workers). These sample differences (versus country or cultural differences) may be what is actually driving any observed differences (Spector, Liu, & Sanchez, Reference Spector, Liu and Sanchez2015). The next section outlines several specific participant and sampling characteristics that researchers should keep in mind when conducting and interpreting global WF research.
Diversity within specific countries. Potential confounding factors may exist within WF studies conducted in countries that are extremely diverse or that include participants who are not “the norm” for a particular country. These characteristics have typically corresponded to within-country differences in region, race, and religion. For example, WF research conducted in India typically identifies participants as being from a specific region, such as the Northern (Larson, Verma, & Dworkin, Reference Larson, Verma and Dworkin2001), Western (Baral & Bhargava, Reference Baral and Bhargava2011), or Southern (Namasivayam & Zhao, Reference Namasivayam and Zhao2007) regions. These regional distinctions are important given India’s large population size, widely-varying physical terrain, and deep political history that all contributed to major regional differences in what exactly constitutes the Indian culture and identity (cf. Mohammada, Reference Mohammada2007). Concerning race, WF studies conducted in New Zealand typically involve White participants consistent with its position in the Anglo country sphere (e.g., O’Driscoll et al., Reference O’Driscoll, Poelmans, Spector, Kalliath, Allen, Cooper and Sanchez2003). However, Haar, Roche, and Taylor’s (Reference Haar, Roche and Taylor2012) study in New Zealand examined the relationship between conflict and turnover intentions among the indigenous Maori people. This distinction is important given the major socioeconomic and cultural differences between the Anglo and Maori people within New Zealand, which likely corresponds to differences in how they manage work and family responsibilities. Religious distinctions have also been made in single-country WF research. For example, Sharabi (Reference Sharabi2010) used religion to differentiate between Jewish and Muslim Israelis and found that Jewish Israelis valued work less and family more than Muslim Israelis.
These examples above all highlight the importance of clearly describing relevant participant characteristics and highlighting how well they represent the broader population within a country. If authors do not clearly report such distinctions, then readers and subsequent researchers may erroneously conflate findings based on unique sub-populations with findings from other studies conducted in that country.
Expatriate and immigrant samples. The studies reviewed thus far assume that participants are native to the countries they are surveyed in. However, WF researchers have also examined individuals residing in countries other than their native ones (e.g., Van der Zee, Ali, & Salomé, Reference Van der Zee, Ali and Salomé2005), allowing for better understanding of how WF processes operate among individuals who are adjusting from one national context to another (Caligiuri, Hyland, & Joshi, Reference Caligiuri, Hyland, Joshi and Bross1998). Such research is relatively uncommon and mostly concerns expatriates (i.e., those living and working in countries they are not natively from). For example, Shih, Chiang, and Hsu (Reference Shih, Chiang and Hsu2010) examined the relationships between WF conflict and job performance and job satisfaction among Taiwanese expatriates in China. On a larger scale, Shaffer, Harrison, Gilley, and Luk (Reference Shaffer, Harrison, Gilley and Luk2001) surveyed expatriates in forty-six different countries to examine how WF conflict related to their thoughts of withdrawing from their expatriate assignments.
WF research has also been conducted on immigrants, allowing for further exploration of how WF processes operate within those in unfamiliar national and cultural contexts (e.g., Grzywacz et al., Reference Grzywacz, Arcury, Marín, Carrillo, Burke, Coates and Quandt2007; Malinen & Johnston, Reference Malinen and Johnston2011). Although comparable to expatriates in that they both live and work in non-native countries, immigrants face additional difficulties that have implications for WF processes. For example, immigrants may experience ambivalence and stress over the fact that they are helping provide financially for their families back home by virtue of having to live apart from them in a foreign country (Grzywacz, Quandt, Arcury, & Martin, Reference Grzywacz, Quandt, Arcury and Marín2005). Also, whereas expatriates begin their assignments with the knowledge that living abroad is temporary, there is a greater sense of permanence for immigrants regarding their life in a new culture. This has implications for acculturation and adjustment that may serve as more of a stressor for immigrants and impact their ability to effectively manage work and family in a new cultural context. Additionally, expatriates are often supported financially (and generously) by their organizations, whereas this is often not the case for immigrants. Furthermore, immigrants’ children will likely grow up internalizing their parents’ culture in addition to the culture of their current country; these differences between immigrant parents’ and their children’s cultures may introduce additional parenting struggles and thus higher family demands.
Global WF Research Designs
Cross-sectional studies. Cross-sectional designs are extremely common in the global WF literature, involving both single and multiple countries. Studies with this design are severely limited in their ability to draw causal inferences from their data (Casper et al., Reference Casper, Eby, Bordeaux, Lockwood and Lambert2007). However, cross-sectional approaches may represent the only realistic means of conducting WF research in certain country or participant contexts that make it difficult to use more complicated designs. For example, it may only be feasible to collect data from refugees or immigrants at one point in time if they face job and life changes which make them difficult to reach again in the future.
Longitudinal studies. A much smaller portion of the global WF literature employs longitudinal designs, and such studies are typically conducted within single countries. The benefits of longitudinal over cross-sectional studies mostly involve their ability to track the development of variables over time, allowing for examination of dynamic processes that would not be detectable at a single point in time (e.g., how WF conflict changes over time in relation to changes in work hours). Longitudinal studies are inherently time-consuming and require a certain degree of confidence from researchers that participant attrition will be manageable. Thus, conducting longitudinal research in industrialized countries with professional employees may simply be less risky, and thus more appealing, to global WF researchers. This is problematic because the choice in using longitudinal approaches may introduce biases in terms of which countries and participants are ultimately examined. Indeed, existing longitudinal studies have mostly been conducted with white-collar employees in European countries, such as Finland (Mauno, Reference Mauno2010), the Netherlands (Andres, Moelker, & Soeters, Reference Andres, Moelker and Soeters2012), and Norway (Innstand, Langballe, Espnes, Falkum, & Aasland, Reference Innstrand, Langballe, Espnes, Falkum and Aasland2008). Longitudinal WF research in countries beyond Europe has been conducted in Canada (Bouchard & Poier, Reference Bouchard and Poirier2011; Leiter & Durup, Reference Leiter and Durup1996), China (Lu, Reference Lu2011), and Israel (Westman, Etzion, & Gattenio, Reference Westman, Etzion and Gattenio2008). This lack of diversity in regional representation limits the extent to which longitudinal findings can be compared across country contexts, and conducting such studies in underrepresented countries and participants represents an important avenue for future research.
Experimental and quasi-experimental studies. Experiments that focus on WF issues are rare in the literature, and no true experimental study exists examining WF constructs in a global context. Similarly, quasi-experiments are also rare in the global WF literature. One exception is Wilson, Polzer-Debruyne, Chen, and Fernandes (Reference Wilson, Polzer-Debruyne, Chen and Fernandes2007) who examined the extent that WF conflict differed among factory workers in New Zealand who underwent different forms of training. The lack of experimental and quasi-experimental designs in the global WF context is not surprising, considering that WF constructs are not amenable to manipulation. Indeed, this trend mimics the broader WF literature (Casper et al., Reference Casper, Eby, Bordeaux, Lockwood and Lambert2007). However, manipulations of the salience of cultural concepts provides an interesting avenue for future cross-cultural WF research. For example, researchers could expose participants from the same country to different conditions that prime a cultural value to be more or less salient (e.g., collectivism in East Asian countries) and then have them report on WF experiences. If making collectivism more salient significantly impacts reports of means levels of WF conflict, this would shed some insight into the causal relationships of these variables.
Meta-analyses. Meta-analyses combine the results from multiple studies to calculate effect sizes that are more precise and generalizable to a larger population than any single study alone. Meta-analyses also allow for examination of whether methodological differences (e.g., the country a study was conducted in, the occupational class of participants) serve as meaningful determinants of different results across studies. Whereas meta-analyses that examine WF issues are abundant in the literature (e.g., Amstad, Meier, Fasel, Elfering, & Semmer, Reference Amstad, Meier, Fasel, Elfering and Semmer2011; Byron, Reference Byron2005; Ford, Heinen, & Langkamer, Reference Ford, Heinen and Langkamer2007), only one published WF meta-analysis currently exists that explicitly considers national context. Allen, French, Dumani, and Shockley (Reference Allen, French, Dumani and Shockley2015) examined whether mean levels of work-to-family and family-to-work conflict differed based on the countries that such studies were conducted in by imputing country-level variables (e.g., gross domestic product, unemployment rate, gender gap, OECD work–life balance scores, collectivism-individualism, gender egalitarianism values). Their results suggested that mean levels of work-to-family and family-to-work conflict did not differ among countries based on country-level unemployment rate, gross domestic product, gender egalitarianism, or work–family balance rankings from the OECD. They did, however, find that people in studies conducted in individualistic countries reported significantly less family-to-work-conflict (but not work-to-family conflict) than those conducted in collectivistic countries. This same pattern of results was observed for studies conducted in the United States versus outside the United States and also studies conducted in countries with a higher versus lower gender economic gap.
A second meta-analysis which was presented at the Work–family Researchers Network conference (Shockley, Shen, DeNunzio, Arvan, & Knudsen, Reference Shockley, Shen, Denunzio, Arvan, Knudsen and Mills2014) focused on gender and WF conflict and tested national indicators of gender egalitarianism (defined in four ways; Project GLOBE values and practices ratings, World Economic Forum’s Global Gender GAP Index, and the United Nations’ Development Programme’s Gender Inequality Index) as a moderator of the gender-WFC relationship. There was no evidence for a significant moderating effect for any of the indicators or for either direction of WF conflict. Further review and recommendations for meta-analyses are provided in Chapter 10 of this handbook (Dumani, French, & Allen).
Data Collection Methods
Primary survey data. Global WF researchers have typically taken one of two approaches to using surveys in their research: using previously created surveys or creating new items for a specific study. When utilizing pre existing scales, important determinations should be made concerning measurement equivalence – the extent to which a certain procedure or scale maintains its construct validity when adapted or used in different cultural or participant contexts (cf. Riordan & Vandenberg, Reference Riordan and Vandenberg1994). Because the majority of commonly used scales in global WF research were initially validated in Western countries and contain items written in English (e.g., Carlson, Kacmar, & Williams, Reference Carlson, Kacmar and Williams2000; Netemeyer, Boles, & McMurrian, Reference Netemeyer, Boles and McMurrian1996), they may not assess WF constructs in the same ways across different countries, participants, or translations. This can impact the extent to which people in different contexts are interpreting and conceptualizing the content of survey items in the same ways (Milfont & Fischer, Reference Milfont and Fischer2010). Chapter 8 of this handbook (Korabik & van Rhijn) contains a deeper discussion of translation and measurement equivalence issues.
Archival survey data. The use of archival data in global WF research is also common, especially in cross-national studies where a large number of countries are being sampled from such as Spector et al.’s (Reference Spector, Cooper, Poelmans, Allen, O’Driscoll, Sanchez and Lu2004) twenty-country comparison of WF conflict using data from the Collaborative International Study of Managerial Stress. Other researchers have used archival data from other large-scale datasets, such as the European Social Survey (e.g., Gallie & Russell, Reference Gallie and Russell2009; Kasearu, Reference Kasearu2009) and the International Social Survey Programme (e.g., Öun, Reference Öun2012). An overview of global WF archival datasets is provided in Table 7.1. Clear benefits of utilizing these datasets stems from the large sample sizes and variety of countries surveyed, allowing researchers to conduct secondary research on these datasets without having to expend time or money conducting the research themselves. However, a drawback to using these datasets concerns the lack of flexibility in which variables are included. In other words, the research questions that can be informed by these datasets are limited to those involving the variables that are included.
| Dataset Name | Years Included | Design | Countries Included | Relevant WF Variables | Publicly Available? |
|---|---|---|---|---|---|
| Collaborative International Study of Managerial Stress (CISMS) – Phase 1 | 1997–2000 | Cross-sectional | Argentina, Australia, Belgium, Brazil, Bulgaria, Canada, China, Colombia, Ecuador, Estonia, France, Germany, Hong Kong, India, Israel, Japan, Mexico, New Zealand, Peru, Poland, Portugal, Romania, Slovenia, South Africa, Spain, Sweden, Taiwan, Ukraine, Uruguay, United Kingdom, United States | Pressure | No |
| Collaborative International Study of Managerial Stress (CISMS) – Phase 2 | 2003–2005 | Cross-sectional | Argentina, Australia, Bolivia, Bulgaria, Canada, Chile, China, Estonia, Finland, Greece, Hong Kong, Japan, Netherlands, New Zealand, Peru, Poland, Puerto Rico, Romania, Slovenia, South Korea, Spain, Taiwan, Turkey, Ukraine, United Kingdom, United States | W-F & F-W Conflict (behavior, strain, time), Family-supportive org. perceptions, Family-supportive org. policies (availability), Family-supportive supervision | No |
| European Quality of Life Survey (EQLS) | 2003, 2007, 2012, 2016 (ongoing) | Cross-sectional | All European Union countries at time of survey | Balance (time), W-F, & F-W Conflict (strain, time) | Yes (www.eurofound.europa.eu/surveys/european-quality-of-life-surveys) |
| European Social Survey (ESS) | 2004, 2010 (ongoing) | Cross-sectional | All European Union countries at time of survey | W-F & F-W Conflict (strain, time), Satisfaction with time balance | Yes (www.europeansocialsurvey.org/) |
| European Working Conditions Survey (EWCS) | 2005, 2010, 2015 (ongoing) | Cross-sectional | All European Union countries at time of survey | Balance (time), W-F Conflict (time), Flexibility (schedule) | Yes (www.eurofound.europa.eu/surveys/european-working-conditions-surveys) |
| EU-Households, Work, and Flexibility | 2000–2003 | Cross-sectional | Bulgaria, Czech Republic, Hungary, Netherlands, Romania, Slovenia, Sweden, United Kingdom | W-F & F-W Conflict, Flexibility (place, schedule) | Yes (www.abdn.ac.uk/socsci/research/new-europe-centre/work-care-projects/households-work-and-flexibility-project-320.php) |
| EU-Project FamWork | 2003–2005 | Cross-sectional | Austria, Belgium, Finland, Italy, Germany, Portugal, Netherlands, Switzerland | W-F & F-W Conflict (time), Family-supportive org. culture, Flexibility (schedule) | Yes (www.improving-ser.jrc.it/default/) |
| Generations and Gender Survey (GGS) | Varies by country (ongoing) | Longitudinal | Australia, Austria, Belgium, Bulgaria, Czech Republic, Estonia, France, Georgia, Germany, Hungary, Italy, Japan, Lithuania, Netherlands, Norway, Poland, Romania, Russia, Sweden | W-F & F-W Conflict (strain, time), Flexibility (schedule) | Yes (www.ggp-i.org) |
| International Social Survey Programme (ISSP) | 2002, 2005, 2012, 2015 (ongoing) | Cross-sectional | Argentina, Australia, Austria, Belgium, Bulgaria, Canada, Chile, China, Croatia, Cyprus, Czech Republic, Denmark, Dominican Republic, Finland, France, Germany, Iceland, India, Ireland, Israel, Japan, Latvia, Lithuania, Mexico, Netherlands, New Zealand, Norway, Philippines, Poland, Portugal, Russia, Slovakia, Slovenia, South Africa, South Korea, Spain, Sweden, Switzerland, Taiwan, Turkey, United States, United Kingdom, Venezuela | W-F & F-W Conflict, Flexibility (schedule) | Yes (www.issp.org) |
| Project 3535 | 2003 | Longitudinal | Australia, Canada, China, India, Indonesia, Israel, Spain, Taiwan, Turkey, United States | W-F & F-W Conflict (strain, time), W-F & F-W Enrichment, Family-supportive org. policies (availability, use) | No |
Qualitative methods. Qualitative global WF studies also exist and are extremely helpful in gaining a rich understanding of the particular cultures and backgrounds of the countries and participants included. Such research can involve structured or semi-structured personal interviews (e.g., Ng, Fosh, & Naylor, Reference Ng, Fosh and Naylor2002) and ethnographic techniques (Poster & Prasad, Reference Poster and Prasad2005). The benefits of qualitative data in global WF research was highlighted in Grzywacz et al.’s (Reference Grzywacz, Arcury, Marín, Carrillo, Burke, Coates and Quandt2007) study of immigrant Latinos residing in the United States. Although participants reported low mean levels of WF conflict, examination of the interview responses indicated that they actually did experience high degrees of strain-based conflict. It is unclear why this conflict was then not captured in the survey data, which included both time-based and strain-based WF conflict items. This example illustrates how qualitative data can provide insights above and beyond survey data in terms of providing greater detail and flexibility in what participants can report. More information regarding qualitative research can be found in Chapter 9 of this handbook (Wong & Lun).
Conclusion
This chapter reviewed several key methodological considerations specific to WF research conducted in a global context. Because of the wide range of countries, participants, designs, and data collection methods that such studies can include, continued work is vital in understanding how variations in these methodological considerations affect the results and interpretation of global WF research. Of utmost importance for future researchers is the need to determine the extent that cultural values are consistent among individuals from the same country (Spector et al., Reference Spector, Liu and Sanchez2015); this is especially important for research conducted in countries that have large populations and established ethnic diversity (e.g., China, India) that may require additional considerations, such as regional location within a country or specific demographic subgroups. Furthermore, the increase in globalization and technological advancements in recent history brings into question the relevance of traditional assumptions regarding different countries that are often incorporated into hypothesis development in cross-cultural research. Despite arguments that culture is largely stable and resistant to change (Hofstede, Reference Hofstede2001), subsequent research is needed to determine if this truly is the case today given increased migration and cross-cultural communication. In addition to these issues and those presented in this chapter, various unknown methodological issues may emerge in the future alongside advances in global WF theory. Consideration of methodology thus constitutes an important area of focus for the entire global WF research community.
Research examining the interplay between work and family roles has flourished over the past few decades (Allen, Reference Allen and Kozlowski2012; Eby, Casper, Lockwood, Bordeaux, & Brinley, Reference Eby, Casper, Lockwood, Bordeaux and Brinley2005). What started as an area that predominantly relied on North American samples has increasingly adopted a cross-cultural perspective to understand the prevalence of work–family experiences and the generalizability of findings across different cultural contexts (Allen, Reference Allen and Kozlowski2012). Given the challenges with obtaining representative samples in certain cultures and conducting cross-cultural research (Beins, Reference Beins and Keith2011), the progress of the cross-cultural work–family literature has been slow but steady (Casper, Allen, & Poelmans, Reference Casper, Allen and Poelmans2014). One way to critically examine how individuals from different cultural contexts experience work and family domains is to conduct meta-analytic studies.
Modern meta-analysis began to take shape in the 1970s and Glass coined the term in 1976 (Shadish, Reference Shadish2015). Historically, mathematical ways of dealing quantitatively with varying observations dates back to the seventeenth century and these quantitative approaches mostly helped determine the value of possible gambles in games of chance (O’Rourke, Reference O’Rourke2007). Mathematicians and astronomers began summarizing results from different studies in the eighteenth and nineteenth centuries, and in the twentieth century statisticians began combining results from different clinical trials. Today meta-analysis refers to a systematic set of methods for searching for and coding comparable studies, the statistical synthesis of the resulting data, and the interpretation and presentation of the results (Schmidt, Reference Schmidt2015).
Meta-analysis helps the researcher make sense of the existing literature, aiding in the accumulation and organization of scientific knowledge on a topic. It offers advantages over narrative reviews. Narrative reviews can be inefficient to extract and report information, especially when a body of literature is large or conflicting. Moreover, narrative reviews are considered to be more subjective and thus vulnerable to bias. While narrative reviews tend to focus on statistical significance versus nonsignificance, meta-analysis permits the computation of the strength/magnitude and the variability of the effect size between two variables, across all studies (Glass, Reference Glass2015).
Many meta-analytic studies are de facto international by including all relevant publications (De Leeuw & Hox, Reference De Leeuw, Hox, Harkness, van de Vijver and Mohler2003). However, in addition to the inclusion of studies from different countries and cultures, explicit hypothesizing and testing for cross-cultural variation are essential for a meta-analysis that aims to extend our understanding of global work and family phenomena. In this respect, a meta-analysis researcher adopting a cross-cultural lens may follow two approaches. First, he or she may explicitly focus on the mean levels of constructs of interest in certain cultural contexts. Second, he or she may examine the relationships involving the constructs of interest across certain cultural contexts (i.e., assess culture as a moderator). These two approaches of using meta-analysis to understand the global work–family interface are similar to what we typically observe in primary cross-cultural work–family studies (see Shockley, Douek, Smith, Yu, Dumani, & French, Reference Shockley, Douek, Smith, Yu, Dumani and French2017).
The first approach examines the direct role of culture by typically comparing mean levels or prevalence rates. These comparisons may be made across countries or based on countries clustered according to cultural values (e.g., collectivism) and/or regions (e.g., Latin America). For example, in a primary study, Allen, Shockley, and Biga (Reference Allen, Shockley, Biga and Lundby2010) reported mean differences in work–life effectiveness (i.e., the absence of work-to-family conflict) across countries clustered into high, medium, and low bands with regard to the cultural values of gender egalitarianism, collectivism, humane orientation, and performance orientation. The second approach examines the moderating role of culture. In this case, the researcher investigates whether relationships, such as that between work demands and work-family conflict (WFC), vary across cultural context. For example, in a primary study, Lallukka, Chandola, Roos, Cable, Sekine, Kagamimori et al. (Reference Lallukka, Chandola, Roos, Cable, Sekine, Kagamimori, Tatsuse, Marmot and Lahelma2010) investigated if relationships between bidirectional WFC and health behaviors differed across samples of British, Finnish, and Japanese employees. Again, countries or country clusters may be used for such comparisons.
The purpose of this chapter is to provide an in-depth overview of adopting a meta-analytical approach to understand the work–family interface from a cross-cultural perspective. Through this chapter, we hope to convey to readers: (1) the potential research questions a meta-analysis on global work and family issues can answer compared to individual studies, (2) effective strategies for conducting a meta-analysis, and (3) the importance of meta-analysis for global work and family research. We aim to achieve these objectives by providing a succinct and comprehensive summary of existing meta-analytical studies on work and family research, and identifying future research directions. In order to encourage readers to use meta-analysis to address the gaps in the literature, we also present an overview of how to conduct a comprehensive meta-analysis with an emphasis on specific lessons we have learned from conducting our own cross-cultural meta-analytic studies (Allen, French, Dumani, & Shockley, Reference Allen, French, Dumani and Shockley2015; French, Dumani, Allen, & Shockley, Reference French, Dumani, Allen and Shockley2015).
Throughout the chapter, we use the terms “culture” and “country” interchangeably to encompass both cultural context and country characteristics. Culture and country do not always reference the same underlying mechanism due to imposed national boundaries, political differences within a country, and existence of subcultures (Schaffer & Riordan, Reference Schaffer and Riordan2003). However, country is typically used as a proxy for culture in cross-cultural research and especially in work–family research, culture is not assessed at the individual level. This precludes fine-grained distinctions between “culture” and “country” (Allen, French, Dumani, & Shockley, Reference Allen, French, Dumani and Shockley2015).
Meta-Analysis in Work and Family Research
The popularity of meta-analysis in the field of I-O psychology is unprecedented (DeGeest & Schmidt, Reference DeGeest and Schmidt2011). Accordingly, the number of meta-analytic studies in work and family research has also been on the rise. Given that national context is typically regarded as the “elephant in the room” for work and family research (Ollier-Malaterre, Valcour, Den Dulk, & Kossek, Reference Ollier-Malaterre, Valcour, Den Dulk and Kossek2013) and most of the primary studies on work–family research originate from North America (Powell, Francesco, & Ling, Reference Powell, Francesco and Ling2009), it is not surprising that the majority of work–family meta-analyses do not empirically investigate culture. Table 8.1 provides a brief description of the extant meta-analytic studies published in the organizational work and family literature. We identified twenty-three meta-analyses and examined them in terms of their objective, their coding scheme (whether or not country/nationality of sample is coded), and their acknowledgement of potential cultural differences in findings either as part of moderator analyses and/or the discussion of findings.
| Reference | Study Objective | Coded for Country Sample | Moderators Examined | National Culture as a Moderator? | Acknowledgment of Potential Cultural Differences |
|---|---|---|---|---|---|
| Kossek & Ozeki, Reference Kossek and Ozeki1998 | Examine the relationship between WIF/FIW and job satisfaction and life satisfaction | No | Gender, marital status, dual-earner status, specific measurement of study variables | No | None |
| Kossek & Ozeki, Reference Kossek and Ozeki1999 | Examine relationship between WIF/FIW and six work outcomes | No | None | No | None |
| Allen et al., Reference Allen, Herst, Bruck and Sutton2000 | Examine the relationship between WIF and work-related, nonwork-related, and stress-related outcomes | Yes | No explicit moderators | No | The significant relationship between WIF and job satisfaction for studies conducted outside of the US is mentioned |
| Byron, Reference Byron2005 | Examine the relationship between individual, work-related, and nonwork related- antecedents of WFC, WIF, and FIW | No | Gender, parenting status, measurement of antecedents | No | None |
| Mesmer-Magnus & Viswesvaran, Reference Mesmer-Magnus and Viswesvaran2005 | Investigate the convergence of WIF and FIW measures and determine their potential incremental prediction over outcome variables | No | None | No | None |
| Mesmer-Magnus & Viswesvaran, Reference Mesmer-Magnus and Viswesvaran2006 | Examine the relationship between different facets of family-friendly work environments and WIF, FIW | No | None | No | None |
| Ford et al., Reference Ford, Heinen and Langkamer2007 | Test two cross-domain path models 1) work domain antecedents-WIF-family satisfaction and 2) family domain antecedents-FIW-job satisfaction | Only North American samples included | Gender, time spent at work, family characteristics (marital status, parental status), dual-earner status, | No | The authors discussed culture as a potential moderator due to the potential differences in the meaning of work and family and cognitive attributional judgments, and intracultural differences in work and family values. |
| Michel & Hargis, Reference Michel and Hargis2008 | Compare indirect effect work–family conflict models and direct effect segmentation models in explaining job and family satisfaction | No | None | No | None |
| Michel et al., Reference Michel, Mitchelson, Kotrba, LeBreton and Baltes2009 | Use meta-analytic path analyses to examine theoretical work–family models and an integrative model | No | None | No | None |
| Hoobler et al., Reference Hoobler, Hu and Wilson2010 | Examine the relationships among WIF, FIW, work performance, objective (e.g., salary) and subjective (e.g., career satisfaction) career outcomes | No | Age | No | None |
| McNall et al., Reference McNall, Nicklin and Masuda2010 | Examine the relationships between WFE, FWE, and work-related, nonwork-related, and health-related outcomes | No | Gender, construct labeling of positive spillover (e.g., enrichment, spillover, facilitation, enhancement) | No | None |
| Michel, Kotrba, et al., Reference Michel, Kotrba, Mitchelson, Clark and Baltes2010 | Test three competing models of antecedents of WIF and FIW | No | None | No | None |
| Michel, Mitchelson, et al., Reference Michel, Mitchelson, Pichler and Cullen2010 | Examine different categories of antecedents (e.g., role stressors, social support, work/family characteristics, personality) of WIF and FIW | No | Marital status, parental status, gender | No | None |
| Amstad et al., Reference Amstad, Meier, Fasel, Elfering and Semmer2011 | Examine the work-related, family-related, and domain-unspecific outcomes associated with WIF and FIW from both cross-domain and matching domain perspectives | Yes | Time spent at work, parental status | No but explicitly stated that the origin of sample was not included as a moderator due to almost all studies being from the US | No but economic changes in non-US countries were mentioned in the discussion |
| Kossek et al., Reference Kossek, Pichler, Bodner and Hammer2011 | Examine the relationship between different types of perceived work social support and WIF | No | None | No | Yes, the potential impact of cultural contexts on the type and source of social support is mentioned in the discussion |
| Michel et al., Reference Michel, Clark and Jaramillo2011 | Examine the unique and combined effects of five-factor personality model on WIF, FIW, WFE, and FWE. | No | None | No | None |
| Shockley & Singla, Reference Shockley and Singla2011 | Incorporate domain specificity and source attribution perspectives to understand the relationships between WIF, FIW, WFE, FWE, job satisfaction, and family satisfaction. | Yes | Gender | No | None |
| Allen et al., Reference Allen, Johnson, Saboe, Cho, Dumani and Evans2012 | Examine the relationships between dispositional variables and WIF, FIW | No | Gender, parental status, marital status | No | None |
| Allen et al., Reference Allen, Johnson, Kiburz and Shockley2013 | Examine the relationship between flexible work arrangements and WIF, FIW | No | Gender, marital status, parental status, work hours | No | None |
| Butts et al., Reference Butts, Casper and Yang2013 | Test several models linking work–family support policy availability and use to family-supportive perceptions, WIF, and work attitudes | Yes (US vs. Non-US) | Number of work-support policies, gender, marital status, dependent care status | No | None |
| Allen et al., Reference Allen, French, Dumani and Shockley2015 | Investigate country-level cultural, institutional, and economic factors to understand mean differences in WIF and FIW | Yes | N/A | N/A | Yes |
| Nohe et al., Reference Nohe, Meier, Sonntag and Michel2015 | Use path analyses to determine the directionality of the relationship between WIF/FIW and work-related strain | Yes | Gender, time lag between the measurement waves, study type (published vs. unpublished), model type (cross-lagged vs. common factor) | No | None |
| Litano et al., Reference Litano, Major, Landers, Streets and Bass2016 | Examine the relationships between leader-member exchange, and WIF, FIW, WFE, and FWE | Yes | Cultural context, job autonomy, leader-member exchange measurement, study type | Yes (individualism-collectivism and power distance) | Yes |
It is important to note that the information displayed in Table 8.1 is based on what the authors explicitly reported in their published work. It might be the case that some authors in fact coded for country as part of their sample characteristics but failed to suggest so in their articles. In this case, this information was still presented as “country of sample is not coded.”
Several observations can be made about the progression of meta-analytic studies in work and family research across time. First, the construct of WFC has been studied more extensively than alternative inter-role constructs, such as work–family enrichment (WFE) or work–family balance. Earlier meta-analyses focused primarily on identifying the most consistent predictors and outcomes of WFC (e.g., Allen, Herst, Bruck, & Sutton, Reference Allen, Herst, Bruck and Sutton2000; Kossek & Ozeki, Reference Kossek and Ozeki1998). In recent years, the complexity of meta-analytic research questions paralleled the increase in primary studies on the topic. The focus shifted towards testing cross-domain models of work-to-family conflict (WIF) and family-to-work conflict (FIW) (e.g., Ford, Heinen, & Langkamer, Reference Ford, Heinen and Langkamer2007), comparing alternative models of WFC (Michel et al., Reference Michel, Mitchelson, Kotrba, LeBreton and Baltes2009), examining relationships specifically based on different theoretical perspectives such as cross-domain versus matching domain hypotheses (Amstad, Meier, Fasel, Elfering, & Semmer, Reference Amstad, Meier, Fasel, Elfering and Semmer2011), and using path models to determine the temporal directionality of the WFC and strain relationship (Nohe, Meier, Sonntag, & Michel, Reference Nohe, Meier, Sonntag and Michel2015). The nomological network of WFC and WFE has also expanded over time. Recent meta-analyses explored the relationships between WFC and personality variables (Allen, Johnson, Saboe, Cho, Dumani, & Evans, Reference Allen, Johnson, Saboe, Cho, Dumani and Evans2012), flexible work arrangements (Allen, Johnson, Kiburz, & Shockley, Reference Allen, Johnson, Kiburz and Shockley2013), work–family policies (Butts, Casper, & Yang, Reference Butts, Casper and Yang2013), and leadership interactions (Litano, Major, Landers, Streets, & Bass, Reference Litano, Major, Landers, Streets and Bass2016).
Although the complexity of research questions posed by meta-analyses in the work and family research domain has increased, several major limitations exist given the limited number of primary studies on specific relationships. For example, job attitudes (e.g., job satisfaction, organizational commitment, turnover intentions) and health-related outcomes (e.g., psychological strain) related to WFC are often studied, yet evidence for performance (e.g., organizational citizenship, family performance) and career success correlates are limited. For example, Allen and colleagues (Reference Allen, Herst, Bruck and Sutton2000) identified only two independent samples reporting relationships between WIF and career satisfaction, and a decade later Amstad and colleagues (Reference Amstad, Meier, Fasel, Elfering and Semmer2011) reported only four correlations for the same relationship. The limited number of correlations or effect sizes makes it impossible to study moderators, including those moderators related to culture. Even when a larger number of primary studies exists for a given relationship (e.g., WIF with job satisfaction or turnover intentions), the moderators examined by researchers tend to be restricted to those variables typically used as controls in primary studies such as demographic variables (e.g., gender, marital status, parental status) and work/family characteristics (e.g., work hours, dependent care status). Only a handful of meta-analyses looked at potential moderating roles of operational definitions of measures, time lag between measures, and study type (published vs. unpublished studies).
Current meta-analyses in work and family research are empirically inconclusive about the extent to which observed relationships can be generalized cross-culturally. Among twenty-three meta-analyses identified in Table 8.1, only eight explicitly indicated that country/nationality of primary samples was part of the structured coding procedure. However, in five out of these eight studies, authors were unable to conduct further analyses using country due to the limited number of non-US samples. For example, in Byron’s (Reference Byron2005) meta-analysis, country of origin was not used as a moderator because most of the included studies were conducted in the United States and Canada.
Cross-cultural differences are also largely ignored conceptually in the development and discussion of the majority of published meta-analyses. Only six of the meta-analyses mentioned potential cross-cultural differences in their interpretation of study findings and study limitations. For example, Kossek and colleagues (Reference Kossek, Pichler, Bodner and Hammer2011) examined the relationship between different types of perceived support at work and WIF. The authors did not explicitly mention country as one of their coding criteria but in the discussion section, argued that cultural context matters in these relationships because culture directly impacts both types and sources of support at work. Ford and colleagues (Reference Ford, Heinen and Langkamer2007) only included studies that were conducted in North America in their meta-analysis but highlighted the potential role of cultural context in the meaning of work and family and cognitive attributional judgments as theoretical implications. Allen et al. (Reference Allen, Herst, Bruck and Sutton2000) coded for country in their meta-analysis, and tested the relationship between WIF and job satisfaction for studies conducted outside of the United States (the relationship remained significant). We applaud these authors for advancing cross-cultural thinking despite limited empirical data from non-US countries. As we wait for more primary studies outside of the United States to accumulate, we strongly believe that acknowledging cross-cultural differences and providing rationale for potential differences across tested relationships are important for both theory development and practical implications.
Currently, two meta-analyses in work and family research incorporate cultural variables. Allen and colleagues (Reference Allen, French, Dumani and Shockley2015) investigated the relative cross-cultural mean difference in levels of both WIF and FIW. The authors focused on the cultural dimensions of gender egalitarianism and individualism/collectivism to define cultural context. They also examined other national-level contextual variables including institutional (i.e., Organization for Economic Co-operation and Development work–family balance rankings and US/non-US distinction as a proxy for national work–family policies), and national economic (i.e., gross domestic product, unemployment rate, and gender gap index score) factors. The authors identified eighteen studies with fifty-three independent samples assessing WIF and/or FIW across two or more countries. No significant differences in WIF or FIW were found for high versus low gender egalitarian cultures. In addition, there were no significant mean level differences in WIF for collectivistic versus individualistic cultures. However, individualistic cultures reported less FIW than collectivistic cultures. This difference was driven by Asian collectivistic societies, which reported greater FIW relative to Anglo societies. The same finding emerged when US samples were compared to non-US samples. The authors also found that higher gender gap countries reported greater FIW than lower gender gap countries.
The findings of Allen and colleagues (Reference Allen, French, Dumani and Shockley2015) meta-analysis are important in that they bring to light three key issues. First, understudied macro-level factors such as cultural context appear to have more critical implications for FIW than WIF. The authors attribute this finding to the stronger association between work-related factors and WIF relative to non-work related factors identified in previous meta-analyses. Second, those implications may differ based on specific countries within the same cultural dimension (e.g., Asian versus Latin American cultures although both are collectivistic). Lastly, assumptions, such as the United States experiencing more WFC due to lack of national work–family policies, might not be supported by the empirical literature.
In another recent meta-analysis, Litano and colleagues (Reference Litano, Major, Landers, Streets and Bass2016) tested whether the cultural dimensions of individualism/collectivism and power distance moderated the negative relationship between leader-member exchange (i.e., the quality of the reciprocal relationship between a supervisor and a subordinate) and WIF. Culture was not examined as a moderator for FIW, WFE, or FWE due to limited country variation across samples. In line with the Allen et al.’s (Reference Allen, French, Dumani and Shockley2015) findings, the authors did not find support for collectivism/individualism as a moderator of the relationship between leader-member exchange and WIF. However, a stronger negative relationship between leader-member exchange and WIF was observed for low power distance cultures compared to high power distance cultures. This finding demonstrates that the intensity to which quality work relationships reduce WFC may depend on specific cultural context. Given that current theories in work and family research, such as the Job Demands-Resources model, are used to explain the negative relationship between high quality relationships and WIF, incorporating culture-specific variables to explain both the boundary conditions of such relationships and the mediating mechanisms becomes important to extend the applications of our theoretical models cross-culturally.
As evident in the aforementioned meta-analyses, there is a need for a conceptually and methodologically sophisticated understanding of the cross-cultural work–family interface. The following section outlines suggestions to help interested readers conduct their own cross-cultural meta-analysis in order to contribute to this aim.
Conducting a Comprehensive Cross-Cultural Meta-Analysis
Conducting a thorough and rigorous meta-analysis takes ample time, effort, and thought. Several books (e.g., Borenstein, Hedges, Higgins, & Rothstein, Reference Borenstein, Hedges, Higgins and Rothstein2009; Schmidt & Hunter, Reference Schmidt and Hunter2014) as well as several published articles (e.g., Aguinis, Pierce, Bosco, Dalton, & Dalton, Reference Aguinis, Pierce, Bosco, Dalton and Dalton2011; Aytug, Rothstein, Zhou, & Kern, Reference Aytug, Rothstein, Zhou and Kern2012) are available to provide guidelines and best practices for conducting a comprehensive and rigorous meta-analysis. For the purposes of this chapter, we highlight best practices for meta-analyses with a cross-cultural focus during each phase of the study.
Clarifying the objective. The objective of any meta-analytic study is important, as it determines what information the author will collect, from where, and how it will be analyzed. For cross-cultural meta-analysis, there are two important points to clarify within the objective. First, is the focus on examining mean differences across countries/cultures, or instead, examining country/culture as a moderator? If the meta-analysis focuses on mean differences, the authors may convert means to a scale that reflects the percentage of the total maximum scale in order to compare means across diverse rating scales (see Meyer, Stanley, Jackson, McInnis, Maltin, & Sheppard, Reference Meyer, Stanley, Jackson, McInnis, Maltin and Sheppard2012). This approach allows for the inclusion of many studies. However, researchers must be careful to ensure rating scales are conceptually equivalent in order to warrant mean comparisons. For example, Meyer and colleagues (Reference Meyer, Stanley, Jackson, McInnis, Maltin and Sheppard2012) used this approach, but only focused on one popular, validated measure of commitment. If researchers wish to include diverse scales (as is common in work–family meta-analyses), we recommend only including studies that report means for multiple countries using the same items (see for example Allen et al., Reference Allen, French, Dumani and Shockley2015). While this latter approach limits the size of the meta-analytic comparisons, it helps to ensure conceptual clarity that might be muddled by comparing diverse scales. These means are then used to compute a standardized difference score within each study, and the standardized scores from each study would then be meta-analytically combined (e.g., Allen et al., Reference Allen, French, Dumani and Shockley2015). If the meta-analysis focuses on country or culture as a moderator, single country studies can be included by treating country as a categorical or continuous between-study moderator (e.g., Litano et al., Reference Litano, Major, Landers, Streets and Bass2016).
The second question to address is whether the study will use reported cultural values or imputed cultural values. Given that only a limited number of work–family studies collect self-reported cultural values (Shockley et al., Reference Shockley, Douek, Smith, Yu, Dumani and French2017), the pool of eligible studies using self-reported cultural values would likely be too small for meta-analytic methods. Fortunately, several large cross-cultural studies and databases exist that provide cultural values for specific countries. These values may be imputed into the meta-analysis to allow for cross-cultural comparisons. For example, in our work, we have imputed cultural values from the GLOBE study and economic values from the World Economic Forum. The GLOBE study provides numeric values representing several cultural dimensions for sixty-two different societies (House et al., Reference House, Hanges, Javidan, Dorfman and Gupta2004). Similarly, the World Economic Forum provides country-specific economic information, such as unemployment rate and GDP, for most countries around the world. We imputed values into our meta-analysis by matching the country associated with a sample to its corresponding cultural information in the cross-cultural study or database (Allen et al., Reference Allen, French, Dumani and Shockley2015; French et al., Reference French, Dumani, Allen and Shockley2015).
If imputed values are used, consideration should be given to the potential frameworks that could be used and the coding that will be necessary for each. The choice of framework should be appropriate for the study purpose, and ideally should be constructed based on up-to-date, empirically validated information. For example, we used the House et al.’s (Reference House, Hanges, Javidan, Dorfman and Gupta2004) GLOBE study because it reflects the most recently developed set of cultural values, and it contains dimensions commonly used in previous WFC research (e.g., Spector et al., Reference Spector, Allen, Poelmans, Lapierre, Cooper, Michael and Brough2007). However, using the GLOBE study meant we could only include data based on countries that were included in the sixty-two GLOBE societies, and some countries needed to be coded for more specific “societies” within a country (e.g., Germanic vs. French Switzerland). Other frameworks, such as Hofstede (Reference Hofstede1980) and Triandis (Reference Triandis1989), have also been used to explain cross-cultural differences in the work–life interface (Ollier-Malaterre & Foucreault, Reference Ollier-Malaterre and Foucreault2016).
Locating sources. In order to capture all relevant primary studies, meta-analytic searches should be conducted across multiple databases (e.g., PsychINFO, ABI/INFORM Global, Google Scholar, JSTOR). For cross-cultural meta-analysis, country or a country characteristic (e.g., culture, policies) is likely included as a second-level substantive variable. That is, studies are nested within countries. It is therefore ideal to have a great deal of variation in both the number of countries and studies within country, so that the meta-analysis has sufficient power to detect country-level effects. Doing a thorough literature search ensures such variability. One way to ensure sufficient variation is by combining traditional search terms (e.g., work–family) with country names (see for example Shockley et al., Reference Shockley, Douek, Smith, Yu, Dumani and French2017). This strategy might also be useful for targeting research within a particular country or region of interest. Expanding the search to include dissertations, conference submissions, and specific journals that publish cross-cultural work is also important to ensure representativeness. For example, in addition to the traditional database searches, Amstad et al. (Reference Amstad, Meier, Fasel, Elfering and Semmer2011) examined two leading European journals in the field of work and organizational psychology to specifically include studies conducted outside of North America. Inter-library loan services are helpful during this process, as many potentially relevant articles will be located overseas, in book chapters, or in dissertation records that may be inaccessible through local library resources. In the event that one cannot locate a full-text or need additional information, such as a correlation table or measure items, authors may be contacted directly. As a final data collection step, unpublished data can be collected by using relevant listserv announcements or by directly reaching out to authors who regularly study your work–family phenomenon of interest.
We recommend that researchers closely document their search process in order to maintain methodological clarity and rigor. For example, in our work we have created a spreadsheet to indicate which studies were screened (authors, title, outlet, type of study), where the studies were identified, and whether or not the study was included. If a study was excluded, we also indicated why the study was excluded. This information can be used to create a PRISMA flow chart that depicts the screening and exclusion process step-by-step (see Figure 8.1; Moher, Liberati, Tetzlaff, & Altman, Reference Moher, Liberati, Tetzlaff and Altman2009). We have found such charts are increasingly expected by editors and journals that publish high-quality meta-analyses.
Coding. In coding data for cross-cultural meta-analysis, at a minimum it is essential to code the country in which the data was collected, basic sample description, and data collection year, in addition to basic information needed for analysis (e.g., sample size, effect size). For many cultural/country frameworks, this information will be important for mapping the study onto imputed values. Most studies report the country associated with data collection. When this information is not reported, the researcher may choose to use unambiguous author affiliation as a substitute (e.g., all authors are from the United States). If the country is not clear based on author affiliation, primary study authors can be contacted to clarify the country in which data were collected. The description of the sample can also help identify ambiguous samples. For example, data from crowdsourcing websites, such as Mechanical Turk, can come from multiple countries. In such cases, contacting the authors would help to clarify the extent to which the data may or may not be multi-national, and the authors may be able to provide separate effect sizes for single nation subsamples within their data set. If sample information is not verified, researchers should report how often author affiliation is used as a proxy to maintain transparency in methodology.
Data collection year is often omitted from the final manuscript of primary studies. In the event that the date of data collection cannot be determined, a consistent estimate can be used. This estimate might be data-driven, such as the mean or median length of time between data collection and publication for the studies that have year of data collection reported in the extant database. It might also be an a priori rule. For example, the publication date minus two years to account for time for analysis, write-up, and revisions is an option (see for example French et al., Reference French, Dumani, Allen and Shockley2015).
A clear codebook is essential for identifying relevant and irrelevant information and taking into account possible country differences. A meta-analysis codebook outlines the variables that will be collected, how they will be recorded (e.g., mean income or median income), and any decision rules that are to be used during the coding process. For example, a researcher might be interested in recording average income of each sample. However, currency and the value of that currency differs across countries, making raw values not directly comparable. Before coding, the researcher should think carefully about such culturally invariant raw values and identify a coding scheme that allows for comparable codes. In this case, coding both raw values (income as reported in the study), and then rescaled values (e.g., average income in US dollars) to ensure income is both recorded and recomputed accurately, is recommended. This level of detail also extends to conceptual scales. Construct labels do not always match construct operationalizations, or how the variable is actually measured. For example, we found study authors use different variable labels for the construct of WFC, and constructs that are labeled WFC are in fact at times alternative inter-role constructs, such as work–family balance or work–life conflict. Per our codebook, we only included measures in which 75% of the items were in line with our definition of WFC to ensure our operationalization of WFC were conceptually and operationally clear and aligned with one another. Thinking in advance about what the common construct labeling issues are within your area of meta-analytic focus, and having a way to identify such discrepancies as part of the coding process, can help ensure the same constructs are being recorded across all studies. In most instances, coding both the label used for a construct in the study (e.g., WFC) and the recoded variable based on consistent operationalizations or measurement (e.g., work–family balance) is warranted.
Typically, cross-cultural studies and databases that provide cultural values provide a value for each country. To make imputing simple, effect sizes should be separated by country. In the event that an effect size is confounded with multiple countries and cannot be separated (e.g., 65% of the sample is from Japan and 35% of the sample is from China), a cultural value can be imputed that is weighted to reflect the proportion of the sample from each country.
As a final note, meta-analysis coding can be a draining process. Mistakes are likely to occur for even the most conscientious coder armed with a clear codebook. This is especially true for complex cross-cultural meta-analyses, which are often large in scale and include a diverse range of operationalizations and methodological approaches. Therefore, it is important that data be coded by multiple coders to ensure any errors are detected and corrected. In addition, double coding ensures any judgment calls that must be made are consistent across multiple independent coders.
Analyzing and interpreting results. Cross-cultural studies often include more than two countries that can be compared. Before meta-analyzing the data, it is important to aggregate effect sizes as necessary to avoid over-counting samples or countries. For example, Spector and colleagues (Reference Spector, Allen, Poelmans, Lapierre, Cooper, Michael and Brough2007) collected data on WFC and correlates across 36 countries. For our analyses comparing mean WFC in US versus non-US countries (Allen et al., Reference Allen, French, Dumani and Shockley2015), we first computed thirty-five mean difference effect sizes, one for each mean difference in WFC between the United States and the thirty-five non-US countries. Because each effect size is dependent (all use the same US comparison sample), we then aggregated these effect sizes into one summary effect size using formulas that accounted for dependencies (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). As a second example, some studies in our meta-analysis on WFC and support reported correlations between three facets of WFC (time, strain, and behavior-based conflict) and one measure of social support (French et al., Reference French, Dumani, Allen and Shockley2015). In this case, the relationships between each facet of WFC and social support are dependent, as they are computed using the same sample. We combined the three effect sizes using formulas that account for dependencies (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009; Schmidt & Hunter, Reference Schmidt and Hunter2014) to form one composite effect size, representing the overall relationship between WFC and support for that sample. The cultural framework will largely determine when and how samples must be aggregated to avoid over-counting samples. It is critical that authors review effect sizes to ensure no one sample is double-counted. This is especially the case in cross-cultural meta-analysis, where double-counting a particular country may have substantial impact on the results. A full discussion of dependent effect size aggregation is beyond the scope of the current chapter; we direct the interested meta-analyzer to textbook resources for a full discussion of the use and implementation of aggregation formulas (e.g., Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009; Schmidt & Hunter, Reference Schmidt and Hunter2014).
Meta-analyses are computed using either fixed- or random-effects models (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). The fixed-effects model assumes there is a single true population mean effect size and that consequently any deviation from this effect size represents random error. In contrast, the random-effects model assumes there are multiple true population effect sizes, which form a distribution of population effect sizes. When conducting cross-cultural meta-analyses within the work–family field, random-effects model is almost certainly the correct analytic choice. Indeed, the notion of cross-cultural effect size differences itself implies that there is a population of true effect sizes, rather than a single true effect size across all studies.
Regarding the presentation of results, beginning with the most simple and then moving to more complicated analyses can help readers follow and interpret the findings. Starting with random-effects models for the proposed relationships across all samples can be a good strategy to estimate average effects for the sample of studies. Next, incorporate country or cultural dimension as a categorical or continuous moderator of the effect size of interest. To isolate effects for each potential moderator, we also recommend estimating each moderator one at a time. For this approach, cultural values should be chosen carefully, as testing several cultural moderators on the same relationship may inflate family-wise alpha rate, increasing the chances of finding a significant moderation when it in fact does not exist in the population (i.e., Type I error). If there are several comparisons made and Type I error inflation is a concern, the researcher might consider an alpha correction.
When interpreting results, three pieces of information are important: the mean effect size, the confidence interval associated with the mean effect size, and the credibility interval associated with the mean effect size. In random effect models, the mean effect size represents the average of the distribution of true population effect sizes, providing an estimate of the overall strength/magnitude and direction of a relationship (Aguinis et al., Reference Aguinis, Pierce, Bosco, Dalton and Dalton2011). The confidence interval provides information about the precision of the estimated mean true effect size. A small confidence interval indicates the mean effect size is precisely estimated, while a larger confidence interval indicates the mean effect size may not be accurate. Typically, effects are interpreted as significantly different from zero when the confidence interval does not include zero. The credibility interval indicates the range of true effect sizes that could be found in the population. Thus, a small credibility interval represents a homogeneous true effect size distribution, while a large one indicates there is a wide range of true effect sizes.
In general, cross-cultural differences are distal in nature. Many country-level attributes (e.g., economic prosperity, culture, policy) influence individual experiences through more proximal individually enacted norms and values (e.g., work and family labor hours, work and family centrality). Because country-level attributes are distal, cross-cultural differences in effect sizes are often small and difficult to statistically detect (e.g., French et al., Reference French, Dumani, Allen and Shockley2015; Litano et al., Reference Litano, Major, Landers, Streets and Bass2016). Within-country variation may also contribute to wide credibility intervals, indicating more moderators could be examined. These issues underscore the importance of a thorough literature search with sufficient within and between-country studies in order to estimate effects as accurately as possible.
Lessons Learned and Next Steps
We have encountered many challenges conducting and publishing cross-cultural meta-analytic work. We use this section to share a few essential lessons that are particularly pertinent to the aspiring cross-cultural researcher interested in using meta-analytic methods. Specifically, we discuss the importance of casting a wide search net, inclusion of non-English studies, and measurement invariance.
First, it is important to know a meta-analysis is only as good as the included primary studies. Therefore, the quality of meta-analysis depends largely on the selection of primary studies, operationalization of the main variables, coding plan, and the thoroughness of the methodology. For example, if a meta-analysis includes only published studies, the magnitude of observed correlations may be overestimated due to what is known as the file drawer problem. The file drawer problem refers to the fact that unpublished studies are more likely to include nonsignificant effects than are published studies (e.g., Aguinis et al., Reference Aguinis, Pierce, Bosco, Dalton and Dalton2011). Further, inclusion of unpublished manuscripts is expected for publication in rigorous journal outlets. Exclusion of such studies will need to be justified and, at the very least, explicitly stated as a limitation. As we mentioned in the previous section, we believe adopting multiple strategies to locate primary studies is important and whenever possible, conference submissions, dissertations, unpublished papers, and raw data sets should be screened for inclusion.
Second, all of the work–family meta-analyses listed in Table 8.1 limit primary studies to those written in the English language. Excluding non-English studies is a particularly critical issue when the meta-analysis aims to look at cross-cultural variations. When non-English studies are excluded, the researcher might directly restrict country-level variance and study representativeness. At the same time, including such studies without excellent translation is likely to introduce confounds, such as incorrectly interpreted variables or results. We recommend researchers who are willing to conduct cross-cultural meta-analysis be cognizant of this issue and explicitly mention the inclusion/exclusion of such studies in their manuscripts. Specifically, providing information on the number of studies written in a language other than English, the original language in which they are written, and the reasoning behind their inclusion/exclusion can be helpful. For example, if there are few non-English studies compared to the total number of studies and no new countries would be introduced by including non-English studies, then exclusion of non-English sources might be an acceptable strategy. Working with colleagues who are native or fluent non-English speakers is also an excellent strategy for accessing information in non-English sources.
Third, a truly cross-cultural meta-analysis relies on the assumption that the data are comparable across studies so that effect sizes can be combined (De Leeuw & Hox, Reference De Leeuw, Hox, Harkness, van de Vijver and Mohler2003). This assumption often suggests that variables of interest are perceived and understood the same way across participants from different countries. However, studies comparing multiple countries sporadically conduct explicit tests of measurement invariance. There is some evidence that the construct of WFC might not mean the same thing for individuals from different cultures. For example, Yang, Chen, Choi, and Zou (Reference Yang, Chen, Choi and Zou2000) found that factor loadings of items on family demand, work demand, and WFC scales were not comparable between participants from the United Sates and China. Therefore, we recommend researchers report which measures display measurement invariance across which countries, as this information has implications for the strength of conclusions drawn from the meta-analysis (Allen et al., Reference Allen, French, Dumani and Shockley2015). For primary cross-cultural studies, multiple-group confirmatory factor analysis and between groups structural equation modeling can be used to explicitly test for measurement equivalence (Schaffer & Riordan, Reference Schaffer and Riordan2003: see also Chapter 11 of this Handbook, Korabik, & Rhijn).
Future Directions in Cross-Cultural Work–Family Meta-Analysis
Moving forward, there are many potential ways meta-analysis can illuminate cross-cultural work–family relationships. First, meta-analyses have been conducted almost exclusively using correlational methods, collapsing across different types of designs (see Nohe et al., Reference Nohe, Meier, Sonntag and Michel2015 for an exception). However, we know WFC is dynamic, changing daily (e.g., Shockley & Allen, Reference Shockley and Allen2013), across time (e.g., Matthews, Winkel, & Wayne, Reference Matthews, Winkel and Wayne2014), and across life stages (e.g., Allen & Finkelstein, Reference Allen and Finkelstein2014). Given cross-cultural differences in the perception of time (e.g., future orientation; House et al., Reference House, Hanges, Javidan, Dorfman and Gupta2004), there may be interesting cultural differences in the timing of WFC episodes, or the relationship between change in WFC and change in correlates over time. Experimental and quasi-experimental studies are also not accounted for in existing work–family meta-analyses. Although existing experimental studies in work–family research are scarce, a meta-analysis that replicates experimental findings across different countries or cultures would be valuable for establishing theoretical and empirical boundary conditions for existing findings.
Second, the existing cross-cultural work uses culture as a theoretical framework for explaining findings, and meta-analytic work directly imputes cultural values associated with a given country (e.g., Shockley et al., 2017; Allen et al., Reference Allen, French, Dumani and Shockley2015). However, research has shown this approach ignores the substantial cultural variability within a given country and may create problems especially when different regions or subcultures within the same country have different cultural values (Taras, Kirkman, & Steel, Reference Taras, Kirkman and Steel2010). For example, two states of Brazil, São Paulo and Rio Grande do Sul, have been shown to have distinct characteristics in terms of time orientation and attachment to money and possessions (Lenartowicz, Johnson, & White, Reference Lenartowicz, Johnson and White2003). These distinct factors may lead to differences in how work and family roles are experienced between these two subcultures. Again, applying a blanket cultural framework by country ignores these potentially important differences. Future cross-cultural meta-analyses might explore subcultures more thoroughly by either meta-analyzing self-reported cultural values, or meta-analyzing subregions or societies. In addition to potential within-country variations, culture also changes over time (Olivas-Lujan, Harzing, & McCoy, Reference 178Olivas-Luján, Harzing and McCoy2004) and imputation of cultural values should reflect the timeframe within which participant data are collected. We hope to see more primary cross-cultural studies that assess culture at the individual level by asking participants to report on their own values and/or perceptions of the culture.
Third, there is a need to use additional cultural dimensions to understand the work–life interface. Ollier-Malaterre and Foucreault (2017) offer an excellent source on extant typologies and dimensions of culture used in work–life research. In addition to their suggestions, we believe that understudied dimensions such as performance orientation and assertiveness (GLOBE; House et al., Reference House, Hanges, Javidan, Dorfman and Gupta2004), inner versus outer direction (the extent to which individuals in a certain culture seek to control their environment versus choose to live in harmony with their environment), and specificity versus diffusion (Trompeenars, Reference Trompenaars1993) may have critical implications for understanding work–family experiences. For example, performance orientation, the extent to which cultures value competition and materialism over quality of life and relationships, may further influence work and family roles. Individuals from high performance orientation cultures might experience more WIF and less work–family balance compared to those from low performance orientation cultures. Similarly, the specificity/diffusion dimension, which reflects the extent to which cultures treat private and work life domains separate (specific) or interdependent (diffuse), can be expected to have direct impact on individuals’ segmentation and integration preferences.
Fourth, we have a limited understanding of the positive side of the work–family interface. Only three of our reviewed meta-analyses focused on WFE or positive spillover, and no meta-analyses have been conducted on work–family balance. Both WFE and work–family balance are relatively understudied in comparison to WFC. For example, in cultures where individuals have more autonomy in their work and they feel more empowered (e.g., low power distance cultures versus high power distance cultures), positive effects of WFE are likely to be observed. Cultures that rely on rigid gender-based roles and value earnings and recognitions over job security and freedom might also pose further challenges for work–family balance. Consequently, we call for additional primary cross-cultural research on these constructs so meta-analytic methods may eventually be applied.
Lastly, we propose expanding the scope of work–family meta-analyses beyond interrole phenomena, such as conflict, enrichment, and balance. For example, there is a healthy, interdisciplinary body of research on parent working conditions and child outcomes (Cho & Ciancetta, Reference Cho, Ciancetta, Allen and Eby2016). A cross-cultural meta-analysis that examines cultural similarities and differences on the relationship between parent working demands and child well-being has implications for not only work–family research, but also for policy makers and organizational decision makers across the globe. Time allocation across work and family domains could also be examined meta-analytically. Currently, primary studies show interesting differences in time allocation between work and family by country (e.g., Craig & Mullan, Reference Craig and Mullan2010), and that time allocation matters for well-being (Dahm, Glomb, Manchester, & Leroy, Reference Dahm, Glomb, Manchester and Leroy2015). Future meta-analytic work could focus on cross-cultural differences in time allocation, as well as culture as a moderator for the relationship between time allocation and domain outcomes (e.g., job satisfaction, family satisfaction). Additionally, as primary studies with cross-cultural comparisons increase, we believe that a multilevel approach to meta-analysis (e.g., Van Den Noortgate, & Onghena, Reference Van Den Noortgate and Onghena2003) that allows testing interactions between country-level variables (e.g., culture) and individual-level variables (e.g., segmentation/integration preferences, perceived organizational work–family culture) will also contribute significantly to our understanding of global work–family phenomena.
Concluding Thoughts
Meta-analytic approaches offer a comprehensive look at work–family phenomena, and their flexibility in terms of design affords opportunities for multi-national research that would be difficult, if not impossible, to examine with primary studies alone. However, meta-analysis is not free of challenges, and conducting a theoretically informative and rigorous meta-analytic study takes time and careful thinking. Our chapter outlined existing meta-analyses, suggestions, lessons learned, and future directions for those aspiring to inform cross-cultural work–family study using meta-analysis. We hope this chapter stimulates research along this vein and provides future meta-analysts with valuable guidance.
Managing both work and family roles has increasingly become an issue, if not a challenge, faced by individuals around the globe in the last few decades. At the same time, interest in studying issues which stem from the work–family interface has also increased (Poelmans, Greenhaus, & Maestro, Reference Poelmans, Greenhaus and Maestro2013). Amongst the burgeoning research on the work–family interface, attempts have been made to investigate and compare the various phenomena in the work–family interface across cultures with quantitative studies being the dominant approach (see for example the meta-analysis by Allen, French, Dumani, & Shockley, Reference Allen, French, Dumani and Shockley2015). However, when investigating the work–family interface across cultures, it is important to also take into account the different meanings that constructs may have in different cultural contexts, which may affect the understanding and interpretations of the findings from quantitative studies (Ollier-Malaterre & Foucreault, Reference Ollier-Malaterre and Foucreault2017).
Qualitative research is a valuable tool in discovering the local meanings and individual experiences of the work–family interface in different cultures. The goal of qualitative research is to help investigators understand and clarify specific phenomena of interest from a particular social-cultural context (Seale, Gobo, Gubrium, & Silverman, Reference Seale, Gobo, Gubrium and Silverman2004). Qualitative research is especially appropriate for exploring relatively under-investigated or highly complex phenomena, as well as understanding contradictions and discrepancies found in existing studies (Goldberg & Allen, Reference Goldberg and Allen2015). This is achieved by collecting data via the narratives of local respondents, then analyzing and interpreting those textual data, with an emphasis on the perceptions and interpretations from the respondents’ perspectives.
The unique contribution of cross-cultural, qualitative research in the work–family interface is twofold. First, as an end, cross-cultural, qualitative research can be adopted to explore concepts or phenomena in the work–family interface across different cultures, and be used to compare the similarities and/or differences of those experiences and perceptions across different cultural groups. This is following the derived etic approach of cross-cultural research (Smith, Fischer, Vignoles, & Bond, Reference Smith, Fischer, Vignoles and Bond2013), in which comparisons on the culturally universal (etic) elements of the phenomenon of interest may be achieved through integrating various culturally specific (emic) studies conducted in the cultures involved (e.g., Shordike et al., Reference Shordike, Hocking, Pierce, Wright-St. Clair, Vitayakorn, Rattakorn and Bunrayong2010).
Second, as a means, the findings from qualitative research can also be used to lay the foundation for quantitative studies by establishing functional equivalence of concepts across different cultures, as well as highlighting the unique (non-equivalent) components of various concepts in different cultural contexts. Functional equivalence concerns whether a certain psychological construct can be said to exist across cultures, and extensive qualitative work such as interviews or focus groups is essential to establish in-depth understanding about each cultural context of interest before one can make informed judgement as to whether quantitative comparison on the construct is meaningful (Smith et al., Reference Smith, Fischer, Vignoles and Bond2013). Alternatively, cross-cultural, qualitative research may also be used to complement quantitative research by clarifying unexpected quantitative results and findings, through interviews and/or focus groups with local respondents.
Review of Existing Cross-Cultural Qualitative Research
Work–family global qualitative studies can be divided into two groups, those that focus on a single culture – and cross-cultural comparisons which include multiple countries or cultures. Most of these studies fall in the former category and are mono-cultural. Several researchers have focused on gaining a general understanding of the work–family experiences of female IT professionals in India (Valk & Srinivasan, Reference Valk and Srinivasan2011), airline employees in Hong Kong (Ng, Fosh, & Naylor, Reference Ng, Fosh and Naylor2002), life insurance agents in Singapore (Chan, Reference Chan2002), young Chinese urban professionals (Coffey, Anderson, Zhao, Liu, Zhang, Reference Coffey, Anderson, Shuming, Yongqiang and Jiyuan2009), Russian cosmonauts (Johnson, Asmaro, Suedfeld, & Gushin, Reference Johnson, Asmaro, Suedfeld and Gushin2012), entrepreneurs in New Zealand (Kirkwood & Tootell, Reference Kirkwood and Tootell2008), Austrian managers (Kasper, Meyer & Schmidt, Reference Kasper, Meyer and Schmidt2005), and business travelers in the United Kingdom (Nicholas & McDowall, Reference Nicholas and McDowall2012). Somech and Drach-Sahavy (Reference Somech and Drach-Zahavy2007) conducted novel research on work–family coping strategies using an Israeli sample; Burgess, Henderson, and Strachan (Reference Burgess, Henderson and Strachan2007) interviewed HR professionals about the informal family-friendly practices that their companies offered in Australia; and Ba’ (Reference Ba’2011) explored integration and segmentation behaviors in a dual-earner UK sample.
Although some findings throughout these studies are unique to their cultural or occupational context, many find similar themes to those noted in Beigi and Shirmohammadi’s (Reference Beigi and Shirmohammadi2017) review of the qualitative work–family literature: parenthood, gender differences, family-friendly policies and non-traditional work arrangements, and coping strategies. One interesting finding comes from Grzywacz et al. (Reference Grzywacz, Arcury, Marín, Carrillo, Burke, Coates and Quandt2007) who conducted both qualitative and quantitative research. They examined the experiences of work–family conflict among Latino immigrant workers in the United States poultry processing industry. The qualitative interviews suggested that the workers, particularly the women, experienced a great deal of work–family conflict, largely due to the physically demanding nature of the job. However, the study also included a quantitative portion, and the male and female workers reported low mean scores on the typical quantitative work–family conflict measures. This is one example of where important findings may have been masked if the question was only examined quantitatively and speaks to the need for future research to understand how the meaning of work–family conflict itself may vary across cultures. Grzywacz, Gopalan, and Carlos Chavez speculate on this further in Chapter 25 of this handbook.
There are fewer qualitative studies that involve an explicit cross-cultural agenda; we review each below. Using focus groups along with archival data from five cultures (China, Hong Kong, Singapore, Mexico, and the United States), Joplin, Francesco, Shaffer, and Lau (Reference Joplin, Francesco, Shaffer and Lau2003) explored the influences of macro-cultural level factors on the work–family interface. Their findings suggested that higher rates of economic, social, technological and legal changes in the macro environment would lead to higher level of work–family conflict. Using semi-structured interviews, Wong and Goodwin (Reference Wong and Goodwin2009) explored the impact of work on family in Hong Kong, China, and the United Kingdom. They found that about half of the respondents from each culture did not experience work interference with family; however, the protective factors mentioned varied across the three cultures. For instance, more Hong Kong-Chinese reported that having a manageable workload helped prevent work from interfering into the family; while more participants from the United Kingdom mentioned that having non-standard work helped prevent work-to-family conflict.
The remaining studies compare two cultures. Using focus groups from Hong Kong and Singapore, Thein, Austen, Currie, and Lewin (Reference Thein, Austen, Currie and Lewin2010) explored the concepts of work–family balance with professional women in these two cultures. They found that most respondents from both cultures did not perceive much work–family conflict, and this phenomenon was partly attributed to the specific economic conditions in both cultures. With the high cost of living in both cultures, it is a normative expectation that females engage in full-time job to help meet the financial and material needs of their families. In sum, these studies yield important insights to the influence of culture on the work–family interface and laid the foundation for further research on relevant constructs. Peus and Traut-Mattausch (Reference Peus and Traut-Mattausch2008) interviewed female managers in the United States and Germany. They noted that German women were more skeptical about combining work and family and, despite more generous state-sponsored work–life policies, felt that such policies were detrimental in that they create discrimination in hiring women. Poster and Prassad (Reference Poster and Prasad2005) studied work–family relations in high-tech firms in India and the United states. They noted that work–family experiences vary in each country largely based on the varying social contracts between workers and the states and the tendency for people to either segment or integrate work and family lives. Lastly, Brough, O’Driscoll, and Briggs (Reference Brough, O’Driscoll and Biggs2009) focused on the impact of parental leave on work–family balance in an Australian and New Zealand sample. Most of their cultural differences are centered on the fact that significantly more New Zealand versus Australian parents have access to parental leave.
Best Practices for Conducting Cross-cultural, Qualitative Work–Family Research
Although guidelines and signposts for conducting qualitative research have been widely published (e.g., Goldberg & Allen, Reference Goldberg and Allen2015), this section aims to highlight several specific issues to be considered when conducting cross-cultural, qualitative research specifically with a focus on the work–family interface.
Defining culture and the specific research question. Various attempts have been made to define culture; common to these definitions are the emphases on shared meaning about things (e.g., Rohner, Reference Rohner1984) or way of life (Berry, Poortinga, Segall, & Dasen, Reference Berry, Poortinga, Segall and Dasen1992) of a group of people, and that this shared meaning or way of life is transmitted within the group. These emphases suggest that cultures are differentiable from one another according to the views and understandings shared by members within any group. In other words, the concept of culture can be applied to many groups that extend beyond nations, such as those that share common ethnicities, languages, sexual orientations, or religions (Frisby, Reference Frisby, Sandoval, Frisby, Geisinger, Scheuneman and Grenier1998; Matsumoto & Juang, Reference Matsumoto and Juang2017). Although most cross-cultural research has been focused on nations (Smith et al., Reference Smith, Fischer, Vignoles and Bond2013), it is possible to conduct cross-cultural research based on different groups, such as comparison across states or provinces within a nation, depending on the theoretical assumptions the researchers hold about the culture of these groups (see for example Chapter 22 in this handbook; Eby, Vande Griek, Maupin, Allen, Gilreath, & Martinez).
Having decided upon the specific cultures of interest, researchers can then determine which factors associated with these cultures are likely to impact work–family outcomes, and design research questions and interview protocol around these ideas. For instance, the meaning of family may differ across cultures; people in individualistic cultures may interpret the term family to mean only the nuclear family, whereas people in more collectivistic cultures may refer to broader kinship relations and extended family (Georgas, Reference Georgas, Georgas, Berry, van de Vijver, Kagitçibasi and Poortinga2006). One consequence of different conceptions of family is that the scope of familial responsibilities may also vary across cultures, which in turn leads to different perceptions about what constitute work–family conflicts in individualistic versus collectivistic cultural contexts. Taking care of aging parents may then be more positively evaluated and be construed as a part of core familial duties in collectivistic cultures (Löckenhoff, Lee, Buckner, Moreira, Martinez, & Sun, Reference Löckenhoff, Lee, Buckner, Moreira, Martinez, Sun, Cheng, Chi, Fung, Li and Woo2015).
Designing the research. When conducting cross-cultural, qualitative research, compared to conducting a mono-cultural study, various additional issues need to be considered to ensure the validity and reliability of the data, such as equivalence of interview or focus group questions in each culture studied, culturally appropriate data collection methods, and transcription-translation-interpretation of data. A detailed set of general guidelines for conducting qualitative and quantitative cross-cultural studies has been laid out by Smith et al. (Reference Smith, Fischer, Vignoles and Bond2013). Many of the suggestions center around how to ensure that the design of research materials (e.g., interview protocols) and the data-collection procedures (e.g., instructions given by interviewers) are equivalent in each cultural setting, so that the results obtained from each culture are subjected to minimal influence of extraneous variables. As such, a cross-cultural, collaborative research team with researchers from each culture participating in the research project provides valuable resources and manpower to cater to the various needs that arise in such research (Mitteness & Barker, Reference Mitteness, Barker, Seale, Gobo, Gubrium and Silverman2004).
After designing the core set of questions and subsequent probes for the data collection, these questions need to be translated to the local language of each culture studied. Specifically, two common approaches have been used in cross-cultural research to ensure linguistic equivalence of instruments before and after translation: back translation (Brislin, Lonner, & Thorndike, Reference Brislin, Lonner and Thorndike1973) and the committee approach. The former involves translation of materials from what are typically written in English into a target language, and another bilingual translator will then back-translate this language-version into English. An independent native user of English will then compare the original English version with the back-translated English version to look for any inconsistency between the two versions and improve the translation accordingly. The back-translation method is cost-effective and easy to implement, but sometimes may result in problems such as awkward phrases adopted by an individual translator. This shortfall may be addressed by the committee approach, where a group of bilinguals work together to translate and discuss the appropriateness of the translation. A combination of the back-translation and committee approach may also be adopted to take advantages of both approaches (see Smith et al., Reference Smith, Fischer, Vignoles and Bond2013). See Chapter 11 of this handbook for additional discussion of translation processes (Korabik & Rhijn).
Sampling. The target population is an important consideration in qualitative research. In work–family studies, researchers must decide, for example, whether to include individuals who are working part-time, work from home, or are shift-workers, in addition to those who work in full-time employment. In addition, researchers also have to decide whether to include individuals who are cohabitating, or those who are in single-parent or step families. The decision-making process will depend on the theoretical assumptions, and the specific cultural backgrounds of the cultural groups studied. The inclusion criteria of perspective respondents can then be clearly defined.
The issue of sample size has been a long-contended issue in qualitative research. Although the sample size for qualitative research is generally smaller than that required for quantitative research, the breadth and scope of the topic of investigation, as well as the heterogeneity of the population should be considered when deciding on the sample size (Baker & Edwards, Reference Baker and Edwards2012). The attainment of saturation (Glaser & Strauss, Reference Glaser and Strauss1967), which occurs when no new insight can be obtained by inclusion of new data, is sometimes regarded as a criterion for determining sample size. It is usually applied when a grounded-theory approach is adopted, with data generated from a few in-depth interviews (Goldberg & Allen, Reference Goldberg and Allen2015; Nelson, Reference Nelson2017). Other guidelines have also been stipulated in the literature, suggesting a sample sizes that range from six to fifty for qualitative interviews (Baker & Edwards, Reference Baker and Edwards2012). The analytical technique used is also a factor to be considered when deciding on sample size, as different sampling procedures require different amounts of data. When a quantitative component is included, sample size calculation can be conducted to ensure that there is enough power in identifying the prevalence of themes (Fugard & Potts, Reference Fugard and Potts2015), or enough statistical power in the quantitative analyses, such as Chi-square or logistic regression analysis.
Data collection. Qualitative interviews and focus groups are two common data collection methods used in qualitative research. Both methods are compatible with a phenomenological, interpretative approach, which emphasizes the understanding of social phenomenon from individuals’ own perspectives (Kvale, Reference Kvale1996). With qualitative interviews, researchers can explore “how and why people behave, think and make meaning as they do” (Ambert, Adler, Adler, & Detzner, Reference Ambert, Adler, Adler and Detzner1995, p. 880). With focus groups, face-to-face interactions amongst representative groups of individuals, along with a moderator, can generate new thinking and in-depth discussion of a topic of interest (Krueger & Casey, Reference Krueger and Casey2000). The data generated via focus group discussions may be more useful in gauging whether a statement has broad consensus or is just the idiosyncratic view of one individual.
The decision on which data collection method to adopt in a qualitative study also depends on issues of privacy and sensitivity of topics. For instance, in Wong and Goodwin’s (Reference Wong and Goodwin2009) study, their aim was to investigate individuals’ personal experiences of how their work influenced their marriages. The topic was deemed rather personal and sensitive, and one that individuals might not feel comfortable discussing with a group of strangers in a focus group; therefore, qualitative interviews were adopted as the data collection method. In contrast, Thein et al.’s (Reference Thein, Austen, Currie and Lewin2010) study focused on professional women’s perceptions of work–life balance and how various support structures were useful to their management of work–life balance. Interactions among participants were helpful in eliciting ideas about issues related to the broader socio-economic context (e.g., government support to women), which may not have been brought up if only individual interviews were utilized.
When qualitative interviews are adopted as the method of data collection for the study, the method in which the interviews should be conducted is another important consideration (e.g., face-to-face or phone interviews). The choice of the mode of interview is dependent in part on the theoretical approach and the research question of the study; however, if the non-verbal, visual cues afforded only by face-to-face interviews are not the focus of the study, there may not be a significant difference between the quality of face-to-face and that of phone interview (Novick, Reference Novick2008; Sturges & Hanrahan, Reference Sturges and Hanrahan2004). Additionally, specific cultural issues may need to be taken into consideration. For instance, in places where most people live in cramped living conditions, and value privacy of their homes more, such as in Hong Kong and urban China (Wong, Reference Wong2003), phone interviews may allow for a relatively high degree of privacy to the respondents as the interviewers do not need to ‘intrude’ into their homes. In addition, safety issues of the interviews may also need to be considered as a certain degree of risk may be involved when the interviewers need to conduct field visits for the interviews, depending on the neighborhoods or locations that the respondents live or work. When the perceived risk is high, phone interview may be a better option. While it may be easier for the interviewer to ensure the privacy of the respondents by physically checking the suitability of the location where the interviews take place, the same can be achieved in phone interviews by checking with the respondents before the start of the interview that they are talking on the phone in private, without the presence of others.
Another advantage of using a cross-cultural research team is that local researchers are available for the data collection (e.g., conducting the qualitative interviews or focus group discussions in each locality). Sharing a common cultural background with the respondents may help the researcher establish rapport with the respondents, as well as cater to their cultural and linguistic backgrounds, such as the probing for interpretations of local slang and jargon, or interpreting non-verbal cues in high-context cultures (Hall & Hall, Reference Hall, Hall, Lane, DiStefano and Maznevski2000). As the quality of the data from qualitative interviews or focus groups depends a great deal on the quality of the interviewers or moderators, all the interviewers or moderators should be well-trained and familiarized with the standardized protocol established for the study, and discuss the process and issues with the research team before the start of the data collection.
Data analysis. Common qualitative data analytical approaches adopted in work–family research include content analysis, thematic analysis, and template analysis. These approaches can help researchers establish meaning directly from the data (Mojtaba, Turnnen, & Bondas, Reference Mojtaba, Turnnen and Bondas2013). Content analysis is a systematic coding and categorizing approach, which classifies and quantifies large amount of textual data, with the use of pre determined categories to count the frequencies of each code in a reliable manner (Green, Reference Green2004; Mojtaba et al., Reference Mojtaba, Turnnen and Bondas2013). This approach is used to elucidate the phenomenon of interest by identifying the prevalence of various themes identified among the respondents and can be used in conjunction with quantitative statistical analysis to compare the frequencies of responses. For example, in Johnson and colleagues’ (Reference Johnson, Asmaro, Suedfeld and Gushin2012) study with retired Russian cosmonauts, the relative frequencies of different types of work–family conflicts and interactions experienced in their pre-career, during career, and post-career period were compared, based on the data analyzed with a content analysis approach. However, the use of a pre determined, fixed, coding scheme in content analysis may not be suitable in picking up unique, emic components when conducting cross-cultural, qualitative studies.
On the other hand, thematic analysis can be used to identify and analyze patterns in qualitative data. With this approach, researchers become familiarized with the data, collate and code quotes from the data, search for themes amongst the codes, review the themes, then define and name the themes (Braun & Clarke, Reference Braun and Clarke2006). Thus, the aim of this approach is more focused on the identification of common themes versus the specific frequencies of occurrences (Braun & Clarke, Reference Braun and Clarke2006). For instance, in Nicholas and McDowall’s (Reference Nicholas and McDowall2012) study, thematic analysis was used to analyze the data collected via semi-structured interviews with business travelers from the United Kingdom. The authors identified four main themes of achieving work–family balance across their group of respondents, such as accepting their lifestyle choice and role and through negotiation. The findings were then interpreted within the framework of work–family border theory. In their cross-cultural study with female professionals from Hong Kong and Singapore, Thein and colleagues (Reference Thein, Austen, Currie and Lewin2010) analyze their data collected from focus groups conducted in Hong Kong and Singapore with a grounded-theory approach. Fourteen categories were identified and linked to their model of work–family balance.
Similar to thematic analysis, template analysis also allows more flexibility in producing an interpretation of the qualitative data. With template analysis, themes in the initial template can be defined a priori (King, Reference King, Symon and Cassell1998) based on previous research and theorizing about the phenomena of interest. However, the themes in the initial template are then modified or discarded, and new themes are included during close readings of all the qualitative data until a final template is constructed, which is then used for coding of all the data. In the final template, themes or codes are organized hierarchically (King, Reference King, Symon and Cassell1998), with those related to each other clustered together under higher-order codes. Higher-level codes represent broad themes and lower-level codes represent more focused themes or themes of finer distinctions. Template analysis also allows researchers to quantify and analyze the qualitative data statistically by counting frequencies of each theme. This technique was used in Wong and Goodwin’s (Reference Wong and Goodwin2009) three-culture study to analyze the data collected from semi-structured interviews in each location. In this study, an initial template was defined a priori based on previous research and theorizing regarding work–family conflict (Greenhaus & Beutell, Reference Greenhaus and Beutell1985) and used as a basis for the subsequent analysis. The production of the final template and the actual analysis, however, were not constrained by the initial template, but allowed emic elements to emerge from the data from the different cultures involved. Both culturally universal and culturally specific themes or codes regarding work–family interactions were identified in the study; for instance, across the three cultures studied, separation between work and family was mentioned as a protective factor against work–family conflict; however, more respondents from the United Kingdom mentioned part-time employment and more respondents from Hong Kong mentioned having reasonable work hours as buffers to work–family conflict, which were specific factors afforded by the respective cultural contexts.
Once the initial coding is conducted, typically by the principal investigator, a portion of the data can be subjected to multiple coding by having different researchers code the data independently. This can help ensure the rigor and reliability of the coding procedure (Armstrong, Gosling, Weinman, & Marteau, Reference Armstrong, Gosling, Weinman and Marteau1997; Barbour, Reference Barbour2001). However, in cross-cultural studies, the focus is not only on ensuring a high degree of concordance amongst the coders, but also to explore and investigate the content of disagreement in the coding to uncover any cultural-specific meaning of the topic of interest (Barbour, Reference Barbour2001). From the discussion of the discrepancies of the coding results, insights and new understanding can be obtained. In Wong and Goodwin’s study (Reference Wong and Goodwin2009), reliability of the coding was established by having approximately one-third of the transcripts from each culture coded independently by a researcher who was native from the respective culture. Each set of coding was then compared with the coding conducted by the chief investigator based on the same set of transcripts. Discrepancies found were discussed and negotiated, and inter-rater reliability was calculated in terms of inter-rater agreement. Inter-rater agreement (Miles & Huberman, Reference Miles and Huberman1994) was used in the study because the unit of analysis was the thematic unit, and the transcript of each respondent could be coded into multiple themes.
In most cross-cultural studies, the qualitative data from different cultures studied are transcribed verbatim and then translated into English for subsequent analyses. However, having the coding conducted based on the transcripts written in the original language can help preserve the original meanings of the qualitative data. For example, in Wong and Goodwin’s (Reference Wong and Goodwin2009) three-cultural study, the data from the qualitative interviews were transcribed verbatim in the original language, and the subsequent analyses were conducted based on these transcripts. This approach was possible because the principal investigator and other researchers in the study were bilingual, and spoke the local languages of the cultures studied. This analytical method helped preserve the original meaning of the data gathered from different cultural contexts and afforded emic elements to emerge from the data.
Recommendations for Future Qualitative Cross-Cultural Work–Family Research
Although cross-cultural, qualitative work–family research is relatively rare, it has great potential to contribute to work–family theory and practice. It provides a tool to investigate and understand the similarities and differences in the perceptions and experiences of the work–family interface across diverse cultural contexts. This understanding of the culturally similar versus culturally specific elements of the different components of work–family relations can help in the development and refinement of culturally sensitive models of the work–family interface, which can then be further validated using quantitative data with larger samples. With this mixed-method approach, the results of the qualitative work can help establish equivalence of constructs across the cultures studied, and refine items to be used in quantitative surveys. This is critical to help ensure the validity of measures adopted in large-scale cross-cultural research, which are often quite costly. Moreover, this also helps lay the groundwork for subsequent unpackaging studies (Smith et al., Reference Smith, Fischer, Vignoles and Bond2013) which aim to explain the observed cultural differences in the work–family interface with relevant individual-level variables across cultures.
Another advantage of qualitative research is that it can be used to investigate and provide explanations to unexpected findings that are observed in cross-cultural, quantitative studies. Thus, in this type of mixed-method approach, the qualitative study supplements the findings in quantitative study. For instance, in Allen and colleagues’ (Reference Allen, French, Dumani and Shockley2015) meta-analysis, no significant difference was found in the levels of work-to-family conflict between individuals from individualistic and collectivistic cultures, while individuals from cultures higher in individualism reported lower levels of family-to-work conflict than their counterparts in cultures higher in collectivism. These findings were inconsistent with their hypotheses that individuals from more collectivistic cultures should report less work-to-family and family-to-work conflicts than their counterparts from less collectivistic cultures. A cross-cultural, qualitative study involving individuals from cultures on the individualism-collectivism continuum may help further investigate these unexpected results.
Similarly, Lu, Huang, and Bond (Reference Lu, Huang and Bond2016) quantitatively examined how cultural values influence work centrality. It was found that perceived independence at work predicted work centrality only for males, but not for females; it is not clear which cultural variable would be useful to account for such difference. Lu et al. (Reference Lu, Huang and Bond2016) conjectured that the difference could be explained by differences in gender role socialization, but they also acknowledged that “in the work–life balance literature, although the role of gender has been stressed, when gender similarity arises or gender difference emerges is still not clearly understood” (p. 290). While gender role socialization offers a possible explanation, it is important to note that gender role ideology differs across cultures and changes as societies undergo change (Gibbons, Hamby, & Dennis, Reference Gibbons, Hamby and Dennis1997). In this case, a qualitative study could assess this conjecture by examining how perceived relative importance and independence of work differ across genders in different cultural contexts.
Given the relative dearth of cross-cultural work–family research, there are many qualitative research opportunities moving forward. One of the possibilities is to examine the meaning of work–family balance across cultures. Cultural values and beliefs influence individuals’ perception of work and family, and the perception of work–family conflict may vary in different cultural contexts (Grzwaycz et al., Reference Grzywacz, Arcury, Marín, Carrillo, Burke, Coates and Quandt2007). Along the same line, the meaning and experience of work–family balance may also vary across cultures. For instance, in cultures where egalitarian gender roles are more emphasized, marital partnership or equality may be perceived as an integral strategy in striving for work–family balance (Zimmerman, Haddock, Current, & Ziemba, Reference Zimmerman, Haddock, Current and Ziemba2003). A related issue concerning spousal and family support may also be of interest. Spousal support can have various benefits to marital relationship (Cutrona, Reference Cutrona, Pierce, Sarason and Sarason1996), such as the prevention of emotional withdrawal in times of stress. However, differential emphasis on egalitarianism in different cultures may influence the types of spousal and family support that are valued in the work–family interface, and these specific types of support (e.g. sharing of housework, childcare, sharing of emotions, respect and value work–life goals of both partners) may be difficult to uncover without qualitative work. Another interesting topic in work–family research across cultures is the concept of work–family boundaries (Ashforth, Kreiner, & Fugate, Reference Ashforth, Kreiner and Fugate2000), specifically, how individuals negotiate the physical and psychological boundaries between work and family domains. Segmentation and integration of work and family roles have been suggested as two possible strategies in managing the work–family interface (Rothbard, Phillips, & Dumas, Reference Rothbard, Phillips and Dumas2005). Although Ashforth et al. (Reference Ashforth, Kreiner and Fugate2000) proposed that there may be cultural differences in segmentation-integration practices, this has only been empirically studied in one study comparing American and Indian workers in high tech firms, finding greater integration in the United States and more segmentation in India (Poster & Prassad, Reference Poster and Prasad2005). Given that previous qualitative work has found that people differ in their boundary management strategies based on other contextual factors such as employment condition and identity (Ba’, Reference Ba’2011), it seems a worthwhile area of future investigation into additional cultural and employment contexts.
From a practical point of view, empirical cross-cultural, qualitative research not only can elucidate our understanding about the cultural influences on work–family interface, as well as the local meanings and experiences of work and family in different cultural contexts, it can also provide insights to policy-makers and multi-national companies in devising culturally sensitive and appropriate work–family policies and interventions (e.g., De Cieri & Bardoel, Reference De Cieri and Bardoel2009). For instance, it is suggested that the concept of work–family balance has long been a less recognized issue in Chinese societies, given the high importance placed on work (Lewis, Gambles, & Rapoport, Reference Lewis, Gambles and Rapoport2007); such that in order to make flexible work–family arrangements effective, the mindset of local managers has to be trained and attuned accordingly (De Cieri & Bardoel, Reference De Cieri and Bardoel2009).
Time use studies analyze how individuals allocate their time. Data on time use allow researchers to document and understand human behavior and population lifestyles and to implement policies for development and planning policies (Harvey & Pentland, Reference Harvey, Pentland and Pentland1999). Time is a scarce resource that each person uses differently, and through the study of time we can analyze whether these differences are voluntary or forced and predict change in future population behavior (Durán, Reference Durán2010). Time use data have been widely used to analyze changes over time and differences between cultures. For example, Bianchi et al. (Reference Bianchi, Robinson and Milkie2006) used time use data for the United States from 1965 forward to analyze the time allocation patterns of changing American families. They found that decreases in housework and increases in multitasking help explain how mothers’ time with children has remained steady despite increases in paid work. Gershuny (Reference Gershuny2000) found convergence in time use patterns both across time and across countries with different social and cultural backgrounds.
In this chapter we introduce time use data as a resource for studying work and family. While we broadly introduce different types of time use data, our primary focus is on time diary data. We review the historical relevance of time use studies, and we describe data collection methods. We discuss challenges for comparing time diary data across time and space. Finally, we include a case study illustrating how time diary data may be used to compare family time in the United States and Spain.
The Historical Relevance of Time Use Survey Methodology
Time diary techniques were used as early as the turn of the twentieth century in Russia and London (Pember-Reeves, Reference Pember-Reeves1913; Bevans, Reference Bevans1913), with the earliest time diaries collected in the United States and the Soviet Union in the 1920s (Strumilin, Reference Strumilin1925; Szalai, Reference Szalai1966; Kneeland, Reference Kneeland1929). In the first known time diary study (Szalai, Reference Szalai and Harvey1984), Bevans (Reference Bevans1913) examined the non-work time of employed men from New York City and surrounding New York state areas. In an effort to understand and alleviate poverty, Pember-Reeves (Reference Pember-Reeves1913) used time diaries to document the experiences of working-class women in London in 1909–1913. In the early 1920s in the Soviet Union, Strumilin undertook a study of industrial workers for the purposes of economic planning (Zuzanek, Reference Zuzanek1980; Szalai, Reference Szalai and Harvey1984). These early studies laid the foundation for future time diary data collection.
The first large-scale time diary studies in the United States were funded by the 1925 Purnell Act, which required federal funding for agricultural research, and were fielded by the United States Department of Agriculture. These studies focused on the time use efficiency of farm and town homemakers (Gershuny & Harms, Reference Gershuny and Harms2016). Using these data, Kneeland (Reference Kneeland1929) argued that women’s unpaid domestic work should be considered part of the national economy (Gershuny & Harms, Reference Gershuny and Harms2016). During the Great Depression, Russian exile Pitirim Sorokin, in collaboration with Clarence Berger (Sorokin & Berger, Reference Sorokin and Berger1939), conducted a large-scale study of young adults in the Boston area of the United States who were unemployed or underemployed, including individual episodes recording what people were doing and who they were with (Michelson, Reference Michelson2015).
The collection and analysis of time diary became more widespread in the 1960s (Andorka, Reference Andorka1987). In the mid-1960s Szalai began a foundational study (Andorka, Reference Andorka1987) sampling the population of twelve countries (Bulgaria, Czechoslovakia, German Democratic Republic, Hungary, Poland, Soviet Union, Yugoslavia, France, Belgium, German Federal Republic, Peru, and the United States) in a time diary survey that recorded what individuals were doing, when they did the activities, where they were, and who they were with. Since Szalai’s early work in the area, hundreds of surveys have been fielded in over 100 countries around the world (Fisher, Reference Fisher2015).
Methods of Collecting Time Use Data
Information on time use is mainly collected via stylized questions or through time diaries. Stylized questions elicit information about the frequency and time spent performing a series of pre established activities (e.g., paid work, housework, childcare, eldercare). By contrast, the time diary is a record of the sequence and duration of activities carried out by an individual over a specified period of time, typically twenty-four hours (Converse, Reference Converse and Sills1968). The time diary format allows researchers to know what individuals do at all times of the day as opposed to just how much time is spent in pre established activities, from which the timing and sequencing throughout the day is missing. Additional information is often collected in time diaries about with whom activities are performed (e.g., with partner, with children or with friends), where activities take place (e.g., at home or at work), and even how respondents feel during activities (e.g., happy or stressed), providing contextual information for analyzing the use of time at the intersection of the moment of the day and space (Durán & Rogero, Reference Durán and Rogero-García2009).
Diaries of activities provide unique opportunities to learn about daily life and how it has changed over time and how it varies across cultures. Despite their complexity, time diary data allow for much more flexibility than other types of time use data and allow researchers to approach the data from many different perspectives. They allow researchers to examine how daily life is structured and provide the basis for contrasting theories about the nature of social change in modern societies (Robinson, Reference Robinson and Pentland1999). Time diary data also provide an opportunity for researchers to investigate the sequence and order of activities, tapping into issues of fragmentation and quality of time (Bianchi et al., Reference Bianchi, Robinson and Milkie2006).
Diaries are considered the most valid and reliable data source for the study of time use (Gershuny & Sullivan, Reference Gershuny and Sullivan1998; Gershuny, Reference Gershuny2000; Marini & Shelton, Reference Marini and Shelton1993; Durán & Rogero-García, Reference Durán and Rogero-García2009; Robinson, Reference Robinson and Pentland1999; Bonke, Reference Bonke2005), and are becoming the standard method for collecting information on the organization of daily life and time spent on market work, non-market work, and leisure (Sevilla, Reference Sevilla2014). For example, the use of diaries avoids the difficulty of estimating the time spent on a particular activity and the over- or under-reporting of activities due to social desirability bias, that is overestimating the time spent in socially desirable activities (e.g., reading a book to children) or underestimating less socially acceptable activities (e.g., watching TV) (Press & Townsley, Reference Press and Townsley1998; Juster et al., Reference Juster, Ono and Stafford2003; Hofferth, Reference Hofferth2006). Research comparing time diary and stylized estimates (i.e., questions that ask respondents to estimate how much time they spend in certain activities) of time devoted to work activity (Juster & Stafford, Reference Juster and Stafford1991; Robinson & Bostrom, Reference Robinson and Bostrom1994) show that stylized estimates may overstate time spent in paid work (but see Frazis & Stewart, Reference Frazis and Stewart2004). Such comparisons have also been made for household tasks (Bianchi et al., Reference Bianchi, Milkie, Sayer and Robinson2000; Presser & Robinson, Reference Presser and Robinson2000), religious activity (Presser and Stinson, Reference Presser and Stinson1998), and reading to children (Hofferth, Reference Hofferth1999), all of which provide evidence suggesting that stylized estimates are more likely to overstate activity participation compared to time diary estimates. Moreover, diaries distinguish between main and secondary activities and capture the relative importance of one activity versus another for the respondent (Durán & Rogero, Reference Durán and Rogero-García2009). Finally, the sum of all activities collected by means of a diary is twenty-four hours (1,440 minutes), while stylized questions sometimes exceed this amount (Juster et al., Reference Juster, Ono and Stafford2003).
Data Collection by Time Diaries
Time diaries spanning a 24-hour period generally take one of three forms; the choice depends on the research objectives and resources available. In the open diary format, respondents provide the sequence of activities and the start and end time of each. There is no specific interval of time for which respondents must report what they are doing, so that the burden on the respondents is lower than for other methods. Diaries with fixed intervals require that the respondents report the activity performed for a specified time interval (EUROSTAT recommends every ten minutes). Larger time intervals mean less work for the respondent, but some detail, especially for shorter activities, is lost (Gershuny, Reference Gershuny2000). Finally, using simplified diaries, activities are pre defined (i.e., participants select from a list of activities) and the respondent must indicate times of day when they performed such activities.
In addition to collecting information about primary activities, time diary data collections often include additional contextual information. The most common types of contextual information include whether respondents are doing a simultaneous activity and what the activity is, with whom they carry out the activity, and where it is done. How respondents feel during an activity and whether they are using electronic devices are also sometimes, though less commonly, collected.
Time use surveys also differ in the number of members per household who participate in the survey. Some surveys include all household members, others sample those individuals in a certain age range, while others sample only one member of the household. In each case, the surveys generally collect sociodemographic information of other household members and general characteristics of the household. Gathering time use information about multiple individuals per household allows researchers to analyze how tasks and roles are divided within the household.
Another difference across time use surveys is the number of diaries completed by each respondent. Respondents may fill only one 24-hour diary, two diaries (usually one on a week day and one on a weekend), or a diary for each day of the week. The availability of more than one diary per respondent allows researchers to compare time allocation under different circumstances, for example, when respondents work or when they do not. Although the information collected in time diaries generally covers only a single day, or in some cases up to a week, the number of respondents who complete diaries is usually large enough to ensure the representativeness of both weekdays and weekends and different demographics (Durán & Rogero, Reference Durán and Rogero-García2009). Moreover, the collection of data across individuals is usually spread over a full a year, capturing the periodicity of certain activities as well as allowing for investigation of seasonal and, potentially, weather effects on time allocation.
The information in the diary may be reported in a respondent’s own words or chosen from a predefined list of activities. The data are more valuable when respondents provide information about the activity in an open-ended format, but it also requires subsequent coding; this is not an issue if activities are predefined. Furthermore, when there is no predefined list of activities, the information is less biased since no research objectives are shown (Durán & Rogero, Reference Durán and Rogero-García2009). Own words reports are simpler to code now than they were previously given the development of harmonized lists of activities and the classification of activities in coding schemes with different levels of detail. Szalai’s early studies (Reference Szalai1966) have been the basis on which activity codes have been modified and adapted. The International Classification of Activities for Time Use Statistics (ICATUS) is another commonly used activity list that has different levels of detail where the most detailed activities together aggregate into a smaller number of second-level activities and the second-level activities aggregate into still a smaller number of first-level activities. An example of first-level detail is: Providing unpaid domestic services for own final use within household is classified in category 06. This category is divided in the following divisions: Working time in providing unpaid domestic services for own final use (061), Travel related to provision of unpaid domestic services (062), and Unpaid domestic services not specified (069). Category 061, for example, is then divided in the following groups: Unpaid domestic services (0611), Shopping (0612). Unpaid domestic services is further split into Food management (06111), Cleaning and upkeep of dwelling and surroundings (06112), Do it yourself decoration, maintenance and small repairs (06113), Care of textiles and footwear (06114), Household management (06115), and Pet care (06116). A nested classification allows for easy context-specific modification to include activities that are common in different contexts. For example, a new activity with code 0613 can be created in the third level if it is relevant in the data collection. A detailed list of the ICATUS can we found in the Statistical Division of United Nations (http://unstats.un.org/UNSD/cr/registry/regcst.asp?Cl=231&Lg=1).
Another source of variation in data collection is whether the information is self-reported or collected by an interviewer. When an interviewer collects the data via direct observation, respondents do not have the opportunity to define the activities they are doing. On the other hand, the behavior of the respondent may be affected by the observer (Durán & Rogero, Reference Durán and Rogero-García2009; Robinson, Reference Robinson and Pentland1999). Time diaries completed by respondents themselves may be filled in throughout the day or completed retrospectively. With retrospective diaries, respondents may have problems remembering some activities, while throughout the day diaries are more burdensome for the respondent, which may produce a higher rate of non-response (Marini & Shelton, Reference Marini and Shelton1993).
Time Diary Data Challenges
The main problem with time diary data collection is the relatively high cost, which increases as more details about time use are collected (Marini & Shelton, Reference Marini and Shelton1993). Other major problems with collecting time diaries include the burden on the respondent and high rates of non-response. Bonke (Reference Bonke2005) estimates a response rate of 49% for surveys collecting information by means of diaries and a response rate of 66% when information is obtained through stylized questions. These problems may lead to concerns about the data because people who are invited to participate but do not respond tend to be busier (Gershuny, Reference Gershuny2000), though other evidence highlights weak integration in one’s community as being more problematic for non-response than busyness (Abraham, Maitland, & Bianchi, Reference Abraham, Maitland and Bianchi2006). The implications of research on non-response suggest that specific types of activities, such as volunteering, may be overestimated because people who volunteer are more likely to participate in surveys (Abraham, Maitland, & Bianchi, Reference Abraham, Maitland and Bianchi2006). Nevertheless, technological advances and the development of apps for data collection have reduced the cost and burden associated with collecting time diary data.
Collecting information about a single day allows one to extrapolate information about the daily dynamics of households, but it is more difficult to know the weekly, monthly, and yearly dynamics (Durán & Rogero, Reference Durán and Rogero-García2009), and there is also a risk that the day is atypical (Robinson, Reference Robinson and Pentland1999; Marini & Shelton, Reference Marini and Shelton1993). This concern is largely addressed with a good sample design that yields representativeness for every day of the week, especially in terms of working days, weekends, and holidays (Marini & Shelton, Reference Marini and Shelton1993). At the population level, typical and atypical days should together constitute a reliable picture about daily time use. Furthermore, in some cases, respondents may indicate if the day of interview was typical or atypical (e.g., holiday, illness), allowing researchers to decide how to handle different types of days.
Open-ended responses in time diary data do mean that coding activities requires careful consideration. For example, should “read the Bible” be coded as a reading activity or a religious activity (Chenu & Lesnard, Reference Chenu and Lesnard2006)? The open-endedness of time diary data also means that the most sensitive activities (e.g., sexual and other biological activities) and illegal activities (e.g., drug use) tend to be excluded from the diaries because individuals do not report them (Robinson, Reference Robinson and Pentland1999). There may also be significant variation in the quantity of content in respondents’ diaries, with some respondents detailing many activities and others reporting fewer (Robinson, Reference Robinson and Pentland1999). Some authors consider secondary activities underreported because of the fatigue or lack of awareness of the respondent (Durán, Reference Durán2010). Time use questionnaires are long and require a higher level of commitment by the respondent to complete the 24 hour diary compared to other surveys. Respondents may complete the main parts of the questionnaire, but skip some parts that they consider less relevant. Researchers argue that multitasking may be underreported because respondents are less apt to report a secondary activity.
Existence of Time Use Data Globally
The large number of countries that conduct time use surveys signifies their importance for scientific research and policy making. Most of the surveys have been fielded by country-specific National Statistical Institutes due to the complexity and cost of fielding the surveys; there are, however, some surveys conducted by other institutions and organizations. Since 1960, 107 countries have conducted time use studies at a national level or for a large geographical area. In eighty-two of those countries, data are collected based on a diary of activities.
The number and periodicity of surveys for each country varies greatly. Figure 10.1 shows which countries have time use surveys and how many. In general, time use studies are more frequent in developed countries; few developing countries have engaged in time use studies, and most of these are not based on diaries of activities. For countries with a long tradition of time use studies, like the United States, the United Kingdom, the Netherlands, and South Korea, there is a long series of periodically collected time diary data. In other countries, time use studies are more recent and the availability of data is limited, but the expectation for most is that the surveys will be repeated in the near future.
In the United States, although an official national survey was not conducted until 2003 when the American Time Use Survey (ATUS) was initiated, time diary data collection was conducted from the 1960s onwards by the Institute of Social Research at the University of Michigan and the Survey Research Center at the University of Maryland, yielding time-use diary data spanning more than fifty years. This is similar in Canada, where the availability of time use studies goes back to the 1960s. In the case of Canada, a subset of the individuals interviewed in the 1970s were also interviewed ten years later, providing longitudinal information about time use, which is rare in time use studies.
Several European countries also have long data series that date back to the 1960s. The most important round of surveys, however, took place around 2000, when fifteen countries carried out a time use survey following the guidelines of the Statistical Agency of the European Commission (EUROSTAT). A second round of surveys took place around 2010, and the plan is to conduct more surveys in the future, although there is not an established periodicity. Among the countries with the longest series of time diary data are the United Kingdom, where time use surveys are available every decade since the mid-1960s, and the Netherlands, where time diary surveys have been carried out every five years since 1975. French data are also available since the 1970s, but surveys based on diaries are more limited in other Mediterranean countries such as Spain, Greece, and Portugal. Among the Nordic countries, Denmark has collected time use diary data since 1964, Norway since 1971, and Finland since 1979. In Sweden the fielding of time-use surveys began in 1982.
Eastern European countries have a long tradition of collecting time use data. Data from this region is rich and spans decades in some countries, notably Bulgaria, Hungary, and Poland. Hungarian data was first collected in 1965 during Szalai’s project, with other surveys have been fielded periodically ever since. In Russia, the collection of time use information goes back to the 1920s; different surveys since then have gathered information, but access to these surveys is complicated. For other Eastern European countries the information is mainly available for more recent decades.
The series of data is also growing for Japan, where time use studies have been done every five years since 1996, although there are older surveys since 1940. In South Korea, the main national broadcaster (KBS) and KOSTAT also conduct time use surveys every five years, with the KBS and KOSTAT surveys two to three years apart. KBS started collecting time use data in the 1980s; the first KOSTAT survey took place in 1999. China, Indonesia, Mongolia, Thailand, and other East and South-East Asian countries have one-off-large scale national time use surveys, with the possibility of collecting future surveys. Moving South, in Australia, time use studies have taken place periodically since the 1970s.
In Latin America, time use surveys based on diaries of activities are uncommon and information on time use is mainly collected by means of stylized questions, sometimes as a part of more general surveys. However, there are some studies based on diaries of activities for specific regions such as Buenos Aires in Argentina, Mexico City in Mexico, and Neuma Aguiar in Brazil. The exception for a national level survey in Latin America is Venezuela, which carried out a time use survey based on diaries in 2011.
Time use studies are also limited in Africa. Few countries have done time use surveys, and they have generally been infrequent. South Africa and Tanzania are the only African countries to have a time series of time diary data available. Surveys were conducted in 2000 and 2010 and 2006 and 2014, respectively. Ghana (2009), Ethiopia (2013), Morocco and Algeria (2012) are some of the countries that have done a time use study based on dairies recently.
Challenges Comparing Time Use Data Worldwide
Despite their prevalence, access to time-use data, comparability across time and place, and complexity of the data are important barriers to conducting time-use research. Some countries, such as Spain and the United States, provide free access to all datasets. This is not the case everywhere, however, and in many cases it is necessary to apply in order to use the data. Moreover, in some countries the access is limited only to researchers in the country, necessitating multi-national collaborations to explore any cross-cultural research questions.
Further, although core methodological similarities exist across time and countries, there are some key differences with significant implications for how the data can be used. For example, variable coding differs among surveys, especially regarding activities, but also regarding background and contextual information of the activities such as location or who is present during the activity. To solve this issue, several efforts have been made to “harmonize” original surveys to get data that are comparable across time and place. Data resulting from the harmonization processes have consistent codes across time and countries. US data from 1965 forward have been harmonized and disseminated by the Centre for Time Use Research (CTUR) as the American Heritage Time Use Study (AHTUS). Similarly, researchers at CTUR have also harmonized data from different countries and created the Multinational Time Use Study (MTUS), which provides time use surveys from more than 60 datasets from 25 countries since the 1960s. The most basic information from the European surveys conducted in the 2000 and 2010 rounds is also available in the Harmonized European Time Use Survey (HETUS) by means of pre defined tables.
The complexity of the data also presents a barrier to their use. To facilitate access to and research using time use data, the IPUMS Time Use project (www.ipums.org/timeuse.html), a collaboration of the Minnesota Population Center at the University of Minnesota, the Maryland Population Research Center at the University of Maryland, and the Centre for Time Use Research at the University of Oxford, provides free individual-level time use data for research purposes. Currently, this project provides access to data from the American Time Use Survey (www.atusdata.org), the AHTUS (www.ahtusdata.org), and the MTUS (www.mtusdata.org). Data are delivered by a web-based data extraction system that makes it easy to create datasets containing diaries of activities for each respondent, aggregated measures of time use from the diary, and personal and household characteristics. Data are free and provided in an easy-to-access format ready for use with the main statistical packages, making them more accessible to a broader audience.
Case Study: Paid Work and Family Time in the United States and Spain
We now turn to a research example utilizing time-use data to explore patterns in paid work and family time in the United States and Spain. Differences in social, cultural, and policy contexts that may contribute to cross-national differences in work and family time will be discussed, in addition to methodological variations in survey design. We also address some data comparability issues between surveys that need to be taken into account.
Data
The American Time Use Survey (ATUS) is the only publicly available annual time diary survey. It began in 2003, and is conducted by the United States Census Bureau and funded by the United States Bureau of Labor Statistics. The sample is comprised of a subset of households that previously participated in the Current Population Survey (CPS). The CPS is a monthly household survey of the civilian, non-institutionalized population. One member per household aged fifteen years or older is selected to complete the time diary by telephone. Respondents report all their activities for one day from 4:00 a.m. to 4:00 a.m., including when the activity starts and ends, if they are providing childcare as a secondary activity, where the activity takes place, and the co-presence of household members and non-members during the activity. For household members present during the activity, additional information, like age and relationship to the respondent, is collected as part of a household roster. For non-household members, relationship of the individual to the respondent is collected as a part of the time diary. In addition to the main questionnaire, different modules have been included periodically to obtain information about topics such as well-being, employee leave, eating, and health. The study discussed here uses the 2010 data, which consists of 13,260 diaries.
The Spanish Time Use Survey (STUS) is conducted by the National Statistical Institute (INE) following the time diary guidelines from EUROSTAT. According to the guidelines, all household members aged ten and older must complete a time diary and report all their activities spanning twenty-four hours. The survey doesn’t have a defined periodicity; so far the INE has carried out two surveys in 2002–2003 and 2009–2010. The information is collected via a self-report diary instrument where each person reports their activities in 10-minute intervals from 6:00 a.m. to 6:00 a.m. The diary collects information about the main activity, when it starts and ends, whether respondents are doing a secondary activity, and the presence during the activity of a partner, father or mother, children of the household under ten years old, other members of the household, and known non-household members. What we report here is based on the 2009–2010 survey containing 19,295 diaries from 9,541 households.
Although both surveys cover a twenty-four hours diary, the starting and ending points are different. The population from which the samples are drawn are also slightly different with information for children aged between ten and fourteen years only available for Spain. For Spain, time use information is available for all household members while for the United States only one member filled a diary. Additional relevant differences in methodology are described where applicable in the examples below.
Paid Work in the United States and Spain
For our analysis of paid work in the United States and Spain, we limit our sample to respondents between fifteen and sixty-four years old whose diary represents a weekday. Figure 10.2 is a tempogram showing the proportion of respondents doing paid work throughout the diary day. The solid lines correspond to all respondents (8,063 respondents for the United States [gray] and 11,540 for Spain [black]), while the dotted lines represent the respondents who reported at least one minute of paid work in the diary (4,132 in the United States [gray] and 5,228 respondents in Spain [black]).
The figure shows that the work day begins and finishes earlier in the United States than in Spain. For those who worked on the diary day, 50% were working at 8:00 a.m. in the United States. Prior to noon, the percentage of people working is always 10 points lower in Spain than in the United States. In Spain, a rate of 50% working is not recorded until forty minutes after the same rate is reached in the United States. In the evening, only 20% of workers in the United States are working at 6:00 p.m., while more than twice the percentage of Spanish workers are still working. The figure also shows different timing and duration for the main breaks in the work schedule. In Spain, the decrease in the proportion of people doing paid work is observed around 2:00 p.m. The break also lasts longer in Spain than in the United States and is more common, representing a relatively constant percentage until 4:00 p.m., when the proportion of people working increases again, though not to pre-break levels.
Lines for overall populations (continuous lines) also show the different unemployment rate in both countries. In Spain, the unemployment rate is much higher than in the United States, and, as a consequence, the proportion of people who don’t work any minutes during the twenty-four hour diary is also higher. In this sense, the proportion of the population doing paid work is higher in the United States during almost the entire day, except a short period at noon and between 5:00 p.m. and 9:00 p.m., when the proportion is slightly larger in Spain. This second period is a consequence of the Spanish labor market that is characterized by work schedules that often involve very long breaks and late finishes (Gracia & Kalmijn, Reference Gracia and Kalmijn2016).
Family Time in the United States and Spain
Utilizing these same datasets, García Román, Flood, and Genadek (Reference García Román, Flood and Genadek2017) examine time with a spouse in the United States, Spain, and France. Here, we expand to examine family time, defined as time spent with a partner and/or children, using the information about the co-presence of others during the activity from the time diary. In general, that type of information is not collected for personal care, sleeping activities, or paid work, so we exclude these activities from consideration for our family time measure. Co-presence refers to the presence of others during the activity and does not necessarily translate into interaction between the respondent and the individuals they report being with at the time. For example, respondents and others could be in the same room but doing different activities. Information on co-presence is also reported in different ways in the two surveys. In the ATUS, respondents list the people who are with them during activities, while in the STUS respondents have to check which relatives or non-relatives are present in their paper diary. Another issue to consider is the measurement of co-presence in the two surveys. In the ATUS, the presence of others means that another person is in the same room as the respondent. In Spain, the condition of being in the same room is not explicit, thus there may be differences in reports of co-presence between the two samples. To correct for this difference between the ATUS and other surveys for time with children, Mullan and Craig (Reference Mullan and Craig2009) propose that researchers consider children as co-present when respondents have a child under their responsibility during the activity (secondary childcare in the ATUS) or when a child is present in the ATUS.
More importantly, in the STUS the presence of children ten and older is captured in the “other household members” category, which means that we cannot distinguish children ten and older from other household members such as siblings or other relatives. For that reason, our analysis is limited to families of adult couples between the ages of twenty and sixty-five with children under ten years old. The final sample contains 1,614 cases for the United States and 1,840 cases for Spain.
Based on these cases, we compute time spent in the following activities: personal care and sleeping, paid work, family time (activities with spouse and children), time with children (activities with children but not partner), time with partner (activities with partner but not children), and time without family (activities neither with partner nor children). Figure 10.3 shows the average minutes per day spent in each type of activity described. The total time for each country sums to 1,440 minutes (or twenty-four hours).
Figure 10.3 Average minutes per day spent in personal care and sleeping, paid work, and activities with spouse and children.
Time spent in personal care (such as sleeping and grooming) and paid work is similar in both countries. The average time spent in personal care is around nine hours while time spent in paid work is around four hours and ten minutes. More differences are observed in the measures of time with a partner and children. The main differences are in time spent with children. In Spain, family time (time with a partner and children together) is more common, and parents spend more than four hours together as a family. In the United States, 37 fewer minutes are spent with family. On the other hand, parents in the United States spend more time alone with children compared to parents in Spain. In the United States, 24 more minutes are spent alone with children per day compared to family time, while in Spain the reverse is true, with much more time being spent as a family (98 more minutes) compared to time alone with children. The sums of the two measures of time with children (family time and time with only children) yield a gap of 48 minutes, with parents in the United States spending 48 more minutes with their children.
Differences between the United States and Spain in time with a partner and not with children are also significant. Spanish parents spend 37 more minutes alone together than US parents. Combining time with a partner only and time as a family, we find that Spanish couples spend almost one hour and 15 minutes more together than parents in the United States.
Figure 10.4 illustrates the timing of the different types of activities analyzed. Percentages represent the activity carried out, and for the activities for which we have information, with whom it is done.
Figure 10.4 Percentage of time in personal care and sleeping, paid work and with family during a day.
Similar to Figure 10.3, time with a partner and children over the course of the day reflects the daily schedule in each country. In the United States, the percentages are quite constant between 8:00 a.m. and 5:00 p.m. with a little bit more than 10% of respondents reporting being with a partner and children and 20% alone with children. After 6:00 p.m. there is a peak in family time, which largely corresponds with dinner time. Time with a partner and children hits a maximum of 45% around 7:00 p.m. The highest proportion of time alone with children occurs slightly earlier, between 5:00 p.m. and 5:30 p.m. The highest proportion of time alone with a partner is observed at 9:30 p.m., when almost 23% of parents report being with their partner without children present.
In the case of Spain, the graph shows two family time peaks, which correspond with main meal times: one for lunch and another for dinner. Meals are more frequently shared with family in Spain, and Spaniards spend more time in this activity. Compared to Spain, time for lunch is less evident in the United States, and the graph only shows a slight decrease of time in paid work. One third of parents are with a partner and children between 2:00 p.m. and 2:30 p.m. The peak for dinner is observed later in Spain than in the United States: at 9:00 p.m., with 50% of respondents reporting being with spouse and children. The highest proportion of time alone with a partner is also observed later in Spain, around 10:30 p.m.
Our results show that time with a partner is much higher in Spain than in the United States. Spanish couples spend more time alone together than couples in the United States, and Spanish couples spend more time with a partner and children than couples in the United States. Paid work constraints explain a small part of the differences in couples’ shared time that we observe between countries. Other differences appear to be rooted more in norms about family life and the cultural rhythms of daily life, such as the importance of meals and how non-work time is allocated.
Conclusion
In this chapter, we described time use surveys as a data source to study global work and family issues. We discussed their main advantages and disadvantages, arguing that time use surveys based on diaries are the most reliable way to collect time use data. We also detailed the existence and accessibility of time use surveys as not homogenous over the world, however, their incidence is increasing and more countries are gathering diary data. Finally, we compared paid work and couples’ shared time in the United States and Spain to illustrate how the data might be used to study work and family.
In our case study, we highlight some comparability issues that need to be addressed for analyses across countries. Time use studies are not limited to the activities carried out; they also provide access to additional information such as the co-presence of others or the place where the activity is done. The collection of information for all household members provides additional value because it allows for investigation of relative time use, for example, between the respondent and a partner, children, or other people in the household. Similarly, when multiple days are collected per respondent, researchers have opportunities to compare time use across days to better understand weekend/weekday differences in time use or differences across days when people work or not.
Time diary data provide many opportunities for research at the intersection of work and family, especially cross-culturally. Given space constraints, we describe only a few of the numerous examples of research that could leverage the rich time diary data available to the scholarly community. Countries like Spain, France, Austria, and the UK have datasets that contain multiple respondents per household, which can facilitate estimating household-level division of labor and improve our understanding of the ways in which couples combine paid work, unpaid housework, and childcare. Data from the Netherlands and the UK, which include multiple time diaries per person, can help illuminate variability in caregiving across days of the week, a key issue in determining burden for caregivers, and how caregivers combine care work and paid work, which is especially relevant given population aging. The availability of data from Eastern Europe, Asia, South Africa, and Latin America allow researchers to conduct comparative studies of the gender revolution in time use.
Time diary data also have many important policy applications. Time diary data allow for a more complete understanding of economic activity through the inclusion of nonmarket work, such as housework, unpaid care work, and volunteer work (National Research Council, 2000). The availability of such data allow for cross-time and cross-country comparisons of productive activities (e.g., Gershuny & Harms, Reference Gershuny and Harms2016; Bridgman et al., Reference Bridgman, Dugan, Lal, Osborne and Villones2012). Related, there is considerable interest in how different tax and employment policies (e.g. tax credits, family leave policies, unemployment benefits) influence individual and household decision making regarding paid work and unpaid care work (National Research Council, 2000). Time diary data are also extremely useful in examining the relationship between time use and the receipt of Supplemental Nutrition Assistance Program (SNAP) benefits in the United States (e.g., Rose, Reference Rose2007; Davis & You, Reference Davis and You2011) as well as knowledge regarding the use of and need for public infrastructure for the purposes of leisure and transportation (National Research Council, 2000).
Data comparability and complexity of the surveys are barriers to working with time use surveys across countries. However, initiatives at the Centre for Time Use Research at the University of Oxford in collaboration with the University of Minnesota and the University of Maryland have partially addressed these issues by providing easier access to the datasets and making them easier to manage.
Studies carried out in English-speaking countries have long dominated the literature on the work–family (WF) interface (Powell, Francesco, & Ling, Reference Powell, Francesco and Ling2009). Recently, however, due to forces like increasing globalization and immigration (Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008), there has been an explosion of WF research from different parts of the world, including many cross-national comparisons. Accordingly, researchers must now pay more attention to the role of language and culture in influencing how people conceptualize and communicate concepts and ideas (Cheung, Leung, & Au, Reference Cheung, Leung and Au2006).
The Shapir–Whorf hypothesis postulates that the structure of a language shapes the thought processes of those who speak it. In this way, language can influence how individuals view their work and family roles and how they experience the WF interface. Similarly, cultural values have been shown to affect the legal, economic, and social structures that pertain to work and family (e.g., employment and family legislation, government policies regarding WF reconciliation) in different national contexts (Bagger & Love, Reference Bagger, Love, Sweet and Casey2010; Ollier-Malaterre & Foucreault, Reference Ollier-Malaterre and Foucreault2016). Cross-national research has indicated that there is much diversity in work and family dynamics in different parts of the world (Korabik, Aycan, & Ayman, Reference Korabik, Aycan and Ayman2017). It is highly likely, therefore, that those from diverse cultural, ethnic, and linguistic backgrounds will have differing perspectives that might affect their understanding and interpretation of the WF interface and the questions on surveys pertaining to it (Davidov, Meuleman, Cieciuch, Schmidt, & Billiet, Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014; Trimble, Reference Trimble and Clauss-Ehlers2007).
These different perspectives necessitate careful attention to issues related to scale translation and measurement equivalence/invariance (ME/I) in order to assure the validity of results when making cross-cultural comparisons from survey data. Proper scale translation is necessary to assure that versions of a survey in different languages are linguistically and semantically equivalent. However, because it does not provide any statistical evidence about whether the different versions have comparable psychometric properties, scale translation alone is not sufficient to establish cross-cultural validity (Epstein, Santo, & Guillemin, Reference Epstein, Santo and Guillemin2015). ME/I analysis is needed to assure that results obtained in different national contexts are due to actual cultural differences instead of due to measurement and scaling artifacts. The purpose of this chapter is to discuss best practices in scale translation and ME/I. Throughout the chapter we draw upon examples from Phase Three of Project 3535, a multinational research project on the WF interface, to illustrate our points.
Scale Translation
Assuring that a measure constructed in one culture can be validly used in another culture is a very complicated process. The first issue that must be addressed is whether translation is even necessary and, if so, into how many different languages and/or dialects given that many countries are multicultural and multilingual. Moreover, the degree of fluency with a language may vary widely from person to person within a country. Some individuals may not speak the principal language at all, whereas others may be fluent in more than one language (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). It may be tempting to forego translation when respondents understand the language in which the survey was originally constructed in addition to their native language. However, the gains made in terms of equivalent wording may be outweighed by the fact that respondents may be less familiar with and less able to express themselves articulately in their non-native language (Schaffer & Riordan, Reference Schaffer and Riordan2003).
Forward Translation
Early work in the field of scale translation concentrated on establishing linguistic equivalence (Trimble, Reference Trimble and Clauss-Ehlers2007). This involved forward translation, or the translation of a measure from a source language into a target language, usually by a single translator. This method has been shown to produce unreliable results, because the results are very dependent on the individual who does the translation and are frequently not able to be replicated by other translators (Hambleton & Patsula, Reference Hambleton and Patsula1998).
Additionally, a focus on simple linguistic accuracy can be problematic. It ignores the fact that languages have many idioms and colloquialisms (e.g., glass ceiling) that are nonsensical if translated verbatim. Words or items can also mean very different things or have no equivalent meaning in different cultural contexts (Epstein et al., Reference Epstein, Santo and Guillemin2015). For example, the word family may invoke different meanings to those in a nuclear versus extended family living situations. Further, certain constructs (e.g., “guanxi” in China) may not even be meaningful in another culture. Therefore, the cultural relevance of all concepts should be assessed prior to undertaking a study or constructing a survey (Hambleton & Patsula, Reference Hambleton and Patsula1998). For these reasons, many scholars recommend moving beyond an emphasis on linguistic equivalence to a more comprehensive process, scale adaptation, that also includes assuring semantic, conceptual, content, and measurement equivalence (Epstein et al., Reference Epstein, Santo and Guillemin2015; Hambleton & Patsula, Reference Hambleton and Patsula1998). We will now turn to the extra steps necessary to achieve these goals.
Back Translation
Due to the problems inherent in the use of forward translation alone, it has been recommended that it be supplemented by back translation. In the typical forward–back translation procedure, a bilingual person translates the survey questions from the source language to the target language and a second bilingual person independently translates them back to the original language. The differences between the original source language version and the back-translated target language version are then reconciled (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). Most often, those responsible for the translation-back translation are also those responsible for doing the reconciliation.
In its most comprehensive form, reconciliation would include an examination of: (1) semantic equivalence (is there correspondence in the meaning of the words?), (2) idiomatic equivalence (have idioms and colloquialisms been dealt with properly?), (3) experiential equivalence (is the situation depicted similarly?), and (4) conceptual equivalence (is the overall meaning the same?) (Hambleton & Patsula, Reference Hambleton and Patsula1998). Based on the results, discrepancies would be identified and improvements (e.g., rewording or eliminating problematic items) made. This may result in an iterative process that consists of additional rounds of translation, back translation, and reconciliation. Epstein et al.’s (Reference Epstein, Santo and Guillemin2015) review of the methodological strategies used for cross-cultural adaptation of scales indicated that, although back translation was the most commonly used procedure in the literature, there was a lack of standardization in how it was applied. For example, there was considerable variation in the number of translators and back translators used, their characteristics and training, the number of rounds of translation-back translation undertaken, who was involved in the reconciliation process, the procedures that were used to identify and resolve discrepancies, and to what extent all of the necessary steps were followed (Epstein et al., Reference Epstein, Santo and Guillemin2015).
In terms of best practices, however, there is general agreement that translators should be bicultural (i.e., have an in-depth familiarity with both cultures) as well as fluently bilingual (Epstein et al., Reference Epstein, Santo and Guillemin2015; Hambleton & Patsula, Reference Hambleton and Patsula1998). This is because someone can be proficient in a language without ever having spent time in a culture where it is spoken. In addition, translators should be knowledgeable about the topic and content area of the survey (Epstein et al., Reference Epstein, Santo and Guillemin2015), have some background in test construction and item writing (Hambleton & Patsula, Reference Hambleton and Patsula1998), and be familiar with the research setting and context (Schaffer & Riordan, Reference Schaffer and Riordan2003). Since it may be difficult to find all of these qualities in two individuals, it is recommended that the translation- back translation-reconciliation process be carried out by a team of multinational experts (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014; Hambleton & Patsula, Reference Hambleton and Patsula1998). This also has the advantage of allowing discussions among those with diverse perspectives to take place.
Pilot Testing
An accurate back translation is insufficient in and of itself and, in most cases, does not assure that all semantic and conceptual inconsistencies have been dealt with, nor does it guarantee the validity of a survey (Hambleton & Patsula, Reference Hambleton and Patsula1998). Because of this, as a best practice a survey also needs be pilot tested (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014; Epstein et al., Reference Epstein, Santo and Guillemin2015; Hambleton & Patsula, Reference Hambleton and Patsula1998; Schaffer & Riordan, Reference Schaffer and Riordan2003). Pilot testing is important because it can identify issues that might affect the reliability and validity of the translated survey, such as problems with item clarity, ambiguity, phrasing, and completeness of concept coverage (Schaffer & Riordan, Reference Schaffer and Riordan2003; van de Vijver & Leung, Reference van de Vijver and Leung1997). Pilot studies are also useful for making sure that culturally unique (i.e., emic) items or constructs are not overlooked (Schaffer & Riordan, Reference Schaffer and Riordan2003).
Ideally, at least two rounds of pilot testing should take place. In the first, the content of the post-reconciliation prototype survey is reviewed by a few individuals similar to those in the target population using either focus groups or cognitive debriefing (Epstein et al., Reference Epstein, Santo and Guillemin2015). In contrast to focus groups, which are a group interviewing technique, cognitive debriefing consists of conducting in-depth one-on-one semi-structured interviews. Based on the results of the first pilot test, the prototype survey may need to undergo another round of revision. Following this, a second pilot test would take place where a small sample of those from the target population would complete the revised version of the translated prototype survey. Preliminary psychometric tests, which might include examining internal consistency reliability, test-retest reliability (stability), homogeneity of variance, construct validity (convergent and/or discriminant), and criterion-related validity (concurrent and/or predictive), would then be conducted on the results.
Scale Translation in Project 3535
Project 3535 is a multinational study of the WF interface (Korabik et al., Reference Korabik, Aycan and Ayman2017). It consisted of three phases, involving the collection of social policy, focus group, and survey data, respectively. The participants for the Phase Three survey were 2,830 organizationally employed, married/cohabiting parents from ten countries: Australia, Canada, China, India, Indonesia, Israel, Spain, Taiwan, Turkey, and the United States. The average number of participants per country was 283 (range 150–561). Sampling was drawn from individuals in the dominant subculture with the exceptions of Israel, where both Jewish and Arabic participants were sampled, and Canada, where both Anglophones and Francophones were sampled. A standardized sampling framework designed to recruit a large, heterogeneous sample with representation by both men and women, and managers and non-managers was utilized. This resulted in a sample composed of 45% men and 40% managers.
The Project 3535 survey consisted of an English version that was used in Australia, Canada (Anglophone subsample), and the United States. The English version was translated into seven other languages (i.e., Chinese, French Canadian, Hebrew, Hindi, Bahasa Indonesian, Spanish, and Turkish) for use in the remaining countries. The survey included a wide range of measures including those assessing core WF interface constructs (i.e., WF conflict, positive spillover, and guilt) and their antecedents and consequences in the work and family domains. Measures of cultural values and moderating and mediating variables were also included.
Project 3535 embodied several of the best practices recommended for translation and adaptation of surveys across cultures (Korabik et al., Reference Korabik, Aycan and Ayman2017). These included: (1) the use of expert bilingual and bicultural indigenous translators, (2) forward and back translation, (3) multiple rounds of review and reconciliation by a multinational team, (4) multiple rounds of pilot testing, and (5) extensive documentation of the procedures followed (Ekrut et al., Reference Ekrut, Alarcon, Coll, Tropp and Garcia1999; Hambleton & Patsula, Reference Hambleton and Patsula1998; Schaffer & Riordan, Reference Schaffer and Riordan2003). Because most of the measures included in the survey had originally been developed and validated in English in the United States, members of our team of fourteen indigenous researchers reviewed whether they could be meaningfully used in each culture, as well as whether any items needed to be added to them for content validity. For example, we chose to use items from three different existing measures of gender-role attitudes to assure that we had a full range of coverage of the types of attitudes that would characterize societies with very traditional as well as those with very egalitarian views about women’s roles. In addition, focus groups were conducted to identify emic issues and additional items and constructs (e.g., work–family guilt) were incorporated based on the results. The first pilot test was conducted using cognitive debriefing after which the prototype survey was revised and pilot tested again with a larger sample size.
What makes Project 3535 unique, however, is that it also exemplified best practices in terms of the team processes that were adopted (Aycan, Reference Aycan, Korabik, Aycan and Ayman2017; see also Chapter 9 of this handbook, Spector & Sanchez). Many cross-cultural researchers have noted shortcomings in the typical translation/scale adaptation process as described above (see Ekrut et al., Reference Ekrut, Alarcon, Coll, Tropp and Garcia1999). They point out that multinational research teams are usually controlled by a few individuals from developed (usually Western) countries, with those from other cultures acting primarily as data collectors (Aycan, Reference Aycan, Korabik, Aycan and Ayman2017). This results in a predominance of Western ideas, methods, and measures that can result in problems with validity by producing translations that are culturally irrelevant and insensitive (Ekrut et al., Reference Ekrut, Alarcon, Coll, Tropp and Garcia1999). Disciplinary bias is another problem that may arise when forming a team. This is a particular issue in the WF field where often researchers who have been trained in industrial-organizational psychology or business management have limited understanding of family science (Jaskiewicz, Combs, Shanine, & Kacmar, Reference Jaskiewicz, Combs, Shanine and Kacmar2017) and vice versa. Project 3535 instituted several processes to guard against these problems. Our team was composed of researchers deliberately chosen to represent a wide variety of disciplines (e.g., cross-cultural and industrial-organizational psychology, management, family science, health). During all stages of the project, the team members functioned as full and equal partners. The team worked together to collaboratively develop a culturally sensitive research plan (Powell et al., Reference Powell, Francesco and Ling2009) and everyone was involved in all aspects of the research process, including deciding on the research questions and constructs, developing the survey items, and delineating implementation procedures (Aycan, Reference Aycan, Korabik, Aycan and Ayman2017). At each stage, the team members discussed the results and offered suggestions for improvements. This, along with consensus-based decision making, guarded against any one cultural or disciplinary view being predominant (Aycan, Reference Aycan, Korabik, Aycan and Ayman2017).
In summary, it is worth expending the time and effort necessary to assure linguistic, semantic, and conceptual equivalence during the survey translation/adaptation phase. As well, it is essential to formulate a culturally sensitive research plan (Powell et al., Reference Powell, Francesco and Ling2009) and ensure team dynamics that will facilitate its implementation (Aycan, Reference Aycan, Korabik, Aycan and Ayman2017). This emphasis on early prevention will decrease the likelihood of discovering after data have been collected that one’s measures are not comparable across cultures and that few options remain as to what can be done about it (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). In addition, before finalizing the items or measures that will be included on a survey, it is advisable to check the statistical requirements for carrying out ME/I analyses. For example, a ME/I done using a CFA approach usually requires a minimum of three items per scale/subscale with items measured at the ordinal, interval, or ratio level. After the data have been collected using a cross-cultural survey, it becomes necessary to assess ME/I before making cross-cultural comparisons. We turn now to describing this process.
Measurement Equivalence/Invariance
Measurement equivalence/invariance (ME/I) refers to the comparability of measures across different groups or time periods (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014; Vandenberg & Morelli, Reference Vandenberg, Morelli and Meyer2016). According to Vandenberg and Lance (Reference Vandenberg and Lance2000), “if one set of measures means one thing to one group and something different to another group, a group mean comparison may be tantamount to comparing apples and spark plugs” (p. 9). Thus, ME/I aims to assess whether those who differ in some way (e.g., demographics, personality, or culture) are using the same conceptual framework and ascribing the same meaning when responding to survey items (Cheung & Rensvold, Reference Cheung and Rensvold2002; Milfont & Fischer, Reference Milfont and Fischer2010; Vandenberg, Reference Vandenberg2002). A measure is said to be invariant if the relationship between a latent variable and an observed variable is equivalent across different groups. For example, ME/I for gender would exist if there was a correspondence between men’s and women’s scores on the latent construct being assessed and their observed or “true” scores at the item and/or subscale level (Raju, Laffitte, & Byrne, Reference Raju, Laffitte and Byrne2002; Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008).
ME/I can be applied to the investigation of any between-group differences that characterize one’s sample. For example, ME/I analyses conducted on different modes of survey administration have found invariance in multinational samples, permitting data from paper-and-pencil and web-based surveys to be aggregated for purposes of analysis (Cole, Bedeian, & Feild, Reference Cole, Bedeian and Feild2006; De Beuckelaer & Lievens, Reference De Beuckelaer and Lievens2009). Groups may differ on a wide variety of characteristics (e.g., gender, age, race/ethnicity), however, so one must examine only those deemed most relevant to one’s research purpose as looking at too many becomes unwieldy. In this chapter we will confine ourselves to discussing ME/I for culture and gender as these are the most relevant to cross-cultural WF research.
ME/I is particularly important for cross-cultural research because, if a measure is not operating similarly across groups, comparisons between cultures may be meaningless. In the absence of ME/I, the interpretation of between-group differences is fraught with ambiguity because one cannot definitively tell if those differences are due to actual differences on the construct of interest or, rather, to scaling artifacts or psychometric differences in how people respond to survey items (Byrne & Stewart, Reference Byrne and Stewart2006; G. W. Cheung & Rensvold, Reference Cheung and Rensvold2002; Davidov, Dulmer, Schluter, Schmidt, & Meuleman, Reference Davidov, Dulmer, Schluter, Schmidt and Meuleman2012; Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008). For example, those in one culture may be more predisposed to respond to survey items using certain response sets (e.g., leniency) or response styles (e.g., social desirability) than those in another culture (Byrne & Watkins, Reference Byrne and Watkins2003). Because of this, ME/I is a prerequisite for carrying out cross-national comparisons (Milfont & Fischer, Reference Milfont and Fischer2010; Oreg et al., Reference Oreg, Bayazit, Vakola, Arciniega, Armenakis, Barkauskiene and van Dam2008); yet, despite this recognition, few cross-cultural studies have incorporated ME/I analyses (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014).
Methods for Establishing Measurement Equivalence/Invariance
The two approaches most often used to establish ME/I are confirmatory factor analysis (CFA) and item response theory (IRT) (Meade & Lautenschlager, Reference Meade and Lautenschlager2004; Raju et al., Reference Raju, Laffitte and Byrne2002; Schaffer & Riordan, Reference Schaffer and Riordan2003; Tay, Meade, & Cao, Reference Tay, Meade and Cao2015). The CFA approach is based on multigroup structural equation modelling (SEM), whereas IRT is based on a differential item functioning (DIF) approach. Each of these methods provides different ME/I information and each gives only an incomplete picture (Meade & Lautenschlager, Reference Meade and Lautenschlager2004). CFA is linear, whereas IRT is not, allowing it to be used with nominal and dichotomously scored variables (Meade & Lautenschlager, Reference Meade and Lautenschlager2004; Raju et al., Reference Raju, Laffitte and Byrne2002; Tay et al., Reference Tay, Meade and Cao2015). CFA consists of a series of sequential tests that allow one to specify the type of invariance that exists. By contrast, IRT estimates item parameters simultaneously, focuses solely on item equivalence, and does not allow one to test the invariance of uniqueness terms and factor covariances (Vandenberg & Morelli, Reference Vandenberg, Morelli and Meyer2016). Because CFA provides information about the relationships among latent factors, it is preferable when one’s purpose is to examine the equivalence of a multifactorial structure. By contrast, IRT may be a better choice for those focused on whether the responses to items on a specific scale are equivalent (Meade & Lautenschlager, Reference Meade and Lautenschlager2004). Meade and Lautenschlager (Reference Meade and Lautenschlager2004) suggest that whenever possible it is best to conduct multiple ME/I tests using both IRT and CFA methods.
In the cross-cultural literature, CFA analyses are much more commonly used than those based on IRT. This may be because researchers are generally more familiar with CFA methodology, and it is less cumbersome and more user-friendly (Tay et al., Reference Tay, Meade and Cao2015). In addition, IRT analyses generally require larger sample sizes and more items per scale and can be unmanageable when comparing several groups (Meade & Lautenschlager, Reference Meade and Lautenschlager2004; Tay et al., Reference Tay, Meade and Cao2015). By contrast, CFA can easily handle multiple latent constructs and multiple groups (Raju et al., Reference Raju, Laffitte and Byrne2002). The remainder of this chapter focuses on CFA, and uses Project 3535 as a working example to illustrate application of CFA for ME/I purposes. We refer the reader to Tay et al. (Reference Tay, Meade and Cao2015) for details about the IRT approach.
The CFA Approach to Establishing ME/I
Multiple group CFA using SEM is the most frequently used method for testing ME/I across culturally diverse groups (Davidoff et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). It consists of a series of sequential hierarchically ordered, multigroup model tests beginning with a baseline model and adding increasingly restrictive nested constraints (Kline, Reference Kline2011; Steenkamp & Baumgartner, Reference Steenkamp and Baumgartner1998). Prior to testing for ME/I, the measurement model should be tested individually for each group (e.g., culture) separately using CFA procedures (Byrne & Stewart, Reference Byrne and Stewart2006) to ensure adequate fit based on typical goodness-of-fit indices (see discussion below). If the model fit is not adequate for each group, exploratory work may be required before moving on to testing for ME/I. This exploratory work might involve dropping items from the latent indicators or correlating error covariances for items that are closely related to one another (based on a strong theoretical rationale) (see Byrne, Reference Byrne2001, pp. 91–93, for a discussion of exploratory CFA work). The resulting group-specific baseline model should be determined based on considerations of parsimony and conceptual meaning (Byrne, Reference Byrne2012). Once established, this baseline model is utilized for multigroup ME/I testing.
Many different types of ME/I tests can be carried out. Vandenberg and Morelli (Reference Vandenberg, Morelli and Meyer2016) state that the minimal requirements are a test for configural invariance (equivalence of factor structure) followed by a test for metric or weak invariance (equivalence of factor loadings). The number and type of other tests that are performed will depend upon their relevance to the purpose at hand (Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008; Vandenberg & Morelli, Reference Vandenberg, Morelli and Meyer2016). We discuss three tests here as these are sufficient for addressing most substantive research questions (Byrne & Stewart, Reference Byrne and Stewart2006). In addition to configural invariance and metric or weak invariance we also include an example pertaining to a test of scalar or strong invariance (equivalence of intercepts) (Byrne, Reference Byrne2012; Kline, Reference Kline2011; Vandenberg & Lance, Reference Vandenberg and Lance2000). Stricter forms of ME/I can also be established, including residual invariance (equivalence of measurement error), but these are often difficult to achieve and rarely necessary (Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008). At each level of ME/I testing not only must the model being tested fit the data well, but its fit must not be significantly worse than that of the preceding model.
Configural invariance. Establishing configural invariance means that those from different groups conceptualize a construct in the same way and utilize the same conceptual framework when responding to a measure (Milfont & Fischer, Reference Milfont and Fischer2010). For configural invariance to be supported, the same items must load on the same latent variables across groups (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). The data from each group (e.g., country) must have the same factor structure, with the same number of factors, same pattern of factor loadings, and same items associated with each factor (Milfont & Fischer, Reference Milfont and Fischer2010; Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008).
In testing the configural model, all model parameters are allowed to freely estimate. In other words, configural invariance tests an unconstrained factor structure to determine whether the same factor pattern is valid in all groups. In this way, the configural model is a multigroup representation of the baseline model that allows invariance tests to be conducted concurrently across several groups, and provides fit values against which subsequent invariance models can be compared (Byrne & Stewart, Reference Byrne and Stewart2006).
Byrne and Watkins (Reference Byrne and Watkins2003) point out that, despite what many researchers think, finding configural invariance alone does not allow one to conclude that cross-cultural ME/I exists. For example, configural invariance does not guarantee that respondents from different cultures are using the same measurement scale in the same way (Steenkamp & Baumgartner, Reference Steenkamp and Baumgartner1998). However, establishing configural invariance is necessary before one can proceed with more stringent tests assessing further aspects of ME/I, as these are nested within the configural model.
Metric invariance. Once configural invariance is established, the next step is to constrain the factor loadings across the groups to test for metric invariance. Metric invariance indicates that the magnitude of all factor loadings (or weights) is equivalent across groups. This means that those items are perceived and interpreted in the same way across groups (Milfont & Fischer, Reference Milfont and Fischer2010). Metric invariance holds if goodness-of-fit is adequate and the fit of the constrained metric model is not significantly worse than that of the configural model. Establishing metric equivalence allows the comparison of difference scores (i.e., mean-corrected scores like unstandardized regression coefficients and covariances) across cultures (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). It is necessary, therefore, to establish metric equivalence before the data from different countries can be combined for analyses using correlation-based statistics (e.g., in SEM models).
Scalar invariance. The next step is to establish scalar invariance by additionally constraining the intercepts to be equal across groups (Byrne & Stewart, Reference Byrne and Stewart2006). Examination of this level of invariance is necessary only for those interested in comparing mean group differences (Schmitt & Kuljanin, Reference Schmitt and Kuljanin2008; Vandenberg & Lance, Reference Vandenberg and Lance2000). In other words, before conclusions about cross-cultural differences in observed (latent) group means can be drawn, cross-cultural equivalence in intercepts (scalar invariance) must first be established (Steenkamp & Baumgartner, Reference Steenkamp and Baumgartner1998). Despite this, scalar invariance is seldom achieved (Davidov et al., Reference Davidov, Dulmer, Schluter, Schmidt and Meuleman2012).
Scalar or intercept invariance implies that measurement scales have the same operational definition across groups (i.e., have the same intervals or units of measurement and origins or zero points) (Byrne & Stewart, Reference Byrne and Stewart2006; Cheung & Rensvold, Reference Cheung and Rensvold2002). Failure to find scalar invariance may indicate that those from different groups are using a scale in different ways. For example, it may reflect the presence of systematic response biases (like leniency) or response threshold differences between groups.
Goodness-of-fit and comparative fit. To determine whether invariance exists, one must examine the goodness-of-fit statistics for the relevant model (i.e., configural, metric, or scalar, respectively). A number of fit statistics, all indicating how well a proposed model fits the data, are usually reviewed including the chi-square, comparative fit index (CFI), goodness-of-fit index (GFI), and Tucker-Lewis Index (TLI). The chi-square should ideally be nonsignificant, but is very often significant with large sample sizes such as those found in most cross-cultural research, often making it not the best indicator of model fit. Although there are no hard and fast rules, it is suggested that the CFI, GFI and TLI should be >.9 (with values closer to.99 being better). The root mean square error of approximation (RMSEA) should ideally be <.05; however, values up to.08 are acceptable (Kline, Reference Kline2011).
In addition, when moving from configural to metric or from metric to scalar invariance, the fit of the subsequent model must not be significantly worse than that of the preceding model. Traditionally, this has been assessed by the Likelihood Ratio Test (the differences in χ2 between the two nested models) (Vandenberg & Lance, Reference Vandenberg and Lance2000). Thus, one would calculate the difference in chi-square (Δχ²) between the fit of the configural and metric models and if it is minimal, there would be evidence of metric invariance. Similarly, one would calculate the Δχ² between the fit of the metric and scalar models and if it is minimal, there would be evidence of scalar invariance.
Researchers have demonstrated, however, that Δχ² is dependent on sample size in the same way that the chi-square fit statistic is (Cheung & Rensvold, Reference Cheung and Rensvold2002). As noted above, cross-cultural research deals with very large sample sizes, so the Δχ² statistic is nearly always significant in cross-cultural studies, making it an unrealistic criterion for establishing ME/I (Byrne & Stewart, Reference Byrne and Stewart2006). Because of this challenge, alternatives have been proposed. One of the most accepted is the ΔCFI where a value of ≤.01 indicates that invariance exists, a value ≥.02 indicates a lack of invariance, and values between .01 and .02 are indicative of some differences (Cheung & Rensvold, Reference Cheung and Rensvold2002; Oreg et al., Reference Oreg, Bayazit, Vakola, Arciniega, Armenakis, Barkauskiene and van Dam2008). Cheung and Rensvold (Reference Cheung and Rensvold2002) also suggest using other criteria, including ΔGamma hat <.−001 and ΔMcDonald’s Noncentrality Index <−.02.
Partial invariance. As noted above, ME/I needs to be established at each stage (i.e., configural, metric, and scalar) before one can move to the next stage. What happens when the ME/I of a particular model cannot be demonstrated? For example, when testing for configural invariance, the baseline models may not be completely identical across groups. The best-fitting model for one group may include an error covariance or a cross-loading not present in another group (Byrne & Stewart, Reference Byrne and Stewart2006). In such cases one can investigate whether partial ME/I exists by relaxing some constraints (Steenkamp & Baumgartner, Reference Steenkamp and Baumgartner1998); to determine which constraints, the focus should be on determining the particular indicator(s) responsible for the noninvariance (Cheung & Rensvold, Reference Cheung and Rensvold2002). Vandenberg and Lance (Reference Vandenberg and Lance2000) propose that this be done only when strong theoretical justification exists for doing so and when it is confined to a minority of the indicators so as to limit its impact. Thus, depending on the level at which the noninvariance was found, one can relax some invariant error covariances, factor loadings, or intercepts, respectively in an attempt to attain partial invariance.
Strategies for dealing with nonequivalence. A number of strategies have been suggested for dealing with situations in which there is a failure to obtain even partial ME/I. One of these is to drop noninvariant items; however, one must make sure the meaning of the concept has not changed as a result (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014). Another stringent approach is to only compare the subset (or subsets) of countries, identified through cluster analysis, in which ME/I holds. This may, however, severely reduce the number of groups included in the study (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014) – a problem for researchers interested in comparing an already small number of cultures/countries. A third strategy is to identify items that function differently because of their relationships with certain individual difference variables (e.g., age) and to control this variance by regressing these items on those individual difference variables (Davidov et al., Reference Davidov, Meuleman, Cieciuch, Schmidt and Billiet2014).
Programs to Run Measurement Invariance Tests
While there are many statistical programs that can be utilized for ME/I (any programs that can run multigroup CFA), AMOS and Mplus are two commonly used programs. AMOS (Analysis of Moment Structures; Arbuckle, Reference Arbuckle2014) is a user-friendly SPSS add-on program allowing users to draw graphical models and attach and run SEM analyses. Due to its graphical nature, AMOS can lack the ability to be flexible as the user has limited control over the analyses (since they are integrated within the graphical configuration). Moreover, when using a data set with missing data, modification indices cannot be generated in AMOS. Mplus (Muthén & Muthén, Reference Muthén and Muthén1998–2011) is a syntax-based program and has greater flexibility than AMOS in that it is able to run mixed models (using both categorical and continuous data), it deals well with missing data and will still generate modification indices, and it allows for Maximum Likelihood Robust (MLR) estimation, a method of estimation that deals well with data that is not normally distributed.
Examples of ME/I Testing from Project 3535
We present two examples of ME/I tests using the data from Project 3535, one relating to cultural invariance the other relating to gender invariance. First, we present configural and metric ME/I analyses across all ten countries (Australia, Canada, China, India, Indonesia, Israel, Spain, Taiwan, Turkey, and the United States) for three WF interface constructs (i.e., WF conflict, positive spillover, and guilt). Then, we provide an example of scalar and partial invariance for gender using the work interference with family subscale of the WF conflict measure with data only from Canada and the United States.
Measures. WF conflict was assessed with Carlson, Kacmar, and Williams’ (Reference Carlson, Kacmar and Williams2000) measure (see Korabik et al., Reference Korabik, Aycan and Ayman2017, for detailed descriptions of all measures). Four subscales were used from this measure: time- and strain-based work interference with family conflict (WIF) and time- and strain-based family interference with work conflict (FIW). The measure was modeled with the WIF and FIW subfactors each having time- and strain-based subcomponents (see Figures 11.1 and 11.2, respectively).WF positive spillover was assessed with Grzywacz and Mark’s (Reference Grzywacz and Marks2000) scale which was modeled as having two three-item subfactors: work-to-family and family-to-work spillover (see Figure 11.3). WF guilt was assessed with a seven item WF guilt scale and was modeled as having two subfactors: work interference with family and family interference with work guilt (see Figure 11.4). Preliminary model testing confirmed that the hypothesized factor structures portrayed in Figures 11.1–11.4 were a good fit to the data from each separate country.
Figure 11.1 Factor structure of work interference with family conflict.
Figure 11.2 Factor structure of family interference with work conflict.
Figure 11.3 Factor structure of positive spillover.
Figure 11.4 Factor structure of work–family guilt.
Analytic strategy. Each of the measurement models was tested for invariance across the respective groups (ten countries; two genders). The multigroup CFAs were conducted using MLR with Mplus (Version 6.11; Muthén & Muthén, Reference Muthén and Muthén1998–2011). To assess the fit of the individual models, multiple goodness-of-fit indices were reviewed, including the χ2, RMSEA, and CFI. Because the ME/I analysis for culture was more complex due to the large sample size and the inclusion of data from ten countries, the ΔCFI was used to compare the fit of the different models to one another. By contrast, for the ME/I analysis for gender, the differences between the different models were assessed using the Sartorra-Bentler Scaled Chi-Square difference test (Δ SBχ2).
Results for cultural invariance of work–family interface measures. As can be seen from Table 11.1, the CFI values were ≥.96 and the RMSEAs were ≤.03 for all configural models. This indicated that the factor structures illustrated in Figures 1–4 were equivalent across all ten countries in that the number of factors and the items associated with each factor were the same. From this we can conclude that those in each of our ten countries are using the same conceptual framework when responding to the three measures and that the constructs measured have equivalent meaning across cultures.
Table 11.1 Measurement invariance tests for culture of WF interface measures
| Models | χ2 (df) | RMSEA | CFI | Model Comparison | Δ CFI |
|---|---|---|---|---|---|
| WIF Conflict | |||||
| 1. Configural | 335.8 (80) | .03 | .96 | ||
| 2. Metric | 424.8 (116) | .03 | .96 | 2 vs. 1 | .007 |
| FIW Conflict | |||||
| 1. Configural | 296.3 (80) | .03 | .97 | ||
| 2. Metric | 414.3 (116) | .03 | .96 | 2 vs. 1 | .01 |
| Positive Spillover | |||||
| 1. Configural | 142.8 (70) | .02 | .98 | ||
| 2. Metric | 211.6 (106) | .02 | .97 | 2 vs. 1 | .01 |
| WF Guilt | |||||
| 1. Configural | 474.3 (130) | .03 | .96 | ||
| 2. Metric | 623.1 (175) | .03 | .95 | 2 vs. 1 | .01 |
Moreover, not only were CFI values ≥.95 and RMSEAs ≤.03 for all metric models, but all ΔCFIs were ≤.01. This indicated that the factor weights and loadings for all the paths indicated in Figures 11.1–11.4 were equivalent across all ten countries and that the scale items were perceived and interpreted in the same way in each country. Because cross-cultural ME/I was established at the metric level for all three WF interface measures, the combined data from the ten countries could be used for purposes of correlation-based analyses (e.g., to test SEM models) with these measures. Because the primary purpose of Project 3535 was to test cross-cultural hypotheses using SEM and hierarchical linear modeling, configural, and metric level invariance were sufficient for our purposes and scalar level invariance tests were not carried out on most constructs. However, researchers who are interested in determining whether between-group mean differences exist will need to also examine scalar level invariance. We now present an example of this which also illustrates the issue of partial invariance.
Results for gender invariance of WIF conflict. This analysis was done only for the WIF subscale of the WF conflict measure. The configural CFI and RMSEA values were excellent and the χ2 for the model was not significant demonstrating fit of the model (see Table 11.2). This confirmed that the factor structure was the same for both genders with items 1, 4, and 10 loading on time-based WIF conflict and items 3, 5, and 11 loading on strain-based WIF conflict (see Figure 11.1). From this we can conclude that the meaning of WIF conflict is the same for men and women (i.e., that men and women were using the same cognitive frame of reference; Vandenberg & Morelli, Reference Vandenberg, Morelli and Meyer2016). When constraints were added to the factor loadings to test for metric invariance, the CFI and RMSEA fit indices remained excellent and the Δ SBχ2 was not significant. Therefore, as Table 11.2 illustrates, metric level ME/I for gender was established for WIF conflict. This means that the factor weights and loadings were the same for both genders and that men and women interpreted each of the subscale items in the same way.
Table 11.2 Measurement invariance tests for gender for work interference with family conflict
| Models | χ2 (df) | RMSEA | CFI | Model Comparison | Δ SBχ2 (df) |
|---|---|---|---|---|---|
| 1. Configural | 25.88 (14) | .06 | .99 | ||
| 2. Metric | 31.19 (18) | .06 | .99 | 2 vs. 1 | −4.72 (4) |
| 3. Scalar | 46.33 (22) | .07 | .98 | 3 vs. 2 | −16.40 (4)* |
| 4. Partial Scalar | 34.09 (21) | .05 | .99 | 4 vs. 2 | −2.59 (3) |
* p <.05, ** p <.01
Next, constraints were added to the item intercepts to test for scalar invariance; however, scalar invariance was not demonstrated (as per the significant Δ SBχ2) for the measure. Yet, due to indications from modification indices that the difference was related to a single item intercept, the freeing up of that particular intercept allowed for partial scalar invariance to be demonstrated. This provided good evidence for the equality of factor loadings and the majority of item intercepts for these measurement models across gender (Byrne, Reference Byrne2012). The item that was not invariant (and freed up for the analysis) was: “I have to miss family activities due to the amount of time I must spend on work responsibilities.” The item remained in the analysis with the constraint for that item released, which allowed the item to freely estimate in the analysis. Once the constraint for that item was released, the change in model fit from the metric model to this partial scalar model was no longer significant and retained excellent RMSEA and CFI values.
Based on the above analysis, the partial scalar invariance of the WIF subscale was established for gender. This means that men and women responded to the WIF subscale in an equivalent way and that the units of measurement, scale intervals, and origins or zero points were the same for both genders. This result allows one to conclude that men and women had similar response styles (e.g., response biases or thresholds). It demonstrates that “individuals who have the same score on the latent construct would obtain the same score on the observed variable regardless of their group membership” (Milfont & Fischer, Reference Milfont and Fischer2010, p. 115), and, therefore, that latent mean comparisons could be made for men and women for this construct.
Conclusion
When translating WF measures for use across cultures, there should be an emphasis on conceptual, in addition to linguistic and semantic, equivalence. Thus, best practices in this area involve the use of a consensus-based, concept-driven approach by a team of indigenous researchers who together formulate and implement a culturally sensitive research plan. Further, forward and back translation should be carried out by fluently bilingual and bicultural translators who are knowledgeable about the subject matter and research context and have experience with survey construction. In addition, a committee composed of multinational content experts should oversee the reconciliation process. Following this there should be multiple rounds of pilot testing that involve both focus groups and cognitive debriefing. Throughout, there should be a focus on whether concepts are meaningful in different cultural contexts and whether any emic issues have been overlooked. Finally, the requirements necessary for conducting ME/I analyses should be understood before choosing items or measures to be included on a survey.
In terms of best practices for ME/I, it is essential to first understand that configural invariance is a necessary, but not a sufficient prerequisite for establishing ME/I across cultures. Secondly, deciding which of the many available ME/I tests should be carried out should be determined by one’s purpose. If one wishes to examine interrelationships among constructs cross-nationally, configural and metric level equivalence will be sufficient. However, if one wishes to compare group means cross-nationally, configural, metric, and scalar invariance are necessary. As a best practice one should examine multiple goodness-of-fit indicators (including the χ2, CFI, GFI, TLI, and RMSEA) to assess model fit at each level. We suggest that for cross-cultural WF research, when moving from one level of ME/I to the next (e.g., from configural to metric or metric to scalar), the ΔCFI should be used as the criterion, particularly when sample sizes are large. If full invariance is not found, when appropriate, researchers should attempt to establish partial invariance using the methods we have described above.
Establishing ME/I for cross-cultural work is neither quick nor easy. It is complicated and takes skill and effort. Nevertheless, establishing ME/I is a necessary foundation to support the validity of any cross-cultural comparisons.
A by-product of the globalization of the world economy has been a rapid increase in research aimed at understanding the role of cultural and national factors on not only work, but also the work-nonwork interface. Conducting research across national boundaries, however, brings with it a unique set of challenges from initial design to interpretation of data and publication of results. There is a rich literature on methodologies for conducting global workplace research that covers the technical intricacies of isolating effects of culture and nationality while controlling for other third-variable confounds (for an overview, see Spector, Liu, & Sanchez, Reference Spector, Liu and Sanchez2015; van de Vijver & Leung, Reference van de Vijver and Leung1997). This chapter will focus, not on such methodological and statistical concerns, but on the often overlooked strategies for organizing and managing research teams engaged in global studies. This “best practice” guide will cover issues that can arise in conducting research across national boundaries that can involve teams of researchers whose efforts need to be coordinated and standardized.
In this chapter we will focus on nine issues that must be carefully considered in the design, implementation, and overall management of a global study. These include:
1. Scope of the study
2. Assembling the team
3. Agreements over data use and publication
4. Study design and management approach
5. Base language for the study
6. Coordinating data collection
7. Data handling
8. Political and societal issues
9. Publication
As a unifying example, we will discuss our experiences with the two-phase Collaborative International Study of Managerial Stress (CISMS). The CISMS project was chosen, not only because we are part of the core team, but because it was focused on work-nonwork issues, and it illustrates all of the issues we will discuss. It should be kept in mind, however, that CISMS is but one model for a successful (in producing multiple papers in rigorously peer-reviewed journals) global collaboration, and there certainly are others that have been just as successful, if not more so.
The CISMS collaboration began in the early 1990s as a study of stress in managers, including work-nonwork issues. Each of the two phases involved approximately 7,000 managers from more than twenty-four countries. Between the two phases there were forty-six researchers involved in CISMS, not counting those who later came on board after data were collected to assist (or take the lead) in publication (e.g., Yang et al., Reference Yang, Spector, Sanchez, Allen, Poelmans, Cooper and Woo2012). Phase I was focused on stress in general (including the work-nonwork interface), whereas Phase II was a follow-up that focused primarily but not exclusively on the work-nonwork interface, attempting to shed further light on and explain the Phase 1 results. In addition, there were a number of small scale spin-off projects that derived from CISMS involving small numbers of the CISMS researcher community.
Scope of the Study
The first issue to be confronted with a global study is the purpose and scope of a project. As with any investigation, one must specify research goals, that is, what are the underlying questions to be addressed, and hypotheses (if applicable). However, particularly with large scale projects involving multiple countries, making adjustments in methodology and instrumentation as a project progresses can be a lot more complicated than with the usual single-country study. For instance, the possibility of gaining access to a sample at more than one point in time to collect a new wave of data decreases when samples come from multiple countries. Similarly, adding new instruments and regaining access to the same or a new sample might differentially alter the time span of the study from country to country, as it might delay data completion in certain countries. Therefore, researchers might end up with incomplete datasets where some scales were administered in some countries but not in others, or time lags might vary across countries, thereby provoking potential confounds. Thus it is important to carefully plan a study so that changes are minimized once it begins.
Another issue that one confronts is the scope of the project, that is, how many national samples will be involved? This is a vital question when the purpose of a study is to explore differences across a cultural variable, such as values. It is difficult to draw conclusions about why country differences might have occurred when one has only two (or a small number) of countries to compare. Studies that address cultural differences at the country or sample level are best considered from a multi-level perspective. Isolating the cultural variable from a host of national differences potentially confounded with culture requires having multiple country samples that vary reliably on that cultural variable. That is, to have sufficient power to detect differences and avoid confounding of country with the cultural variable requires not only a sufficient number of samples, but the sampling strategy should include multiple samples that run the gamut of low, high, and intermediate values on the cultural variable of interest. In CISMS Phase II for example, we wanted to isolate the cultural value of individualism-collectivism to determine its impact on work–family conflict (Spector et al., Reference Spector, Allen, Poelmans, Lapierre, Cooper, O’Driscoll and Widerszal-Bazyl2007). A country selection strategy was chosen that allowed a comparison of individualistic with collectivistic samples by choosing five countries from each of four regions that prior research found differed on this variable: Anglo, Asia, East Europe, and Latin America (for discussion of using external data on values, see Taras, Kirkman, & Steel, Reference Taras, Kirkman and Steel2010).
Assembling the Team
Collecting data from multiple countries is best done with the coordinated efforts of an international team of research partners. Although it might be possible to collect data from a variety of countries remotely through electronic media, such methods are quite limited in providing reasonable research controls over sampling and methods. In addition, countries vary in idiosyncratic factors that make their citizens differentially inclined to participate in a research study. In collectivistic countries, for instance, potential respondents might be reluctant to participate in a study that does not have the endorsement of a trustworthy source close to them or their ingroup. Our approach in CISMS mirrored that of other multinational studies (e.g., Peterson, Smith, Akande, Ayestaran, et al., Reference 245Peterson, Smith, Akande and Ayestaran1995) in assembling an international team of research partners who were willing and able to contribute to the study. In both phases there was a central team who designed and planned the study and recruited the partners who were responsible for survey translation and back-translation, data collection, and providing a clean dataset to the central team for analysis. Partners varied in their involvement in dissemination of results, including contributing to write-ups, and even writing papers. Some of the partners were seasoned researchers with established research records, whereas others had limited research experience prior to their involvement in CISMS. These variations in research expertise turned into developmental opportunities for the more junior researchers. For instance, drafts of manuscripts were circulated among all participants who were given a chance to offer comments and edits on them. They had a chance to participate in and witness how responses addressing the reviewers’ concerns in journal submissions were crafted. Even for researchers who did not participate much in the writing of manuscripts, witnessing the process from beginning to end (i.e., several rounds of revise-and-resubmit) was undoubtedly a learning experience.
Our experience is that there are many researchers around the world eager to collaborate on multinational projects. They can be recruited through personal contacts and networking, either by targeting specific individuals, or by putting out open calls for research partners on appropriate listservs or via social media. Before recruitment of collaborators begins, the roles of the team members should be carefully specified so that individuals will know from the outset the rules for their participation. For instance, are they being offered payment for collecting data that the central team will own, or are they being offered a larger role in the research team? Do members “own” the data that they collect and can proceed to write a separate article or do they need to give other team members a chance to participate in the writing? There also needs to be clear policies and procedures for governing the study, which brings us to the issue of data agreements.
Agreements over Data Use and Publication
There are perhaps no aspects of a multinational study that have more potential for conflict than ownership and use of data. Although the central team might have designed the study and study protocols, the country partner who collected a sample of data might have ownership rights to those data. Agreements over how those data will be used and disseminated at conferences and in print form should be developed at the onset of the study. In this way expectations are clarified early, and the rules governing participation in the study are well known to all team members, so that each is treated fairly.
There are different models that can be used that inform the kinds of agreements that might be reached. At one extreme is the possibility of paying individuals to collect data, with the agreement that they will have no ownership rights. In such cases the individual country partners would have no substantive contribution, and would merely enact the data collection protocols provided. At the other extreme is a true partnership, where partners become full members of the research team who have the ability to influence the design and execution of the project, as well as dissemination of results.
Perhaps the most controversial issue of data agreements concerns authorship in potential publications resulting from the study. For example, how does one earn the right to serve as a lead author, and who is to be a coauthor, and in what order? In CISMS each individual partner owned the dataset he or she collected, and thus deserved co-authorship in each article in which that dataset was used. For each paper, one individual member of the team would take the lead and be first author to pursue a specific research question(s) that he or she championed. The remaining co-authors followed in order according to contribution to that paper, followed in order by contribution to the main project design (generally the central team), and lastly followed by those who made smaller contributions related to instrumentation (e.g., producing measurement-equivalent back-translated scales) and data collection. When people were approximately equal in contribution, they were listed alphabetically. Different team members took the lead on individual papers that involved most or all countries and that focused on the lead author’s own research idea. A few of the country partners published papers using solely their own country dataset, or a small number of country datasets. Adherence to just a few basic principles regulating order of authorship helped CISMS avoid conflicts over data ownership and publication.
Study Design and Management Approach
International studies vary in their underlying management model, particularly in their degree of centralization. CISMS is an example of a centralized model with a central team that designed the study and recruited country partners, or what Sanchez and Spector (Reference Sanchez and Spector2012) termed the “NATO model.” An alternative involves recruiting an international team that designs and carries out the study, more like a cooperative. Sanchez and Spector termed this approach the “UN model,” suggesting a relative lack of hierarchy. Whereas the UN model can work well with small numbers of countries and partners, as the project grows in size and scope, the advantages of having a centralized group to design the study and plan how it will be conducted becomes apparent.
The UN approach has the advantage of maximizing diversity of inputs, and can best incorporate country-specific perspectives. In a sense it has the potential to maximize the emic-ness of a study, that is, the extent to which investigation of country-specific or culture-specific characteristics is built into the study. This is because each country partner would be free to make inputs and to design the study in a way that best reflects local issues. This sort of project lends itself well to a mixed-methods approach, combining qualitative and quantitative aspects. An initial qualitative component can be helpful in isolating the most relevant variables, with a following quantitative component to explore relationships and test hypotheses (Niblo & Jackson, Reference Niblo and Jackson2004). Although this could be done with a NATO approach, the UN approach maximizes the country-specific aspect of the study, as all partners can have equal input into the questions asked in the qualitative portion, and the variables chosen for the quantitative portion of the project. For example, Yang et al. (Reference Yang, Spector, Sanchez, Allen, Poelmans, Cooper and Woo2012) found that the relationship between the number of work hours and perceived work overload was stronger for individuals in individualistic than collectivistic countries. They speculated that the coping mechanisms employed in individualistic cultures might be less effective to cope with long work hours than those employed in collectivistic cultures. It follows that coping with work–family conflict might take different forms in different cultures as traditional gender roles might often dictate who takes over certain home chores and family obligations. Armed with these insights plus additional ones gained through qualitative studies, researchers might develop emic measures of work–family coping mechanisms fit to the realities of collectivistic cultures.
A UN approach that does not allow for any centralized control can result in a study in which country datasets are not comparable, perhaps with different questions/items that reflect local usages and customs that are quite difficult to translate, or with somewhat different procedures for sampling or data collection. The NATO approach is perhaps preferred when the purpose is to investigate general principles, requiring the control of third variables, that is, biases and confounds that might arise if insufficient research controls are in place. These threats to the validity of the design are likely to happen in a more loosely organized UN study in which a central person or team is not “in charge” of the study. Such a person can help coordinate data collection efforts, and would serve as a resource to all partners concerning how and when data should be collected. Of course, excessive centralization and standardization can backfire when standardized measures and methods do not work equivalently in all samples. For example, the core CISMS team realized early on that the measure of values we chose (Hofstede, Reference Hofstede1994) did not have acceptable internal consistency reliability and comparable factor structure in some countries.
For many projects, a hybrid approach might be preferred in which there is a central team that plans the study, but input is solicited from all research partners. An iterative procedure could be used where the central team produces an initial draft document that states the purpose of the study, outlines a proposed study design, and lists potential measures. Feedback from all research partners would be solicited and then incorporated into the document, with each draft circulated to everyone for input. The iterative process would continue until all issues were resolved through discussion, and no new issues were raised. Once finalized, this document serves as the study’s “constitution,” which can be used to explain the research goals. If needed, this document can be later employed to recruit additional collaborators across nations. All these collaborators are treated as research partners and are free to add additional measures as they see fit, they are also expected to administer the scales adopted by the core central team. Of course, one must be careful that additions do not make the questionnaire too long and unwieldy, thus adversely affecting data quality.
Base Language for the Study
A multi-country study is most effectively conducted if the project is managed using a common language. For CISMS the base language for communication was English, with all documents and emails written in English. Since English is the most universal language for publishing organizational and work-nonwork research, it was not difficult to recruit research partners who were sufficiently fluent in English to communicate with the central team.
For CISMS, English was also the base language for questionnaire development, containing a mixture of existing scales and new items. The English version of the questionnaire was provided to country partners who handled the translation and back-translation (for discussion of back-translation procedures see Spector et al., Reference Spector, Liu and Sanchez2015). Of course, it is possible to assemble a questionnaire that consists of scales that were developed in a variety of languages, for example, English and Spanish. It would be most manageable in such cases to produce a single-language version of all scales (e.g., translate the English scales into Spanish) so that translations of the completed questionnaire would involve one source language. Large-scale cross-national studies are still an emerging area, and further research is needed on whether variations in the base language produce conflicting results. The Sapir–Whorf or Whorfian hypothesis (Werner & Campbell, Reference Werner, Campbell, Carroll and Cohen1970) suggests a strong bond between language and cognition, and therefore, the choice of a specific base language should be carefully considered. For example Sanchez, Alonso, and Spector (Reference Sanchez, Alonso and Spector2000) found signs of measurement non-equivalence among back-translated measures administered twice to the same bilingual participants, as well as evidence suggesting that altering the order of language across the two administrations changed the results. Together, these results hint that the meaning of certain constructs might be intrinsically bound to the choice of language, and that even back-translations might not always produce measurement equivalence.
Coordinating Data Collection
Whether the project is designed centrally via a NATO approach or designed by the broader group of partners via a UN approach, someone needs to coordinate and manage the data collection across countries. With both phases of CISMS, a member of the central team was identified as a point of contact with the research partners. This person kept in contact with all partners, answering their questions, noting problems that occurred including the necessity to deviate from the study protocol, and collecting datasets. The contact person maintained contact beyond data collection, asking for answers to questions as they arose during the data analysis and write-up stages of the project. Having a single point of contact helped avoid confusion as the partners had one person with whom to communicate, and there were not conflicting messages flowing among partners. The partners were certainly free to communicate with one another, and some did, but having the main messages centralized assured that the central team could be kept apprised of study progress over the several years it took for data collection to be completed for each phase. Obviously, we make no claims of our approach being either the only or best one. A potentially fruitful alternative approach may be to outsource the data collection to a third party, such as a consulting company with an international presence. However, releasing control of data collection to a third party, or any other approach that involves delegating or cascading down the data collection to unaccountable parties, elevates the risk of ending up with poor quality data furnished simply to obtain a short-term financial gain.
Data Handling
The management and analysis of data from a multi-country study can be a challenging endeavor, especially when there are more than a few samples. Data management efforts begin with the design of an investigation, and procedures for how data are to be collected and handled should be incorporated into the initial study plan. This is the case whether data are qualitative or quantitative, even though the analysis process might be quite different. In order to conduct cross-country or cross-cultural comparative analyses, data must be compiled into a uniform format that allows for analysis. Variation in mode of survey administration (paper-and-pencil versus web-based) across countries creates a potentially critical confound that should be avoided if possible. However, differences in economic development and technology infrastructure may make web-based surveys difficult in some locations. An even more complicated confound would be created if the same procedure (e.g., web-based survey) is followed in every country, because it might result in a biased sample whereby participants from emerging economies might represent, not average workers, but privileged individuals with prompt access to the internet. Enforcing web-based surveys as the sole method of data collection might also result in other types of sampling biases such as an over-representation of white collar employees with direct access to computers at work.
Planning data collection and handling. Whether the study adopts a NATO, UN, or hybrid approach, standardization will be required so that data can be directly compared. With qualitative studies, this means that comparable questions should be used across countries, and with quantitative studies, measurement-equivalent instruments should be employed. A limitation to conducting research across countries where people speak different languages is that measurement-equivalent translations are often challenging to produce. This can introduce noise and potential confounds into an investigation as there can be variation in exact meaning of items and questions. Nevertheless, to the greatest extent possible, equivalence should be the goal, so carefully developed back-translations (Brislin, Reference Brislin, Lonner and Berry1986) are in order. Back-translation requires that a party translates the scale to the target language first so that an independent party translates it back to the source language. Then the two versions of the same language are compared to detect linguistic problems in the translation. Even though back-translations do not necessarily guarantee measurement equivalence, they are an essential step towards it.
For quantitative studies, statistical methods have been developed to test that measurement equivalence/invariance (ME/I) has been achieved (Spector et al., Reference Spector, Liu and Sanchez2015; Vandenberg & Lance, Reference Vandenberg and Lance2000). Most of these methods are based either on item response theory (IRT) or structural equation modeling (SEM) approaches. IRT methods show that item response distributions are similar across country samples, whereas SEM methods show equivalence of inter-item factor structure. The testing of ME/I with one or both methods has become standard practice in studies where samples are drawn from different countries. See Chapter 8 of this handbook (Korabik & Rhijn) for a detailed discussed of translation and ME/I issues.
Another aspect of data planning concerns avoiding confounding country with sample characteristics by choosing samples that are as similar as possible. One would not, for example, choose physicians for one country sample and plumbers for another, as it would not be possible to disentangle differences due to country from those due to occupation. In addition to controlling for occupation, one should control for industry sector, employing organization characteristics (e.g., sector and size), and demographic variables.
Collecting and cleaning datasets. To facilitate data processing all country partners should send datasets to a central data custodian who has the responsibility of cleaning and combining the datasets in preparation for analysis. With CISMS one person was point-of-contact and another person was data custodian who cleaned the data and handled preliminary analyses. Having a single person handled all data sets ensures a homogeneous treatment of, for instance, missing values and random response patterns. It is perhaps easiest if each country partner provides a spreadsheet with the item names in the first row, as well as a copy of the survey as used in that country. Whoever is processing the data at this stage should obviously inspect each dataset for uniformity. Our experience with CISMS is that often, country partners would make small modifications to the original survey. In some cases scales were omitted, whereas in others some additional questions or scales were added. Thus the order of items and even scales might not always be the same, and this could readily be seen in the spreadsheet. Another variation from the original was in the coding of some items, particularly demographics. Some partners coded gender, for example, as 1 = male and 2 = female, but others did the reverse, and some used 0 and 1 rather than 1 and 2. Often questions arose that required back-and-forth discussion with partners to clarify what things might have been changed. Still another issue is whether researchers in each country followed the same procedure to treat missing values and to declare that an incomplete survey lacks sufficient answers and should be excluded from the study. Some partners left missing items blank, whereas others inserted a 9 for each digit with missing data, that is, 9 for a one-digit number and 99 for a two-digit number.
All of the deviations from standard procedures should be documented so that they can be incorporated into the data analysis stage. Deviations can be handled by changing values in the dataset. For example, if the coding of gender is to be standardized to 1 = male and 2 = female, samples where the coding was reversed could be manually changed. If there are few items that need changing, this might be a reasonable approach. However, if there are many deviations, an easier and less error prone approach would be to automate recoding by programming rules into the data analysis software. For CISMS we used SAS which is a fully functioning program language that easily allowed for such recoding.
Data analysis. Only after cleaning and testing for data integrity (e.g., checking that values for all variables are within possible range) has been done, should the analysis of data proceed. With multi-country studies, programming the combining and scoring of data can be extensive. For example, the CISMS scoring and analysis program is more than 1,000 lines of SAS code. That is because each country dataset needed recoding for standardization, the individual datasets were merged into a single master dataset, and scoring of instruments was completed. At this stage each case in the dataset represented a person, identified by country. Analyses were then conducted on the master dataset, both within country and between country. Output datasets were created as needed when analyses were conducted using software other than SAS. One member of the central team conducted the initial analyses with SAS and provided text-format datasets to other partners who wished to conduct further analyses for papers they took the lead to write. An added advantage of a centralized approach to data analysis is that a complete copy of the raw data is always available from the data custodian of the central team. The existence of this raw dataset allows researchers to go back to the data that were originally gathered in each country, rather than data that have been transformed and recoded in ways that can make it difficult to reconstruct how the data were processed.
Political and Societal Issues
When conducting research across countries, there is potential for political and societal issues to arise. Societal issues can result when potentially sensitive areas of inquiry are addressed, for example, a work–family study might include same-sex couples. Asking questions about such arrangements might not be considered acceptable in some traditional and non-secular societies, and thus there might be concerns raised by partners about the appropriateness of certain areas of inquiry. As another example, conventions regarding gender roles might limit access to females in certain cultures, or they may result in a preponderance of participants from certain age ranges. Furthermore, offering rewards for study participation may provoke conflicts of interest in participants when such rewards are not proportional to the effort demanded of the study for the particular location.
Political issues can also be problematic, and are particularly likely when research partners are in countries that are engaged in political disputes. The CISMS project encountered two such issues. The first had to do with terminology and how we identified countries. In an initial draft of the first CISMS paper, China and Taiwan were referred to as separate countries. The Chinese partner objected, suggesting that China and Taiwan were both part of the same country and should be noted as such. The compromise that solved the issue was to use the term “nations/provinces” (Spector, Cooper, Sparks, et al., Reference Spector, Cooper, Sanchez, O’Driscoll, Sparks, Bernin and Yu2001) or “geopolitical entities” (Spector et al., Reference Spector, Cooper, Sanchez, O’Driscoll, Sparks, Bernin and Yu2002).
The second political issue was more complicated, as it involved an underlying conflict between two of the countries in which the partners lived. In Phase 1 of CISMS, two of the country partners were from countries in the midst of a political conflict. When one of the partners realized that another partner was from a “forbidden” country, she withdrew from the study. It was not that the researcher had an objection to collaborating with a colleague from the other country. Rather she was concerned that she would get into political trouble in her own country if officials found out she was working with someone from the other country, even though the contact was indirect and through the central team. That this is not so farfetched is illustrated by American history during the McCarthy era of the 1950s cold war when even a hint of collaboration with communist countries could result in being blacklisted (losing one’s job) and worse.
Dealing with political issues and cultural differences can require sensitivity and a willingness to compromise. It is best handled with open communication, and by everyone being kept informed about the project, including who is involved, what is being studied, and how data are to be collected. Even with the best pre planning, unforeseen issues can arise both in conducting the study and in the broader societies where partners live.
Publication
One of the main drivers that leads international researchers to participate in multi-country studies is the possibility of attaining one or more refereed publications. Given the importance of publication in well-cited outlets, there is significant potential for conflict. Issues can arise about who takes the lead and writes papers, author order, the content of papers, the framing and interpretation of results, and the outlets for publication. For this reason, it is vitally important that publication agreements be stated in writing at the beginning of a project to minimize misunderstandings at the publication phase. Earlier we gave an overview of the CISMS agreement about data use and publication.
Both CISMS phases were large-scale studies that collected more data than could be reasonably included in a single research report, so each phase resulted in a series of conference papers and journal articles. Some were preplanned and involved testing hypotheses that each phase was designed to address, but others evolved in a more exploratory way. For example, Phase 1 was intended to test the impact of cultural values on stressors and strains, but during early data analysis it became apparent that there were psychometric weaknesses with the values scale chosen, so it was decided to write a paper about that issue (Spector, Cooper, Sparks, et al., Reference Spector, Cooper, Sanchez, O’Driscoll, Sparks, Bernin and Yu2001). Some papers involved all countries (Yang et al., Reference Yang, Spector, Sanchez, Allen, Poelmans, Cooper and Woo2012), whereas others involved only a subset (Allen et al., Reference Allen, Lapierre, Spector, Poelmans, O’Driscoll, Sanchez and Woo2014). A few of the country partners wrote papers from only their own dataset, and some partners pooled their own datasets to write papers about a single region or simply a smaller set of countries. Table 12.1 contains a sampling of peer-reviewed CISMS papers showing for each the countries involved and the purpose. We firmly believe that giving researchers this type of freedom to toss ideas around and to test them in the manner that they deemed most appropriate using parts or all of the data was key to the success of the project, as everyone was challenged to be creative and to test his or her own ideas while being part of a large research team.
| Reference | Journal | Phase | Countries/Regions | Purpose |
|---|---|---|---|---|
| Miller et al. (Reference Miller, Greyling, Cooper, Lu, Sparks and Spector2000) | Stress Medicine | I | South Africa, Taiwan, UK, US | Examined gender and culture differences in work stress. |
| Spector, Cooper, Sparks, et al. (Reference Spector, Cooper, Sanchez, O’Driscoll, Sparks, Bernin and Yu2001) | Applied Psychology: An International Review | I | Belgium, Brazil, Bulgaria, Canada, China (PR), Estonia, France, Germany, Hong Kong, India, Israel, Japan, New Zealand, Poland, Romania, Slovenia, South Africa, Spain, Sweden, Taiwan, UK, Ukraine, US | Explored psychometric properties of Hofstede’s VSM 94. |
| Spector, Cooper, Sanchez, et al. (Reference Spector, Cooper, Sanchez, O’Driscoll, Sparks, Bernin and Yu2001) | Journal of Organizational Behavior | I | Australia, Belgium, Brazil, Bulgaria, Canada, China (PR), Estonia, France, Germany, Hong Kong, India, Israel, Japan, New Zealand, Poland, Romania, Slovenia, South Africa, Spain, Sweden, Taiwan, UK, Ukraine, US | Country-level study of link between individualism-collectivism, locus of control, and well-being at work. |
| Spector et al. (Reference Spector, Cooper, Sanchez, O’Driscoll, Sparks, Bernin and Yu2002) | Academy of Management Journal | I | Australia, Belgium, Brazil, Bulgaria, Canada, China (PR), Estonia, France, Germany, Hong Kong, India, Israel, Japan, New Zealand, Poland, Romania, Slovenia, South Africa, Spain, Sweden, Taiwan, UK, Ukraine, US | Individual-level study of link between locus of control and well-being at work. |
| Bernin et al. (Reference Bernin, Theorell, Cooper, Sparks, Spector, Radhakrishnan and Russinova2003) | International Journal of Stress Management | I | Bulgaria, India, Sweden, UK, US | Studied gender differences in stress coping strategies. |
| Spector, Cooper, et al. (Reference Niblo and Jackson2004) | Personnel Psychology | I | Regions: Anglo (Australia, Canada, New Zealand, UK, US), China (Hong Kong, PR, Taiwan), Latin (Argentina, Brazil, Colombia, Ecuador, Mexico, Peru, Uruguay) | Tested hypotheses about regional differences in responses to working hours and work–family stressors. |
| Spector et al. (Reference Spector, Allen, Poelmans, Lapierre, Cooper, O’Driscoll and Widerszal-Bazyl2007) | Personnel Psychology | II | Regions: Anglo (Australia, Canada, New Zealand, UK, US), Asia (China (PR), Hong Kong, Japan, Korea, Taiwan), East Europe (Bulgaria, Poland, Romania, Slovenia, Ukraine), Latin America (Argentina, Bolivia, Chile, Peru, Puerto Rico) | Tested hypotheses about cultural differences in responses to work–family conflict. |
| Lapierre et al. (Reference Lapierre, Spector, Allen, Poelmans, Cooper, O’Driscoll and Kinnunen2008) | Journal of Vocational Behavior | II | Australia, Canada, Finland, New Zealand, US | Tested a model linking perceptions of family supportive work environment to work–family conflict, family, job, and life satisfaction. |
| Lu et al. (Reference Lu, Kao, Cooper, Allen, Lapierre, O’Driscoll and Spector2009) | International Journal of Stress Management | II | Taiwan, UK | Two country comparison of work resources, work family conflict, and well-being. |
| Masuda et al. (Reference Masuda, Poelmans, Allen, Spector, Lapierre, Cooper and Moreno-Velazquez2012) | Applied Psychology: An International Review | II | Regions: Anglo (Australia, Canada, New Zealand, UK, US), Asia (China (PR), Hong Kong, Japan, Korea, Taiwan) Latin America (Argentina, Bolivia, Chile, Peru, Puerto Rico) | Investigated regional differences in the relationship of flexible work arrangements to work family conflict and well-being. |
| Yang et al. (Reference Yang, Spector, Sanchez, Allen, Poelmans, Cooper and Woo2012) | Journal of International Business Studies | II | Argentina, Australia, Bolivia, Bulgaria, Canada, Chile, China (PR), Estonia, Finland, Germany, Greece, Hong Kong, Japan, Korea, Netherlands, New Zealand, Peru, Poland, Puerto Rico, Romania, Spain, Taiwan, Turkey, UK Ukraine, US | Multi-level study of individualism-collectivism as a moderator of the work demand-strain relationship. |
| Allen et al. (Reference Allen, Lapierre, Spector, Poelmans, O’Driscoll, Sanchez and Woo2014) | Applied Psychology: An International Review | II | Australia, Canada, Finland, Greece, Japan, Korea, Netherlands, New Zealand, Slovenia, Spain, UK, US | Investigated national paid leave policies and work–family conflict in working parents. |
Related to the previous point, our approach made the establishment of solid trust bonds among participating researchers possible, and that resulted in long-term research relationships. After the success of Phase I, many of the same individuals continued in Phase II, as well as participated in smaller spin-off projects (e.g., Spector, Sanchez, Siu, Salgado, & Ma, Reference Spector, Sanchez, Siu, Salgado and Ma2004).
Although most of the CISMS researchers had a background in organizational psychology or organizational behavior, several had other backgrounds (e.g., medicine). Although this did not become an issue for CISMS, it is possible that researchers from different disciplines would have difficulty agreeing on outlets for publication, given that academic rewards are often tied to within-discipline recognition, which can result in different preferences for where papers are published. The potential outlets for publication should be part of the early negotiation as multi-country teams are assembled so later conflicts and misunderstandings about publication outlet can be avoided.
Conclusions
Conducting research studies across multiple countries is a challenging, but rewarding endeavor. Given the involvement of many individuals who are geographically dispersed, there are many challenges including communication, coordination, and cooperation. The almost universal availability of the internet is perhaps the most important tool for carrying out such collaborations, as communication is quick and efficient. Email is the major tool for sharing documents and information, but face-to-face electronic communication is possible in many places via video chat. Storing data in a central cloud-based repository is certainly feasible, but one should be cautious about unrestricted access and potential data theft. Given the rules of participation in CISMS, for instance, country partners were promised authorship in any manuscript that included their country’s data and therefore, data leakages would threaten fulfillment of that promise.
Table 12.2 provides a summary of what might be considered best practices in carrying out a multi-country study. It is organized around the nine issues covered in the chapter. A core theme running through these best practices concerns clear communication among all members of the research team, and the need to put all agreements and procedures in writing. Doing so will reduce the potential for conflicts and disagreements.
| Issue | Best Practices |
|---|---|
| Scope of the study | Specify the purpose including hypotheses and research questions |
| Choose countries based on purpose | |
| Assembling the team | Use personal contacts to identify team members |
| Be sure team has expertise in all areas needed | |
| Agreements over data use and publication | Have written agreements that include data ownership/use and publication |
| Each team member’s responsibility should be clearly stated | |
| Study design and management approach | Choose management model that best suits purpose. NATO when strict protocol needs following; UN when country-specific input is vital |
| Base language for the study | Choose a base language for team communication, likely English |
| Team members chosen must be able to communicate in base language | |
| Choose base language for instrument development | |
| Coordinating data collection | One person should serve as liaison with country partners |
| Individuals should have responsibility for coordinating/managing various functions such as gathering data sets and data processing | |
| Data handling | One person collects data sets from partners |
| Check for variation in instruments, questionnaires, and research materials | |
| Document rules for standardizing data | |
| Run checks for data errors | |
| Pick statistical package/language for main data processing and scoring | |
| Political and societal issues | Be alert for cultural misunderstandings and political issues that might arise |
| Be flexible in dealing with potential conflicts and issues | |
| Publication | Publication products and outlets should be part of the initial written plan for the project |
| Have written agreements that covers what papers will be written, who will take the lead, and author order |
Multi-country studies require a significant effort over a considerable length of time that can stretch into several years. It is important to carefully plan such studies, assemble a good team able and willing to carry out the project, and to methodically process the complex data that are collected. Given the effort and time required, it makes sense to use these projects to collect an extensive dataset that can serve as an archival resource to answer new questions as they arise for years to come. Large-scale studies involving more than a few countries are not very common, so they are potentially important contributions to the literature. We hope that this chapter is of assistance to researchers who are considering conducting such studies in the work-nonwork and other domains.