Bureaucratic Representation and Gender Mainstreaming in International Organizations: Evidence from the World Bank

H ow does the representation of women in international organizations affect the implementation of gender mainstreaming policies? Many international organizations have adopted policies to prevent gender discrimination in their operations, but their implementation is often lackluster. We argue that these shortcomings appear due to a combination of institutional incentives and an underrepresentation of women in their staff. We test the argument in the case of the World Bank, drawing on highly disaggregated staffing data, an instrumental variable strategy, and an elite survey experiment. Our results show that most staff incorporate at least shallow gender mainstreaming in their projects. Deeper implementation of gender mainstreaming is more likely when women staff supervise projects, hold positions of authority, and are more represented as coworkers. These results contribute to understanding the disconnects between talk and action on mainstreaming policies and inform debates on representation in global governance.


N
early 2.4 billion women do not have equal economic rights, and women still face barriers to equal participation in social, political, and economic life in 178 countries (World Bank 2022b).In response, international organizations (IOs) have almost universally adopted gender mainstreaming policies to promote gender equality in their operations (Hafner-Burton and Pollack 2002;Meyer and Prügl 1999;True and Mintrom 2001).While the discourse and practice on mainstreaming policies vary substantially among IOs, the common objective of these policies is to integrate "gender issues into the entire spectrum of activities that are funded and/or executed by an organization (…) making it a routine concern of all bureaucratic units and its staff members" (Razavi and Miller 1995, II).Yet, history has shown that gender mainstreaming strategies and policies do not easily translate into organizational practice, even when there are clear mandates, codified operational policy, and buy-in from high-level leaders (Berik 2017;Dietrich et al. 2023;Hafner-Burton and Pollack 2002;Kenny and O'Donnell 2016;Moser and Moser 2005;Mukhopadhyay 2016;Parpart 2014;True and Parisi 2013;Walby 2005;Weaver 2010; Winters et al. 2018) The results are often persistent disconnects between talk and action that leave IOs open to charges of ineffectiveness, "box-ticking," and even hypocrisy (Caglar 2013;Tiessen 2005;Weaver 2008).
To explain differences in implementation, we highlight the importance of staff as agents within IOs.Organizational culture, principal preferences, and client interests create substantial pressures for staff that can conflict with mainstreaming mandates (Barnett and Finnemore 2004;Hawkins et al. 2006;Weaver 2008).Nevertheless, implementation staff often have significant autonomy from member state principals and organizational management (Clark and Zucker 2023;Heinzel 2022).Compliance with formal policy guidance thus depends on staff using their discretion to embrace gender mainstreaming in the operations they oversee.Hence, staff identities, values, and goals crucially determine their day-to-day implementation decisions on gender mainstreaming.
We argue that the gender composition of IO staff can explain internal variation in responses to formal policy guidance on gender mainstreaming.In doing so, we do not posit an essentialist notion that all women will implement gender mainstreaming because they are women, and men will ignore it because they are men.Instead, we argue that all staff members are incentivized to implement at least a minimal level of compliance with gender mainstreaming guidelines to ensure project approval.However, organizational incentives to engage in box-ticking and skepticism of recipient counterparts mean that the depth of implementation depends on the commitment of individual staff members.
We draw on two insights from gender representation literature to understand variation in such commitment to gender mainstreaming: first, while not all women will be automatically interested in fighting for gender inequality, women are on average more committed to policies focusing on gender equality than men (Betz, Fortunato, and O'Brien 2021;Celis 2007;2008;Celis et al. 2008;Park and Liang 2021;Poushter, Fetterolf, and Tamir 2019;Sapiro 1981;Wike et al. 2010).Second, women's representation within organizations can create what bureaucratic representation theorists call a "contagion" effect (An, Song, and Meier 2022), defined as a socialization mechanism wherein the presence of more women staff in a professional group will affect the behavior of men in that group.
Our empirical analysis focuses on the World Bank (hereafter Bank)-one of the most important IOs in global economic governance.We rely on a mixedmethods approach that combines observational analyses at the project level and a survey of Bank staff.Our analyses link novel individual-level data on the most relevant Bank project staff with the Bank's own Gender Mainstreaming Index (GMI, discussed in the following section), which tracks shallow versus deep implementation of the Bank's gender mainstreaming policy guidelines (World Bank 2018a).We implement an instrumental variable strategy and conduct an elite survey experiment with Bank staff members to help address endogeneity concerns.To interpret the findings of our quantitative analyses, we also draw on insights from 33 key informant interviews. 1  Three findings stand out from our mixed-methods analysis: first, formal policy guidance on gender mainstreaming appears to incentivize some level of shallow gender mainstreaming-irrespective of the gender of staff members directly tasked with implementing these projects.Second, women's representation makes a difference once we distinguish between shallow and deep mainstreaming as women staff implement gender mainstreaming more deeply.In line with these observational results, our preregistered conjoint survey experiment reveals that all staff perceive gender mainstreaming as important to get projects approved, but women staff view projects with gender mainstreaming as more relevant to achieving overall development objectives than men.Third, all staff show deeper commitments to gender mainstreaming when their bureaucratic subunits include more women staff and when their superiors in the internal hierarchy are women.Together, our findings strongly imply that the decisions of individual Bank staff members affect variation in the implementation of gender mainstreaming at the Bank.

PUTTING IDEALS INTO PRACTICE: GENDER MAINSTREAMING IN INTERNATIONAL ORGANIZATIONS
Most IOs have adopted gender mainstreaming policies to overcome systematic gender discrimination in their operations.These policies have attracted considerable attention in the scholarly community (True and Mintrom 2001).Some have focused on the gender-based advocacy that shaped the adoption and diffusion of gender mainstreaming policies (Hafner-Burton and Pollack 2002;Hardt and von Hlatky 2020;Kardam 1993;Pollack and Hafner-Burton 2010;Rothermel and Shepherd 2022;Stratigaki 2005;Tiessen 2004;2005;True and Mintrom 2001;Weaver 2010).Others have critically assessed differences in the content of, shown severe blind spots in, and criticized lackluster implementation of gender mainstreaming strategies (Hardt and von Hlatky 2020;Meyer and Prügl 1999;Moser and Moser 2005;Pollack and Hafner-Burton 2010;Roberts and Soederberg 2012;Tiessen 2005;True and Parisi 2013).
One key takeaway from these studies is that many IOs implement mainstreaming policies unevenly across their organizations.They experience "serious problems in translating the commitment into action" (Caglar 2013, 336), and "the result has been a highly variable and generally disappointing pattern of implementation" (Pollack and Hafner-Burton 2010, 308).The explanations of this disconnect between talk and action highlight the institutional context in which IO staff are embedded.For example, Hardt and von Hlatky (2020) demonstrate that NATO's civilian bodies implemented gender mainstreaming less diligently than its military bodies and explain these differences based on available gender training and the organizations' hierarchical structure.Moreover, Hafner-Burton and Pollack (2002) and Pollack and Hafner-Burton (2010) compare the implementation of gender mainstreaming across IOs, showing that strategic framing of gender mainstreaming at UNDP led to more commitment to and broader implementation of gender mainstreaming than at the World Bank.
We complement this scholarship by unpacking within-IO variation in the implementation of gender mainstreaming policy through a study of the World Bank.The Bank is an important case because of its material and ideational power in global development.In the fiscal year 2022, it disbursed over $50 billion in development finance, making it the world's second-largest multilateral development aid institution, after the European Union (OECD 2024), and its annual World Development Report and World Development Indicators are among the most highly utilized sources of development ideas and data (Stone 2003).Furthermore, we believe that studies of the Bank can yield broader insights for studies of the implementation of mainstreaming policies in other IOs.Many service IOs-especially in the field of international development-have mimicked the Bank's 1 The first round of 23 interviews took place as part of a previous project on gender mainstreaming at the Bank, conducted by Catherine Weaver in January 2007.The second set of interviews were conducted via Zoom in May-June 2022 with nine task team leaders and one senior researcher in the Bank's Development Research Group.All interviews were conducted under a consent agreement that offered confidentiality.More information on the interview respondents and ethical considerations can be found in the Supplementary Material.institutional design.Other aid organizations also frequently hire former World Bank staff and vice versa.We believe that these facts increase the representativeness of the Bank as a case of a large service IO (da Conceição-Heldt and Schmidtke 2019).As Briggs (2019) has shown, results generated from World Bank project implementation studies tend to generalize fairly well to other international development organizations.Nevertheless, we discuss the limits to the generalizability of our findings in more detail in the conclusion.
The World Bank has long lagged behind other IOs, such as the United Nations, in gender mainstreaming efforts.Despite peer institutions' more proactive embrace of international norms on gender equality in the 1970s and 1980s, Bank leadership allocated minimal staff resources to such work.In 1994, the Bank adopted a formal policy guidance-official internal rules governing operational practices that required staff to include gender poverty assessments, public expenditure reviews, and sector work in response to pressures from gender advocates in-and outside of the Bank (Weaver 2010).2Subsequently, the 1995 Beijing conference led to visible support from the Bank's top leadership for the first time. 3Internal advocates succeeded in pushing for two gender mainstreaming action plans in 2002 and 2006.Subsequently, gender mainstreaming was further strengthened through amendments to internal operational policies passed by the Bank executive board (O.P. 4.20).In 2012, the Bank published its first World Development Report on Gender and Development-17 years after the UN's 1995 Human Development Report on Gender and Human Development.The focus on gender mainstreaming was further reinforced by newly adopted gender mainstreaming strategies in 2015 and 2023.
The Bank's approach to gender mainstreaming has attracted negative attention.To gain traction, internal gender advocates felt compelled to strategically frame gender mainstreaming as "smart economics" to fit with the Bank's economistic culture and gain the support of important principals (Weaver 2010). 4Studies show that the Bank's "gender mainstreaming as smart economics" adheres to the Bank's apolitical, technocratic, and economistic culture, and reifies, rather than challenges, neoliberal discourse on development (Bazbauers 2023;Bergeron 2003;Griffin 2009;Moser and Moser 2005;Prügl 2017;True 2003).As a result, both internal and external advocates have argued that the Bank's approach deviates from feminist approaches to gender mainstreaming and hinders a meaningful transformation in values and behavior (Caglar, Prügl, and Zwingel 2013;Lombardo and Meier 2006;Prügl 2011).Therefore, some observers worry that mainstreaming in the Bank may not be particularly helpful in fostering the aspirations of many gender advocates.
We recognize these important debates but focus on a complementary challenge to Bank gender mainstreaming in this article: a discrepancy between formal policy guidance and their implementation.This divergence between talk and action is important since many gender mainstreaming strategies-irrespective of their content -face problems in implementation (Caglar 2013;Hafner-Burton and Pollack 2002;Hardt and von Hlatky 2020;Pollack and Hafner-Burton 2010).At the Bank, the implementation of gender mainstreaming strategies has proven difficult for more than fifty years.In the 1970s and 1980s, "only about 11 percent of the Bank's lending portfolio in the early 1980s contained projects with gender-related action-mostly in rural development, education, and health projects" (Weaver 2010, 75).The proportion of projects that included some consideration of gender issues in their design almost doubled between 1995 and 2001, to nearly 40% (Winters et al. 2018).Nevertheless, internal evaluations continued to warn of intraorganizational variation in implementation.For example, the 2015 Gender Strategy notes "instances of poor design and missed opportunities, owing to substantial variations in the approaches taken to similar projects in different countries (…) take-up is uneven, and the challenge is to ensure that all Bank Group staff sign on to a new good practices baseline" (World Bank 2015, 70).
Such uneven take-up is captured by the Bank's own GMI (World Bank 2018b), which codes the implementation of gender mainstreaming in projects approved between 2009 and 2017.The GMI measures the extent to which gender considerations are incorporated into (1) analysis, (2) actions, and (3) monitoring and evaluation of Bank projects.To fulfill the analysis criterion (1), project documents should examine gender issues, discuss gender diagnostics, undertake a gender assessment, or include results of consultations with genderfocused NGOs and beneficiaries.For example, the Uzbekistan Health System Improvement Project explicitly discussed that certain diseases are more prevalent for women than men in Uzbekistan and that the country has exceptionally high maternal mortality rates compared to other countries in Central Asia (World Bank 2020a).The actions criterion (2) requires that projects include targets addressing gendered needs, have gender-specific environmental and social safeguards, or discuss explicitly how project targets address gender disparities.For instance, the Zhejiang Qiantang River Basin Small Town Environment Project financed specific training programs for women and ensures that women and men share equally in compensation contracts (World Bank 2017).The monitoring and evaluation dimension (3) measures whether projects incorporate genderdisaggregated targets in their result frameworks or gender issues in their evaluation strategies.An example is the Mauritania Mining Sector Capacity Building Project, which set targets of 30% women in vocational training programs and 70% women as microgrant beneficiaries (World Bank 2005).
The resulting GMI is an additive index that ranges from 0 (no gender mainstreaming) to 3 (deep gender mainstreaming).Descriptive data on the GMI demonstrate the continued variation in the implementation of gender mainstreaming policy in Bank projects (Figure 1).5 Despite the Bank's ambition to include deep gender mainstreaming in all operations, nearly half of all projects fall short of a GMI score of 3. The following section explains this variation in the depth of gender mainstreaming across Bank projects.

EXPLAINING VARIATION IN THE IMPLEMENTATION OF GENDER MAINSTREAMING
We argue that variation in the implementation of mainstreaming goals at IOs is based on the interplay of four factors: organizational culture, principal control, client preferences, and the views of IO staff.As the literature on IO bureaucracies has shown, staff are often endowed with considerable authority and autonomy in policy implementation (Barnett and Finnemore 2004;Bauer and Ege 2016;Bayerlein, Knill, and Steinebach 2020;Fleischer and Reiners 2021;Hawkins et al. 2006;Hooghe and Marks 2015;Johnson 2013;Liese et al. 2021).At the Bank, for example, operational staff possess discretion in project design and implementation (Clark and Dolan 2021;Heinzel 2022;Honig 2020;Weaver 2008). 6Nevertheless, staff are constrained by organizational and environmental factors outside their immediate control.Our argument on the implementation of mainstreaming policies at IOs thus focuses on how individuals respond to such constraints differently, depending on their views on specific policy issues.We first discuss the three constraints and then explain why we believe the views of individual staff members are critical for understanding variation in the implementation of gender mainstreaming.
Organizational culture shapes IO staff incentives, beliefs, and interests (Barnett and Finnemore 2004;Bayerlein, Knill, and Steinebach 2020;Christian 2022;Weaver 2008;Weaver and Nelson 2016).This culture creates important context conditions for the implementation of all formal policy guidance, including gender mainstreaming policies, in IOs.At the Bank, scholars have well-documented that an "approval culture" is deeply embedded in the organization (Weaver 2008).This organizational culture incentivizes staff to get large and complex operations approved by the executive board to progress in their career.Indeed, a recent survey of Bank staff revealed that more than 90% of staff members see project approval as very or extremely important for their career success (Briggs 2021). 7his strong focus on approval increases principal state influence and the need to consider the views of Bank management as well as the executive board when designing projects.Staff have to ensure that their organizational principals are pleased with their proposed operations (Clark and Dolan 2021).To get buy-in from management and the Executive Board, they need to check all the boxes of social and environmental safeguards, anticorruption action plans, and gender mainstreaming (Heinzel 2022;Weller and Yi-Chong 2010).Operational staff are well aware of these "corporate Gender Mainstreaming Index Note: Percentage of projects that are scored as 0-3 on the World Bank's gender mainstreaming index (World Bank 2018a).
expectations" around gender mainstreaming. 8As one staff member put it: "you have to think about the Board's reaction (…) you can't just ignore issues like gender." 9 However, being resigned to accepting or rejecting project proposals brought by staff, the executive board does not typically review the depth of implementation of these policies (Weaver 2008).Since what matters internally is "getting your project gender tagged," 10 the approval culture promotes box-ticking behavior. 11These incentives for shallow implementation are often exacerbated by the need to design projects that align with client interests on gender mainstreaming.Borrower governments do not always enthusiastically endorse gender mainstreaming, which creates conflicting incentives for operational staff.For example, one Bank staff relayed being told repeatedly by a client government to "stay in their lane" when stressing gender mainstreaming. 12In these cases, "we have to push them/force them to do things like girls' education, which they don't have any interest in." 13 To resolve these conflicting incentives, staff need to exercise their discretion in project design and make difficult decisions about which principals to please.As one staff member stressed: "we have to sneak in gender to get Board approval." 14Hence, staff are often constrained in how much they can focus on gender mainstreaming in project design and implementation.
The context conditions highlighted so far (organizational culture, principal control, and client interests) pull in different directions.When faced with conflicting incentives during project implementation, staff must put in the extra effort to ensure that gender mainstreaming policies are fully implemented-sometimes against recipients' preferences.As one staff member highlights: "it takes extra work to meet the specific criteria [as targeted in the GMI]." 15 Whether they are willing to do so crucially depends on their personal views on mainstreaming.Studies on gender mainstreaming show that the content and implementation of gender mainstreaming is shaped by critical actors' views on gender issues and their gender expertise (Altan-Olcay 2020; Caglar, Prügl, and Zwingel 2013;Gerard 2023).We believe that similar dynamics affect project implementation.Staff committed to gender mainstreaming will be more inclined to fully integrate gender mainstreaming policy guidance into project design and less likely to engage in perfunctory and shallow implementation.
Our argument highlights gender as an important factor shaping such commitment to gender mainstreaming.We do not invoke an essentialist argument about differences between men and women in the context of gender mainstreaming.Not all women are feminists and men or people with other gender identities can and do advocate for gender mainstreaming.Nevertheless, concerns about gender equality tend to be more common among women than men.Survey evidence from a large number of countries has demonstrated that women, on average, hold more egalitarian views on gender roles and perceive gender inequality as a bigger problem than men (Bolzendahl and Myers 2004;Burns and Gallagher 2010;Poushter, Fetterolf, and Tamir 2019;Wike et al. 2010).Such differences in views can materialize in political and bureaucratic decision-making.On average, women legislators are more likely to prioritize policies that affect women, including reproductive rights, equality of opportunity, family policies, and social welfare (Atkins and Wilkins 2013; Betz, Fortunato, and O'Brien 2021;Celis 2007;2008;Phillips 1995).Similarly, women public sector workers-again, on average-emphasize different policy issues in their work than men (Bishu and Kennedy 2020; Park 2013; Wilkins 2007).For example, Keiser et al. (2002) and Riegle-Crumb and Humphries (2012) show that increasing the number of women teachers in US schools increases girls' grades.Wilkins and Keiser (2006) find a link between women's representation in child and family services agencies and enforcement of child support laws.Similarly, Park and Liang (2021) uncover that the bureaucratic representation of women is associated with increases in women's educational attainment in non-OECD countries.This leads us to expect that gender mainstreaming policies will, on average, resonate more with women than with men. 16 In sum, we argue the Bank's organizational culture and the demands of principals compel staff, at a minimum, to implement shallow gender mainstreaming.However, as argued above, deeper mainstreaming requires extra effort that may not always be welcomed by client governments and does not help get projects approved.Such deeper gender mainstreaming thus often depends on the individual commitment of staff members to gender mainstreaming goals, which prior literature has linked to gender representation.Therefore, we expect that gender differences between staff shape the depth of gender mainstreaming components but not the inclusion of any gender mainstreaming in the projects they oversee.We hypothesize: H1: (any mainstreaming).There is no difference between women and men staff in the inclusion of gender mainstreaming into the projects they implement.
H2: (deep mainstreaming).If women staff oversee project implementation, these projects will include deeper gender mainstreaming.
Gender representation may also shape the commitment to gender mainstreaming policies among staff peer groups.IOs are influenced by the shared beliefs of subcultures within these organizations (Weaver and Nelson 2016).We thus hypothesize that bureaucratic cultures become less hostile to issues of gender equity as the proportion of women in staff increases (Keiser et al. 2002;Wilkins and Keiser 2006).As the literature demonstrates, a greater number of women can (under certain conditions) change the dynamics of deliberations on gender issues (Dietrich, Hayes, and O'Brien 2019;Karim and Beardsley 2016;Mendelberg, Karpowitz, and Goedert 2014).Specifically, the increased representation of women can shift the salience of certain issue areas (Meier and McCrea 2022), alter "masculine" subcultures (Kennedy, Bishu, and Heckler 2020), and encourage discussion on hitherto neglected policy issues (Chattopadhyay and Duflo 2004;Dietrich, Hayes, and O'Brien 2019).
Therefore, we expect that the Bank's bureaucratic subunits (especially the so-called Global Practices) more closely align with gender mainstreaming goals as the number of women working within them increases.Crucially, because of presumed socialization and learning effects due to the increased proportion of women in these subunits, the embrace of gender mainstreaming policy guidance may happen irrespective of whether women are leading projects directly.Therefore, we hypothesize: H3: (any mainstreaming).Bureaucratic units with a higher proportion of women do not differ in the inclusion of gender mainstreaming components.
H4: (deep mainstreaming).If a higher proportion of positions in a bureaucratic unit are held by women, projects will include deeper gender mainstreaming.

A DATASET OF WORLD BANK STAFF
To test these arguments, we constructed a novel individual-level dataset on thousands of Bank staff members at different levels of the organizational hierarchy.Studies show that staff need to be in positions of authority to push for change (Keiser et al. 2002).Therefore, we identified the key individuals with decision-making power over individual Bank projects and collected data on their gender.Specifically, we collected data on country directors (CDs), practice managers (PMs), and task team leaders (TTLs).These three groups represent the three primary staff involved in Bank projects.Figure 2 displays the answers to a question asking staff to indicate the key staff that input into project design decisions based on our 2022 survey with TTLs (discussed in more detail below).It shows that 79% of respondents highlight the importance of TTLs, 43% indicate that CDs play an important role, and 22% that PMs are involved in project-level decision-making.We briefly discuss the role of these staff in Bank projects and how we collected data on their gender.
First, TTLs are the main staff members in charge of individual projects and are generally considered the most critical staff in project-level decision-making at the Bank (Briggs 2021).Second, CDs oversee the overall project portfolio for a given country (Honig 2020) and are mandated to ensure that gender issues are considered within this portfolio (Kenny and O'Donnell 2016).Third, PMs are appointed by sectoral global practices to oversee project staff in a particular sector and subregion.In other words, TTLs are relevant because they make design and implementation decisions, CDs matter because they are supposed to ensure that gender mainstreaming is included in the projects under their supervision, and PMs are important because they appoint and manage TTLs.Project design decisions are mainly owned by TTLs (in negotiations with client governments) and coordinated with PMs and CDs.However, during the board approval process, projects are discussed in broader committees that can include higher-level staff like regional vice presidents or the managing director -depending on the importance of the recipient countries for the Bank. 17e collected the names of CDs, PMs, and TTLs from publicly available information in the Bank API and project documents from its documents and reports archive (World Bank 2022a).18Specifically, we extracted the names of 4,949 TTLs overseeing 86% (8,506) of Bank projects between 2000 and 2020.Our database also includes 196 CDs in charge of more than 90% of the Bank's project portfolio during the same time and 280 PMs overseeing the 2,525 projects where data on our dependent variable (GMI score) are available.
We combine automated methods and hand-coding to ascertain whether staff likely identify as women.In line with recent data collection of individual-level data (Nyrup and Bramwell 2020), we predicted whether Bank staff likely identify as women using genderize.io, which classifies the gender of individuals based on millions of self-reported names and gender from social media profiles.To validate the data, we hand-coded the gender of 981 Bank managers named in Bank Annual Reports based on their use of gendered pronouns in online biographies on the official Bank and personal websites.The algorithm correctly classified 92.6% of these staff members.To increase the accuracy of our measurement of gender further, we hand-coded every individual where genderize.ioreported a distribution of less than 75% of social media profiles listed as one particular gender (e.g., Andrea or Jira) or failed to classify the name.By doing so, we increased the accuracy of our variable to approximately 98%. Figure 3 displays the percentage of projects in our GMI database supervised by women staff.Most women TTLs work on finance, ICT, and education projects, while fewer women work on agriculture, energy, and industry projects.

OBSERVATIONAL ANALYSIS OF GENDER MAINSTREAMING IN WORLD BANK PROJECTS
We estimate the impact of gender representation on the inclusion of gender mainstreaming in 2,076 projectswhere data on the GMI are available-that the Bank has implemented since 2009.We employ ordinary least squares regression, entropy balancing, as well as instrumental variable regressions.Our main independent variables are dummy variables indicating whether any CDs, PMs, and TTLs in charge of individual Bank projects were women (H1 and H2) and the share of women appointed in the previous sector-year (H3 and H4).
We display more information on the descriptive statistics for our key variables and their sources (Supplementary Tables A3 and A4) and give some more information on the GMI in Supplementary Figures A1-A5 and Supplementary Table A5.
The main threat to inference is endogeneity, which would occur in two scenarios: first, management could predetermine that a project should have a greater gender focus and select women to run this project based on gender stereotypes.Second, women staff might be more likely to apply internally for projects with a greater gender focus and, hence, self-select into these projects.Selection based on predetermined project objectives is not a problem for CDs or PMs, who oversee large, often multicountry, project portfolios and do not select into countries based on specific projects that have not even started when they begin their posts.However, reverse causality is a considerable concern for TTLs.Therefore, we employ several control variables and an instrumental variable strategy to minimize this possibility.
First, we aim to hold differences in the Bank's approach to projects constant.Gender mainstreaming seeks to incorporate a gender lens into all projects, not just those focusing explicitly on gender issues. 19Therefore, we control for the percentage of a project that explicitly focused on gender issues.As budgetary data on gender are unavailable, we rely on Bank data on the share of a project's goals (called themes).Additionally, to ensure that results are driven by the gender of Bank staff and not by their gender expertise, we control for the number of gender-focused projects a given staff member has run before the project of interest.While we cannot account for staff's experience before joining the Bank, this measure accounts for their experience on gender issues as a TTL at the Bank.We also employ sector-year fixed effects because of the differences in staff gender and gender mainstreaming focus across sectors. 20Additionally, we include a measure for the involvement of IDA, the (log) project amount, and conflict-affected countries because attention by Bank management tends to be greater for such projects.
Second, we account for important differences between recipient countries.Country fixed effects control for time-invariant differences and measures of the economic rights of women (Cingranelli and Richards 2010), the share of women in the national government (Nyrup and Bramwell 2020), women's infant mortality (World Bank 2020b), and the share of women in vulnerable employment (World Bank 2020b) for timevarying differences in women's empowerment.Furthermore, we include recipient countries' GDP per capita and (logged) population as control variables (World Bank 2020b).
Third, we control for the average gender-focused lending of the five most important Bank DAC shareholders (USA, UK, Germany, France, and Japan) to adjust for the interests of the Bank's political principals (Clark and Dolan 2021).Data on the gender focus of principals' bilateral aid are taken from the OECD (2024).The indicator is calculated by taking the average share of activities with a gender marker of the five donors approved each year for a recipient.
We present the results from six models in Table 1.The models labeled "any mainstreaming (MS)" use a binary dependent variable that indicates whether a given project has at least one gender mainstreaming component.The models labeled "Deep MS" focus on the overall level of the gender mainstreaming index (ranging from 0 to 3).Models 1 and 2 are OLS regressions, including the discussed control variables.In models 3 and 4, we use entropy balancing to reduce the covariate imbalance between treatment and control groups.These models proceed under the assumption of exogeneity conditional on covariates.Given the extensive control variables used in these models, it is very unlikely that omitted variable bias operates on the sector-year or country-year level.However, we might face endogeneity at the project level.We use an instrumental variable approach in models 5 and 6 to address such endogeneity.
Our instrumental variable is valid if it predicts whether at least one of the TTLs in a project is a woman (relevance criterion) but must not affect gender mainstreaming ratings through any other channel (exclusion restriction).We use the number of TTLs listed on a project in our database as an instrument.The instrument is relevant because the chances that at least one woman works in a project increase with every additional TTL.Women comprise between 25% and 45% of staff in our period of interest.Thus, the more hiring decisions are made, the more likely the selection of women among TTLs is because of the supply of candidates.Indeed, the results from the first stage regression (reported in Supplementary Table A6) show that one additional TTL on a project increases the likelihood of a woman TTL on the project by approximately 15% (p < 0.001), and the F-statistic from the first stage is around 111-far exceeding conventional critical values.
The instrument is also plausibly excludable, conditional on covariates, because theoretical arguments do not imply that projects with larger numbers of TTLs lead to a deeper mainstreaming, except through the greater likelihood that at least one woman is working on the project.One could question the excludability of the instrument from three main angles: first, larger projects may have more TTLs, and the increased scrutiny in these projects could lead Bank management to try to ensure that gender mainstreaming is incorporated.We account for this argument by controlling for the (log) project amount.Second, some sectoral global practices may be more cooperative and, thus, more inclined to appoint more co-TTLs.If these practices had more women TTLs, the validity of the instrument would be threatened.We control for such factors through sector-year fixed effects.Finally, our gender expertise control variable accounts for gender experts being simply added to projects that have been predetermined to include greater gender mainstreaming.Hence, we argue that the number of TTLs on a project is a valid instrument.
Consistent with our hypotheses, the gender of TTLs is not associated with any mainstreaming (measured on the 0-1 scale).The coefficients for women TTLs are very small and fail to attain statistical significance at conventional thresholds in models 1, 3, and 5.However, substantial gender differences appear when we probe shallow versus deep mainstreaming (0-3 scale).We find strong evidence that women TTLs run projects with deeper mainstreaming.The coefficients are positive and statistically significant in models 2, 4, and 6.For example, the depth of mainstreaming in the average project would increase by around 19% if the TTL in charge were a woman, based on model 6.The findings on the two other staff groups align with their mandates.CDs are tasked with ensuring that projects run in their country incorporate relevant policy guidance, including gender mainstreaming (Winters et al. 2018).We show that projects are more likely to incorporate gender mainstreaming components and the depth of gender mainstreaming increases when CDs are women.However, we do not attain similar results for PMs who are not directly tasked with ensuring that gender mainstreaming is incorporated into projects.In other words, women staff appear to make a difference when given the mandate and discretion to do so.
We now turn to Hypotheses 3 and 4, which posit that an increased proportion of women in bureaucratic subunits, like the Bank's sectoral Global Practices, changes the culture in these units.We expect that when more women work in sectoral subunits, such as education or agriculture, they impact decisions by others through the discussed contagion effect.To test this argument, we use an additional independent variable that calculates the share of women appointed in each sector in all other projects approved within three years.The specifications presented in Table 2 mimic the models discussed above with two substantial modifications.First, we include sector instead of sector-year fixed effects because the primary variable of interest varies only on the sector-year.Second, we also control for the average GMI in the sector and country within three years, as well as their interaction to mitigate concerns that the variable picks up organizational changes toward gender equality in the subunit more generally.
The regression models support our expectation that a greater proportion shapes the depth of gender mainstreaming.The coefficient for women appointed in the sector within three years of the project of interest is positive and statistically significant (p < 0.05) in model 8.One standard deviation (0.09) change in the share of women in a sector is comparable to the coefficient of having at least one woman TTL overseeing the project of interest.If all TTLs in a given sector were women, the models estimate that gender mainstreaming scores would increase on average by around 1.31 (on a fourpoint scale).Overall, the results support the contagion hypothesis, implying that hiring more women staff seems to alter the behavior of staff working in the sector more generally-a result that aligns with the views shared by current TTL staff. 21 Our results remain robust when using a substantial number of alternative specifications.The mainstreaming depth measure may be more accurately modeled as count data.Hence, we re-estimate models using Poissonpseudo-maximum-likelihood (Supplementary Table A7).We also test the robustness of our findings using logit models for the "any mainstreaming" and ordinal logit models for the "mainstreaming depth" measure (Supplementary Table A8).Furthermore, we re-estimate models using the conditional mixed process estimator, a special variant of seemingly unrelated regression, to account for correlated errors between the two models, and control for country-year fixed effects to account for all unobserved heterogeneity at the country-year level (Supplementary Table A9).In addition, we test the robustness to alternative clustering at the country-year (Supplementary Table A10) and re-estimate the models for each staff category separately (Supplementary Table A11).Another concern would be that we only controlled for the main five principal member states of the Bank (France, Germany, Japan, United Kingdom, United States), but some principals can wield outsized influence when they fund projects directly through cofinancing or earmarked funding (Heinzel, Cormier, and Reinsberg 2023).Therefore, we scraped data on the financing of individual Bank projects from the World Bank project websites and controlled for all identified third-party funders that appear more than once in the data by including financier dummies (Supplementary Table A12).
We also try to understand whether gender stereotypes of evaluators bias the gender mainstreaming index.We scraped data on project objectives from the Bank website.We then coded whether a project includes genderdisaggregated targets-a key gender mainstreaming indicator-and estimate whether gender mainstreaming indices on monitoring and evaluation are inflated in women-run projects.We do so by controlling for our novel gender-disaggregated targets indicator, and our results show that gendered rating bias does not appear to be a major concern (Supplementary Table A13).Additional analysis show that our results are not driven by an overall better performance of women staff compared to men (Anzia and Berry 2011;Park 2013) but are specific to gender mainstreaming by utilizing performance ratings by the Bank's Independent Evaluation Group (Supplementary Table A14).We also re-estimate models excluding the gender theme control variable as well as all projects without a gender theme to ensure that our results do not depend on these specification choices (Supplementary Table A15).
Finally, we conduct exploratory tests of interactions between some of our key variables (Supplementary Table A16).Our results show that men staff with more experience with gender-focused projects (gender expertise) implement somewhat deeper gender mainstreaming than women without such gender expertise, although women with gender expertise run projects that adhere deepest to the gender mainstreaming policy guidance.Similarly, men staff are more likely to implement deep gender mainstreaming when the person overseeing this implementation, the Country Director, is a woman.These findings further nuance our findings and caution against an essentialist interpretation of our analyses.

EXPERIMENTAL ANALYSIS OF GENDER MAINSTREAMING IN WORLD BANK PROJECTS
Our observational analysis strongly supports our expectation that women staff members show more commitment to gender mainstreaming goals and, therefore, incorporate deeper gender mainstreaming into the operations they oversee.Two key assumptions of our theoretical argument remain untested so far: first, women and men differ in the degree to which they believe that implementing gender mainstreaming will help achieve the Bank's mandate.Second, there are no substantial gender differences in the belief that including gender mainstreaming will ease project approval.
To test these assumptions, we implemented a preregistered22 elite survey experiment in the Spring of 2022 with Bank TTLs.We identified respondents based on our TTL database, identified 4,949 email addresses of TTLs that oversaw at least one project from 2000 to 2020, and emailed TTLs to invite them to share "how differences between TTLs affect their opinions on Bank projects."Of the sent emails, 2,328 (47%) reached their intended addressee and 178 TTLs answered the survey, resulting in a response rate of 7.6%.While low, this response rate exceeds the response rates attained by comparable surveys with Bank staff (4-5%) (Heinzel, Weaver, and Briggs 2024).We weight responses according to the sectoral and country distribution of Bank projects between 2000 and 2020 to minimize concerns around self-selection bias-particularly pressing for surveys with low response rates (Briggs 2021).
We implemented a conjoint survey experiment where respondents were asked to rate two profiles of Bank projects that randomly differed on eight project features.For each of the project profiles, these features randomly took on one of two different levels (see Supplementary Material for full discussion).The conjoint experiment allows us to assess the impact of these different levels (Briggs 2021;Hainmueller, Hopkins, and Yamamoto 2014).For example, one of the eight features was gender-disaggregated targets and this feature randomly took on the levels "Yes" or "No."We focus on the gender-disaggregated targets feature in this article.As discussed, gender-disaggregated targets are an important dimension of gender mainstreaming and are included explicitly in the GMI.Respondents were shown two profiles that randomly either included or excluded these targets.They were then asked to indicate on a scale from 1 to 10 whether they thought each project would be likely to (1) attain approval by the Bank executive board and (2) whether each profile would lead to a greater development impact. 23Each respondent rated seven pairs of profiles, and our dataset includes 2,478 observations.We display marginal means to understand the effect of gender mainstreaming on the perceived likelihood of approval and impact (Leeper, Hobolt, and Tilley 2020).The results presented in Figure 4 clearly show that staff members (regardless of their gender) perceive that gender mainstreaming components help get projects approved.The marginal mean for gender mainstreaming ("Yes") is substantially higher than the marginal mean for gender mainstreaming ("No").The difference is statistically significant at conventional thresholds (p < 0.05).Similarly, staff perceive that gender mainstreaming also increases the development impact of projects.The difference is also statistically significant at the 95% confidence level.
In a final step, we estimate the difference in subgroup marginal means between women and men for projects including gender-disaggregated targets and projects that do not include them (Table 3).It is important to note that we cannot experimentally manipulate the gender of respondents and rely on subgroup differences in causal effects to test our hypotheses.
In line with our theoretical expectation, we find no statistically significant differences in the degree to which women and men believe that gender mainstreaming components will help them get projects approved.However, women's perceptions of the development impact of projects with gender mainstreaming are, on average, 0.608 points higher than men's.The difference is statistically significant at conventional thresholds (p < 0.05).The results of our experimental analyses indicate that gender representation shapes the degree to which the depth of gender mainstreaming is perceived as a valuable tool for development.
We report several (exploratory) robustness checks to increase confidence in our results on the subgroup analysis.First, we control for the sector respondents primarily worked on to ensure that results cannot be explained by the differential representation of women in different sectors (Supplementary Table A17).The results are substantively similar.Second, we verify that the greater self-reported gender expertise of women respondents does not drive results.To this end, we asked respondents to indicate their perceived expertise on gender issues on a scale from 1 (very low) to 5 (very high).Supplementary Table A18 demonstrates that women self-assess greater levels of gender expertise, even controlling for the sector they work in, the region they work on, the focus of their educational background, and the country they are from.In Supplementary Table A19, we include dummies for four of the five levels of self-reported expertise on gender issues on the righthand side of the equation.Our primary coefficient of interest (the difference between women and men for projects with gender mainstreaming) is only marginally significant in this specification.We subsequently use

Likelihood of project impact
Note: Marginal means of TTL's perceived likelihood of project approval and project impact for profiles including and not including genderdisaggregated targets.The full model is displayed in Supplementary Table A17.
self-perceived expertise as an alternative subgroup indicator (Supplementary Table A20).Again, the main coefficient is marginally significant.These findings imply that our main results are partly, but not wholly, driven by greater gender expertise among women staff.Finally, we weighted by sectors and regions of respondents work in the preregistered main analysis to generalize to the universe of Bank projects.We use alternative weighting approaches to generalize to the population of staff members in Supplementary Table A21 (by gender), Supplementary Table A22 (by gender and educational background), and Supplementary Table A23 (by gender and nationality).These tests consistently show that women believe more than men that including gender mainstreaming increases projects' development impact.

CONCLUSION
Pervasive gender gaps in economic opportunity undermine the global effort to eradicate poverty and achieve the sustainable development goals.Many IOs try to overcome these gaps by implementing gender mainstreaming policies.Yet, IO staff often implement these policies only in a shallow manner (Hafner-Burton and Pollack 2002;Hardt and von Hlatky 2020).In this article, we explain this variation in the case of the World Bank and show that the underrepresentation of women in IO staff is an essential piece of explaining this puzzle.
Our empirical analyses provide substantial evidence for our explanation highlighting a combination of institutional constraints and gender representation.First, we show that the formal policy guidance on gender mainstreaming at the Bank incentivizes all staff-regardless of gender-to implement gender mainstreaming in a shallow manner to attain project approval.Second, the likelihood that women implement deeper gender mainstreaming is higher than for men.Women, on average, also perceive gender mainstreaming as more essential to the development impact of Bank projects in our survey with Bank staff.Third, when women have positions of authority and a higher proportion of women are coworkers within an organizational subunit, the depth of gender mainstreaming in projects increases, irrespective of the gender of the staff implementing the specific project.These results imply that women's increased representation on Bank staff incites contagion effects that are important to understanding variation in the depth of gender mainstreaming.
Before discussing the broader implications of our study, we want to highlight some limitations.As discussed, we relied on a binary classification of gender because of data collection challenges.These data do not accurately account for the gender of people who do not identify as men or women and future research should seek to redress this empirical shortcoming.
Moreover, the gender representation literature also shows that intersectional identities shape peoples' views and decisions (Acker 2012;Kantola and Nousiainen 2009;Karim and Beardsley 2017, 52;Palaguta 2020).For example, the highly skewed representation of nationalities among the staff of the Bank, dominated by staff from high-income countries, likely also means that the voices of women from countries where most Bank's projects are implemented remain marginalized in the organization.More research is needed to understand how intersectional identities (including nationality, race, gender, education, and professional backgrounds) mediate substantive gender representation in global governance.
Finally, we only focused on the implementation of gender mainstreaming policy guidance, which is a necessary but not a sufficient condition for the impact of gender mainstreaming.While recent research indicates that it can make a positive difference on gender equity and empowerment in developing countries (Donno, Fox, and Kaasik 2022;Minasyan and Montinola 2023), whether these hopes materialize also depends on how projects are designed and whether context conditions do not undermine mainstreaming goals.
We believe that our study may generalize to other IOs where staff are faced with similar context conditions.The unprecedented depth of the novel staffing data we collected over three years meant we had to limit our investigation to one organization.Yet, gender mainstreaming has diffused to many IOs, and we expect our findings to have broader lessons for other development aid organizations such as regional development banks, who work in similar contexts.Specifically, there are three context conditions or cultural attributes that we believe are shared by these multilateral development IOs.First is the bureaucratic incentive for staff to move money out the door which is common in aid agencies and IOs (Easterly 2002).Second, many of these organizations also have similar weighted voting procedures and principal approval processes (Blake and Payton 2015), which amplify principals' influence over the approval process.And third, all organizations face reticence from some client governments to engage with gender mainstreaming due to fundamental differences in social, political, and economics norms.While we are confident that these conditions are similar among the group of international development organizations, we believe that our findings could also apply to other IOs active in different policy areas with strong operational components, like climate change adaptation, refugee support, or food security.In these areas, at least some of the discussed context conditions-particularly approval cultures and client governments views on gender mainstreaming-are also common.
We nonetheless acknowledge that our theory is not readily generalizable to all IOs.For example, we deem it less likely that our findings apply to high politics areas like security and defense.Staff discretion is key to our argument, and the hierarchical structure of security organizations (at least the military bodies) means that lower-level staff have less discretion to act on their policy views (Hardt and von Hlatky 2020). 24Our findings may also not generalize to less service-oriented 24 Scholars working on security policy have provided important and insightful discussions of the implementation of gender mainstreaming in NATO (Hardt and von Hlatky 2020;Prescott 2013) and the role of gender mainstreaming in UN peacekeeping (Karim 2019;Karim and Beardsley 2016;Prescott 2013;Pruitt 2016).
IOs, such as the UN Security Council or World Trade Organization, as staff is quite limited and it is the principal member states that control the policy output of these organizations (Abbott and Snidal 1998;Zimmermann, Kortendiek, and Young 2023).More comparative research is needed to better understand the broader applicability of our findings on the role of individuals in the implementation of mainstreaming policies in IOs.
Despite these limitations, our findings have three important implications.First, we add to the literature on bureaucratic influence and bureaucratic representation in IOs (Badache 2020; Barnett and Finnemore 2004;Chow and Han 2022;Dijkstra 2017;Haack, Karns, and Murray 2020;Johnson 2013;Parizek 2017).Specifically, we highlight a link between the gender of individual staff members and important IO implementation decisions.Hence, our study aligns with a growing literature highlighting the importance of individuals in IOs (Abels and Mushaben 2020;Arias and Hulvey 2023;Chwieroth 2010;Clark and Zucker 2023;Forster 2024;Oksamytna, Bove, and Lundgren 2020).
Second, IOs have increasingly mainstreamed crosscutting policy areas, like gender, human rights, environmental protection, and climate change, into their policy portfolios (Clark and Zucker 2023;Dörfler and Heinzel 2023;Hafner-Burton and Pollack 2002;Pollack and Hafner-Burton 2010;Tallberg et al. 2020).These policy areas tackle some of the most important challenges of the twenty-first century, but many studies also show that the implementation of such policies can vary.Our explanation based on the interplay of individual preferences and organizational incentive structures may also help to explain the varying implementation of formal policy guidance in these other crucial areas.
Finally, our study could inform IO policymakers who wish to align internal diversity policies with gender mainstreaming agendas while avoiding tokenism.Simply hiring more women should not be treated as a silver bullet.Too often, women are asked to shoulder additional burdens of the work on gender equality (Karim and Beardsley 2016).As previously discussed, hiring more women does not guarantee that the organization will gain more feminist or gender expertise (Caglar, Prügl, and Zwingel 2013;Gerard 2023) and the creation of "gender expert" positions does not guarantee to fundamentally alter organizational discourse and practice to align with more feminist sentiments (Altan-Olcay 2020).Nevertheless, our findings may provide guidance on reforms that could aid the implementation of mainstreaming.These include hiring more women and men staff members who possess specific gender expertise,25 promoting women into positions of authority, and increasing the overall number of gender advocates in the organization to change views engrained in bureaucratic subcultures.Through these means, organizations like the World Bank could minimize the variable application of their gender mainstreaming policies and enhance the chances that the benefits from their programs are not withheld from women in recipient countries.

FIGURE 2 .
FIGURE 2. Most Important Staff Members for Project Design According to Survey with TTLs (n = 178)

TABLE 2 .
OLS Models Regressing Gender Mainstreaming on Average Women Appointed in Sector 21Interviews 2022B, 2022G.

TABLE 3 .
Difference in Marginal Means between Women and Men Note: Respondent-clustered standard errors in parentheses; * p < 0.05 (preregistered confidence level).