Which Active Labor Market Policies Work for Male Refugees? Evidence from Germany

Abstract In this paper, we estimate the causal effects of a set of active labor market programs for male unemployed refugees on welfare who entered Germany between, 2013 and September, 2016. Using rich administrative data, we employ covariate balancing propensity scores combined with inverse probability weighting to estimate effects up to 33 months after the start of treatment. Our results show that relatively short-term training in the form of Schemes by Providers and In-Firm Training, as well as longer-term Further Vocational Training programs have a positive impact on both the employment chances as well as labor market earnings of refugees in the medium run. So-called “One Euro Jobs”, a public employment program, does not yield positive effects on employment or earnings. Sensitivity analyses confirm that our results are unlikely to be driven by unobserved confounding.


Introduction
According to official statistics from the German Federal Office for Migration and Refugees (), about . million persons applied for asylum between  and . Around  percent were submitted between  and  alone, mostly by asylum seekers from Syria, Afghanistan and Iraq. This recent refugee wave was caused by intense instability as a result of war, political and/ or socioeconomic volatility, factors that are unlikely to be resolved in the foreseen future. Therefore, the integration of refugees in the German labor market is a major policy concern.
In addition to the importance of employment for economic purposes, studies have shown that it is vital for a person's social integration and well-being (Clark, ; Paul and Moser, ; Pohlan, ). For refugees, this effect is likely stronger given they faced displacement and need to integrate in a foreign environment. Moreover, successfully integrating refugees into the German labor market could help in reducing labor supply shortages (Klinger and Fuchs, ). Recent evidence by Kosyakova et al. () shows that almost  percent of refugees who migrated to Germany between  and  are in employment five years after arrival. Nevertheless, a large number of refugees remain unemployed, most of which receive welfare benefits (German Federal Employment Agency, ). Moreover, Brücker et al. () shows that wages of employed refugees are often low, highlighting the poor quality of such employment.
Participation in active labor market programs has a significant potential in facilitating the employment integration of refugees. There is ample evidence that active labor market programs (ALMPs) played an important role in achieving this goal for more general population groups (Card et al., ). However, it is unclear how informative this literature is for more recent refugees, as they tend to be less educated and more culturally distant from the native population compared to earlier refugee waves (Dustmann et al., ).
To this date, only a limited number of empirical studies in Europe analyzed the effectiveness of active labor market polices for the recent refugee wave, mainly from Scandinavia (Arendt and Bolvig, ; Arendt, ; Dahlberg et al., ), in addition to a study from Austria (Ortlieb et al., ). Overall, the results highlight that on-the-job training and further vocational training provide significant employment and income effects, while short-term programs focusing on skill assessment and career counseling also showed positive (albeit more modest) effects. The results also show that early language training increases employability of refugees. We add to this literature by examining the effectiveness of the four quantitatively most important German ALMPs for welfare recipients (Wapler et al., ) on the labor market success of refugees in Germany. Our analysis is based on rich and high-quality administrative data, allowing us to evaluate the effectiveness of the programs using inverse probability weighting based on the selection-on-observables assumption (Heckman and Robb, ).
Our sample is based on the full count of refugees with a residence permit who were unemployed and received welfare benefits on September  th . On the one hand, choosing this sampling date allows a sufficiently long observation period after the start of treatment such that causal treatment effects up to  months after treatment start can be estimated. On the other hand, this limits the number of available observationsas only about . individuals had a residence permitwhich grants them access to welfare in Germany towards the end of  (German Federal Office for Statistics, ).
Moreover, only about . refugees were officially registered as unemployed and received welfare in September  (German Federal Employment Agency, ). To obtain a more homogenous sample, we restrict our attention to individuals who entered Germany between  and , further limiting sample size. Moreover, most refugees were male such that observation numbers for women in ALMPs are too low to estimate program effects separately. Hence, we focus on men only in this paper. These and a few other necessary sample restrictions leave us with a sample of about , to , male individuals, depending on the ALMP considered. Our analysis shows that relatively shortterm trainingeither by a service provider or in-firmand longer term further vocational training increase employment and earnings, while a public employment program is ineffective. Additional analyses show that there is nonnegligible effect heterogeneity in terms of individuals' duration of stay in Germany for participants of training schemes by providers. Moreover, sensitivity analyses show that our results are robust to unobserved confounding.
Next, Section  presents the German institutional setting and provides a short overview of the evaluated programs. Section  lays out our empirical strategy, provides details on the data and presents selected descriptive statistics. Section  presents the empirical analysis, while Section  concludes.

Institutional background 2.1. Institutional setting
In Germany, two types of unemployment benefits are available. Unemployment benefits I (UB I) cover unemployed workers who contributed to the unemployment insurance system for a minimum of one year within the last two years of employment. UB I-beneficiaries receive up to % (%) of their last net wage as parents (childless persons), usually for a maximum duration of one year (Riphahn et al., ).
Unemployment benefits II (UB II) or simply welfare benefit, are taxfinanced means-tested flat rate benefits covering persons capable of working  for a minimum of three hours per day, and whose household income falls short of the legal social minimum. So even persons who are working or receive UB I can be eligible to UB II if their household income is sufficiently low (Riphahn et al., ; Wapler et al., ). Most refugees who arrived between  and  and who have a legal residence permit in Germany are also eligible for welfare benefits as their household income is likely below the social minimum. Moreover, most of them have not worked long enough to be eligible for UB I benefits.
Welfare beneficiaries are obliged to make efforts to increase their employability to reduce their welfare dependency. Public employment offices, referred to as job centers, support welfare beneficiaries to this purpose through job search assistance and targeted placement in ALMPs (Wapler et al., ). To combat moral hazard, welfare beneficiaries refraining to participate in an ALMP may be subject to sanctions that can result in a reduction of welfare benefits for a period up to three months (van den Berg et al., ). The same sanctions apply for refugees assigned to participate in an ALMP.

Which programs do we evaluate?
The first two programs examined are part of the so-called Schemes for Activation and Integration (SAI). In Germany, SAIs are performed either by private training providers (Schemes by Providers or SP) or by employers in the form of In-Firm Training (IFT), also referred to as on-the-job training (Harrer et al., ).
SP are the most frequently used ALMPs for welfare recipients in Germany. They provide training whose goals include guiding into apprenticeships and work, identifying and reducing employment impediments, providing placement services into contributory employment, preparing for self-employment and stabilizing an employment take-up (Wapler et al., ). For example, Guiding into Apprenticeships and into Work is a type of SP that provides guidance on finding suitable job offers and writing resumes and application letters. Another type of SP focuses on detecting individual employment impediments, improving skills and providing information on suitable employment opportunities (Harrer et al., ). SP may also be combined with In-Firm Training for a period of up to  weeks. Moreover, SP focusing on teaching a professional occupation may not exceed a period of  weeks (Harrer et al., ). The median duration of SP in our sample is about  months.
While SP occur mainly in the form of classroom training, IFTs occur in establishments and are shorter in duration. At the median, IFTs last one month in our sample. IFT measures focus on gaining practical skills and applying them directly on the job. In this manner, they are comparable to unpaid internships (Harrer et al., ). A number of refugee-specific programs were additionally developed within the context of a SAI. However, the numbers of treated individuals in these programs are typically too low to be analyzed by themselves or cannot be properly identified in the data due to initial limitations after the introduction of these programs.
The third program considered is Further Vocational Training (FVT), whose objective is to enable employability through human capital accumulation. FVT include short-term programs providing professional and practical skills along with long-term programs (up to two years) providing a certified vocational training degree (Bernhard and Kruppe, ; Wapler et al., ). Refugees can also receive accreditation of their foreign vocational qualifications in the context of FVT (German Federal Employment Agency, ).
As FVT programs usually require longer and more intensive investments in human capital, participants have less time and resources to invest in job search compared to non-participants, creating a so-called "lock-in-effect". The lock-ineffect reflects a period of lower employment rates for participants compared to non-participants due to being "locked-in" in the training. This lock-in-effect is likely to occur only during program participation, which, on average, lasts about five months in our sample. Soon after training completion, participants' employment should catch up and even exceed those of non-participants due to higher returns from human capital investments (Arendt, ; Card et al., ).
Lastly, we examine a public employment program referred to as One-Euro-Jobs (OEJs) targeting unemployed welfare recipients strongly detached from the labor market. OEJs are temporary (part-time) public sector jobs, typically lasting between three and twelve months, in which participants receive one to two Euros per hour that is not deducted from their welfare benefit (Harrer and Stockinger, ). The median duration of OEJs in our sample is three months. OEJs provide basic work activities for participants to increase long-term employability. Jobseekers should only be placed in a One-Euro-Job if none of the above-mentioned ALMPs is (yet) suitable (Hohmeyer and Wolff, ). All the above-mentioned programs, apart from the refugee-specific programs, were therefore pre-existing and extended to refugees. Access considerations according to the Federal Employment Agency are hence identical for refugees as non-refugees and include the necessary language skills.
Comparing treatment probabilities across programs, Tübbicke and Kasrin () show that unemployed refugees have a higher probability of participating in SP and IFT but a lower participation probability in FVT and OEJ compared to non-refugees. These differences may be due to different employment impediments among refugees on the one hand and caseworker preferences or perceptions about program effectiveness on the other. Case workers assign participants to programs after weighing potential program benefits against costs.
Higher participation in SP and IFT relative to FVT could be due to their shorter duration and due to lower human capital required for successful participation. FVT programs are also more expensive than SP and IFT. On the other hand, OEJs are programs of 'last resort' aimed at the long term unemployed who are extremely distant to the labor market. Previous results for more general populations also show that participants tend to differ across these programs. For example, Harrer et al. () show that IFT participants have more favorable background characteristics, leading to better labor market prospects than participants of SP in the absence of treatment. Hohmeyer and Wolff () show that OEJ participants are expected to perform relatively poorly in the labor market without an intervention. These differences must be addressed in order to obtain unbiased estimates of treatment effects (see Section .).

Identification and estimation strategy
We aim to estimate the causal effects of a series of active labor market programs on participating refugees in Germany. In the notation of the potential outcomes model by Roy () and Rubin (), we want to estimate the average treatment effect on the treated (ATT), defined as where D g is a treatment indicator, taking on the value of one if the person received treatment g 2 {SP, IFT, FVT and OEJs}, and zero if not. While Y 1 g refers to the outcome that is realized when an individual receives treatment g, Y 0 g denotes the outcome that occurs when the same individual is not treated with g (but potentially other programs). Hence, the ATT defined by () measures the additional effect of program g relative to all other programs available. Due to the fundamental evaluation problem, Y 0 g is unobservable for participants and hence, the second expectation in the equation above has to be imputed from data on non-participants.
To do so, we use inverse probability weighting (IPW) based on a selectionon-observables approach (Heckman and Robb, ). IPW re-weights nonparticipants based on their propensity score P g X PrD g 1 jX, i.e. the conditional treatment probability, in order to undo the selection into treatment. Under the selection-on-observables assumption, i.e. the assumption that the set of observed characteristics X includes all relevant factors that simultaneously govern the selection process and the outcome of interest, and the so-called common support assumption P g X < 1 8 X, the missing counterfactual can be recovered from data on non-participants by reweighting their outcomes by the odds of treatment P g X = 1 P g X . While the selection-on-observables assumption is strong, we are equipped with administrative data that allows us to control for a large number of observed characteristics measured prior to the start of treatment. To specify a model for P g X , we largely follow other studies evaluating the programs of interest for more general populations and pick covariates that have been shown to be predictive of at least one of the treatments under consideration (Bernhard and Kruppe, ; Harrer et al., ; Harrer and Stockinger, ).
We adjust for socio-demographics (age, time in Germany, nationality, schooling and occupational qualification  , presence of children by age category), household characteristics (household type, total income, income without welfare benefits), the type of last job (regular or minor employment) along with minor employment at sampling date, individuals' labor market history  (cumulated durations in regular employment, vocational training, ALMP participation, unemployment benefit I and welfare receipt, job search, information on regular employment, and real labor earnings at specific dates one year prior to the sampling date) and regional labor market information (unemployment rate, longterm unemployment rate, vacancy-to-unemployment ratio, inflow rates into sanctions and ALMPs). Kosyakova et al. () show that age, time in Germany, schooling, the presence of children in the household and regional labor market conditions are important predictors of refugees' labor market success. To minimize the impact of any parametric assumptions on our analysis, we discretized all covariates into sensible categories.  If the set of covariates X does not include all factors that influence the treatment decision and the outcome, then selection-on-observables assumption will fail and estimates will be biased. For example, this could be the case if physical or mental healthcharacteristics we do not observeare decisive for selection into treatment and the outcome-generating process. Similarly, bias may arise if unobserved language skills are important predictors of treatment and outcomes. Fortunately, (Caliendo et al., ) have shown that the inclusion of typically unobserved covariates tends not to significantly alter effect estimates of ALMPs upon conditioning on a large set of observed covariates. This is likely because covariates and especially pre-treatment outcomes have already been affected by those unobserved factors before the start of treatment.

Studies by Brücker et al. () and
Moreover, our list of control variables includes several strong predictors that can proxy language skills, including age, time since entry, education and country of origin (Fennelly and Palasz, ; van Tubergen, ). Nonetheless, we inspect how sensitive our results are to unobserved confounding in Section .. Results shows that our estimates are highly robust. Hence, we believe that the selection-on-observables assumption is a reasonable approximation for our application. Because we have fairly low treatment to control ratios and differences in covariate distribution are modest, overlap between groups is strong and we do not need to trim the sample.  In order to estimate the propensity score, we use the so-called covariate balancing propensity scores (CBPS, Imai/Ratkovic, ). The CBPS does not maximize the maximum-likelihood, instead it estimates logit coefficients by algorithmically optimizing the resulting covariate balance, leading to (near) perfect finite-sample balance.  In addition to obviating the need for iterative balance checking and re-estimation of the propensity score, it was shown to perform well in a recent empirical Monte-Carlo simulation by Frölich et al. () based on data similar to ours. Denoting person i's estimated propensity score byP gi X and its observed labor market outcome by Y i , estimates of the ATT are obtained using the normalized IPW estimator aŝ Intuitively, IPW re-weights non-participants by increasing (decreasing) the weight of individuals with a large (small) propensity score, thereby eliminating bias due to differences in covariate distributions between treated and untreated units. For statistical inference, we rely on the weight-based asymptotic approximation of standard errors by Lechner () as it was shown to be somewhat conservative and to perform rather well in large samples (see, Bodory et al., , for details). 

Data and descriptive statistics
As a data source, we mainly use the Integrated Employment Biographies (IEB), an administrative dataset from the statistics department of the Federal Employment Agency in Germany. The data includes the universe of employees liable to social security, the registered unemployed, registered jobseekers, benefit recipients, and ALMP program participants. Using the IEB has a number of advantages. First, due to its extensive nature, the IEB allow to condition on a large number of observed characteristics. Second, as the IEB includes the universe of person groups, it allows observing the full count of individuals, eliminating the need to rely on a random sample as is the case for most survey data. Lastly, the IEB does not suffer from panel attrition, which is likely to be more problematic among refugees than among the general population. All of these factors should greatly improve inference in our application.
We use the full count of refugees with a residence permit according to § § - Residence Law ("Aufenthaltsgesetz") who were unemployed and on welfare as of September , . As the great majority of refugees arriving to Germany between  and  was male and their treatment probabilities are higher than those of female refugees, we restrict our analysis to men.  To get a more homogenous sample, we restrict our attention to individuals who first entered Germany in  or later.  Moreover, we exclude individuals who were younger than  or older than  at the sampling date to avoid including people still in high school or who have reached the legal retirement age before the end of our analysis time frame. This is necessary as neither being in school nor in retirement can be observed in the data.
Based on this sample, we classify individuals as treated if they entered SP, IFT, FVT or OEJs during the entry window between October  and March .  For each treatment group we define a separate control group (or simply non-participants hereafter), i.e. individuals who did not participate in the respective program during the entry window. However, non-participants may participate in other programs during the entry window or participate later in the same program. Using a six months entry window allows sufficient observation numbers among the treated, while avoiding selectivity issues as a longer entry window may lead to a negative selection of non-participants (Sianesi, ). For each treatment, non-participants are randomly assigned a hypothetical entry month from the distribution of entry months among the respective treatment group. Individuals who are either gainfully employed or participating in one of the other programs at the hypothetical entry date are dropped from the respective sample.  This leads to slightly different groups of non-participants across treatments. Table  shows selected descriptive statistics for the four estimation samples. SP is clearly the most common ALMP assigned to refugees with , participants. Moreover,  individuals participated in IFT, followed by  OEJ and  FVT participants. The groups of non-participants range in size from around , to , individuals.
Panel A shows that the majority of both participants and non-participants are Syrian. Refugees from Iraq, Eritrea or Afghanistan make up between one and six percent, depending on the sample.  The mean age ranges from just over  years among participants of SP to around  years among individuals in FVT, while nonparticipants are about  years old. ALMP participants also differ by their time of arrival to Germany: on the one hand, OEJ and SP participants have been in Germany for . and . years, respectively. Both are significantly less than the . years that their non-participants spent in Germany. On the other hand, with an average duration of . and . years for IFT and FVT participants, respectively, both groups have spent more time in Germany than their non-participants.
Moreover, FVT and IFT participants have the highest percentage of persons with at least some vocational training or university education, with  percent for FVT and  percent for IFT participants. SP and OEJ participants have a lower share of persons with some professional qualification, at  percent for SP participants and  percent for OEJ participants. Among non-participants,  percent have at least some vocational training or university education. Similarly, IFT and FVT participants have the highest high school completion rates with  and  percent across all groups. Among OEJ participants,  percent completed high school, which is lower than the completion rates of nonparticipants with  percent. This indicates that IFT and FVT participants might have more favorable labor market prospects than SP participants, while OEJ participants seem to have the lowest employment prospects.
Our outcomes of interest are monthly regular employment, i.e. an indicator for whether an individual is in unsubsidized dependent employment subject to social security contributions, and real monthly labor earnings. For our causal analysis, we make use of outcomes from  months prior to the start of (hypothetical) treatment up to  months after. After  months, IFT and FVT participants perform best in comparison to the non-participants in both dimensions, as shown in panel B of Table . However, due to the non-random selection into treatment, such a simple mean comparison is plagued by selection bias.

Empirical analysis
In this section, we present estimates of causal effects for the ALMPs considered and inspect heterogeneity along several standard background characteristics. Note: Statistically significant differences at the //% level between participants and non-participants are indicated by * / * * / * * * on the respective value among the non-participants.
Lastly, a sensitivity analysis examines the robustness of estimates to unobserved confounding.

Checking balance
Before turning to the causal analysis, we present some evidence on the balancing quality before and after re-weighting the respective group of nonparticipants using our IPW approach. Table  displays several common balancing measures given by the mean absolute standardized bias (MSB, Rosenbaum and Rubin, ), the pseudo-R 2 from a probit regression, along with its corresponding p-value of overall significance (Sianesi, ), and Rubin's B (Rubin, ). If the CBPS methodology works, MSB, pseudo-R 2 and Rubin's B should be very close to zero after re-weighting. Indeed, all three measures only show negligible deviations from zero after weighting, indicating (near) perfect covariate balance. Similarly, the p-value rises from essentially zero to one in all samples after weighting, thus it is impossible to reject the null hypothesis that covariates are not related to treatment. Hence, our balancing approach delivers excellent balance, successfully purging the outcome gap between participants and nonparticipants from its association with background characteristics. 

Main results
Figure  presents effects on regular employment, with effects on real monthly labor earnings displayed in Figure . Estimated effects range from twelve months prior to the start of treatment until  months after. Estimates prior to treatment are close to zero and statistically insignificant, showing that participants and nonparticipants follow the same path in terms of employment and earnings, providing additional evidence that our weighting strategy was successful. For OEJs, there is a rather long-term and statistically significant negative effect on labor earnings and employment. After about  months, point estimates become positive but insignificant. Actually, this result is not surprising: OEJs are a last resort measure for individuals who are particularly distant from the labor market and so are likely to only increase employment prospects in the long run, possibly also in combination with other ALMP measures (Kiesel and Wolff, ). Indeed, the majority of OEJ participants in our sample received some additional treatment after the OEJ program. This partly explains the long-lasting negative effects of OEJ participation despite its median duration of around three months. Similar results are found by Harrer and Stockinger () who analyze the effectiveness of OEJs for a more general population of unemployed welfare recipients.
SP show a lock-in effect for the first three months after the start of the program, which coincides with their median duration in our sample. After five months, employment effects become statistically significant and after seven months, earnings are positively affected by the treatment. After  months, employment effects are roughly . percentage points and effects on earnings are about  Euro. While in absolute value, these effects may seem rather small, they are considerable when compared to mean outcomes of participants (see Table ). Relative to the mean, these effects are equivalent to an . percent increase in employment and a . percent increase in monthly labor earnings.
For IFT, we find positive effects on participants' employment probabilities almost immediately. This highlights that participating in an IFT, which has a medium duration of one month, speeds up labor market integration considerably. Over time, employment effects increase, peaking at around  percentage points after  months. Afterwards, estimates of employment effects subside somewhat, reaching  percentage points at the end of the observation period.  Effects on earnings follow a similar path, with an impact on monthly labor earnings of  Euro after  months. The effect sizes for IFT and SP participants are comparable to those found for a more general population of unemployed welfare recipients by Harrer et al. (), who examined treatment effects of program entry into SP and IFT in .
The results for FVT show a significant lock-in effect for three months, which is not surprising given its relatively long median duration of  months.
After around  months, positive impacts on employment rates and average earnings materialize. While employment effects quickly stabilize at around  percentage points, effects on earnings rise to around  Euro after  months. Again, the effect sizes are similar to the ones found by Bernhard and Kruppe () for German and non-German welfare participants who entered FVT in .
Thus, our findings indicate that all programs except OEJ are effective in improving refugees' labor market success. As groups of program participants differ in their composition, it should be noted that effect sizes across programs cannot be directly compared.

Effect heterogeneity
For our heterogeneity analysis, we split our samples according to median values of certain background characteristics, given by age, time in Germany and regional unemployment. The analysis then repeats the estimation steps for all sub-samples. As before, the CBPS methodology yields excellent balancing quality after weighting (see Table A. in the appendix for details).
First, we split the sample according to individuals' age, i.e. under  or at least  years old. This distinction is important because younger refugees might learn German and adapt to their new environment quicker than older refugees, which may influence how much they benefit from training programs. Second, we split the sample by duration of stay in Germany since time of arrival; here we analyze effects for individuals with a short and a long duration, i.e. below and above the median of roughly . years. Along this dimension, those with a longer duration are expected to be better integrated socially and culturallyand thus may benefit more from programs targeting short-term labor market integration such as SP and IFT.
We also analyze effects for individuals in different local labor market environments, splitting the sample by the local unemployment rate, with  percent serving as the cut-off. Refugees living in regions with relatively high unemployment may profit the most from programs focusing on human capital acquisition such as FVT, due to higher skill mismatch in regions with higher unemployment (Wapler et al., ). Table  shows estimates of employment effects for all programs and sub-groups to keep the presentation concise.  While all subgroup effect estimates are statistically significant except for OEJs, only SP shows statistically significant effect heterogeneity for duration of stay in Germany (indicated by the italics in Table ). Specifically, the employment probability of persons with a longer duration increases by . percentage points through participation, more than twice its impact on persons with a shorter duration of stay. Although heterogeneity effects for all other estimates are statistically insignificant, gaps are sizable among IFT, FVT and OEJ participants along several dimensions. As these are relatively small treatment groups, a lack of statistical power makes the detection of meaningful effect heterogeneity difficult. Particularly interesting are the results found for IFT and OEJs. Similar to SP, IFT participants with longer durations of stay appear to benefit more compared to participants with shorter durations. Regarding OEJs, point estimates suggest that this type of program may be effective for individuals with a short duration of stay in Germany and for individuals in regions with low unemployment rates. The latter may be due to an easier transition from a OEJ to a regular job in low unemployment environments.

Sensitivity analysis
The results presented so far are based on the selection-on-observables assumption. Hence, it is crucial to assess the risk of those estimates being plagued by "hidden bias" through unobserved confounding (Rosenbaum, ). As we cannot rule out this possibility, we follow much of the literature on sensitivity analyses and explore how point estimates and possibly inference may change under certain assumptions regarding a relevant but omitted binary confounder. Vanderweele and Arah () show thatunder some simplifying assumptionsthe bias B g due to an unmeasured binary confounder U can be written as: B g δ y Pr U 1jD g 1 Pr U 1jD g 0 ; where δ y is the population average effect of switching the unobserved confounder U from zero to one. Similar to Ichino et al. (), we use the distribution of observed covariates to inform our analysis about plausible potential unobserved confounder distributions. To provide the most credible results possible, the entire list of available covariates is used. The bias components in Formula () are estimated as follows: estimates of δ y are obtained through linear regression of the employment indicator after  months on all available covariates among non-participants. The sample shares of covariates among participants and non-participants serve as estimates of their probabilities of possessing the respective characteristic. Based on this information, the true ATT is re-estimated under the assumption that there exists an unmeasured confounder, which follows the same distribution as the observed covariate. Table  shows our baseline estimates as presented in Section . along with information on the distribution of results for the sensitivity analyses. As expected, there is some variation in effect estimates, albeit rather limited considering the strong predictiveness of covariates such as age, unemployment and job search history along with regional unemployment on individuals' employment probabilities.  Moreover, statistical inference remains unchanged, independent of the type of unobserved confounder assumed. This lack of variation in the bias-adjusted estimates supports our claim that the selection-on-observables Note: Statistically significant estimates at the //% level are marked with * / * * / * * * . This table shows estimation results using the sensitivity analysis suggested by Vanderweele and Arah (). The analysis assumes that there is some unobserved covariate that is distributed the same as an observed covariate and re-estimates bias-adjusted causal effects. assumption seems reasonable. If there is some bias remaining after conditioning on observed characteristics, it appears unlikely to overturn our inference.

Conclusion
This paper provides first evidence on the effectiveness of four prominent ALMPs on the labor market success of recent refugees in Germany based on a rich administrative dataset. Using the universe of male refugees on welfare that were unemployed on September ,  and entered treatment between October  and March , we examine treatment effects for a period of around three years after program entry. Our empirical strategy relies on the selection-on-observables assumption, which is justifiable in light of the highquality administrative data and supported by our sensitivity analysis. We find that all programs, except for the public employment scheme, have a positive impact on participants' regular employment and monthly earnings in the medium run. Thus, our analysis confirms that ALMPs are a valuable tool for successfully integrating male refugees into the German labor market. Our results on the positive effects of In-Firm Training and Further Vocational Training on regular employment are in-line with the results found for Sweden by Dahlberg et al. (). Moreover, our findings on the positive employment effects of In-Firm Training are consistent with the results found for Denmark by Arendt () and Arendt and Bolvig ().
In contrast to Arendt and Bolvig (), who find that In-Firm Training only increases employment in the short-term, our results show steady employment effects after around  years of program entry. Additionally, we find a significantly positive effect of In-Firm Training on wages in both the short and the medium term while the results by Arendt () show no significant short-term effect. This implies that ALMPs are an important addition to other existing European integration programs such as language courses that were also found to be effective.
Another interesting result is that the absolute effect sizes found are similar to those shown for other more general populations of unemployed welfare recipients in Germany (see Harrer et al., ; Bernhard and Kruppe, ; Harrer and Stockinger, ). Yet, refugees typically face stronger employment impediments, meaning that their effects are even more economically significant. This is because migrants in general, and refugees in specific, tend to display lower labor market success than natives, even after controlling for important factors including age and education (Algan et al., ; Bedaso, ). A possible reason for the larger relative effect for refugees is that successful ALMP participation acts as a stronger positive signal to potential employers due to their larger distance from the labor market compared to other groups (Liechti et al., ).
Regarding policy recommendations, the results clearly imply that public employment schemes should be used less intensively for refugees. A viable alternative may be to intensify the use of other ALMP programs for refugees, such as start-up and wage subsidies, which were shown to be effective for the general population as well as migrants (Butschek and Walter, ; Wolff et al., ). This may be a more effective option for individuals more distant from the labor market than public employment schemes, as start-up and wage subsidies typically lead to more regular types of employment, generating more valuable labor market experience. Lastly, one may think of combining several types of ALMPs more frequently.
However, both issues, i.e. start-up and wage subsidies as well as the combination of different ALMPs needs to be examined in future research for refugees to make definite recommendations. Analyses should also be performed for women when the number of observations allows it, to provide a comprehensive picture on the effectiveness of ALMPs on the successful integration of refugees in the German labor market.
implemented more strict common support measures and results do not change in any meaningful way compared to our baseline results.  Balancing is only perfect up to the tolerance given by the optimization procedure, meaning some negligible imbalance remains even after re-weighting.  While results by Bodory et al. () show that bootstrapping techniques may outperform the Lechner () approximation, the large number of replications necessary, our relatively large dataset and the computational intensity of the CBPS method makes the bootstrap infeasible in our application.  On September th , about . people registered refugees were unemployed and received welfare, about , of which were men. Dropping individuals who entered Germany before  and those whose date of entry is unknown reduces the sample to about . male individuals. As the date of entry into Germany is unknown in job centers run solely by municipalities, only individuals who are administered by job centers jointly administered by the municipality as well as the Federal Employment Agency are kept in the sample. In addition to the age restriction, individuals who left welfare, took up employment or joined another ALMP between sampling and (hypothetical) program start are dropped to implement the estimation procedure. Final numbers of observations are shown in Table . Due to lower observation numbers and lower treatment probabilities for women, an analysis including women cannot compare effects of different ALMPs. However, estimated effects of SP participation for women are presented in Kasrin et al. ().  Another reason is that this makes our sample as comparable as possible to the sample from the IAB-BAMF-SOEP survey (which also includes information on refugees entering between  and ), allowing us to use studies based on the survey to understand which individual and household characteristics influence labor market success and employment quality for a population similar to our population of interest. Restricting our analysis to sample members who arrived in Germany in  or later (% of the original sample) barely changes point estimates but inflates standard errors noticeably.  Treated individuals may receive another treatment after the ones we analyze. Indeed, this happens quite frequently: between  percent (SP) and  percent (OEJ) receive another treatment after the start of the treatments analyzed. However, due to relatively small sample sizes, it is impossible to analyze such interaction effects. Nonetheless, our sensitivity analyses show that multiple treatments do not drive our main results.  Since treatments are evaluated independently of one another, an individual in the group of non-participants for one treatment may be part of the treatment group for another treatment. Moreover, individuals in the group of non-participants may join a different or the same program after the entry window. Overall, between  percent (IFT) and  percent (SP) of non-participants ever receive some treatment in the period of observation. However, our sensitivity analyses show that such a contamination of the control groups does not drive our results.  As % of the sample are Syrian and due to the limited number of participants in most programs, we were unable to perform subgroup analyses by nationality. We did however estimate effects for Syrians only and found similar results to those found for the total sample.  An analysis of the resulting weight distribution implied by the CBPS method shows that no trimming is necessary in order to avoid overly large weights as proposed by Imbens ().  This drop is hardly statistically significant at the % level. Additional inspection shows that the drop is driven by participants' employment rates stagnating after  months while individuals from the control group catch up to some degree.
 Patterns of effect heterogeneity for real monthly labor earnings are similar and found in Table A. in the Appendix.  Certain categories of these covariates have regression coefficients of  percentage points or above.