Role of latent tuberculosis infection on elevated risk of cardiovascular disease: a population-based cohort study of immigrants in British Columbia, Canada, 1985–2019

We investigated cardiovascular disease (CVD) risk associated with latent tuberculosis infection (LTBI) (Aim-1) and LTBI therapy (Aim-2) in British Columbia, a low-tuberculosis-incidence setting. 49,197 participants had valid LTBI test results. Cox proportional hazards model was fitted, adjusting for potential confounders. Compared with the participants who tested LTBI negative, LTBI positive was associated with an 8% higher CVD risk in complete case data (adjusted hazard ratio (HR): 1.08, 95% CI: 0.99-1.18), a statistically significant 11% higher risk when missing confounder values were imputed using multiple imputation (HR: 1.11, 95% CI: 1.02-1.20), and 10% higher risk when additional proxy variables supplementing known unmeasured confounders were incorporated in the highdimensional disease risk score technique to reduce residual confounding (HR: 1.10, 95% CI: 1.01-1.20). Also, compared with participants who tested negative, CVD risk was 27% higher among people who were LTBI positive but incomplete LTBI therapy (HR: 1.27, 95% CI: 1.04-1.55), whereas the risk was similar in people who completed LTBI therapy (HR: 1.04, 95% CI: 0.87-1.24). Findings were consistent in different sensitivity analyses. We concluded that LTBI is associated with an increased CVD risk in low-tuberculosis-incidence settings, with a higher risk associated with incomplete LTBI therapy and attenuated risk when therapy is completed.


Outcome definition
The outcome variable was the time from cohort entry date to the first occurrence of CVD (composite of ischemic heart disease or stroke) or censoring (end of provincial health insurance coverage as a proxy for emigration, death due to other than CVD, or study end). The CVD events were identified from hospital separations, outpatient physician claims, and vital statistics deaths databases proposed by Tonelli et al. [1]. The case definition of ischemic heart disease includes 1 hospitalization or underlying cause of death with ICD-9 codes 410.x-414.x or ICD-10 codes I20.x-I25.x; the case definition of stroke includes 1 hospitalization or 1 visit to a health professional or underlying cause of death with ICD-9 codes 362.34, 430.x-438.x or ICD-10 codes G45.x, G46.x, H34.0, I60.x-I69.x [1,2].

Variable Definition
Age at immigration Continuous Sex Binary (female, male)

Income
The categorical neighbourhood income quintile was defined as the lowest 20%, second lowest 20%, middle 20%, second highest 20% and highest 20%.

Education
Categorical (none or no education, secondary or less, trade or diploma, and university degree)

World Health Organization (WHO) region of birth
The categorical WHO birth region was defined as Africa, Americans, Eastern Mediterranean, Europe, Southeast Asia, and Western Pacific [3,4].

Immigration class
The categorical immigration class was defined as economic class, family class, refugee, and others.

Description of sensitivity analyses for Aim 1 Dealing with missing values in covariates
In our primary analysis using complete case dataset, we excluded 3,396 participants (~6.5%) due to missing data in covariates. Particularly, we have 3.7% of missing values for WHO region of birth, followed by income (2.1%), education (0.6%), and immigration class (0.1%) (Appendix Figure 2; pp 8). We used multiple imputation to impute those missing values by considering the missing at random assumption. We also added the 'tobacco use' variable from the TB registry and imputed the missing values for that variable. Since we were dealing with time-to-event outcomes, predictors used to build the imputation model included all covariates used in the main analysis, tobacco use, LTBI exposure status, CVD outcome event, and the Nelson-Aalen estimator of CVD event [5]. We imputed 10 datasets with five iterations. The Cox proportional hazards model was fitted on each imputed dataset, adjusting for the covariates used in the main analysis. Finally, we pooled the estimates using Rubin's rule [6].

Dealing with unmeasured confounding by 'smoking'
We used the high dimensional disease risk score to minimize bias due to unmeasured confounding [7]. There were seven steps of high dimensional disease risk score: • Step 1 -identify the source of empirical/proxy variables: All empirical covariates were identified in a one-year covariate assessment window prior to the cohort entry date. The following data sources were used: o Physician claims database for ICD-9 diagnostic codes o Hospital abstracts database for ICD-9 and ICD-10 diagnosis codes, procedure codes, and intervention codes o Pharmacy dispensations database for the drug identification number, generic names, American hospital formulary codes, Pharmacare therapeutic class o Census database for income band.
• Step 2 -empirical variable identification: Based on the prevalence, the 200 most prevalent codes in each data dimension were considered. • Step 3 -assessing recurrence of codes: We generated three binary recurrence covariates for each of the candidate empirical covariates: (i) once, (ii) frequent, and (iii) sporadic. • Step 4 -prioritizing covariates: We used the Bross formula to prioritize the covariates.
• Step 5 -variable selection: We selected the top 200 variables based on the log of bias calculated in step 4. • Step 6 -predicting disease risk scores: In this step, we fitted the outcome model with investigator-specified variables (all confounders used in the main analysis) and empirical variables from step 5 on the cohort with only LTBI negative. Then we fitted the LASSO model (binary CVD as the outcome) to deal with overfitting of the model [7]. Hyperparameters of the model were chosen using 5-fold cross-validation. The disease risk scores are the predicted probabilities from the LASSO model. • Step 7 -outcome modelling: The outcome model was the Cox proportional hazards model, adjusting for the deciles of disease risk scores. We used a robust sandwich-type variance estimator to estimate the 95% CI.

Dealing with missing values in covariates
The same as Aim 1.

Dealing with unmeasured confounding by 'smoking'
We used the high dimensional disease risk score to minimize bias due to unmeasured confounding in Aim 2 [7]. There were seven steps of high dimensional disease risk score, with steps 1-3 are identical to the steps defined for Aim 1. Steps 4 to 7 are as follows: • Step 4 -prioritizing covariates: We used the hybrid LASSO method to prioritize the covariates [8]. The 5-fold cross-validation was used to choose the hyperparameters of the model. We estimated the disease risk scores by fitting LASSO regression. We used 5-fold crossvalidation to choose the hyperparameters of the model. • Step 7 -outcome modelling: The outcome model was the Cox proportional hazards model, with categorical LTBI therapy as the exposure and deciles of disease risk scores as a covariate. Again, we used a robust sandwich-type variance estimator to estimate the 95% CI.

Dealing with potential immortal time bias
As a sensitivity analysis for the potential risk of immortal time due to defining LTBI therapy exposure at the cohort entry date, we conducted a sensitivity analysis with a time-varying LTBI therapy exposure definition. The unexposed time for those with LTBI therapy information was the time from cohort entry date to the starting date of LTBI therapy. The exposed time began at the LTBI therapy starting date and continued until an event or censoring date was reached. On the other hand, the exposure time for those without LTBI was the time from the cohort entry date to the date of an event or censoring. We fitted the time-dependent Cox regression [9], adjusting for the same set of confounders used in the main analysis.

Description of complementary analyses for Aim 2
First, in the search for reducing healthy user bias in the association between LTBI therapy and CVD, we used propensity score weighting analysis on a subset of the sample who had information on LTBI therapy. Although subjects were self-controlled, the subjects who developed CVD before the test were omitted from the post-test calculation. Thus, adjusted rates were deemed more appropriate than crude estimates. The propensity score weighting approach was used to adjust for measured confounding among those who completed the LTBI therapy versus did not complete the therapy (adjusting for the same confounders used in the main analysis). Logistic regression was used to estimate the propensity scores. The mean stabilized weight was 1, with a minimum of 0.79 and a maximum of 1.54. The CVD rate ratio was calculated by re-weighting each participant's contribution by the stabilized inverse probability weights.
Second, we compared the medication adherence rate for two comorbidities (e.g., metformin, insulin, sulfonylurea for diabetes, aspirin for anti-inflammation). We considered SMD less than 0.2 as a good balance of adherence rate among those who completed versus did not complete the LTBI therapy.
Third, for the main analysis, we assumed LTBI status to be positive or negative from the cohort entry date. However, 53 participants had LTBI status changed (from negative to positive) due to close contact to people with TB disease. To account for changing the exposure status due to close contact, we conducted our third complementary analysis and used a time-varying LTBI exposure definition. The unexposed time for those who had close contact was the time from cohort entry date to the date of contact. Since the exact date of contact was unknown, we considered the unexposed time as the time from the index date to the date of the LTBI test. The exposed time began at the date of the LTBI test and continued until an event or censoring date was reached. On the other hand, the exposure time for those without close contact was the time from the cohort entry date date to the date of an event or censoring. The time-dependent Cox model was fitted, adjusting for the same confounders used in the main analysis.
Fourth, the proportional hazards assumption was violated for age, birth region, immigration class, substance use, and chronic kidney disease. To deal with that problem, the modified Poisson regression with binary CVD outcome variable and an offset by the natural logarithm of follow-up time was fitted, adjusting for the same confounders used in the main analysis. The 95% confidence interval was calculated using the robust sandwich method. Appendix Figure 3: The rate ratio (RR) of cardiovascular disease (CVD) after the LTBI therapy completion (among those who completed therapy) or discontinuation (among those who did not complete therapy) compared to the rates from the same subjects before the latent tuberculosis infection (LTBI) test. The CVD rate ratio among those who completed the therapy was 0.99 (95% CI: 0.71-1.36) after the completion of LTBI therapy than before the LTBI test. On the other hand, the CVD rate ratio in people who did not complete LTBI therapy was 1.18 (95% CI: 0.83-1.69) after discontinuation of LTBI therapy than before the LTBI test. Here, 'before' means before the LTBI test; 'after' means after the LTBI therapy completion (among those who completed therapy) or discontinuation (among those who did not complete therapy); #CVD is the number of CVD events, PT is person-time in years, RR is the rate ratio; CVD is cardiovascular disease; LTBI is latent tuberculosis infection.

Appendix tables
Appendix Interpretation 20 Generalisability 21 Other Information Funding 22 Accessibility of protocol, raw data, and programming code RECORD 22.1: Authors should provide information on how to access any supplemental information such as the study protocol, raw data, or programming code.