Evolutionary and genomic analysis of four SARS-CoV-2 isolates circulating in March 2020 in Sri Lanka; Additional evidence on multiple introduction and further transmission

The molecular epidemiology of the virus and mapping helps understand the epidemics' evolution and apply quick control measures. This study provides genomic evidence of multiple severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) introductions into Sri Lanka and virus evolution during circulation. Whole-genome sequences of four SARS-CoV-2 strains obtained from coronavirus disease 2019 (COVID-19) positive patients reported in Sri Lanka during March 2020 were compared with sequences from Europe, Asia, Africa, Australia and North America. The phylogenetic analysis revealed that the sequence of the sample of the first local patient collected on 10 March, who contacted tourists from Italy, was clustered with SARS-CoV-2 strains collected from Italy, Germany, France and Mexico. Subsequently, the sequence of the isolate obtained on 19 March also clustered in the same group with the samples collected in March and April from Belgium, France, India and South Africa. The other two strains of SARS-CoV-2 were segregated from the main cluster, and the sample collected from 16 March clustered with England and the sample collected on 30 March showed the highest genetic divergence to the isolate of Wuhan, China. Here we report the first molecular epidemiological study conducted on circulating SARS-CoV-2 in Sri Lanka. The finding provides the robustness of molecular epidemiological tools and their application in tracing possible exposure in disease transmission during the pandemic.


Introduction
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in the late 2019s, causing coronavirus disease 2019 (COVID-19). The SARS-CoV-2 virus spread rapidly throughout the world with an unsettling effect on human livelihood and economy. Globally, since 31 December 2019 and as of week 2021-5, 106 472 660 cases of COVID-19 have been reported, including 2 323 103 deaths. Since 31 December 2019 and as of the fifth week, 2021, >106 472 660 confirmed cases, including 2 323 103 deaths, have been reported worldwide, affecting >343 countries as indicated by the European Centre for Disease Prevention and Control [1]. The World Health Organization (WHO) has declared a global health emergency at the end of January 2020 [2]. In Sri Lanka, the first local case of COVID-19 was recorded on 11 March 2020 in a 58-year-old male. As of 23 February 2021, there were 80 517 confirmed cases of COVID-19, including 450 deaths reported to the WHO [3]. A robust surveillance system, understanding the molecular epidemiology of the virus, and mapping can help to understand the evolution of the epidemics and apply quick control measures [4,5]. Currently, based on the genomic epidemiology mapping of the COVID-19 virus, it has been demonstrated that the virus is undergoing mutations [6]. Therefore, in addition to confirmation of the presence of the virus, the WHO recommends regular sequencing of a percentage of specimens from clinical cases to monitor viral genome mutations that might affect the medical countermeasures, including diagnostic tests [5]. The whole genome of four SARS-CoV-2 virus strains obtained from COVID-19 positive local patients was sequenced and deposited in the Global Initiative on Sharing All Influenza Data (GISAID) EpiCoV™ database. This study was conducted to investigate the evolution and genetic relatedness of SARS-CoV-2 strains in Sri Lanka, with other reported SARS-CoV-2 strains.

Sri Lankan SARS-CoV-2 sequences
The whole-genome sequence of four SARS-CoV-2 virus strains obtained from COVID-19 positive patients, including the first local positive case reported on 11 March 2020, was deposited in the GISAID EpiCoV™ database 5 were used for this study ( Table 1).

Selection of SARS-CoV-2 isolates
For further understanding of the molecular epidemiology of the COVID-19 outbreak in Sri Lanka, 46 isolates were selected from GenBank, National Center for Biotechnology Information (NCBI) using Basic Local Alignment Search Tool nucleotide (BLASTn) tool based on the highest identity and lowest expected value (E-Value) with Sri Lankan isolates. The sequence datasets of 46 selected SARS-Cov-2 complete genomes from different countries in Asia, Africa, Australia, Europe, North America and four Sri Lankan isolates retrieved from GISAID 5 by 28 April 2020 were used for this analysis. The strain isolated from Wuhan in December 2019 with the NCBI accession number NC_045512.2 was used as the reference genome.

Whole-genome sequence alignment and phylogenetic analysis
Sequence alignment was performed using Multiple Sequence Comparison by Log-Expectation (MUSCLE) software [7]. Following alignment, single nucleotide polymorphisms (SNPs) and amino acid variations analysis were conducted using Molecular Evolutionary Genetics Analysis version ten (MEGA X) [8], taking the first SARS-CoV-2 reference sequence (GenBank Accession number NC_045512) deposited December 2019 in GenBank from Wuhan, China. The evolutionary history was inferred using the neighbour-joining method with the maximum composite likelihood method and the Hasegawa-Kishino-Yano model (HKY) as the best fitting model [9] after 1000 bootstrap replication using MEGA X [8].

Phylogenetic tree analysis
The maximum likelihood phylogenetic tree in Figure 1 shows that two of the SARS-CoV-2 isolates from Sri Lanka (GISAID accession IDs: EPI_ISL_428671 and EPI_ISL_428672) collected on 10 March 2020 and 19 March 2020, respectively, are clustered in the group with the isolates from Italy, Germany, France and Mexico that were collected before 10 March 2020. The EPI_ISL_428673 Sri Lankan isolates collected on 31 March 2020 was clustered with isolates obtained on 9 February 2020 from England while EPI-ISL_428670 Sri Lankan isolates collected on 16 March 2020 showed the highest evolutionary distance to the SARS-CoV-2 sequence originated in Wuhan, China (GenBank Accession number NC_405512).

SNPs analysis
Fifteen SARS-CoV-2 genome sequences that are mainly clustered with the four Sri Lankan strains were compared with the Wuhan reference to observe the viral genome mutations and amino acid variations. The SNPs presented along with the whole genome are indicated in Table 2 (positions referred with respect to the reference sequence; GenBank accession number: NC_045512). The genome sequence of EPI_ISL_428671 from the first local patient       Table 3 indicates the respective changes in the amino acid positions of the derived proteins (positions referred with respect to the reference sequence; GenBank accession number: NC_045512). SNPs occurred only in the open reading frame (ORF) 1ab gene, S gene, ORF 3a gene, M gene and N gene of four Sri Lankan whole-genome strains have resulted in amino acid changes at the corresponding positions of the translated proteins, while the rest of SNPs in the genes did not result in any changes in amino acid sequence (Table 3). Except for the first Sri Lankan isolate collected on 10 March (EPI_ISL_428671), the other three Sri Lankan isolates presented a total of six mutations in the ORF 1ab protein with respect to the reference (Table 3). Mutations can be observed in the S protein at the same position AA614 (bps23403) in both Sri Lankan strains EPI_ISL_428671 and EPI_ISL_428672. A single mutation was observed in ORF 3a protein in strain EPI_ISL_428673 at the positions AA251 and bps26144. In the EPI_ISL_428670 strain, the amino acid sequence of N protein shows one mutation at the position AA398 at bps29465, while EPI_ISL_428671 strain had mutations at the positions AA203 (bps28882) and AA204 (bps28883) compared to the reference strain.

Discussion
In this study, the virus strain of the first local patient (EPI_ISL_428671) collected on 10 March 2020, who was a tour guide and had direct contact with Italian tourists [4], is clustered together with isolates from Italy, Germany and Mexico. This evolutionary evidence revealed the first sequence of SARS-CoV-2 showed the highest genomic similarity to Italy, and European isolates confirming the history of exposure of the first patient who has been exposed to tourists who came from Italy. Even though the history and origin of the infection of the remaining isolates were not reported, the clustering of other isolates from Sri Lanka with isolates from the database has provided a clue about the possible source of the infection. Furthermore, genomic relatedness of the SARS-CoV-2 virus genome sequences of Sri Lankan isolates further confirmed the exposure history of the patients presented in the Epidemiology Unit, Ministry of Health, Sri Lanka [7]. More importantly, this study has indicated the importance of tracking the history of the infection to trace the contacts of the infected person, particularly for the asymptomatic patients in Sri Lanka. The mutations found in the virus identified in Sri Lanka, compared with the reference Wuhan strain and the recognise amino acid changes, should be further monitored to understand whether those changes affect the virulence of the virus or clinical manifestations of the disease. Though this study had limitations mainly due to the lack of epidemiological information on the available genome in the database and a limited number of sequence genome available in Sri Lanka at the time of analysis, the information obtained from this study might assist in understanding the evolutionary dynamics and local transmission of circulating SARS-CoV-2 in Sri Lanka.

Conclusion
In this section, the results of this study indicated that the SARS-CoV-2 sequences from Sri Lanka have the highest genomic similarity to isolates from Italy, Germany and England. This study was conducted as a preliminary study in Sri Lanka; further studies are necessary to be performed to increase our knowledge regarding SARS-CoV-2 isolates. Since the mutational variants can alter the presentation of COVID-19 infection, the robust molecular epidemiological tools indicated in this study can be used to trace possible exposure, epidemiological analysis and develop an effective treatment, including vaccines.
Supplementary material. The supplementary material for this article can be found at https://doi.org/10.1017/S0950268821000583.