This report was automatically generated on October 3, 2022.
Katherine Eaton
| National Microbiology Laboratory, PHAC
| katherine.eaton@phac-aspc.gc.ca
The ncov-recombinant update from v0.4.2 to v0.5.0 has two major changes. The first is increased flexibility in creating and defining sc2rf modes, which allows sc2rf to run with different parameter sets for breakpoint detection. The second change is a Nextclade upgrade to the sars-cov-2 2022-09-27 dataset, along with validation of all designated recombinants in this dataset (XA to XBC).
Between v0.4.2 and v0.5.0, 47.5% of sequences in the controls-gisaid dataset had different detection results. 16.2% of sequences were newly classified (NA → X*) and represent lineages not present in the v0.4.2 model. 31.3% of sequences had lineage assignment changes as a result of the Nextclade dataset upgrade and manual curation of previously published breakpoints. 0% of positive controls were dropped between (X* → NA), indicating no observed loss in sensitivity.
ncov-recombinant v0.5.0 is a strongly recommended upgrade for monitoring existing recombinants and performing routine surveillance for emerging lineages, given the high proportion of sequences (47.5%) with lineage assignment changes.
For a comprehensive summary of the methodological changes, please see the release notes for v0.5.0
Verify that the update of ncov-recombinant pipeline from version 0.4.2 to0.5.0:
controls-gisaid)This dataset includes SARS-CoV-2 genomes from GISAID that reflect the known diversity of recombinant sequences to date. These include 431 positive controls (recombinants), representing lineages XA - XBC and 186 negative controls (non-recombinants) selected from the Nextstrain Reference Phylogeny.
In total, 617 control sequences were used as input and a strain list is available here.
The snakemake pipelines for v0.4.2 and v0.5.0 were run independently on the same dataset (controls-gisaid). Please see the Procedure section in the Supplementary for detailed command-line instructions.
controls-gisaid)NA).Note: Lineage assignments in
v0.5.0are identical to those in pango-designation and are the expected values.
New detections (NA → X*) result from the following changes in v0.5.0:
Lineage changes result from the following updates in v0.5.0:
Curation of published breakpoints.
Nextclade dataset updates.
* Why were sequences of XAL assigned to XM rather than XM-like in v0.4.2 ?
XALis almost identical toXM, with the same hotspot breakpoint (17411:19954), and the same high-confidence parental lineages (BA.1.1*,BA.2*; confidence:0.994,0.996). BeforeXALwas designated, ncov-recombinant had no way to detect that sequences ofXALbelonged to a distinct cluster fromXM. Furthermore,XALonly differs fromXMby two mutations (A2865G,G21586T) which is insufficient evidence for ncov-recombinant to call thisXM-like(by default, requires a minimum of three mutations). Finally, it is unclear whether XAL emerged from a unique recombination event, or is a sublineage within XM. For more information, please see pango-designation issue XAL #757.
† Why were sequences of XAR assigned to XN rather than XN-like in v0.4.2 ?
XAR and XN are handed as special cases by ncov-recombinant, because their breakpoints lie at the extreme 5’ end of the genome (
2834:4183) with few diagnostic alleles from a second parent (BA.1). Breakpoint and parents often cannot be detected by sc2rf and therefore before XAR was designated, ncov-recombinant could not differentiate them. For more information, please see ncov-recombinant issues XN #137, XAR #106, #74, and #90.
‡ Why were sequences of XAP assigned to XZ rather than XZ-like in v0.4.2 ?
XAPis almost identical toXZ, with the same hotspot breakpoint (26061:26529), and the same parental lineages (BA.2*,BA.1.1*; confidence:0.999,0.544). See the above discussion on XAL* for more information.
Note: Download the GISDAID sequences and metadata in the strains list.
Note: A commit hash (
37f40480) is used instead of the tag (v0.4.2), for an important bugfix that was introduced betweenv0.4.2andv0.4.3.
Download the pipeline.
git clone --recursive https://github.com/ktmeaton/ncov-recombinant.git 0.4.2
cd 0.4.2
git checkout 37f40480Version control submodules.
cd sc2rf
git checkout 2852f05a
cd ..Create a version-controlled conda environment.
mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.4.2Create profile for controls-gisaid.
scripts/create_profile.sh --data data/controls-gisaid --hpcManually change MIN_LINEAGE_SIZE in scripts/linelist.py to 5.
v0.5.0.Run the pipeline.
scripts/slurm.sh --conda-env ncov-recombinant-0.4.2 --profile my_profiles/controls-gisaid-hpcDownload the pipeline.
git clone https://github.com/ktmeaton/ncov-recombinant.git 0.5.0
cd 0.5.0
git checkout v0.5.0Create a version-controlled conda environment.
mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.5.0Run the pipeline.
scripts/slurm.sh --conda-env ncov-recombinant-0.5.0 --profile my_profiles/controls-gisaid-hpcAfter the pipelines are complete for each version, run the following to compare lineage assignments.
python3 0.5.0/scripts/compare_positives.py \
--positives-1 0.4.2/results/controls-gisaid/linelists/positives.tsv \
--positives-2 0.5.0/results/controls-gisaid/linelists/positives.tsv \
--ver-1 "v0.4.2" \
--ver-2 "v0.5.0" \
--outdir compare/controls-gisaid \
--node-order alphabetical \
--min-link-size 1csvtk cut -t -f "strain" 0.4.2/results/controls-gisaid/linelists/positives.tsv \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - -v 0.5.0/results/controls-gisaid/linelists/positives.tsv \
| csvtk cut -t -f "strain" \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - 0.4.2/results/controls-gisaid/linelists/linelist.tsv \
| csvtk pretty -t \
| less -Scsvtk cut -t -f "strain" 0.5.0/results/controls-gisaid/linelists/positives.tsv \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - -v 0.4.2/results/controls-gisaid/linelists/positives.tsv \
| csvtk cut -t -f "strain" \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - 0.5.0/results/controls-gisaid/linelists/linelist.tsv \
| csvtk pretty -t \
| less -S