This report was automatically generated on February 28, 2023.
Katherine Eaton
| National Microbiology Laboratory, PHAC
| katherine.eaton@phac-aspc.gc.ca
The ncov-recombinant update from v0.6.1 to v0.7.0 has 3 major changes.
The first change is a nextclade dataset upgrade from 2022-10-27 to 2023-02-01 which adds nomenclature for newly designated recombinants XBH to XBP.
The second change is detection of recursive recombinants, XBL and XBN which arose from two separate recombination events between BA.2.75* and XBB*. Currently, recursive recombination is only set to be detected between XBB and VOC circulating in late 2022 and early 2023.
The third major change is that all documentation has been migrated to Read The Docs. This includes a detailed Developer’s Guide for those looking to contribute to the project.
Between v0.6.1 and v0.7.0, 15.2% of sequences in the controls-gisaid dataset had different detection results. 5.1% of sequences were newly classified (NA → X) and represent lineages not present in the v0.6.1 model. 6.6% of sequences had lineage assignment changes and 3.5% of sequences had sublineage assignment changes as a result of the Nextclade dataset upgrade. 0% of positive controls were dropped (X → NA), indicating no observed loss in sensitivity.
ncov-recombinant v0.7.0 is a recommended upgrade for recombinant surveillance to accurately classify the latest recombinant lineages (up to XBP) and to detect recursive recombination (ex. XBL is a recombinant of XBB).
For a comprehensive summary of the methodological changes, please see the release notes for v0.7.0
Verify that the update of ncov-recombinant pipeline from version 0.6.1 to0.7.0:
This dataset includes SARS-CoV-2 genomes from GISAID that reflect the known diversity of recombinant sequences to date. These include 572 positive controls (recombinants), representing lineages XA - XBP and 186 negative controls (non-recombinants) selected from the Nextstrain Reference Phylogeny.
In total, 758 control sequences were used as input and a strain list is available here.
This dataset includes publicly available SARS-CoV-2 genomes from the Canadian VirusSeq Data Portal. Sequences were downloaded on 2023-01-23 and include 441,234 genomes in total.
The snakemake pipelines for v0.6.1 and v0.7.0 were run independently on the controls-gisaid and virusseq datasets. Please see the Procedure section of the Supplementary for detailed command-line instructions.
XBB → XBN).XBB.1 → XBB.1.5 or XAY.1 → XAY.2).NA).Note: Lineage assignments in
v0.7.0are identical to those in pango-designation and are the expected values.
XBB → XBN).XBB.1 → XBB.1.5 or XAY.1 → XAY.2).NA).Note: Lineage assignments in
v0.7.0are identical to those in pango-designation and are the expected values.
New detections (NA → X*) result from the following changes in v0.7.0:
Nextclade dataset upgrades to include newly designated lineages: XBG, XBK, XBM.
Lineage (v0.7.0) |
Lineage (v0.6.0) |
Parents |
|---|---|---|
| XBG | NA | BA.2.76*, BA.5.2* |
| XBK | NA | BA.5.2*, CJ.1* |
| XBM | NA | BA.2.76*, BF.3* |
Lineage changes result from the following updates in v0.7.0:
Nextclade dataset upgrades to include newly designated lineages: XBH, XBJ, XBL, XBN, XBP.
Lineage (v0.7.0) |
Lineage (v0.6.0) |
Parents |
|---|---|---|
| XBH | BY.1 | BA.2.3*, BA.2.75* |
| XBJ | BA.2.3.20 | BA.2.3*, BA.5.2* |
| XBL | XBB.1-like | BA.2.75*, XBB* |
| XBN | XBB-like | BA.2.75*, XBB* |
| XBP | XBD-like | BA.2.75*,* BA.5* |
Sublineage changes result from the following updates in v0.7.0:
Nextclade dataset upgrades to include new sublineages for: XAY and XBB.
Lineage (v0.7.0) |
Lineage (v0.6.0) |
Parents |
|---|---|---|
| XAY.2 | XAY, XAY-like | BA.2*, Delta (21J) |
| XBB.1 | XBB.1.1 | BA.2.10*, BA.2.75* |
| XBB.1.5 | XBB.1, XBB-like | BA.2.10*, BA.2.75* |
Dropped positives are only observed in the virusseq dataset, and include the unpublished cluster_id hCoV-19/Canada/ON-PHL-22-53186/2022 (N=19, 2022-12-09 to 2023-01-02). In v0.6.1 this was classified as a BA.5.2/BA.5.3 recombinant with breakpoints extremely close to the 5’ termini (Figure 3). The most likely reason this is dropped in v0.7.0 is because the 3 mutations attributed to BA.5.2 are no longer considered diagnostic based on the latest global mutation frequencies.
The results here are in whole, or in part based upon data hosted at the Canadian VirusSeq Data Portal: https://virusseq-dataportal.ca/. We wish to acknowledge the Canadian Public Health Laboratory Network (CPHLN), Genome Canada and the CanCOGeN VirusSeq Consortium for their contribution to the Portal.
Download the GISAID sequences and metadata in the strains list from GISAID to data/controls-gisaid/.
Download the VirusSeq sequences and metadata.
wget -O virusseq.tar.gz https://singularity.virusseq-dataportal.ca/download/archive/2d9ace2c-0808-475f-bc93-6ad5808581a4
tar -xvf virusseq.tar.gz
mkdir data/virusseq
# Prep metadata
csvtk cut -t -f "fasta header name,sample collection date,geo_loc_name (country),geo_loc_name (state/province/territory)" *files-archive*.tsv \
| csvtk rename -t -f "fasta header name" -n "strain" \
| csvtk rename -t -f "sample collection date" -n "date" \
| csvtk rename -t -f "geo_loc_name (country)" -n "country" \
| csvtk rename -t -f "geo_loc_name (state/province/territory)" -n "division" \
> data/virusseq/metadata.tsv
# Prep sequences
mv *files-archive*.fasta data/virusseq/sequences.fasta
# Cleanup
rm *files-archive*.tsv
rm virusseq.tar.gzDownload the pipeline.
git clone https://github.com/ktmeaton/ncov-recombinant.git 0.7.0
cd 0.7.0
git checkout v0.7.0Create a version-controlled conda environment.
# Local
mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.7.0
# HPC
sbatch -J conda-ncov-recombinant-0.7.0 --wrap="mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.7.0"Symlink the controls-gisaid data.
ln -s ../../../data/controls-gisaid/metadata.tsv data/controls-gisaid/metadata.tsv
ln -s ../../../data/controls-gisaid/sequences.fasta data/controls-gisaid/sequences.fastaSymlink the virusseq data.
ln -s ../../data/virusseq data/virusseqRun the pipeline for controls-gisaid.
# Local
conda activate ncov-recombinant-0.7.0
snakemake --profile profiles/controls-gisaid
# HPC
scripts/slurm.sh --profile profiles/controls-gisaid-hpc --conda-env ncov-recombinant-0.7.0Run the pipeline for virusseq (must be done as HPC).
scripts/slurm.sh --profile profiles/virusseq-hpc --conda-env ncov-recombinant-0.7.0Download the pipeline.
git clone https://github.com/ktmeaton/ncov-recombinant.git 0.6.1
cd 0.6.1
git checkout v0.6.1-hotfix.1Create a version-controlled conda environment.
# Local
mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.6.1
# HPC
sbatch -J conda-ncov-recombinant-0.6.1 --wrap="mamba env create -f workflow/envs/environment.yaml -n ncov-recombinant-0.6.1"Symlink the controls-gisaid data.
ln -s ../../../data/controls-gisaid/metadata.tsv data/controls-gisaid/metadata.tsv
ln -s ../../../data/controls-gisaid/sequences.fasta data/controls-gisaid/sequences.fastaSymlink the virusseq data.
ln -s ../../data/virusseq data/virusseqRun the pipeline for controls-gisaid.
# Local
conda activate ncov-recombinant-0.6.1
snakemake --profile profiles/controls-gisaid
# HPC
scripts/slurm.sh --profile profiles/controls-gisaid-hpc --conda-env ncov-recombinant-0.6.1Run the pipeline for virusseq (must be done as HPC).
scripts/slurm.sh --profile profiles/virusseq-hpc --conda-env ncov-recombinant-0.6.1After the pipelines are complete for each version, run the following to compare lineage assignments.
old_ver="0.6.1"
new_ver="0.7.0"conda activate ncov-recombinant-0.7.0
link_sizes=("1" "3" "5" "10")
for size in ${link_sizes[@]}; do
python3 0.7.0/scripts/compare_positives.py \
--positives-1 ${old_ver}/results/controls-gisaid/linelists/positives.tsv \
--positives-2 ${new_ver}/results/controls-gisaid/linelists/positives.tsv \
--ver-1 "v${old_ver}" \
--ver-2 "v${new_ver}" \
--outdir compare/controls-gisaid-${size} \
--node-order alphabetical \
--min-link-size $size
doneconda activate ncov-recombinant-0.7.0
link_sizes=("1" "3" "5" "10")
for size in ${link_sizes[@]}; do
python3 0.7.0/scripts/compare_positives.py \
--positives-1 ${old_ver}/results/virusseq/linelists/positives.tsv \
--positives-2 ${new_ver}/results/virusseq/linelists/positives.tsv \
--ver-1 "v${old_ver}" \
--ver-2 "v${new_ver}" \
--outdir compare/virusseq-${size} \
--node-order alphabetical \
--min-link-size $size
doneold_ver="0.6.1"
new_ver="0.7.0"
csvtk cut -t -f "strain" ${old_ver}/results/controls-gisaid/linelists/positives.tsv \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - -v ${new_ver}/results/controls-gisaid/linelists/positives.tsv \
| csvtk cut -t -f "strain" \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - ${old_ver}/results/controls-gisaid/linelists/linelist.tsv \
| csvtk pretty -t \
| less -Scsvtk cut -t -f "strain" ${new_ver}/results/controls-gisaid/linelists/positives.tsv \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - -v ${old_ver}/results/controls-gisaid/linelists/positives.tsv \
| csvtk cut -t -f "strain" \
| tail -n+2 \
| csvtk grep -t -f "strain" -P - ${new_ver}/results/controls-gisaid/linelists/linelist.tsv \
| csvtk pretty -t \
| less -S