Big Data, Small Microbes

Katherine Eaton (B.A. Hons)

The Plague is a disease that has profoundly impacted human history and is responsible for some of the most fatal pandemics ever recorded. It may surprise many to know that this disease is not a bygone of a past era, but in fact is still present in many regions of the world. Although researchers have been studying plague for hundreds of years, there are many aspects of its epidemiology that are enigmatic. In this thesis, I focus on how DNA from the plague bacterium can be used to estimate where and when this disease appeared in the past. To do so, I reconstruct the evolutionary relationships between modern and ancient strains of plague, using publicly available data and new DNA sequences retrieved from the skeletal remains of plague victims in Denmark. This work offers a new methodological framework for large-scale genetic analysis, provides a critique on what questions DNA evidence can and cannot answer, and expands our knowledge of the global diversity of plague.

Abstract

Pandemics of plague have reemerged multiple times throughout human history with tremendous mortality and extensive geographic spread. The First Pandemic (6^th - 8^th century) devastated the Mediterranean world, the Second Pandemic (14^th - 19^th century) swept across much of Afro-Eurasia, and the Third Pandemic (19^th - 20^th century) reached every continent except for Antarctica, and continues to persist in various endemic foci around the world. Despite centuries of historical research, the epidemiology of these pandemics remains enigmatic. However, recent technological advancements have yielded a novel form of evidence: ancient DNA of the plague bacterium Yersinia pestis. In this thesis, I explore how genomic data can be used to unravel the mysteries of when and where this disease appeared in the past. In particular, I focus on phylogenetic approaches to studying this ‘small microbe’ with ‘big data’ (ie. 100s - 1000s of genomes). I begin by describing novel software I developed that supports the acquisition and curation of large amounts of DNA sequences in public databases. I then use this tool to create an updated global phylogeny of Y. pestis, which includes ~600 genomes with standardized metadata. I devise and validate a new approach for temporal modeling (ie. molecular clock) that produces robust divergence dates in pandemic lineages of Y. pestis. In addition, I critically examine the questions that genomic evidence can and cannot address in isolation, such as whether the timing and spread of short-term epidemics can be confidently reconstructed. Finally, I apply this theoretical and methodological insight to a case study in which I reconstruct the appearance, persistence, and disappearance of plague in Denmark during the Second Pandemic. The three papers enclosed in this sandwich-thesis contribute to a larger body of work on the anthropology of plague, which seeks to understand how disease exposure and experience change over time and between human populations. Furthermore, this dissertation more broadly impacts both prospective studies of infectious disease, such as environmental surveillance and outbreak monitoring, and retrospective studies, which seek to date the emergence and spread of past pandemics.

Acknowledgments

I’d like to thank my parents, Michelle and Michael Eaton. When I was little, I thought you knew everything. And now that I’m writing my doctoral dissertation… I realize you do know everything! I hope when I grow up, I turn out to be just like you <3 Thank you for your love, support, and encouragement over all these years.

To Miriam: Thank you for being my partner, my best friend, my everything. I hope that one day I have 1/10 of your intellect, kindness, and patience. (Maybe we won’t hold our breath for that last one.)

To Hendrik Poinar: Thank you for your unending support and enthusiasm. Your mentorship and passion for research has been my rock during the hard times. Thank you for taking leaps of faith and trusting me when I proposed ridiculous project ideas. At least a few of those wound up in this thesis! And most importantly, thank you for traveling to Edmonton in 2013 to give a talk at the University of Alberta!

Thank you to members of my doctoral supervisory committee: Brian Golding, Tracy Prowse, and Nükhet Varlık. I am indebted to you for your generous support, careful guidance, and prompt feedback. Our affectionate motto of ‘Keep It Simple Stupid’ has played on loop in my head as I prepared this dissertation.

To John Silva, Marcia Furtado, and Delia Hutchinson: Thank you for guiding me through the labyrinth that is McMaster’s administration. Your smiling faces up on the 5th floor were always so reassuring. I knew that if I ever had a problem, you would be there to investigate and advocate on my behalf.

To the Plague Team: Jennifer Klunk, what would I do without you? I think everything I’ve ever known and ever will know comes back to you. Thank you for being a dedicated mentor, a brilliant scientist, and the best companion for dancing in the lab. Madeline Tapson, I dearly miss sharing a desk with you. Your warm and friendly spirit was always comforting, and you opened my eyes to so many new avenues of plague research. Ravneet Sidhu, I learned so much from training you and I loved that you questioned everything. I’m so excited to be collaborating with you on current and future projects. Michael Klowak and Julianna Stangroom, thank you for your HARD work in screening a dizzying number of plague samples and making the lab such a fun and exciting place to be.

To Emil Karpinski: I always looked forward to our bus rides into campus together. You played such an important role in creating a welcoming atmosphere when I first arrived and throughout my whole degree. Also, you are lab notebooks goals (wow).

To Ana Duggan: You have been a role model for me in so many avenues of my academic, personal, and professional life! You were the first woman I met that was also passionate about computational analysis, and have made me feel more comfortable and confident in my own skin.

To Nathalie Mouttham: Thank you for being such a stellar trainer and friend! I have vivid (but positive) memories of long hours in a laboratory basement doing Phenol Chloroform extractions together and playing 20 questions.

Thank you to all past and present members of the McMaster Ancient DNA Centre. In particular: Melanie Kuch, Matthew Emery, Jess Hider, Samantha Price, Marie-Hélène B.-Hardy, and Dirk Hackenberger. You created such a unique sense of community, and left pretty big shoes to fill!

Thank you to all collaborators who have generously shared their time, energy, and resources with me. In particular, thank you to Rebecca Redfern for working with me on the Roman Londoners project!

To my colleagues at Red Lobster: I worked in many kitchens to fund my education, but working at Red Lobster was by far my favorite. Also to the managers, thank you for letting me have so many free Cheddar Bay Biscuits, they were a crucial component of my student diet (no joke).

Finally, I would like to acknowledge all individuals who financially supported me throughout my doctoral research. I thank Hendrik Poinar, the Department of Anthropology, the MacDATA Institute, the Sherman Centre for Digital Scholarship, McMaster University, the Province of Alberta, and the Social Sciences Research and Humanities Research Council.

List of Abbreviations and Symbols

Declaration of Academic Achievement

I, Katherine Eaton, declare that this thesis titled, ‘Big Data, Small Microbes: Genomic analysis of the plague bacterium Yersinia pestis’ and the work presented in it are my own. I confirm that:

1 Introduction

In 2011, I learned about a researcher named Dr. Hendrik Poinar. His team had just published a seminal paper, in which they identified the causative agent of the infamous Black Death [1]. I discovered that this morbid term describes a pandemic that devastated the world in the 14^th century, with unprecedented mortality and spread. In less than 10 years (1346-1353) the Black Death swept across Afro-Eurasia, killing 50% of the population [2]. Outbreaks of this new and mysterious disease, often referred to as The Plague, reoccurred every 10 years on average [3]. This epidemic cycling continued for 500 long years in Europe, but in Western Asia, the disease never truly disappeared [4]. The 10-year window of the Black Death alone has an estimated global mortality of 200 million people, making it the most fatal pandemic in human history [5], and also one of the most mysterious.

The cryptic nature of this medieval disease led to significant debate among contemporaries. The dominant theory of contagion at the time was miasma, in which diseases were spread through noxious air [6]. However, Ibn al-Khatib, a notable Islamic scholar, took issue with this theory. After studying outbreaks of The Plague in the 14^th century, he proposed an alternative hypothesis in which minute bodies were transmissible between humans [7]. Like most controversial theories, this idea was not readily embraced. Some 400 years later, the British botanist Richard Bradley wrote a radical treatise on The Plague [8] where he similarly proposed that infectious diseases were caused by living, microscopic agents. Again, this theory was rejected. It was not until the 19^th century that this “new” perspective would receive widespread acceptance [9]. It is quite remarkable that our modern conceptions of epidemiology and bacteriology can be traced back to diverse “founders” throughout history, who all happened to be grappling with the perplexing nature of The Plague.

After it was established that a living organism caused the Black Death, the intuitive next step was to precisely identify the organism. The symptoms described in historical texts seemed to incriminate bubonic plague [2], a bacterial pathogen that passes from rodents to humans, and leads to grotesquely swollen lymph nodes (buboes). On the other hand, the rapid spread of the Black Death suggests this was a contagion primarily driven by human to human transmission, which more closely fit the profile of an Ebola-like virus [10]. In the 1990s and 2000s, geneticists began contributing novel evidence to the debate, by retrieving pathogenic DNA from skeletal remains [11]. The plague bacterium, Yersinia pestis, played a central role in these molecular investigations, as researchers sought to either establish or refute its presence in medieval victims [12]. The competitive nature of this discourse fueled significant technological progress, and over the next decade, the study of ancient DNA became a well-established discipline. However, the origins of the Black Death remained unresolved, due to numerous controversies surrounding DNA contamination and scientific rigor [13].

As an undergraduate student of forensic anthropology, I was fascinated by the rapid pace at which the field of ancient DNA was developing. I attribute my developing academic obsession to two early-career experiences. First, was reading the highly entertaining back-and-forth commentaries in academic journals [14], where plague researchers would occasionally exchange personal insults [15]. It was clear that these researchers cared deeply about their work. Despite the occasional toxicity, I found these displays of passion to be engaging and refreshing, compared to the otherwise emotionally-sterile field of scientific publishing.

The second defining experience, was the perplexing and often frustrating task of diagnosing infectious diseases from skeletal remains. I was intrigued by the idea of reconstructing an individual’s life story from their skeleton, and using this information to solve the mysteries of the dead. However, while some forms of trauma leave diagnostic marks on bone (ex. sharp force), acute infectious diseases rarely manifest in the skeleton [16,17] and thus are ‘invisible’ to an anthropologist. Because of this, I found the new field of ancient DNA to be extremely appealing, as it offered a novel solution to this problem. Anthropologists could now retrieve the precise pathogen that had infected an individual, and contribute new insight regarding disease exposure and experience throughout human history. These experiences suggested to me that studying the ancient DNA of pathogens would be an exciting, dynamic, and productive line of research for a graduate degree. I’m happy to say that 10 years later, I still agree with this statement, and by writing this dissertation I hope to convince you, the reader, as well.

Which brings us back to Dr. Hendrik Poinar and his team’s seminal work on the mysterious Black Death. The study, led by first author Kirsten Bos, had found DNA evidence of the plague bacterium Y. pestis in several Black Death victims buried in a mass grave in London [1]. However, they did not just retrieve a few strands of DNA, they captured millions of molecules (10.5 million to be precise) which allowed them to reconstruct the entire Y. pestis genome, comprising four million DNA bases. The amount of molecular evidence was staggering, and offered irrefutable proof that the plague bacterium was present during the time of Black Death. However, with a sample size of N=1, the genetic link between Y. pestis and this ancient pandemic was tentative at best.

Armed with the proposal of finding more evidence of Y. pestis in the archaeological record, I applied to work for Dr. Hendrik Poinar at the McMaster Ancient DNA Centre. In 2014, I had the delight and privilege of being accepted into the graduate program at McMaster University. Alongside other members of the “McMaster Plague Team”, I set about the daunting task of screening the skeletal remains of more than 1000 individuals for molecular evidence of Y. pestis. This material was generously provided by archaeological collaborators, who were similarly invested in the idea that ancient DNA techniques could identify infectious diseases in their sites. These archaeological remains reflected nearly a millennium of human history, with sampling ages ranging from the 9^th to the 19^th century CE. The geographic diversity was also immense, with individuals sampled across Europe, Africa, and Asia.

Of the 1000+ individuals screened, approximately 30% originated in Denmark. Due to this large sample size, we, the “Plague Team”, had the greatest success in identifying ancient Y. pestis in this region. Over a period of 5 years, we retrieved Y. pestis DNA from 13 Danish individuals dated to the medieval and early modern periods. To contextualize these plague isolates, we reconstructed their evolutionary relationships using a large comparative dataset of global Y. pestis. In Chapter 4, I present the results of this collaborative study, which marks the first longitudinal analysis of an ancient pathogen in a single region. I explore whether the genetic evidence of Y. pestis aligns with the historical narrative of the Black Death, and whether or not subsequent epidemics can be attributed to long-distance reintroductions. However, while this high-throughput study was the first one I embarked on, as the chapter numbering indicates, it would be the last project I completed due to several unforeseen complications.

While the McMaster Plague Team was busy screening for Y. pestis, so too were other ancient DNA centres throughout the world. Between 2011 and 2021, more than 100 ancient Y. pestis genomes were published, making plague the most intensively sequenced historical disease. The sequencing of modern isolates accelerated in tandem, with over 1500 genomes produced from culture collections of 20^th and 21^st century plague outbreaks [18]. Because of this influx of evidence, the research questions changed accordingly. Geneticists were no longer interested in just establishing the presence of Y. pestis during the short time frame of the Black Death (1346-1353), they wanted to know how it behaved and spread throughout the long 500 years of this pandemic. The longitudinal study design of Chapter 4 was therefore well-positioned to address these nuanced epidemiological questions. However, this novel genetic evidence also introduced new complexities.

It quickly became clear that isolates of Y. pestis sampled during epidemic periods were highly similar in terms of genetic content, if not indistinguishable clones [19]. This called into question the resolution of genomic evidence, and whether the geographic origins and spread of the Black Death could be accurately inferred using ancient DNA studies. This was further confounded by the finding that the rate of evolutionary change in Y. pestis could vary tremendously [20] which led to the discovery that previously published temporal models were erroneous [21]. It became increasingly uncertain whether genetic evidence could be used to produce informative estimates of the timing of plague’s frequent reemergences [22]. As I read these critical studies, I began developing an idea to address the substantial gaps in our evolutionary theory and methodology concerning the plague bacterium Y. pestis. This idea culminated in Chapter 3, where I curated and contextualized the largest global data set of plague genomes. I critiqued the existing spatiotemporal models of plague’s evolutionary history, and with the assistance of my co-authors, devised a new methodological approach. This method would then be repurposed for Chapter 4, so that I could infer the emergence and disappearance of Y. pestis in Denmark with greater accuracy. However, as the chapter numbering once again reflects, there was one final obstacle.

Synthesizing the largest genomic data set was a lofty ambition, especially considering that there were few software tools available to perform such a task. New plague genomes of Y. pestis were being published monthly, and at times even weekly, with such volume that manual tracking became impossible. My excel spreadsheet of genetic metadata became riddled with errors and fields with missing data. The era of “Big Data” had arrived, and I was woefully unequipped to effectively manage this deluge of information. In response, I ventured into the tumultuous waters of software development. In Chapter 2, I describe my original software that automates the acquisition and organization of genetic metadata. Academic publishing in the field of software was a unique experience, as I had to both produce a scholarly manuscript and demonstrate expertise as a service-provider. This database tool has continually proven to be indispensable, and is the backbone upon which the studies in Chapter 3 and Chapter 4 would be rebuilt upon.

At this point, I re-introduce the dissertation as a collection of three hierarchical, but independently published, studies. I first describe an original piece of software in Chapter 2, which automates the retrieval and organization of publicly available sequence data. In Chapter 3, I outline how this tool was used to generate an updated and curated phylogeny of Y. pestis, which yielded novel insight regarding the timing and origins of past pandemics. In this chapter, I also conduct a critical examination of the historical questions that genomic evidence can, or cannot, address. In Chapter 4, I use these theories and methods to reconstruct the emergence and continuity of plague in Denmark over a period of 400 years. I conclude in Chapter 5 with a discussion of the contributions of each study, with a particular focus on their significance within the broader field of anthropology.

2 NCBImeta: Efficient and comprehensive metadata retrieval from NCBI databases

Published 03 February 2020 in
The Journal of Open Source Software, 5(46), 1990.
https://doi.org/10.21105/joss.01990
Licensed under a Creative Commons Attribution 4.0 International License.

¹ McMaster Ancient DNA Centre, McMaster University
² Department of Anthropology, McMaster University

2.1 Summary

NCBImeta is a command-line application that downloads and organizes biological metadata from the National Centre for Biotechnology Information (NCBI). While the NCBI web portal provides an interface for searching and filtering molecular data, the output offers limited options for record retrieval and comparison on a much larger and broader scale. NCBImeta tackles this problem by creating a reformatted local database of NCBI metadata based on user search queries and customizable fields. The output of NCBImeta, optionally a SQLite database or text file(s), can then be used by computational biologists for applications such as record filtering, project discovery, sample interpretation, and meta-analyses of published work.

2.2 Background

Recent technological advances in DNA sequencing have propelled biological research into the realm of big data. Due to the tremendous output of Next Generation Sequencing (NGS) platforms, numerous fields have transformed to explore this novel high-throughput data. Projects that quickly adapted to incorporate these innovative techniques included monitoring the emergence of antibiotic resistance genes [23], epidemic source tracking in human rights cases [24], and global surveillance of uncharacterized organisms [25]. However, the startling rate at which sequence data are being deposited online have presented significant hurdles to the efficient reuse of published data. In response, there is growing recognition within the computational community that effective data mining techniques are a dire necessity [26,27].

An essential step in the data mining process is the efficient retrieval of comprehensive metadata. These metadata fields are diverse in nature, but often include the characteristics of the biological source material, the composition of the raw data, the objectives of the research initiative, and the structure of the post-processed data. Several software applications have been developed to facilitate bulk metadata retrieval from online repositories. Of the available tools, SRAdb [28], the Pathogen Metadata Platform [29], MetaSRA [30], and pysradb [31] are among the most widely utilised and actively maintained. While these software extensions offer substantial improvements over the NCBI web browser experience, there remain several outstanding issues.

In response, NCBImeta aims to provide a more user-inclusive experience to metadata retrieval, that emphasizes real-time access and provides generalized frameworks for a wide variety of NCBI’s databases.

2.3 NCBImeta

NBCImeta is a command-line application that executes user queries and metadata retrieval from the NCBI suite of databases. The software is written in Python 3, using the BioPython module [32] to connect to, search, and download XML records with NCBI’s E-Utilities [33]. The lxml package is utilised to perform XPath queries to retrieve nodes containing biological metadata of interest. SQLite is employed as the database management system for storing fetched records, as implemented with the sqlite3 python module. Accessory scripts are provided to supply external annotation files, to join tables within the local database so as to re-create the relational database structure, and finally to export the database as tabular text for downstream analyses. NCBImeta currently interfaces with the molecular and literature databases [34] described in Table 2.1.

The typical workflow of NCBImeta follows four major steps as outlined in Figure 2.1. Users first configure the program with their desired search terms. NCBImeta is then executed on the command-line to fetch relevant records and organize them into a local database. Next, the user optionally edits the database to, for example, add their own custom metadata. Finally, the resulting database, kept in SQLite format or exported to text, delivers 100+ biologically-relevant metadata fields to researcher’s fingertips. This process not only saves significant time compared to manual record retrieval through the NCBI web portal, but additionally unlocks attributes for comparison that were not easily accessible via the web-browser interface.

NCBImeta’s implementation offers a novel approach to metadata management and presentation that improves upon the prevously described limitations of existing software in a number of ways. First, NCBImeta is run on the command-line, and the final database can be exported to a text file, thus no knowledge of an external programming language is required to generate or explore the output. Second, a general parsing framework for tables and metadata fields was developed which can be extended to work with diverse database types contained within NCBI’s infrastructure. Finally, a query system was implemented for record retrieval that allows users to access records in real-time, as opposed to working with intermittent or out-dated database snapshots.

2.4 Use Case

The following section demonstrates how NCBImeta can be used to obtain current and comprehensive metadata for a pathogenic bacteria, Pseudomonas aeruginosa, from various sequencing projects across the globe. P. aeruginosa is an opportunistic pathogen associated with the disease cystic fibrosis (CF) and is highly adaptable to diverse ecological niches [35]. As such, it is a target of great interest for comparative genomics and there are currently over 15,000 genomic sequence records available which are spread across two or more databases. In cases such as this, it is critical to leverage the tremendous power of these existing datasets while being conscious of the labor typically required to retrieve and contextualize this information. NCBImeta renders the problem of acquiring and sifting through this metadata trivial and facilitates the integration of information from multiple sources.

To identify publicly available P. aeruginosa genomes, NCBImeta is configured to search through the tables Assembly (assembled genomes) and SRA (raw data). For additional context, NCBImeta is used to retrieve metadata from the Nucleotide table for descriptive statistics of the genomic content, from the BioProject table to examine the research methodology of the initiative, from Pubmed to identify existing publications, and finally from the Biosample table to explore characteristics of the biological material. A small subset of the 100+ retrieved columns is shown in Figure 2.2, to provide a visual example of the output format and the metadata that is retrieved.

Subsequently, the output of NCBImeta can be used for exploratory data visualization and analysis. The text file export function of NCBImeta ensures downstream compatibility with both user-friendly online tools (ex. Google Sheets Charts) as well as more advanced visualization packages [36]. In Figure 2.3, the geospatial distribution of P. aeruginosa isolates are plotted alongside key aspects of genomic composition (ex. number of genes).

Finally, NCBImeta can be used to streamline the process of primary data acquisition following careful filtration. FTP links are provided as metadata fields for databases attached to an FTP server (ex. Assembly, SRA) which can be used to download biological data for downstream analysis.

2.5 Future Work

The development of NCBImeta has primarily focused on a target audience of researchers whose analytical focus is prokaryotic genomics and the samples of interest are the organisms themselves. Chief among those include individuals pursuing questions concerning epidemiology, phylogeography, and comparative genomics. Future releases of NCBImeta will seek to broaden database representation to include gene-centric and transcriptomics research (ex. NCBI’s Gene and GEO databases).

2.6 Availability

NCBImeta is a command-line application written in Python 3 that is supported on Linux and macOS systems. It is distributed for use under the OSD-compliant MIT license (https://opensource.org/licenses/MIT). Source code, documentation, and example files are available on the GitHub repository (https://github.com/ktmeaton/NCBImeta).

2.7 Acknowledgments

I would like to thank Dr. Hendrik Poinar and Dr. Brian Golding for their guidance and support on this project, as well as for insightful conversations regarding biological metadata, the architecture of NCBI, and software deployment. Thank you to Dr. Andrea Zeffiro, Dr. John Fink, Dr. Matthew Davis, and Dr. Amanda Montague for valuable discussions regarding APIs, digital project management, and software publishing. Thank you to all past and present members of the McMaster Ancient DNA Centre and the Golding Lab. This work was supported by the MacDATA Institute (McMaster University, Canada) and the Social Sciences and Humanities Research Council of Canada (#20008499).

3 Plagued by a cryptic clock: Insight and issues from the global phylogeny of Yersinia pestis

Katherine Eaton^1,2, Leo Featherstone³, Sebastian Duchene³, Ann G. Carmichael⁴, Nükhet Varlık⁵, G. Brian Golding⁶, Edward C. Holmes⁷, Hendrik N. Poinar^1,2,8,9,10

¹McMaster Ancient DNA Centre, McMaster University, Hamilton, Canada.
²Department of Anthropology, McMaster University, Hamilton, Canada.
³The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia.
⁴Department of History, Indiana University Bloomington, Bloomington, USA.
⁵Department of History, Rutgers University-Newark, Newark, USA.
⁶Department of Biology, McMaster University, Hamilton, Canada.
⁷Sydney Institute for Infectious Diseases, School of Life & Environmental Sciences and School of Medical Sciences, University of Sydney, Sydney, Australia.
⁸Department of Biochemistry, McMaster University, Hamilton, Canada.
⁹Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Canada.
¹⁰Canadian Institute for Advanced Research, Toronto, Canada.

3.1 Abstract

Plague has an enigmatic history as a zoonotic pathogen. This potentially devastating infectious disease will unexpectedly appear in human populations and disappear just as suddenly. As a result, a long-standing line of inquiry has been to estimate when and where plague appeared in the past. However, there have been significant disparities between phylogenetic studies of the causative bacterium, Yersinia pestis, regarding the timing and geographic origins of its reemergence. Here, we curate and contextualize an updated phylogeny of Y. pestis using 601 genome sequences sampled globally. We perform a detailed Bayesian evaluation of temporal signal in subsets of these data and demonstrate that a Y. pestis-wide molecular clock model is unstable. To resolve this, we devised a new approach in which each Y. pestis population was assessed independently. This enabled us to recover significant temporal signal in five populations, including the ancient pandemic lineages which we now estimate may have emerged decades, or even centuries, before a pandemic was historically documented from European sources. Despite this, we only obtain robust divergence dates from populations sampled over a period of at least 90 years, indicating that genetic evidence alone is insufficient for accurately reconstructing the timing and spread of short-term plague epidemics. Finally, we identify key historical data sets that can be used in future research, which will complement the strengths and mitigate the weaknesses of genomic data.

3.2 Introduction

Plague has an impressively long and expansive history as a zoonosis of rodents. The earliest “fossil” evidence of the plague bacterium, Yersinia pestis, stems from ancient DNA studies which date its first emergence in humans to the Late Neolithic Bronze Age (LNBA) approximately 5000 years ago [37]. During this time, Y. pestis has dispersed globally on multiple occasions due to an ability to infect a variety of mammalian hosts [38] and ever-expanding trade networks [39]. Few regions of the ancient and modern world remain untouched by this disease, as plague has demonstrated a persistent presence on every continent except Australia and Antarctica [40]. There are three historically documented pandemics of plague: the First Pandemic (6^th to 8^th century CE) [21], the Second Pandemic (14^th to 19^th century CE) [41], and the Third Pandemic (19^th to 20^th century CE) [42]. The advent of each has been marked by a series of outbreaks, such as the medieval Black Death (1346 - 1353 CE), which is estimated to have killed more than half of Europe’s population [43].

One long-standing line of inquiry in plague’s evolutionary history has been estimating the timing, origins, and spread of these past pandemics. The most intensively researched events have been: (1) the first appearance of Y. pestis in human populations [44], (2) the onset and progression of the three pandemics [1,20,21], and (3) the inter-pandemic or “quiescent” periods where Y. pestis recedes into wild rodent reservoirs and disappears from the historical record [45,46]. Our knowledge of these events has deepened considerably in recent years, owing in part to technological advancements in the retrieval and sequencing of ancient DNA alongside new molecular clock dating methods.

Despite intensive interest and methodological advancement, the rate and time scale of evolution in Y. pestis remains notoriously difficult to estimate. This is largely attributed to the substantial variation in evolutionary rates that has been documented across the phylogeny [19,20]. As a result, considerable debate has emerged over whether Y. pestis has no temporal signal [21], thereby preventing meaningful rate estimates, or if some Y. pestis populations have such distinct rates that a species-wide signal is obscured [22]. This uncertainty has resulted in radically different rate and date estimates between studies, with node dates shifting by several millennia [20,44].

The geographic origins and dispersal of past pandemics are similarly contentious, particularly concerning mechanisms of spread and their underlying ecology. This contention concerns competing hypotheses about the relative importance of localized persistence versus long-distance reintroduction [47,48,49]. Among both sides of this issue, there is an expectation that genomic evidence will play a significant role [50], if not resolve the debate [47]. However, no study to date has statistically evaluated whether Y. pestis genomes have sufficient geographic signal to confidently infer ancestral locations and spread.

To address these debates and obstacles, we: (1) curated and contextualized the most recent Y. pestis genomic evidence, (2) reviewed and critiqued our current understanding of plague’s population structure, (3) devised a new approach for recovering temporal signal in Y. pestis, and (4) critically assessed the reliability of phylogeographic analysis. We ground our results and their interpretation using informative historical case studies to demonstrate the methodological and interpretive consequences. We anticipate these results will impact both prospective studies of plague, such as environmental surveillance and outbreak monitoring, and retrospective studies, which seek to date the emergence and spread of past pandemics.

3.3 Results and Discussion

3.3.1 Population Structure

To determine the population structure of Y. pestis, we first estimated a maximum likelihood phylogeny using 601 global isolates including 540 modern (89.9%) and 61 ancient (10.1%) strains (Methods). We rooted the tree using two genomes of the outgroup taxa Yersinia pseudotuberculosis. The alignment consisted of 10,249 variant positions exclusive to Y. pestis, with 3,844 sites shared by at least two strains. Following phylogenetic estimation, we pruned the outgroup taxa from the tree to more closely examine the genetic diversity of Y. pestis. In Figure 3.1 A, we contextualize the global phylogeny using three nomenclature systems: the metabolic biovars, major branches, and historical time periods. In the following section, we compare and critique each system, identify any incongruent divisions and uncertainty, and explore an integrative approach for molecular clock analysis.

3.3.1.1 Biovars

The oldest classification system of Y. pestis is the biovar nomenclature that uses metabolic differences to define population structure. Accordingly, Y. pestis can be categorized into four classical biovars: Antiqua (ANT), Medievalis (MED), Orientalis (ORI), and microtus/pestoides (PE) [51,52]. Non-classical biovars have also been introduced, such as the Intermedium biovar (IN), which reflects a transitional state from Antiqua to Orientalis [53]. The biovar system is simple in application, as it largely focuses on two traits: the ability to ferment glycerol and reduce nitrate [52]. However, this simplicity is offset by the growing recognition of regional inconsistencies in metabolic profiles [54]. This is further exacerbated by the sequencing of non-viable, “extinct” Y. pestis for which metabolic sub-typing is challenging [1]. Researchers have responded to this uncertainty in a variety of ways, by extrapolating existing biovars [21] and creating new pseudo-biovars (PRE) [44]. Others have foregone the biovar nomenclature altogether in favor of locally-developed taxonomies [54]. Despite extensive research, it remains unclear which metabolic traits, if any, can be used to classify Y. pestis into distinct populations on a global scale.

3.3.1.2 Major Branches

In contrast to the biovar nomenclature which emphasizes phenotype, the major branch nomenclature focuses on genotype. This system divides the global phylogeny of Y. pestis into populations according to their relative position to a multifurcation called the “Big Bang” polytomy [20]. All lineages that diverged prior to this polytomy are grouped into Branch 0 and those diverging after form Branches 1 through 4. Because this multifurcation plays such a central role in this system, there is great interest in estimating its timing and geographic origins [46,55].

3.3.1.3 Time Period

Ancient Y. pestis genomes now represent a substantive portion of the known genetic diversity yet cannot be easily classified via direct metabolic testing. An alternative strategy has been employed that incorporates contextual evidence such as the sampling age, historical time period, and potential pandemic associations. In ancient DNA studies, the genetic diversity of Y. pestis is commonly divided into four time periods: the Late Neolithic Bronze Age [44], the First Pandemic [21], the Second Pandemic [19], and the Third Pandemic [20] (Figure 3.1 B).

The key strengths of the time period nomenclature are two-fold. First, it provides a foundation for interdisciplinary discourse, in which the genetic diversity can be contextualized and explained using relevant historical records. Second, this system effectively categorizes the historical outbreaks of plague recorded in Europe. This can be seen in Figure 3.1, where the Bronze Age strains (0.PRE) in Europe are replaced by those of the First Pandemic (0.ANT4), which in turn are replaced by strains of the Second Pandemic (1.PRE). However, this “strength” comes at a cost, as this system is far less effective in describing plague populations outside of Europe and incurs two significant risks.

The first risk is artificially grouping unrelated populations. Contemporaneous strains can have distinct evolutionary histories [56] even when originating from the same plague foci. The Pestoides (0.PE) and Medievalis (2.MED) biovars are informative examples, as these populations have co-existed in the Caucasus mountains since at least the 20th century (Figure 3.1 C). The second risk is artificially separating related populations. The Second and Third Pandemics were previously seen as mutually exclusive events dated to the 14^th to 18^th century, and the late 19^th to mid-20^th century respectively [57]. Recent historical scholarship has contested this claim and demonstrated that these dating constraints are a product of a Eurocentric view of plague [41]. The Second Pandemic is now recognized to have extended into at least the 19^th century [4,58] and the Third Pandemic is hypothesized to have began as early as the 18^th century [42,59]. Phylogenetic analysis reveals genetic continuity between these two events, as the Third Pandemic (1.ORI) is a direct descendant of the Second Pandemic (1.PRE) [60]. What remains unknown is the extent of temporal overlap, and as such, it is unclear how to distinguish these pandemics using genetic evidence.

A final limitation is that several populations are curiously excluded from the pandemic nomenclature altogether. For example, Branch 2 populations emerged at the same time as, but separate from, the Second Pandemic and have been associated with high mortality epidemics [61]. In particular, the Medievalis population (2.MED) has dispersed throughout Asia (Figure 3.1) with the fastest spread velocity of any Y. pestis lineage [42]. Despite its epidemiological significance, Branch 2 populations across Asia continue to be overlooked in the pandemic taxonomy of Y. pestis. As ancient DNA sampling strategies expand in geographic scope, and as more non-European historical sources are brought to bear, it will be important to consider how best to refashion the historical period nomenclature to encompass this diversity.

3.3.1.4 Integrative Approach

There exists no current classification system which comprehensively represents the global population structure of Y. pestis. Instead, integrative approaches have been previously used in large comparative studies of Y. pestis [20,62]. We therefore take the intersection of the three taxonomic systems discussed previously and describe 12 populations for further statistical analysis (Figure 3.1, Table S1, Table ??). In the following sections, we highlight the novel insight and issues that arise when this population structure is explicitly incorporated into molecular clock models and phylogeographic reconstructions.

3.3.2 Estimating Rates of Evolutionary Change

The extent of rate variation present in our updated genomic data set is notably larger than those depicted in previous studies [19,63] . A root-to-tip regression on sampling age reproduces the finding that substitution rates in Y. pestis are poorly represented by a simple linear model or “strict clock” (Figure 3.2 A). We found a very low coefficient of determination (R²=0.09) that indicates a large degree of unaccounted variation. This finding suggests that evolutionary change in Y. pestis may be more appropriately estimated using a “relaxed clock”, where rate variation is explicitly modeled. To test this hypothesis, we performed a Bayesian Evaluation of Temporal Signal (BETS) [64]. In brief, this method tested four model configurations including: (1) a strict clock, (2) a relaxed clock, (3) the true sampling ages, and (4) no sampling ages. Configurations with no sampling ages explicitly test for the presence of temporal signal. A comparison of the model likelihoods, or Bayes factors, was then used to assess the degree of temporal signal.

BETS was inconclusive when attempting to fit a single clock to the updated global diversity of Y. pestis. The Markov chain Monte Carlo (MCMC) inference exhibited poor sampling of parameter space (effective sample size, ESS < 200) across all model configurations, even when we reduced sources of variation by removing tip date uncertainty, fixing the tree topology, and removing up to 70% of the genomes. The poor performance of a single clock model is consistent with several other studies, in which low ESS values were observed [44] and divergence dates could not be estimated [21]. A single clock model is not a viable approach for Y. pestis, as there is excessive rate variation across the global phylogeny, which likely explains node-dating disparities between previous studies [20,44,62].

In contrast to the single clock approach, we observed substantial improvements when each population was assessed independently. All model parameters in our Bayesian analysis demonstrated MCMC convergence with ESS values well above 200 and we detected temporal signal in 9 out of 12 Y. pestis populations (Table S2). Several of these appeared more clock-like than others, which was observed through the root-to-tip regression and Bayesian rate estimation. For example, we found rate variation to be low in the Bronze Age (R²=0.92), moderate in the Second Pandemic (R²=0.76) and high in Medievalis (R²>=0.02). Our results indicate that population specific models are a more effective approach for estimating substitution rates across the global phylogeny.

To demonstrate the application of our molecular clock method and the interpretive consequences, we explored three outcomes as case studies. First, as a control, we examined Y. pestis populations that had (i) no temporal signal. These “negative” cases inform us about the minimum sampling time, or phylodynamic threshold, required to obtain robust temporal estimates in Y. pestis. Second, we examined populations with (ii) irreproducible estimates between studies, such as the time to most recent common ancestor (tMRCA). We discuss how sampling bias drives this outcome, and how it can be identified and corrected with ancient DNA calibrations where available. Finally, we identify the populations with the most (iii) informative rates and dates. We discuss how these molecular dates change our understanding of pandemic “origins” and complement recent historical scholarship.

3.3.2.1 No Temporal Signal

We found several Y. pestis populations with no detectable temporal signal that include the Intermedium (1.IN) and Antiqua biovars (2.ANT, 3.ANT). Despite being sampled over a period as long as 84 years (2.ANT), these populations have not accumulated sufficient evolutionary change to yield informative divergence dates. This limited diversity is identifiable in the maximum likelihood phylogeny as populations with the highest density of nodes sitting close to their roots (Figure S6). Out of caution, we also consider the rates and dates associated with the Antiqua population 4.ANT to be non-informative, as it has a similar node distribution, with a smaller number of samples (N=12) collected over an even shorter time frame (38 years).

Our results show that for robust temporal estimates to be obtained, Y. pestis must be sampled over multiple decades at minimum. This time frame is largely consistent with the finding that Y. pestis has one of the slowest substitution rates observed among bacterial pathogens [22]. Here we found that all populations had a median rate of less than 1 substitution per year (Figure 3.2 C, Table S4), with the lowest rate in Antiqua (0.ANT) at 1 substitution every 14.1 years (1.7 x 10^-8 subs/site/year) and the highest rate in Pestoides (0.PE) at 1 substitution every 1.1 years (2.1 x 10^-7 subs/site/year). In application, this means that Y. pestis lineages often cannot be differentiated until at least several decades have passed, a concept referred to in the literature as the phylodynamic threshold [65].

The phylodynamic threshold has been rigorously explored in other pathogens, such as SARS-CoV-2 [66], but not explicitly in Y. pestis. The challenges in reconstructing intra-epidemic plague diversity have been noted previously. For example, several isolates from the Second Pandemic dated to the medieval Black Death (1346-1353) are indistinguishable clones [60], extinguishing any hope of reconstructing its spread from genetic evidence alone. Our median rate estimation for the Second Pandemic (1.PRE) of 1 substitution every 9.5 years (2.5 x 10^-8 subs/site/year) is congruent with this finding. The clonal nature of the Black Death is not an exceptional event, but rather the norm based on the sampling time frame. Our results highlight a significance limitation and cautionary note for plague research, as genetic evidence alone is not suitable for reconstructing the timing of short-term, episodic epidemics.

3.3.2.2 Irreproducible Estimates

We observed two populations with detectable temporal signal associated with substantial node-dating conflicts: the Pestoides (0.PE) and Antiqua (0.ANT) biovars: both of which are paraphyletic. Conflicts were identified by comparing their estimated time to the most recent common ancestor (tMRCA) to that of their descendant populations. For example, the First Pandemic (0.ANT4) is a descendant clade of the larger Antiqua (0.ANT) population based on the maximum likelihood phylogeny (Figure 3.3). We would expect the tMRCA of the ancestral 0.ANT to pre-date the First Pandemic, for which ancient DNA calibrations are available. However, the tMRCA of 0.ANT is far too young (95% HPD: 1357 - 1797 CE), and incorrectly post-dates the tMRCA of 0.ANT4 (95% HPD: 39 - 234 CE) by more than a millennium. This outcome is somewhat paradoxical, as these populations have robust temporal signal and yet a critical examination of their divergence dates reveals they are unreliable.

This conflicting pattern has been previously described and attributed to sampling bias [67,68], specifically, insufficient sampling of basal branches and the presence of extensive rate variation. The two affected populations, Antiqua (0.ANT) and Pestoides (0.PE), have a low density of nodes at their roots in the maximum likelihood phylogeny (Figure S6). This pattern is also observed in another Antiqua population (1.ANT) which has a small sample size (N=4) and has previously been linked to rate acceleration events [20]. The dates associated with these three populations (0.PE, 0.ANT, 1.ANT) should be considered non-informative.

These node-dating issues reveal a clear limitation in our approach of estimating divergence times from population-specific models. Defining Y. pestis population by time periods has adverse effects, as ancient plague genomes can serve as crucial calibration points for rate changes that are otherwise unsampled in extant populations. In populations with poorly sampled basal branches (0.PE, 0.ANT, 1.ANT) we expect an optimization approach to be more ideal, in which a few closely related populations are merged or select ancient DNA calibrations are introduced [56]. Otherwise, divergence dates in these populations tend to be overly young, sometimes by more than a 1000 years [63], and are difficult to replicate between studies (Table ??).

The inability to infer divergence dates due to sampling bias also has several historical implications. Perhaps the most significant concerns the emergence of plague in Africa which makes up 90% of all modern plague cases [69], yet for which there remains not a single ancient sequence. Little progress has been made in sampling extant African plague diversity, with this region represented by only 1.5% (9/601) of all genomes. Furthermore, the oldest genetic evidence of African plague comes from the 1.ANT population, which has only four representative strains. Despite this sparse sampling, researchers have repeatedly attempted to use genomic evidence to date the first appearance of Y. pestis in Africa [20,62,63]. The result is a complete lack of congruent dates for this event, as the majority of tMRCA estimates for 1.ANT do not overlap (Table ??). These divergence dates are of limited value for historical interpretation [63,70,71] and should be treated with great skepticism.

3.3.2.3 Informative Rates and Dates

Excluding populations with no detectable signal, we identified five populations with potentially informative rates and dates. These include the Bronze Age (0.PRE), Medievalis (2.MED), the First Pandemic (0.ANT4), the Second Pandemic (1.PRE), and the Third Pandemic (1.ORI). The Bronze Age marks the first identified appearance of Y. pestis in humans, and the three pandemics, along with Medievalis, are historically associated with high mortality and rapid spread [42]. Due to this epidemiological significance, these five populations have been sampled over the longest time frames, ranging from 92 years for the Third Pandemic (1.ORI) to 1250 years for the Bronze Age (1.PRE). This affirms the importance of long-term heterochronous sampling for Y. pestis, made possible through the retrieval of ancient DNA [1] and recent sequencing of early 20^th century culture collections [61]. By curating and contextualizing this new heterochronous data, we were able to detect temporal signal in extant Y. pestis populations without the use of ancient DNA calibrations for the first time.

Our estimates of the tMRCA for the First and Second Pandemics share a common theme, in that the genetic origins potentially pre-date the appearance of plague in traditional (ie. European) historical narratives. For example, the earliest textual evidence of the Second Pandemic (1.PRE) in Europe comes from the Black Death (1346) [43]. However, we estimate the mean tMRCA of this population to be earlier, between 1214 and 1315 CE. Similarly, the first recorded outbreaks of plague during the First Pandemic (0.ANT4) come from the Plague of Justinian (541 CE) [72]. Instead, we estimate that the strains of Y. pestis associated with this pandemic shared a common ancestor between 272 and 465 CE.

One explanation for these disparate timelines is sampling bias, as western European sources dominate both the genetic and historical record. Recent historical scholarship has contested Eurocentric timelines [4,73] by demonstrating the presence of plague in western Asia far earlier than previously thought. Arabic historical chronicles suggest that the Second Pandemic may have begun as early as the 13^th century [74]. Genetic dating appears to support these historical critiques, by expanding the timelines of past pandemics to make space for more diverse historical narratives. An alternative explanation for our earlier dates is tip date uncertainty. The radiocarbon estimates for the majority of ancient Y. pestis samples have confidence intervals of ±50 years or more. As we only used the mean sampling age for molecular clock models, it’s possible that the true tMRCA intervals are larger and do overlap with historical estimates. How much uncertainty can be included in molecular clock models for Y. pestis, while still achieving convergence of parameter estimates, remains to be tested.

In contrast to the ancient pandemics, our temporal estimates of the Third Pandemic were more closely aligned to the historical evidence. We estimated that isolates from the Third Pandemic (1.ORI) shared a common ancestor between 1806 and 1901 CE, which aligns well with the timeline as reconstructed from epidemiological reports. Highly localized plague cases began appearing in southern China and (1772-1880) and later spread globally out of Hong Kong (1894-1901) [42,75,76]. Our estimate also overlaps with the majority of previous studies, although it is the youngest tMRCA to date (Table ??). This comparison demonstrates the reproducibility of our estimate, but also reveals how the “origin” story of the Third Pandemic continues to change. The phylogenetic root once estimated to be as old as 326 BCE [62] is now resolved to be much younger (19^th century CE). This younger date is particularly intriguing, as a major epidemiological transition occurred in the 19^th century with the reemergence of several other notable pathogens [78]. Reconstructing the evolutionary history of the Third Plague Pandemic may not only inform us about the epidemiology of plague, but contribute to a broader understanding of the factors that led to reemerging diseases in the modern era [79].

Even less is known about the Medievalis population whose strains were hypothesized to be responsible for plague outbreaks in the Caspian Sea region which reoccurred throughout the 19^th and 20^th centuries [61]. We estimated the tMRCA of Medievalis (2.MED) to be between 1560 and 1845 CE, which overlaps with all previously published estimates (Table ??). While this population was once thought to have emerged as early as 583 BCE, there is now growing consensus that the earliest possible emergence was in the 16^th century CE. Interestingly, the Caspian Sea region appears to be a nexus of plague as the only known area where the distributions of both European and Asian Y. pestis strains overlap (Figure 3.4). This raises the interesting possibility that distinct populations of Y. pestis were co-circulating during the Second Pandemic, a hypothesis that ancient DNA from Medievalis could help elucidate. In the absence of direct genetic sampling, an alternative approach is to infer the ancestral locations and spread of plague using phylogeographic analysis.

3.3.3 Estimating Ancestral Locations and Spread

Phylogeographic inference relies heavily on the degree of geographic signal in the data. To assess this in Y. pestis, we tested whether phylogenetic relationships correlate with sampling locations. We identified the closest genetic relative of every genome in our data set, using the maximum likelihood phylogeny. We then recorded whether these genomes were collected from the same location at three levels of resolution: (1) continent, (2) country, and (3) province. As a statistical measure of geographic structure, we reported the percent of genomes that had a closest relative sampled from the same location.

The majority of Y. pestis populations (6/12) were localized to a single continent (Figure S2). Of those distributed across multiple continents, geographic structure ranged from 76% to 99%. At the country level, the degree of geographic structure dropped drastically in some populations (Bronze Age 0.PRE: 38%) while remaining stable in others (3.ANT: 100%). The inverse of this pattern appeared at the province level, where Antiqua (3.ANT) dropped to 45% while the Bronze Age (0.PRE) was unchanged. As expected, geographic structure decreases at finer resolutions but the extent to which varies by population.

The factors which appeared to govern these patterns are wide-ranging, but primarily concern mobility. One striking aspect is the difference in host composition between populations driving this signal. We observed a correlation (R²=0.43) between the degree of geographic structure and the percentage of samples collected from a non-human host (Figure 3.5). Populations primarily sampled from rodents and arthropod vectors had far more geographic structure than those found exclusively in humans. This is epidemiologically consistent with the greater mobility of human populations, which disrupt geographic clustering via long-distance spread.

Another factor is the difference in substitution rates relative to migration rates. In populations that are spread faster (locations/year) than they evolve (substitutions/year), geographic structure decreases as identical isolates (clones) are found in different locations. Lineages of the Second Pandemic (1.PRE) exemplify this, as clonal isolates have been sampled across multiple countries [19], leading to uncertainty regarding the routes of spread. Along similar lines is the disparity in the sizes of locations across the globe used in the analyses. China, for example, is approximately the same size (0.94) as the European continent. Thus, at the country level, plague populations in Asia are sampled over fewer locations and have stronger geographic structure. An informative comparison is Antiqua (3.ANT) with 100% structure across China and Mongolia, and the Second Pandemic (1.PRE) with 55% structure across 11 European countries. Plague populations that are distributed over multiple, smaller locations have less geographic structure, leading to greater uncertainty when inferring past migrations.

These observations suggests that phylogeographic inference is best suited to populations that are slow-spreading and/or rapidly-evolving, a significant problem for Y. pestis, as it is both a rapidly-spreading and slow-evolving pathogen [22]. To explore how this feature impacts our ability to infer ancestral locations for Y. pestis in the past, we independently fit three discrete migration models [80] to the maximum likelihood phylogeny using the sampling locations by: (1) continent, (2) country, and (3) province. For each internal node, we extracted the ancestral location with the highest likelihood given the data. To explore whether genomic evidence can provide meaningful geographic estimates, we compared two case studies: the Third Pandemic, which serves as a “control” for our phylogeographic analysis, and the Second Pandemic, where the origins and spread remain contentious due to limited non-European historical evidence.

3.3.3.1 The Third Pandemic (19^th - 20^th Century CE)

The re-emergence of plague during the Third Pandemic was closely monitored by contemporary researchers [81]. As a result, the geographic origins are well-documented and firmly established. Highly localized plague cases first appeared in Yunnan, China (1772-1800), later spreading throughout the province (1800-1880) [42,76]. Plague then traveled eastwards to the coast (1880-1900), where it dispersed globally out of Hong Kong (1894-1901) [82].

We estimated that the Third Pandemic (1.ORI) diverged from an ancestral Intermedium (1.IN) population located in Yunnan, China (probability: 1.00) (Figure 3.6, Figure S3). Plague then rapidly diversified (reflected by a polytomy), after which new lineages appeared in North America (probability: 0.99), South America (probability: 1.00), and Africa (probability 1.00). Due to the unresolved branch structure, we could not confidently estimate the routes of this dispersal. The migrations that could be reconstructed all occurred post this radiation, and included endemic cycling in Southeast Asia (China, Myanmar) as well as North America (USA), which led to a re-introduction of plague into South America (Peru).

The strength and specificity of our estimated origin is striking, given that we could not confidently locate the ancestral divergence for any other population (Figure S3). This may be because the Third Pandemic (1.ORI) is a direct descendant of the Intermedium (1.IN) population, which has strong geographic structure at the province level (87%). In addition, isolates from Yunnan fall both basal to, and within, the known diversity of the Third Pandemic (1.ORI). This combination provides strong evidence of the geographic origin, which is congruent with the historical narrative. This level of precision was only possible due to the extensive sampling of non-human hosts. Yunnan is solely represented by rodent and arthropod samples (N=18) and therefore this reservoir would be entirely invisible if only human isolates were used. Like others [50], we caution that the presence and location of rodent reservoirs should not be inferred from phylogenetic evidence alone. Instead, new modeling approaches have been developed [83] that could leverage multi-disciplinary sources [76] to correct for sampling biases in the genomic data.

3.3.3.2 The Second Pandemic (14^th - 19^th Century CE)

In comparison to the Third Pandemic, there is far less surviving historical evidence from the Second Pandemic. Historians have identified early accounts of plague appearing in 1346 in the Golden Horde, which encompass Central Asia and Eastern Europe [2]. The disease then appears to have spread southward through the Caucasus to reach Western Asia, and westward to the Crimea, from which it dispersed throughout Europe, the Middle East, and North Africa. Plague reoccurred for several centuries in these territories, with successive waves varying in scale from localized epidemics to continent-wide outbreaks [43]. In Western Europe, plague receded after 1720 and would not re-emerge again until 1899 [84], while in Western Asia the disease never disappeared [4].

We estimated that the Second Pandemic (1.PRE) diverged from an ancestral population located in China (probability: 0.93) as part of the “Big Bang” polytomy. The ancestral province was poorly resolved, with the most likely location being near Xinjiang (probability): 0.64) which includes the Tien Shan mountains. The location of the Second Pandemic MRCA was also uncertain and estimated to be in Russia (probability: 0.63), specifically in Tatarstan (probability: 0.37) which was part of the Golden Horde. However, these low likelihoods indicate that our estimated origins are poorly supported by the data. With regards to spread, only four migrations could be confidently inferred (likelihood > 0.95) across the full sampling time frame (500 years). The available genetic data therefore provides little definitive evidence as to the spread of plague during the Second Pandemic.

This then begs the question of whether more ancient DNA samples will improve these geographic estimates? As it currently stands, the relationships between all countries could not be resolved during the 14^th century, nor among the Baltic states sampled in the 15^th century, or between England and Russia in the 17^th century. Furthermore, the historical evidence indicates that plague often spread to multiple countries, if not continents, in the span of a decade [85]. This migration rate is far higher than the substitution rate of the Second Pandemic (1.PRE), which accumulates 1 mutation every 9.5 years. The genomic data alone does not have sufficient resolving power to reconstruct the spread of short-term, episodic waves of plague. Instead, this evidence is best used in conjunction with higher-resolution evidence, such as historical case records [67,86].

3.4 Conclusions

We sought to contribute to five lines of debate concerning the evolutionary history of Yersinia pestis. The first, was whether Y. pestis has sufficient temporal signal (ie. molecular clock) to accurately estimate rates and dates. Accordingly, we found that a species-wide clock model was methodologically unstable and did not lead to reproducible estimates. However, we observed significant improvements when each Y. pestis population was assessed independently. We therefore recommend this approach for future studies, as the full global diversity of Y. pestis can be utilized without down-sampling.

Second, we explored the minimum sampling time frame for Y. pestis that yields informative rates and dates. The lowest substitution rate was observed in Antiqua (0.ANT) with a median rate of 1 substitution every 14.1 years, meaning that some Y. pestis lineages cannot be differentiated until several decades have passed. In addition, we found no temporal signal in several populations (1.IN, 2.ANT, 3.ANT) which have been sampled over a period as long as 84 years. Genetic evidence alone may not be suitable for reconstructing the timing of short-term, epidemic events of plague.

In the third instance we tackled node dating disparities between studies. We explored how phylogenetic sampling bias drives this outcome, and how it can be detected and remedied with ancient DNA calibrations. In particular, we focused on the non-overlapping tMRCA estimates of the first appearance of Y. pestis in Africa (1.ANT). Until sampling strategies diversify, we caution that the published divergence dates for this population, and several others (Antiqua 0.ANT, Pestoides 0.PE), are of limited value for historical interpretation.

The fourth point revolved around the timing of past pandemics. We observed a common theme in which the genetic dates (tMRCAs) of pandemic Y. pestis potentially pre-date the historical dates by several decades or centuries. For example, we estimated the tMRCA of the Second Pandemic to be between 1214 and 1315 CE which pre-dates the Black Death (1346 - 1353 CE). Similarly, we estimated the tMRCA of the First Pandemic to be between 272 and 465 CE, also pre-dating the Plague of Justinian (531 CE). We discussed this disparity in light of methodological concerns such as radiocarbon dating uncertainty and geographic sampling biases that have historically favored European sources. We anticipate that additional samples from non-European locations will add greater clarity to how the timelines of past pandemics can be expanded to include more diverse historical narratives.

Finally, we asked whether Y. pestis has sufficient geographic signal to accurately infer ancestral locations and spread. As expected, geographic structure diminished at finer resolutions but also varied by population. We found that the geographic origins of the Third Pandemic (1.ORI) were unambiguously inferred to be in Yunnan province, China (likelihood=1.00) and attributed this to comprehensive sampling of rodent reservoirs. In contrast, we demonstrated how the origins and spread of the Second Pandemic (1.PRE) cannot be resolved from genetic evidence alone, as this population is exclusively sampled from human remains which have high mobility. In isolation, Y. pestis genomic evidence may be unsuitable for inferring point migrations and the directionality of spread. Alternatively, new methods which incorporate non-genetic evidence, such as outbreak case-occurrence records [67], into phylogeographic analysis presents an exciting avenue for interdisciplinary collaboration, as explicitly integrative models will complement the strengths of genetic and historical evidence, while mitigating their respective weaknesses.

3.5 Methods

A visual overview of the computational methods is provided in Figure S7 and is publicly available as a snakemake pipeline (https://github.com/ktmeaton/plague-phylogeography/).

3.5.1 Data Collection

Y. pestis genome sequencing projects were retrieved from the National Centre for Biotechnology Information (NCBI) databases using NCBImeta v0.7.0 [87]. 1657 projects were identified and comprised three genomic types. 1473 projects came from isolates sampled during the 20^th and 21^st centuries, which we label as “modern”. Of these, (i) 586 projects were available as assembled genomic contigs (FASTA), and (ii) 887 were only available as unassembled sequences (FASTQ). An additional (iii) 184 projects came from skeletal remains with sampling ages older than the 19^th century, which we label as “ancient”. The 887 modern unassembled genomes were excluded from this project, as the wide variety of laboratory methods and sequencing strategies precluded a standardized workflow. In contrast, the 184 ancient unassembled genomes were retained given the relatively standardized, albeit specialized, laboratory procedures required to process ancient tissues.

Collection location, date, and host metadata were curated by cross-referencing the original publications. Locations were transformed to latitude and longitude coordinates using GeoPy v2.0.0 and the Nominatim API (https://github.com/osm-search/Nominatim) for OpenStreetMap. Coordinates were standardized at the level of country and province/state, using the centroid of each. Collection dates were standardized according to their year and recording uncertainty arising from missing data and radiocarbon estimates. Genomes were removed if no associated date or location information could be identified in the literature, or if there was documented evidence of laboratory manipulation.

Two additional data sets were required for downstream analyses. First, Y. pestis strain CO92 (GCA_000009065.1) was used as the reference genome for sequence alignment and annotation. Second, Yersinia pseudotuberculosis strains NCTC10275 (GCA_900637475.1) and IP32953 (GCA_000834295.1) served as an outgroup to root the maximum likelihood phylogeny.

3.5.2 Sequence Alignment

Modern assembled genomes were aligned to the reference genome using snippy v4.6.0 (https://github.com/tseemann/snippy), a pipeline for core genome alignments. Default parameters were used, along with the following minimum thresholds: depth of 10X, base quality of 20, mapping quality of 30, major allele frequency of 0.9. Modern genomes were excluded if the number of sites covered at a minimum depth of 10X was less than 70% of the reference genome.

Ancient unassembled genomes were downloaded from the SRA database in FASTQ format using the SRA Toolkit. Pre-processing and alignment to the reference genome was performed using the nf-core/eager pipeline v2.2.1, a reproducible workflow for ancient genome reconstruction [88]. Default parameters were used, along with the following minimum filters: read length of 35 bp, an edit distance of 0.01, and a 16 bp seed length. Only merged reads were retained from paired end-sequencing projects. Ancient genomes were removed if the number of sites covered at a minimum depth of 3X was less than 70% of the reference genome.

A multiple sequence alignment was constructed using the snippy core module of the snippy v4.6.0 pipeline. The output alignment was filtered to only include chromosomal sites that were present in at least 95% of samples (ie. a missing data threshold of 5%). The filtered alignment included 10,249 variant positions exclusive to Y. pestis, with 3,844 sites shared by at least two strains

3.5.3 Maximum Likelihood Phylogenetic Analysis

Model selection was performed on the full data set (N=601) using Modelfinder [89] which identified the K3Pu+F+I model as the optimal choice based on the Bayesian Information Criterion (BIC). The K3P model, also known as K81, estimates substitution rates using three categories, in this case: (1) A<->C equals G<->T, (2) A<->G equals C <->T, and (3) A<->T equals C<->G). The “u+F”suffix indicates that base frequencies will be empirically evaluated and thus are not assumed to be equal. The “+I” suffix indicates that a proportion of the alignment includes invariable sites (ie. non-SNPS),

A maximum likelihood phylogeny was estimated for this data across 10 independent runs of IQTREE2 [90]. Branch support was evaluated using 1000 iterations of the ultrafast bootstrap approximation [91], with a threshold of 95% required for strong support.

3.5.4 Data Partitions

The full multiple sequence alignment was alternatively split into 12 populations, referred to as the population data sets. These populations were defined by the intersection of the following nomenclature systems: the major branches, metabolic biovars, and historical time period (Table S1). One sample was excluded, a Pestoides isolate from the Bronze Age (Strain RT5, BioSample Accession SAMEA104488961), as this population would be of size N=1.

In an attempt to improve the performance and convergence of molecular clock analyses, a subsampled data set was also constructed. Populations that contained multiple samples drawn from the same geographic location and the same time period were reduced to one representative sample. The sample with the shortest terminal branch length was prioritized, to diminish the influence of uniquely derived mutations on the estimated substitution rate. An interval of 25 years was identified as striking an optimal balance, resulting in 191 samples, which is a 68% reduction from the original data set.

3.5.5 Estimating Rates of Evolutionary Change

To explore the degree of temporal signal present in the data, two categories of tests were performed. The first was a root-to-tip (RTT) regression on the mean sampling age using the statsmodels python package. Given the relative simplicity of a regression model, the full data set of 601 genomes was used.

For the second test of temporal signal, a Bayesian Evaluation of Temporal Signal (BETS) was conducted. This consisted of running four model configurations: either with or without sampling dates, and under a strict or uncorrelated lognormal relaxed clock models (strict and UCLN, respectively). We calculated the log marginal likelihood under each model configuration using stepping-stone sampling as implemented in BEAST v1.10 [92]. To this end, we ran 200 path steps, each with a Markov chain Monte Carlo (MCMC) of length 10⁶ steps. In addition to the clock model we used a constant-size coalescent tree prior, a GTR+gamma nucleotide substitution model.

Importantly, the models involved priors that were proper for all parameters, which is essential for marginal likelihood calculations [93]. In particular, the molecular clock rate (ie. the mean of the UCLN clock model or the global rate of the strict clock) had a continuous time Markov chain reference prior [94], the population size of the constant-size coalescent an exponential prior distribution with mean 10, and the standard deviation of the UCLN had an exponential prior with mean 0.33. Marginal likelihood estimation with stepping- stone sampling does not require from the posterior distribution. To obtain the posterior distribution we used an MCMC of 10⁹ steps, sampling every 10³ steps. For situations where the effective sample size (ESS) of any parameters was below 200 we increased the chain length by 50% and reduced sampling frequency accordingly.

3.5.6 Estimating Ancestral Locations and Spread

To explore underlying phylogeography, we performed ancestral state reconstruction using the maximum likelihood method implemented in TreeTime [80]. We independently fit three discrete mugration models to the maximum likelihood phylogeny using the sampling locations by: (1) continent, (2) country, and (3) province. The mapping of countries to continents was defined according to the open-source resource GeoJSONRegions (https://geojson-maps.ash.ms/). For each internal node, we extracted the ancestral location with the highest likelihood given the data.

We also conducted a discrete trait analysis in BEAST [92,95]. Country of sample origin was chosen as the discrete trait of interest. A coalescent constant population size tree prior was chosen with an exponential prior placed on the effective population size with mean 100000. We modeled evolutionary rate with an uncorrelated relaxed lognormal clock, with a CTMC scale prior on the mean and exponential prior with mean 1/3 on the standard deviation of the underlying lognormal distribution [96]. A GTR+gamma nucleotide substitution model with estimated base frequencies for 1.ORI, 1.PRE, 0.ANT4, and 0.PRE. The same settings were used for 2.MED with the exception of swapping the GTR+gamma model to an HKY+gamma model. MCMC chains were run for 10⁷ steps with sampling every 10³ steps. We used logCombiner to combine between 3-5 replicate runs, with 10% burnin, for each clade to achieve ESS above 200 for each parameter and Maximum Clade Credibility (MCC) trees [97].

3.5.7 Visualization

Data visualization was performed using the python package seaborn [98] and Auspice [99], a component of the Nextstrain visualization suite.

3.6 Acknowledgments

This work was supported by the Social Sciences and Humanities Research Council of Canada (#20008499), CIFAR humans and the microbiome program (HP), and the MacDATA Institute (McMaster University, Canada). This research was enabled in part by support provided by Compute Ontario (https://www.computeontario.ca/) and Compute Canada (https://www.computecanada.ca). We would like to thank Madeline Tapson, Dr. Dan Salkeld, and Dr. Jennifer Klunk for their expertise in contextualizing the ecology and evolutionary history of plague. We also thank Jessica Hider and Marie-Hélène B.-Hardy for discussions on the interpretation of genomic data from zoonotic pathogens. We are indebted to Dr. Ana Duggan and Dr. Emil Karpinski for their insight regarding Bayesian methods for phylogenetic analysis. We thank members of the Sherman Centre for Digital Scholarship, including Dr. Andrea Zeffiro, Dr. John Fink, Dr. Matthew Davis, and Dr. Amanda Montague, for their assistance in developing the genomic database. Finally, we would like to thank all past and present members of the McMaster Ancient DNA Centre and the Golding Lab at McMaster University.

3.7 Author Contributions

K.E, G.B.G, and H.N.P designed the study. K.E, L.F., and S.D performed computational analysis. A.G.C and N.V. provided historical sources and interpretation. E.C.H critiqued and revised the computational methods and discussion. G.B.G provided access to computational resources and data storage. H.N.P and G.B.G supervised the study. K.E wrote the manuscript with contributions from all co-authors.

3.8 Competing Interests Statement

3.9 Supplementary Information

3.9.1 Tables

3.9.2 Figures

4 Plague in Denmark (1000-1800): A longitudinal study of Yersinia pestis

Prepared 08 December 2021 for submission to
The Proceedings of the National Academy of Sciences

Katherine Eaton*^1,2, Ravneet Sidhu*^1,3, Jennifer Klunk^1,4, Julia Gamble⁵, Jesper Boldsen⁶, Ann G. Carmichael⁷, Nükhet Varlık⁸, Sebastian Duchene⁹, Leo Featherstone⁹, Vaughan Grimes¹⁰, G. Brian Golding³, Sharon DeWitte¹¹, Hendrik N. Poinar^1,2,12,13,14

¹McMaster Ancient DNA Centre, McMaster University, Hamilton, Canada.
²Department of Anthropology, McMaster University, Hamilton, Canada.
³Department of Biology, McMaster University, Hamilton, Canada.
⁴Daicel Arbor Biosciences, Ann Arbor, USA.
⁵Department of Anthropology, University of Manitoba, Winnipeg, Canada.
⁶Department of Forensic Medicine, Unit of Anthropology (ADBOU), University of Southern Denmark, Odense, Denmark.
⁷Department of History, Indiana University Bloomington, Bloomington, USA.
⁸Department of History, Rutgers University-Newark, Newark, USA.
⁹The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, Australia.
¹⁰Department of Archaeology, Memorial University of Newfoundland, St. Johns, Canada.
¹¹Department of Anthropology, University of South Carolina, Columbia, USA.
¹²Department of Biochemistry, McMaster University, Hamilton, Canada.
¹³Michael G. DeGroote Institute of Infectious Disease Research, McMaster University, Hamilton, Canada.
¹⁴Canadian Institute for Advanced Research, Toronto, Canada.

4.1 Abstract

The epidemiology of plague in the past is highly controversial, owing to the scarcity and ambiguity of historical evidence. A frequent source of debate is the re-emergence and continuity of plague in Europe during the 14^th to 18^th centuries CE. Scandinavia is particularly underrepresented in the historical archives, despite having a uniquely long history of plague (5000 years) as revealed through ancient DNA analysis. To better understand the historical epidemiology of plague in this region, we performed in-depth (N=298), longitudinal screening (800 years) for the plague bacterium, Yersinia pestis, across 13 archaeological sites in Denmark. We captured the emergence and continuity of Y. pestis in this region over a period of 400 years (14^th - 17^th century CE), for which the plague-positivity rate was 8.3% (3.3% - 14.3% by site). These results deepen the epidemiological link between the plague bacterium, Y. pestis, and the unknown pestilence that afflicted medieval and early modern Europe. Furthermore, this study paves the way for the next generation of historical disease research, in which hypotheses concerning mortality can be tested using population-scale, genomic evidence from ancient pathogens.

4.2 Introduction

Europe endured a 500-year long pandemic from the 14^th to 18^th centuries CE [19]. During this period, mysterious outbreaks reoccurred every 10 years with mortality estimates as high as 50% during the infamous Black Death (1346-1353) [3]. Paleogeneticists have increasingly identified the plague bacterium Yersinia pestis as the most likely agent, although the epidemiology of this pandemic remains controversial [100]. The major source of debate concerns two aspects: mortality and spread. The ecology of Y. pestis is highly complex, and involves both zoonotic “spillover” from rodent populations as well as inter-human transmission [38]. As a result, both disease exposure and spread are known to vary between regions and over time [100]. These differences are challenging to reconcile, and have led to significant controversy concerning the location of plague reservoirs in the past [50].

Recent studies have explored this question by synthesizing genetic evidence [50] and historical records [39,47] across Europe. These sources have significant geographic gaps, such as the complete lack of evidence from Scandinavia in digitized databases [86]. This gap has been attributed to the sparseness of historical sources and ambiguity with regards to disease terminology during the medieval period [3]. However, recent ancient DNA research [101] has revealed that the history of plague in Scandinavia is among the oldest in the world, and established the presence of Y. pestis in Sweden 5000 years ago. This raises the possibility of long-term persistence of plague in Scandinavia, with Y. pestis re-emerging as a local, endemic disease.

To evaluate the possibility of undocumented plague persistence, we screened for the presence of Y. pestis in the Anthropological DataBase Odense University (ADBOU) collection. This extraordinary collection includes preserved and curated skeletal remains from over 16,000 Danish individuals, dated from the Viking Age to the Early Modern period. To ensure a wide variety of locations were represented, we sampled 298 individuals across 13 archaeological sites from the mainland (Jutland), as well as two islands (Funen and Lolland). Based on the skeletal dates, these individuals represent 800 years of population history (1000-1800 CE) which includes both the known pandemic period in Denmark (1350-1657) and the quiescent periods (1000-1350 CE, 1658-1800) for which no outbreaks of plague are historically documented [100].

4.3 Results and Discussion

We detected Y. pestis in 7 archaeological sites using PCR assays and targeted sequencing (Figure 4.1 A). Across the 7 sites, 8.3% of individuals (13/157) tested positive for Y. pestis, ranging from 3.3% at Ribe Lindegärden to 14.3% at Hågerup. This positivity rate could be considered an underestimate of the ‘true’ prevalence of Y. pestis in Danish populations, due to variable DNA preservation. On the other hand, it may be an overestimate due to the osteological paradox [102], in which mortality is selective and the deceased are not representative of the living population. While the exact extrapolation is unclear, our Y. pestis positivity rate (3.3 - 14.3%) does align with mortality estimates (5 - 20%) during the later epidemics of the medieval and early modern period [85,103].

Of the 13 plague-positive individuals, 9 had sufficient sequencing depth (>3X) of the Y. pestis chromosome for phylogenetic analysis (Figure 4.2 C). To estimate a time-scaled phylogeny and dates for these 9 samples, we fit a relaxed molecular clock to an alignment of Y. pestis genomes which included 40 other isolates (Figure 4.1 B). We observed that all Danish strains clustered strongly (posterior: 1.0) within the known diversity of medieval and early modern Y. pestis in Europe (Figure 4.3). We found no evidence to suggest that Neolithic lineages of Y. pestis in Scandinavia (5000 YBP) [101] left descendants in medieval Danish populations. If long-term persistence of Y. pestis did occur in this region, it fell outside the geographic and temporal scope of this study.

We found no evidence of Y. pestis in Denmark between 1000 and 1300 CE. The factors influencing the preservation of ancient DNA are wide-ranging and complex, thus the absence of evidence cannot prove evidence of absence. That being said, we sampled a minimum of 85 individuals and a maximum of 165 individuals that pre-date the 14^th century (Figure 4.2 A). Taking the mean positivity rate observed in this study (8.3%), we would expect to detect Y. pestis in 7 to 13 individuals from this time frame if it were present. We therefore interpret our negative results from this period as tentative evidence that Y. pestis was a relatively new pathogen in medieval Denmark, that did not become abundant and/or widespread until at least the 14^th century.

The earliest evidence of Y. pestis in Denmark was found in the town of Ribe. Two individuals were associated with Y. pestis from the first half of the 14^th century, dated to 1333 (1301-1366) and 1350 (1319-1383). These estimates are highly congruent with the historical record, as the first documented appearance of plague in Denmark was at Ribe in 1349 [104]. Furthermore, these strains fell within an unresolved cluster (posterior: 0.15) of samples from Northern and Western Europe (Figure 4.3) which has previously been linked to clonal spread of the Black Death (1343-1356) [19]. Our molecular dates support this historical association, albeit only weakly, as the precise epidemic period cannot be resolved due to the large confidence intervals of our estimates (>50 years).

The next period in which we identified Y. pestis was in the latter half of the 14^th century. A third isolate from Ribe was dated to 1370 (1336-1408) and strongly clustered (posterior: 0.99) with post-Black Death samples from The Netherlands and Russia. These samples have previously been attributed to the pestis secunda (1357-1366) [105], although we find the pestis tertia (1364-76) [85] to be an equally likely candidate. This clade also has broader epidemiological significance, as it is directly ancestral to the Third Pandemic of plague (19^th-20^th century) [19]. Our results therefore reveal new global connections, as the same lineage that afflicted medieval Danish populations would later re-emerge to cause modern epidemics of plague, including the recent outbreaks in Madagascar [106].

We observed a gap in the continuity of plague at Ribe, as no Y. pestis was detected there between 1408 and 1484. This was surprising, as 86% of individuals (43/50) from this site were archaeologically dated to between 1400 and 1536. Instead, the distribution of Y. pestis appeared to shift during this period from the eastern coast of Jutland to the western coast. We recovered 3 distinct, and possibly contemporaneous, isolates of Y. pestis from 3 sites near Horsens dated to 1429 (1392-1467), 1433 (1403-1464) and 1457 (1427-1487). These genomes were most closely related to individuals sampled in Germany, Lithuania, Poland, and England (Figure 4.3). This geographic association parallels the historical record, in which outbreaks in Denmark coincided with those in the Baltic region [85]. However, recent studies have demonstrated that the directionality and spread of zoonotic diseases cannot be robustly inferred from genomic data alone [83,107]. Instead, our results establish an epidemiological link between Y. pestis and historical case records in Denmark, which could be jointly modeled with greater resolving power [67] in future work.

In the 16^th century, we once again observed Y. pestis at Ribe. We dated two Y. pestis isolates from this region to 1513 (1484-1546) and 1525 (1494-1560). Furthermore, we also found evidence of Y. pestis in the northern site of Faldborg dated to 1594 (1550-1649). As an estimate of plague’s disappearance (1649), this is congruent with the historical record which documents the last recorded outbreak of plague in Jutland to last from 1654-1657 [100]. We found no evidence of Y. pestis in Denmark after this point, specifically between 1649 and 1800 CE. However, no individuals definitively post-date 1649 CE, although this period could include a maximum of 70 individuals (Figure 4.2 A). We would therefore expect to detect Y. pestis in 0 to 2 individuals (3.3%) from this time frame if it were present. Our results do not differ from this expectation, and are therefore not informative with regards to the disappearance of Y. pestis in Denmark. To address this question, additional samples would be required from the 17^th and 18^th centuries.

4.4 Conclusion

This study marks the first population-level analysis of ancient Y. pestis, where we performed in-depth (N=298), longitudinal sampling (800 years) within a single country (Denmark). We describe the earliest known appearance of Y. pestis in Denmark (14^th century), and document the continuity of this pathogen in Scandinavia over a period of 400 years (17^th century). Furthermore, we provide the first positivity rates of historical plague from molecular evidence, as we detected Y. pestis in 8.3% of Danish individuals. Our phylogenetic analysis was highly congruent with the sparse textual evidence of pestilence in Denmark, with regards to the timing of outbreaks and geographic ties to the Baltic region. We also provide novel evidence of plague exposure among Danish populations, such as the site of Tirup, where there is no surviving historical evidence. These results are of importance for both researchers of plague and other infectious diseases, as they (1) illuminate undocumented pathogens in the historical record, (2) reveal new connections between our past and present experience of plague, (3) broaden our understanding of the epidemiology of re-emerging diseases.

4.5 Materials and Methods

We sampled 298 individuals across 13 archaeological sites in Denmark (Figure 4.2 A, Dataset S1). Site occupation dates spanned from the 11^th to the 19^th century CE. We estimated individual date ranges based on burial position, which was categorized according to cultural shifts that occurred in Denmark throughout the medieval and early modern period [108]. When the original stratigraphic context was preserved, we refined these individual estimates further. For individuals with ambiguous or conflicting archaeological estimates, we performed radiocarbon dating when additional destructive sampling was permitted.

DNA was extracted from teeth and dental pulp according to a specialized protocol for ancient DNA [109]. Reagent blanks were introduced as negative controls to monitor DNA contamination in subsequent steps. We screened for plague using a PCR assay that targets the pla virulence gene in Yersinia pestis [21]. Extracts showing amplification in at least 4/6 replicates were converted into paired-end sequencing libraries [110]. Targeted capture of the Y. pestis genome was performed using previously designed probes [21] and sequenced on an Illumina platform.

Sequenced molecules were aligned to a reference genome using the nf-core/eager pipeline [88]. To phylogenetically place these new samples, we downloaded a comparative dataset of 40 high-coverage Y. pestis genomes (>3X) dated between the 14^th and 18^th centuries. We then constructed a multiple alignment with the snippy pipeline (https://github.com/tseemann/snippy), which included 356 variation positions and 4,289,810 constant sites.

To tip-date each genome, we performed a Bayesian Evaluation of Temporal Signal (BETS) [64] with BEAST2 [111]. We assumed a constant population size and compared the use of a strict clock and an uncorrelated lognormal (UCLN) relaxed clock. Diffuse normal priors were constructed for all tip-dates, using the mean radiocarbon/mortuary date and half the uncertainty as the standard deviation. All Danish samples were assigned equivalent priors with a mean date of 1330 CE and a standard deviation of 115 years. Bayes factors were calculated by comparing the marginal likelihoods of each candidate model, as estimated with a generalized stepping stone (GSS) computation. The model with the highest marginal likelihood was then run for 100,000,000 generations to ensure the effective sample size (ESS) of all relevant parameters was greater than 200.

Data visualization was performed using the python package seaborn and auspice, a component of the Nextstrain visualization suite [99].

4.6 Data Availability

Raw sequence reads have been deposited in NCBI BioProject PRJNAXXXXX. Archaeological metadata is provided in the supplementary information (Dataset SI).

4.7 Acknowledgments

This work was supported by the Social Sciences and Humanities Research Council of Canada (#20008499) and the MacDATA Institute (McMaster University, Canada). This research was enabled in part by support provided by Compute Ontario (https://www.computeontario.ca/) and Compute Canada (https://www.computecanada.ca). We would like to thank Julianna Stangroom, Michael Klowak, Dr. Emil Karpinski, Dr. Matthew Emery, and Dr. Stephanie Marciniak for their assistance in laboratory procedures. We also thank Dr. Ana Duggan for her insight regarding Bayesian methods for phylogenetic analysis. We thank members of the Sherman Centre for Digital Scholarship, including Dr. Andrea Zeffiro, Dr. John Fink, Dr. Matthew Davis, and Dr. Amanda Montague, for their assistance in developing the underlying genomic database. Finally, we would like to thank all past and present members of the McMaster Ancient DNA Centre and the Golding Lab at McMaster University.

4.8 Author Contributions

K.E, R.S, J.K, and H.N.P designed the study. J.G, J.B, and S.D provided access to archaeological sites and materials. V.G performed radiocarbon dating. K.E, R.S, and J.K performed laboratory analysis. A.G.C and N.V. provided historical sources and interpretation. S.D and L.F critiqued and revised the computational methods and discussion. G.B.G provided access to computational resources and data storage. H.N.P and G.B.G supervised the study. K.E wrote the manuscript with contributions from all co-authors.

4.9 Competing Interests Statement

5 Conclusion

5.1 Main Findings and Contributions

In this dissertation, I developed computational methods for genomics research and used them to reconstruct past and present pandemics of plague. In Chapter 2, I resented a novel software called NCBImeta that facilitates the acquisition of sequence data and metadata from the NCBI repository. This specialized tool supports genomics research in the era of big data, where manual processing of abundant sequence records (10,000+) is impossible. As a paper on software development, its contributions and significance to the field of anthropology are understandably unclear. I targeted this article exclusively towards computational biologists because, at the time, few anthropologists had expressed interested in the issue of collecting and curating sequencing data. Reflecting this, NCBImeta has mainly been cited across biological fields including studies of the human microbiome [112], plant-associated bacteria in agriculture [113], and emerging infectious diseases in public health (Matthew Gopez & Philip Mabon, personal communication, https://github.com/ktmeaton/NCBImeta/pull/9).

In 2021, I took a more active approach in my discipline and used this software to support several bodies of anthropological research. NCBImeta was recently used in an environmental reconstruction of Beringia [114], the former land-bridge that facilitated early human migrations into North America from northeast Asia. The study by Murchie et al. furthers our understanding of the peopling of the Americas, and the possible interactions between early human populations and large animals (ie. megafauna) before the Last Glacial Period (~12,000 years ago). NCBImeta was also recently used to curate sequence data in a case study of the zoonotic disease brucellosis in the 14^th century [115]. The pioneering work by Hider et al. demonstrates how pathogen DNA preserves differently throughout the body, ranging from being the dominant microorganism in several tissues while being completely absent in others. It raises an important cautionary note for ancient DNA analysis and the anthropology of disease, by empirically demonstrating how sampling strategies can bias our understanding of what diseases were present in past populations.

In Chapter 3, I explored the challenges in estimating where and when plague appeared in the past, and why these estimates are often not reproducible between studies. I used the software tool from Chapter 2 to collect all publicly available Y. pestis genomes, and carefully curated their collection dates, locations, and hosts. My co-authors and I then used this data set for phylodynamic analysis, and devised a new approach for modeling the rates of evolutionary change (ie. molecular clock). We used these results to explain why divergence dates varied between studies, and outlined a critical framework for identifying which divergence dates should be considered non-informative. In addition, we found that past pandemics of plague may have emerged decades, or even centuries, before they were historically documented in European sources. These early dates are in agreement with recent historical work that examines more diverse (ie. non-European) sources. Through this finding, we demonstrated how genomic dating plays an important role in expanding the timelines of past pandemics to make space for more diverse narratives.

In contrast to our claims of the ‘power’ of genomic evidence, a prominent takeway from Chapter 3 was our discussion of the limitations of DNA. In particular, we found that Y. pestis genomes in isolation are not suitable for reconstructing evolutionary relationships during short-term epidemics. This is because the evolutionary rate of past pandemic lineages is approximately 1 substitution every 10 years. Isolates collected within this time frame (<10 years) are often identical, which means that the directionality of spread cannot be confidently inferred. To mitigate this weakness, complementary evidence is needed that has a higher temporal resolution. Historical case records are an excellent candidate, where plague cases are recorded annually if not weekly [86]. Based on initial comments from readers of the preprint, this conclusion was particularly exciting as it provided guidance on how to avoid over-interpreting ancient DNA evidence, and suggested a new avenue for inter-disciplinary collaboration (Boris Schmidt, personal communication).

In Chapter 4, I applied this updated genomic dataset and molecular clock method to a new problem. While in Chapter 3 we were concerned with estimating the first emergence of pandemic lineages, in Chapter 4 we reconstructed the persistence or continuity of ancient pandemics. We designed a unique longitudinal study, where we sampled skeletal remains spanning 800 years (1000 - 1800 CE) dated to before, during, and after the Second Pandemic (14^thth - 18^th century). Our sampling strategy focused on Scandinavia, particularly Denmark, as this region is underrepresented in the historical narrative and because the Anthropological DataBase Odense University collection (ADBOU, University of Southern Denmark) has exquisitely curated over 17,000 skeletal remains dated from the Viking Age (10^th century) to the Early Modern Period (18^th century). Using ancient DNA techniques, we recovered evidence of Y. pestis throughout the 14^th to 17^th centuries, which perfectly aligns with the historical narrative, limited as it is. Furthermore, our positivity rate for Y. pestis (3.3% - 14.3%) overlaps with mortality estimates from several historical outbreaks during the Second Pandemic. Our results strengthen the argument that Y. pestis was the causative agent of the Second Pandemic, and suggests that plague was a relatively new disease in medieval Denmark. These findings are being expanded on in two upcoming studies. The first, is an examination of how Danish populations responded to this new disease with regards to changes in the human immune system [116]. The second, is a reconstruction of how and when virulence in Y. pestis became attenuated during the Second Pandemic. Taken together, we anticipate these studies will deepen our understanding of disease exposure and experience in Denmark and across Europe.

5.2 Future Directions

5.2.1 Same ‘Plague’, New Problems

A reoccurring problem in plague research is how best to integrate multidisciplinary sources, as there is great interest in combining genetic, historical, and environmental records to better understand past pandemics of plague [47,117]. An approach that is commonly used in ancient DNA studies of Y. pestis involves two steps: (1) reconstructing the relationships between epidemics using genetic evidence, and then (2) interpreting those relationships using historical records [19,49,105]. However, a major limitation of this method is that multidisciplinary sources are only integrated during the final interpretation phase. This runs the risk that errors and uncertainty associated with the genetic analysis will propagate, leading to high levels of ambiguity when attempting to provide historical context for this genetic ‘noise’.

An alternative method, is to leverage the strengths and mitigate the weaknesses of interdisciplinary sources in a joint phylogenetic analysis. This novel approach treats historical records (ex. location and date of an outbreak) as special ‘sequence-free’ samples. These records are then combined with DNA evidence to jointly infer a phylogeny, which can then be used to estimate the timing and location of historical events. Recent studies have demonstrated how critical this approach is, as case occurrence records can effectively correct for sampling biases in sparse genomic datasets [67,83]. However, incorporating sequence-free datasets is still a relatively recent method, and to date has only been applied to the study of viruses. Furthermore, it has only been tested on outbreaks occurring over a relatively small geographic area and time range. It remains unknown whether this approach is feasible for bacterial genomics, let alone ancient DNA, where genomes are larger and sampled across greater temporal and geographic scopes. This presents a key line of inquiry for future research, for which the plague bacterium Y. pestis would be an excellent case study.

5.2.2 New ‘Plague’, Same Problems

During the course of this dissertation, my interest in global pandemics turned from an academic curiosity to a lived experience. In 2019, the novel coronavirus SARS-CoV-2 emerged to cause a global pandemic, with over 270 million cases recorded worldwide. While there are many unique aspects of this pandemic, one that has captured my attention is that it is the first pandemic to be monitored with real-time genomic surveillance [118]. Over two million genomic sequences have been deposited in public repositories, which can be used to inform public health responses [119]. However, this avalanche of data has also caused numerous problems, as researchers are struggling to manage this information and utilize it effectively [120]. As a result, database tools such as NCBImeta presented in Chapter 2, are playing an important role in information management.

One field of ongoing research involves improving the scalability of these tools. For example, NCBImeta was developed for a data set of ‘only’ 15,000 records, and in its current implementation, cannot process the 1+ million SARS-CoV-2 records on NCBI. A second critical avenue is integrating information from multiple repositories, as surveillance data is inconsistently being deposited in national and international databases [121,122,123]. Progress towards these two objectives will result in more diverse genomic data being analyzed (geographically and temporally), which may improve of our understanding of transmission and spread between and within countries.

Another parallel between this dissertation and the ongoing pandemic involves spatiotemporal modeling. In Chapter 3, we discovered that in our expanded genomic data set, Y. pestis’ rate of spread tends to outpace its rate of evolutionary change. This leads to identical Y. pestis isolates found across multiple countries, such as the case throughout the Black Death (1346-1353). However, we sporadically observed the opposite trend, in which Y. pestis strains collected in a short time frame (<10 years) were extremely different. This tremendous diversity in evolutionary rates meant that we were unable to estimate a single molecular clock for Y. pestis. These issues, clonal spread and rate variation, were also recently documented in SARS-CoV-2 [124]. Ferreira et al. describe this as a paradox in which we “become increasingly uncertain about the relationships among specific lineages as we collect greater amounts of data”. This runs counterintuitive to the general expectation in scientific studies that the more data we collect, the closer we get to the ‘truth’. Overall, this presents a complex theoretical problem that is becoming increasingly prevalent across various disciplines moving into the era of ‘big data’.

6 References

A draft genome of Yersinia pestis from victims of the Black Death

Kirsten I Bos, Verena J Schuenemann, GBrian Golding, Hernán A Burbano, Nicholas Waglechner, Brian K Coombes, Joseph B McPhee, Sharon N DeWitte, Matthias Meyer, Sarah Schmedes, … Johannes Krause

Nature (2011) http://www.nature.com/articles/nature10549

DOI: 10.1038/nature10549

The Black Death, 1346-1353: The Complete History

OJ Benedictow

Boydell Press (2004)

ISBN: 0-85115-943-5

“In these perilous times”: Plague and plague policies in early modern Denmark

Peter Christensen

Medical History (2003-10) https://www.cambridge.org/core/journals/medical-history/article/in-these-perilous-times-plague-and-plague-policies-in-early-modern-denmark/EFF71835DE9EBFB610E35451FD7A0A86

DOI: 10.1017/s0025727300057331

The plague that never left: restoring the Second Pandemic to Ottoman and Turkish history in the time of COVID-19

Nükhet Varlık

New Perspectives on Turkey (2020-11) https://www.cambridge.org/core/journals/new-perspectives-on-turkey/article/plague-that-never-left-restoring-the-second-pandemic-to-ottoman-and-turkish-history-in-the-time-of-covid19/AE81F48AD6FEB71C42AAA2115A0307A9

DOI: 10.1017/npt.2020.27

Pandemics Throughout the History

Shrikanth Sampath, Anwar Khedr, Shahraz Qamar, Aysun Tekin, Romil Singh, Ronya Green, Rahul Kashyap

Cureus (2021) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8525686/

DOI: 10.7759/cureus.18136 · PMID: 34692344 · PMCID: PMC8525686

The plague at Granada, 1348-1349: Ibn Al-Khatib and ideas of contagion.

WB Ober, N Aloush

Bulletin of the New York Academy of Medicine (1982-05) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1808550/

PMID: 7052179 · PMCID: PMC1808550

Islamic medicine: 1000 years ahead of its times

Ibrahim Syed

Journal of the International Society for the History of Islamic Medicine (1981) https://jima.imana.org/article/view/11925

The Plague at Marseilles: Consider'd with Remarks Upon the Plague in General

Richard Bradley

W. Mears (1721) https://books.google.ca/books?id=qQYAmMH1nS4C

Richard Bradley: A Unified, Living Agent Theory of the Cause of Infectious Diseases of Plants, Animals, and Humans in the First Decades of the 18th Century

Melvin Santer

Perspectives in Biology and Medicine (2009) http://muse.jhu.edu/article/362917

DOI: 10.1353/pbm.0.0124

10.

Biology of Plagues: Evidence from Historical Populations

Susan Scott, Christopher J Duncan

Cambridge University Press (2001) https://www.cambridge.org/core/books/biology-of-plagues/9AE9838FB1BBE059455F24C75DDB9DE2

DOI: 10.1017/cbo9780511542527 · ISBN: 978-0-521-80150-8

11.

Detection of 400-year-old Yersinia pestis DNA in human dental pulp: An approach to the diagnosis of ancient septicemia

Michel Drancourt, Gérard Aboudharam, Michel Signoli, Olivier Dutour, Didier Raoult

Proceedings of the National Academy of Sciences (1998-10-13) https://www.pnas.org/content/95/21/12637

DOI: 10.1073/pnas.95.21.12637 · PMID: 9770538

12.

Absence of Yersinia pestis-specific DNA in human teeth from five European excavations of putative plague victims

MThomas P Gilbert, Jon Cuccui, William White, Niels Lynnerup, Richard W Titball, Alan Cooper, Michael BYR 2004 Prentice

Microbiology (2004) https://www.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26594-0

DOI: 10.1099/mic.0.26594-0

13.

Ancient DNA: do it right or not at all

A Cooper, HN Poinar

Science (New York, N.Y.) (2000-08-18)

DOI: 10.1126/science.289.5482.1139b · PMID: 10970224

14.

Response to Drancourt and Raoult

MThomas P Gilbert, Jon Cuccui, William White, Niels Lynnerup, Richard W Titball, Alan Cooper, Michael B Prentice

Microbiology (2004-02-01) https://www.microbiologyresearch.org/content/journal/micro/10.1099/mic.0.26959-0

DOI: 10.1099/mic.0.26959-0

15.

Was the Black Death yersinial plague?

Didier Raoult

The Lancet Infectious Diseases (2003-06) https://linkinghub.elsevier.com/retrieve/pii/S1473309903006522

DOI: 10.1016/s1473-3099(03)00652-2

16.

Differential Diagnosis of Skeletal Lesions in Infectious Disease

Donald J Ortner

Advances in Human Palaeopathology (2007) https://onlinelibrary.wiley.com/doi/abs/10.1002/9780470724187.ch10

DOI: 10.1002/9780470724187.ch10 · ISBN: 978-0-470-72418-7

17.

The Anthropology of Infectious Disease: International Health Perspectives

Peter J Brown, Marcia C Inhorn

Routledge (2013-11-05)

ISBN: 978-1-134-38642-0

18.

The EnteroBase user's guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity

Zhemin Zhou, Nabil-Fareed Alikhan, Khaled Mohamed, Yulei Fan, Mark Achtman

Genome Research (2020-01) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6961584/

DOI: 10.1101/gr.251678.119 · PMID: 31809257 · PMCID: PMC6961584

19.

Phylogeography of the second plague pandemic revealed through analysis of historical Yersinia pestis genomes

Maria A Spyrou, Marcel Keller, Rezeda I Tukhbatova, Christiana L Scheib, Elizabeth A Nelson, Aida Andrades Valtueña, Gunnar U Neumann, Don Walker, Amelie Alterauge, Niamh Carty, … Johannes Krause

Nature Communications (2019-10-02) https://www.nature.com/articles/s41467-019-12154-0

DOI: 10.1038/s41467-019-12154-0

20.

Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis

Y Cui, C Yu, Y Yan, D Li, Y Li, T Jombart, LA Weinert, Z Wang, Z Guo, L Xu, … R Yang

Proceedings of the National Academy of Sciences (2013-01-08) http://www.pnas.org/cgi/doi/10.1073/pnas.1205750110

DOI: 10.1073/pnas.1205750110

21.

Yersinia pestis and the Plague of Justinian 541–543 AD: a genomic analysis

David M Wagner, Jennifer Klunk, Michaela Harbeck, Alison Devault, Nicholas Waglechner, Jason W Sahl, Jacob Enk, Dawn N Birdsell, Melanie Kuch, Candice Lumibao, … Hendrik Poinar

The Lancet Infectious Diseases (2014-04) https://linkinghub.elsevier.com/retrieve/pii/S1473309913703232

DOI: 10.1016/s1473-3099(13)70323-2

22.

Genome-scale rates of evolutionary change in bacteria

Sebastian Duchêne, Kathryn E Holt, François-Xavier Weill, Simon Le Hello, Jane Hawkey, David J Edwards, Mathieu Fourment, Edward C Holmes

Microbial Genomics (2016-11-30) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5320706/

DOI: 10.1099/mgen.0.000094 · PMID: 28348834 · PMCID: PMC5320706

23.

Identification of acquired antimicrobial resistance genes

E Zankari, H Hasman, S Cosentino, M Vestergaard, S Rasmussen, O Lund, FM Aarestrup, MV Larsen

Journal of Antimicrobial Chemotherapy (2012) https://academic.oup.com/jac/article-lookup/doi/10.1093/jac/dks261

DOI: 10.1093/jac/dks261

24.

Genomic epidemiology of the Haitian cholera outbreak: A single introduction followed by rapid, extensive, and continued spread characterized the onset of the epidemic

Mark Eppinger, Talima Pearson, Sara SK Koenig, Ofori Pearson, Nathan Hicks, Sonia Agrawal, Fatemeh Sanjar, Kevin Galens, Sean Daugherty, Jonathan Crabtree, … Paul S Keim

mBio (2014) https://mbio.asm.org/lookup/doi/10.1128/mBio.01721-14

DOI: 10.1128/mbio.01721-14

25.

Species-wide whole genome sequencing reveals historical global spread and recent local persistence in Shigella flexneri

Thomas R Connor, Clare R Barker, Kate S Baker, François-Xavier Weill, Kaisar Ali Talukder, Anthony M Smith, Stephen Baker, Malika Gouali, Duy Pham Thanh, Ishrat Jahan Azmi, … Nicholas R Thomson

eLife (2015-08-04) https://elifesciences.org/articles/07335

DOI: 10.7554/elife.07335

26.

Experimental design-based functional mining and characterization of high-throughput sequencing data in the Sequence Read Archive

Takeru Nakazato, Tazro Ohta, Hidemasa Bono

PLoS ONE (2013-10-22) https://dx.plos.org/10.1371/journal.pone.0077910

DOI: 10.1371/journal.pone.0077910

27.

Post-archival genomics and the bulk logistics of DNA sequences

Adrian Mackenzie, Ruth McNally, Richard Mills, Stuart Sharples

BioSocieties (2016-03) http://link.springer.com/10.1057/biosoc.2015.22

DOI: 10.1057/biosoc.2015.22

28.

SRAdb: query and use public next-generation sequencing data from within R

Yuelin Zhu, Robert M Stephens, Paul S Meltzer, Sean R Davis

BMC Bioinformatics (2013) http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-14-19

DOI: 10.1186/1471-2105-14-19

29.

Pathogen metadata platform: software for accessing and analyzing pathogen strain information

Wenling E Chang, Matthew W Peterson, Christopher D Garay, Tonia Korves

BMC Bioinformatics (2016-12) http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1231-2

DOI: 10.1186/s12859-016-1231-2

30.

MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive

Matthew N Bernstein, AnHai Doan, Colin N Dewey

Bioinformatics (2017-09-15) https://academic.oup.com/bioinformatics/article/33/18/2914/3848915

DOI: 10.1093/bioinformatics/btx334

31.

pysradb: A Python package to query next-generation sequencing metadata and data from NCBI Sequence Read Archive

Saket Choudhary

F1000Research (2019) https://doi.org/10.12688/f1000research.18676.1

DOI: 10.12688/f1000research.18676.1

32.

Biopython: freely available Python tools for computational molecular biology and bioinformatics

PJA Cock, T Antao, JT Chang, BA Chapman, CJ Cox, A Dalke, I Friedberg, T Hamelryck, F Kauff, B Wilczynski, MJL de Hoon

Bioinformatics (2009-06-01) https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btp163

DOI: 10.1093/bioinformatics/btp163

33.

Entrez Direct: E-utilities on the UNIX Command Line

J Kans

Entrez Programming Utilities Help (2019) https://www.ncbi.nlm.nih.gov/books/NBK179288/

34.

Entrez Help

National Center for Biotechnology Information

(2016) https://www.ncbi.nlm.nih.gov/books/NBK3837/

35.

Draft genomes of 12 host-adapted and environmental isolates of Pseudomonas aeruginosa and their positions in the core genome phylogeny

Lewis Stewart, Amy Ford, Vartul Sangal, Julie Jeukens, Brian Boyle, Iréna Kukavica-Ibrulj, Shabhonam Caim, Lisa Crossman, Paul A Hoskisson, Roger Levesque, Nicholas P Tucker

Pathogens and Disease (2014-06) https://academic.oup.com/femspd/article-lookup/doi/10.1111/2049-632X.12107

DOI: 10.1111/2049-632x.12107

36.

ggplot2: Elegant Graphics for Data Analysis

H Wickham

Springer-Verlag (2016) http://ggplot2.org

ISBN: 987-3-319-24277-4

37.

The Stone Age Plague and its persistence in Eurasia

Aida Andrades Valtueña, Alissa Mittnik, Felix M Key, Wolfgang Haak, Raili Allmäe, Andrej Belinskij, Mantas Daubaras, Michal Feldman, Rimantas Jankauskas, Ivor Janković, … Johannes Krause

Current Biology (2017-12-04)

DOI: 10.1016/j.cub.2017.10.025 · PMID: 29174893

38.

Yersinia pestis - etiologic agent of plague

RD Perry, JD Fetherston

Clinical Microbiology Reviews (1997-01)

PMID: 8993858 · PMCID: PMC172914

39.

Trade routes and plague transmission in pre-industrial Europe

Ricci PH Yue, Harry F Lee, Connor YH Wu

Scientific Reports (2017-10-11) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5636801/

DOI: 10.1038/s41598-017-13481-2 · PMID: 29021541 · PMCID: PMC5636801

40.

Plague

World Health Organization

(2017-10-31) https://www.who.int/news-room/fact-sheets/detail/plague

41.

New science and old sources: why the Ottoman experience of plague matters

Nükhet Varlık

The Medieval Globe (2014) https://scholarworks.wmich.edu/tmg/vol1/iss1/9/

42.

Wet climate and transportation routes accelerate spread of human plague

Lei Xu, Leif Chr Stige, Kyrre Linné Kausrud, Tamara Ben Ari, Shuchun Wang, Xiye Fang, Boris V Schmid, Qiyong Liu, Nils Chr Stenseth, Zhibin Zhang

Proceedings of the Royal Society B: Biological Sciences (2014-04-07) https://royalsocietypublishing.org/doi/10.1098/rspb.2013.3159

DOI: 10.1098/rspb.2013.3159

43.

The Complete History of the Black Death

OJ Benedictow

Boydell Press (2021) https://boydellandbrewer.com/9781783275168/the-complete-history-of-the-black-death/

44.

Early divergent strains of Yersinia pestis in Eurasia 5,000 years ago

Simon Rasmussen, Morten Erik Allentoft, Kasper Nielsen, Ludovic Orlando, Martin Sikora, Karl-Göran Sjögren, Anders Gorm Pedersen, Mikkel Schubert, Alex Van Dam, Christian Moliin Outzen Kapel, … Eske Willerslev

Cell (2015-10-22) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4644222/

DOI: 10.1016/j.cell.2015.10.009 · PMID: 26496604 · PMCID: PMC4644222

45.

Ongoing quiescence in the Borborema Plateau Plague focus (Paraiba, Brazil)

Caio G Zeppelini, Alzira MP DE Almeida, Pedro Cordeiro-Estrela

Anais Da Academia Brasileira De Ciencias (2018)

DOI: 10.1590/0001-3765201820170977 · PMID: 30304231

46.

How a microbe becomes a pandemic: a new story of the Black Death

Monica H Green

The Lancet Microbe (2020-12-01) https://www.thelancet.com/journals/lanmic/article/PIIS2666-5247(20)30176-2/abstract

DOI: 10.1016/s2666-5247(20)30176-2

47.

Climate-driven introduction of the Black Death and successive plague reintroductions into Europe

Boris V Schmid, Ulf Büntgen, WRyan Easterday, Christian Ginzler, Lars Walløe, Barbara Bramanti, Nils Chr Stenseth

Proceedings of the National Academy of Sciences (2015-03-10) http://www.pnas.org/content/112/10/3020

DOI: 10.1073/pnas.1412887112 · PMID: 25713390

48.

Plague persistence in Western Europe: a hypothesis

Ann G Carmichael

The Medieval Globe (2015) https://muse.jhu.edu/article/758488

49.

A genomic and historical synthesis of plague in 18th century Eurasia

Meriam Guellil, Oliver Kersten, Amine Namouchi, Stefania Luciani, Isolina Marota, Caroline A Arcini, Elisabeth Iregren, Robert A Lindemann, Gunnar Warfvinge, Lela Bakanidze, … Barbara Bramanti

Proceedings of the National Academy of Sciences (2020-11-10) https://www.pnas.org/content/117/45/28328

DOI: 10.1073/pnas.2009677117 · PMID: 33106412

50.

Assessing the origins of the European Plagues following the Black Death: A synthesis of genomic, historical, and ecological information

Barbara Bramanti, Yarong Wu, Ruifu Yang, Yujun Cui, Nils Chr Stenseth

Proceedings of the National Academy of Sciences (2021-09-07) https://www.pnas.org/content/118/36/e2101940118

DOI: 10.1073/pnas.2101940118 · PMID: 34465619

51.

Variétés de l'espèce Pasteurella pestis

R Devignat

Bulletin of the World Health Organization (1951) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2554099/

PMID: 14859080 · PMCID: PMC2554099

52.

Comparative and evolutionary genomics of Yersinia pestis

Dongsheng Zhou, Yanping Han, Yajun Song, Peitang Huang, Ruifu Yang

Microbes and Infection (2004-11-01) http://www.sciencedirect.com/science/article/pii/S1286457904002357

DOI: 10.1016/j.micinf.2004.08.002

53.

Genotyping and phylogenetic analysis of Yersinia pestis by MLVA: insights into the worldwide expansion of Central Asia plague foci

Yanjun Li, Yujun Cui, Yolande Hauck, Mikhail E Platonov, Erhei Dai, Yajun Song, Zhaobiao Guo, Christine Pourcel, Svetlana V Dentovskaya, Andrey P Anisimov, … Gilles Vergnaud

PLOS ONE (2009-06-22) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0006000

DOI: 10.1371/journal.pone.0006000

54.

Phylogeny and classification of Yersinia pestis through the lens of strains from the plague foci of Commonwealth of Independent States

Vladimir V Kutyrev, Galina A Eroshenko, Vladimir L Motin, Nikita Y Nosov, Jaroslav M Krasnov, Lyubov M Kukleva, Konstantin A Nikiforov, Zhanna V Al’khova, Eugene G Oglodin, Natalia P Guseva

Frontiers in Microbiology (2018-05-25) https://www.frontiersin.org/article/10.3389/fmicb.2018.01106/full

DOI: 10.3389/fmicb.2018.01106

55.

The four Black Deaths

Monica H Green

The American Historical Review (2020-12-29) https://doi.org/10.1093/ahr/rhaa511

DOI: 10.1093/ahr/rhaa511

56.

Analysis of 3800-year-old Yersinia pestis genomes suggests Bronze Age origin for bubonic plague

Maria A Spyrou, Rezeda I Tukhbatova, Chuan-Chao Wang, Aida Andrades Valtueña, Aditya K Lankapalli, Vitaly V Kondrashin, Victor A Tsybin, Aleksandr Khokhlov, Denise Kühnert, Alexander Herbig, … Johannes Krause

Nature Communications (2018-06-08) https://www.nature.com/articles/s41467-018-04550-9

DOI: 10.1038/s41467-018-04550-9

57.

Natural history of plague: perspectives from more than a century of research

Kenneth L Gage, Michael Y Kosoy

Annual Review of Entomology (2005)

DOI: 10.1146/annurev.ento.50.071803.130337 · PMID: 15471529

58.

The Ottomans during the global crises of cholera and plague: the View from Iraq and the Gulf.

Isacar A Bolaños

International Journal of Middle East Studies (2019-11) http://www.cambridge.org/core/journals/international-journal-of-middle-east-studies/article/ottomans-during-the-global-crises-of-cholera-and-plague-the-view-from-iraq-and-the-gulf/82A6A0E7CF8BD2138FA230AE4189057E

DOI: 10.1017/s0020743819000667

59.

Towards the atlas of plague and its environment in the People's Republic of China: idea, principle and methodology of design and research results

Jian'an Tan, Yunpeng Liu, Erli Shen, Wenyu Zhu, Wuyi Wang, Ribang Li, Linsheng Yang

Huan Jing Ke Xue (2002)

PMID: 12145922

60.

Historical Y. pestis genomes reveal the European Black Death as the source of ancient and modern plague pandemics

Maria A Spyrou, Rezeda I Tukhbatova, Michal Feldman, Joanna Drath, Sacha Kacki, Julia Beltrán de Heredia, Susanne Arnold, Airat G Sitdikov, Dominique Castex, Joachim Wahl, … Johannes Krause

Cell Host & Microbe (2016-06) http://linkinghub.elsevier.com/retrieve/pii/S1931312816302086

DOI: 10.1016/j.chom.2016.05.012

61.

Evolution and circulation of Yersinia pestis in the Northern Caspian and Northern Aral Sea regions in the 20th-21st centuries

Galina A Eroshenko, Nikolay V Popov, Zhanna V Al’khova, Lyubov M Kukleva, Alina N Balykova, Nadezhda S Chervyakova, Ekaterina A Naryshkina, Vladimir V Kutyrev

PLOS ONE (2021-02-11) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0244615

DOI: 10.1371/journal.pone.0244615

62.

Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity

Giovanna Morelli, Yajun Song, Camila J Mazzoni, Mark Eppinger, Philippe Roumagnac, David M Wagner, Mirjam Feldkamp, Barica Kusecek, Amy J Vogler, Yanjun Li, … Mark Achtman

Nature Genetics (2010-12)

DOI: 10.1038/ng.705 · PMID: 21037571 · PMCID: PMC2999892

63.

Yersinia pestis strains isolated in natural plague foci of Caucasus and Transcaucasia in the context of the global evolution of species

Sergey V Pisarenko, Anna Yu Evchenko, Dmitry A Kovalev, Yuri М Evchenko, Olga V Bobrysheva, Nikolay A Shapakov, Anna S Volynkina, Alexander N Kulichenko

Genomics (2021-07-01) https://www.sciencedirect.com/science/article/pii/S0888754321001488

DOI: 10.1016/j.ygeno.2021.04.021

64.

Bayesian evaluation of temporal signal in measurably evolving populations

Sebastian Duchene, Philippe Lemey, Tanja Stadler, Simon YW Ho, David A Duchene, Vijaykrishna Dhanasekaran, Guy Baele

Molecular Biology and Evolution (2020-11-01) https://doi.org/10.1093/molbev/msaa163

DOI: 10.1093/molbev/msaa163

65.

The impacts of low diversity sequence data on phylodynamic inference during an emerging epidemic

Anthony Lam, Sebastian Duchene

Viruses (2021-01) https://www.mdpi.com/1999-4915/13/1/79

DOI: 10.3390/v13010079

66.

Temporal signal and the phylodynamic threshold of SARS-CoV-2

Sebastian Duchene, Leo Featherstone, Melina Haritopoulou-Sinanidou, Andrew Rambaut, Philippe Lemey, Guy Baele

Virus Evolution (2020-07-01) https://doi.org/10.1093/ve/veaa061

DOI: 10.1093/ve/veaa061

67.

Infectious disease phylodynamics with occurrence data

Leo A Featherstone, Francesca Di Giallonardo, Edward C Holmes, Timothy G Vaughan, Sebastián Duchêne

Methods in Ecology and Evolution (2021) https://onlinelibrary.wiley.com/doi/abs/10.1111/2041-210X.13620

DOI: 10.1111/2041-210x.13620

68.

Dating the emergence of human pathogens

Simon YW Ho, Sebastián Duchêne

Science (2020-06-19) https://www.science.org/doi/10.1126/science.abc5746

DOI: 10.1126/science.abc5746

69.

Plague in Zimbabwe from 1974 to 2018: a review article

Amon Munyenyiwa, Moses Zimba, Tamuka Nhiwatiwa, Maxwell Barson

PLOS Neglected Tropical Diseases (2019-11-21) https://dx.plos.org/10.1371/journal.pntd.0007761

DOI: 10.1371/journal.pntd.0007761

70.

Putting Africa on the Black Death map: narratives from genetics and history

Monica H Green

Afriques (2018) https://doi.org/10.4000/afriques.2125

DOI: 10.4000/afriques.2125

71.

Molecular epidemiological investigations of plague in Eastern Province of Zambia

Stanley S Nyirenda, Bernard M Hang′ombe, Edgar Simulundu, Evans Mulenga, Ladslav Moonga, Robert S Machang′u, Gerald Misinzo, Bukheti S Kilonzo

BMC Microbiology (2018-01-04) https://doi.org/10.1186/s12866-017-1146-8

DOI: 10.1186/s12866-017-1146-8

72.

Plague and the End of Antiquity: The Pandemic of 541-750

Lester K Little

Cambridge University Press (2007) https://doi.org/10.1017/CBO9780511812934

ISBN: 978-0-521-84639-4

73.

Plague in Iran: its history and current status

Abdolrazagh Hashemi Shahraki, Elizabeth Carniel, Ehsan Mostafavi

Epidemiology and Health (2016-07-24) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5037359/

DOI: 10.4178/epih.e2016033 · PMID: 27457063 · PMCID: PMC5037359

74.

Plague and the Fall of Baghdad (1258)

Nahyan Fancy, Monica Green

Medical History (2021) https://scholarship.depauw.edu/hist_facpubs/14

75.

Bubonic plague in nineteenth-century China

Carol Benedict

Modern China (1988) http://www.jstor.org/stable/189268

76.

Historical and genomic data reveal the influencing factors on global transmission velocity of plague during the Third Pandemic

Lei Xu, Leif C Stige, Herwig Leirs, Simon Neerinckx, Kenneth L Gage, Ruifu Yang, Qiyong Liu, Barbara Bramanti, Katharine R Dean, Hui Tang, … Zhibin Zhang

Proceedings of the National Academy of Sciences (2019-06-11) https://www.pnas.org/content/116/24/11833

DOI: 10.1073/pnas.1901366116 · PMID: 31138696

77.

The cholera pandemic, still with us after half a century: time to rethink

Edward T Ryan

PLOS Neglected Tropical Diseases (2011-01-25) https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0001003

DOI: 10.1371/journal.pntd.0001003

78.

What we can learn from the dynamics of the 1889 ‘Russian flu’ pandemic for the future trajectory of COVID-19

Harald Brüssow

Microbial Biotechnology (2021) https://onlinelibrary.wiley.com/doi/abs/10.1111/1751-7915.13916

DOI: 10.1111/1751-7915.13916

79.

Pandemics throughout history

Jocelyne Piret, Guy Boivin

Frontiers in Microbiology (2021-01-15) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7874133/pdf/fmicb-11-631736.pdf

DOI: 10.3389/fmicb.2020.631736 · PMID: 33584597 · PMCID: PMC7874133

80.

TreeTime: maximum-likelihood phylodynamic analysis

Pavel Sagulenko, Vadim Puller, Richard A Neher

Virus Evolution (2018-01-08) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5758920/

DOI: 10.1093/ve/vex042 · PMID: 29340210 · PMCID: PMC5758920

81.

The spread of plague

James Cantlie

Transactions. Epidemiological Society of London (1897) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5540063/

PMID: 29419142 · PMCID: PMC5540063

82.

Pestis redux: the initial years of the third bubonic plague pandemic, 1894-1901

Myron Echenberg

Journal of World History (2002) https://www.jstor.org/stable/20078978

83.

Sampling bias and model choice in continuous phylogeography: getting lost on a random walk

Antanas Kalkauskas, Umberto Perron, Yuxuan Sun, Nick Goldman, Guy Baele, Stephane Guindon, Nicola De Maio

PLOS Computational Biology (2021-01-06) https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008561

DOI: 10.1371/journal.pcbi.1008561

84.

The plague in Oporto

A Shadwell

The Nineteenth century: a monthly review (1899) https://www.proquest.com/openview/5dd86300bfa45d65/1?pq-origsite=gscholar&cbl=1017

85.

Out of the West: formation of a permanent plague reservoir in south-central Germany (1349–1356) and its implications

Philip Slavin

Past & Present (2021) https://doi.org/10.1093/pastj/gtaa028

DOI: 10.1093/pastj/gtaa028

86.

Dangers of noncritical use of historical plague data

Joris Roosen, Daniel R Curtis

Emerging Infectious Diseases (2018-01) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5749453/

DOI: 10.3201/eid2401.170477 · PMID: null · PMCID: PMC5749453

87.

NCBImeta: efficient and comprehensive metadata retrieval from NCBI databases

Katherine Eaton

Journal of Open Source Software (2020) https://www.theoj.org/joss-papers/joss.01990/10.21105.joss.01990.pdf

DOI: 10.21105/joss.01990

88.

Reproducible, portable, and efficient ancient genome reconstruction with nf-core/eager

James AFellows Yates, Thiseas C Lamnidis, Maxime Borry, Aida Andrades Valtueña, Zandra Fagernäs, Stephen Clayton, Maxime U Garcia, Judith Neukamm, Alexander Peltzer

PeerJ (2021-03-16) https://peerj.com/articles/10947

DOI: 10.7717/peerj.10947

89.

ModelFinder: fast model selection for accurate phylogenetic estimates

Subha Kalyaanamoorthy, Bui Quang Minh, Thomas KF Wong, Arndt von Haeseler, Lars S Jermiin

Nature Methods (2017-06) http://www.nature.com/articles/nmeth.4285

DOI: 10.1038/nmeth.4285

90.

IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era

Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt von Haeseler, Robert Lanfear

Molecular Biology and Evolution (2020-05-01) https://academic.oup.com/mbe/article/37/5/1530/5721363

DOI: 10.1093/molbev/msaa015

91.

UFBoot2: improving the ultrafast bootstrap approximation

Diep Thi Hoang, Olga Chernomor, Arndt von Haeseler, Bui Quang Minh, Le Sy Vinh

Molecular Biology and Evolution (2018-02-01) https://academic.oup.com/mbe/article/35/2/518/4565479

DOI: 10.1093/molbev/msx281

92.

Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10

Marc A Suchard, Philippe Lemey, Guy Baele, Daniel L Ayres, Alexei J Drummond, Andrew Rambaut

Virus Evolution (2018-06-08) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007674/

DOI: 10.1093/ve/vey016 · PMID: 29942656 · PMCID: PMC6007674

93.

Accurate model selection of relaxed molecular clocks in Bayesian phylogenetics

Guy Baele, Wai Lok Sibon Li, Alexei J Drummond, Marc A Suchard, Philippe Lemey

Molecular Biology and Evolution (2013-02) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3548314/

DOI: 10.1093/molbev/mss243 · PMID: 23090976 · PMCID: PMC3548314

94.

Bayesian analysis of elapsed times in continuous-time Markov chains

Marco AR Ferreira, Marc A Suchard

Canadian Journal of Statistics (2008) https://onlinelibrary.wiley.com/doi/abs/10.1002/cjs.5550360302

DOI: 10.1002/cjs.5550360302

95.

Bayesian phylogeography finds its roots

Philippe Lemey, Andrew Rambaut, Alexei J Drummond, Marc A Suchard

PLOS Computational Biology (2009-09-25) https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000520

DOI: 10.1371/journal.pcbi.1000520

96.

Relaxed phylogenetics and dating with confidence

Alexei J Drummond, Simon YW Ho, Matthew J Phillips, Andrew Rambaut

PLOS Biology (2006-03-14) https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.0040088

DOI: 10.1371/journal.pbio.0040088

97.

BEAST: Bayesian evolutionary analysis by sampling trees

Alexei J Drummond, Andrew Rambaut

BMC Evolutionary Biology (2007-11-08) https://doi.org/10.1186/1471-2148-7-214

DOI: 10.1186/1471-2148-7-214

98.

seaborn: statistical data visualization

Michael L Waskom

Journal of Open Source Software (2021-04-06) https://joss.theoj.org/papers/10.21105/joss.03021

DOI: 10.21105/joss.03021

99.

Nextstrain: real-time tracking of pathogen evolution

James Hadfield, Colin Megill, Sidney M Bell, John Huddleston, Barney Potter, Charlton Callender, Pavel Sagulenko, Trevor Bedford, Richard A Neher

Bioinformatics (2018-12-01) https://academic.oup.com/bioinformatics/article/34/23/4121/5001388

DOI: 10.1093/bioinformatics/bty407

100.

The Black Death and Later Plague Epidemics in the Scandinavian Countries: Perspectives and Controversies

Ole Jørgen Benedictow

De Gruyter Open Poland (2016-12-19) https://www.degruyter.com/document/doi/10.1515/9788376560472/html

ISBN: 978-83-7656-047-2

101.

Emergence and spread of basal lineages of Yersinia pestis during the Neolithic Decline

Nicolás Rascovan, Karl-Göran Sjögren, Kristian Kristiansen, Rasmus Nielsen, Eske Willerslev, Christelle Desnues, Simon Rasmussen

Cell (2019-01-10) https://www.cell.com/cell/abstract/S0092-8674(18)31464-8

DOI: 10.1016/j.cell.2018.11.005 · PMID: 30528431

102.

The Osteological Paradox: Problems of Inferring Prehistoric Health from Skeletal Samples

James W Wood, George R Milner, Henry C Harpending, Kenneth M Weiss, Mark N Cohen, Leslie E Eisenberg, Dale L Hutchinson, Rimantas Jankauskas, Gintautas Cesnys, Gintautas Česnys, … Richard G Wilkinson

Current Anthropology (1992) https://www.jstor.org/stable/2743861

103.

Black Death Bodies

Sharon N DeWitte, Maryanne Kowaleski

Fragments: Interdisciplinary Approaches to the Study of Ancient and Medieval Pasts (2017) http://hdl.handle.net/2027/spo.9772151.0006.001

104.

The Black Death

Kristina Lenz, Nils Hybel

Scandinavian Journal of History (2016) https://journals.scholarsportal.info/details/03468755/v41i0001/54_tbd.xml

DOI: 10.1080/03468755.2015.1110533

105.

Integrative approach using Yersinia pestis genomes to revisit the historical landscape of plague during the Medieval Period

Amine Namouchi, Meriam Guellil, Oliver Kersten, Stephanie Hänsch, Claudio Ottoni, Boris V Schmid, Elsa Pacciani, Luisa Quaglia, Marco Vermunt, Egil L Bauer, … Barbara Bramanti

Proceedings of the National Academy of Sciences (2018-12-11) http://www.pnas.org/lookup/doi/10.1073/pnas.1812865115

DOI: 10.1073/pnas.1812865115

106.

The 2017 plague outbreak in Madagascar: Data descriptions and epidemic modelling

Van Kinh Nguyen, César Parra-Rojas, Esteban A Hernandez-Vargas

Epidemics (2018-12)

DOI: 10.1016/j.epidem.2018.05.001 · PMID: 29866421

107.

Plagued by a cryptic clock: Insight and issues from the global phylogeny of Yersinia pestis

Katherine Eaton, Leo Featherstone, Sebastian Duchene, Ann G Carmichael, Nükhet Varlık, Edward C Holmes, GBrian Golding, Hendrik N Poinar

Nature Communications (Submitted, 2021) https://www.researchsquare.com/article/rs-1146895

108.

Leprosy in Medieval Denmark — Osteological and epidemiological analyses

Jesper L Boldsen

Anthropologischer Anzeiger (2009) http://www.jstor.org/stable/29543069

109.

Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments

J Dabney, M Knapp, I Glocke, M-T Gansauge, A Weihmann, B Nickel, C Valdiosera, N Garcia, S Paabo, J-L Arsuaga, M Meyer

Proceedings of the National Academy of Sciences (2013-09-24) http://www.pnas.org/cgi/doi/10.1073/pnas.1314445110

DOI: 10.1073/pnas.1314445110

110.

Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform

Martin Kircher, Susanna Sawyer, Matthias Meyer

Nucleic Acids Research (2012-01-01) https://academic.oup.com/nar/article/40/1/e3/1287690

DOI: 10.1093/nar/gkr771

111.

BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis

Remco Bouckaert, Timothy G Vaughan, Joëlle Barido-Sottani, Sebastián Duchêne, Mathieu Fourment, Alexandra Gavryushkina, Joseph Heled, Graham Jones, Denise Kühnert, Nicola De Maio, … Alexei J Drummond

PLOS Computational Biology (2019-04-08) https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006650

DOI: 10.1371/journal.pcbi.1006650

112.

SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata

G Agostinetto, D Bozzi, D Porro, M Casiraghi, M Labra, A Bruno

Cold Spring Harbor Laboratory (2021-08-18) https://www.biorxiv.org/content/10.1101/2021.08.17.456635v1

DOI: 10.1101/2021.08.17.456635

113.

Comparative Genomics and In Vitro Plant Growth Promotion and Biocontrol Traits of Lactic Acid Bacteria from the Wheat Rhizosphere

Sabrina Strafella, David J Simpson, Mohammad Yaghoubi Khanghahi, Maria De Angelis, Michael Gänzle, Fabio Minervini, Carmine Crecchio

Microorganisms (2021-01) https://www.mdpi.com/2076-2607/9/1/78

DOI: 10.3390/microorganisms9010078

114.

Pleistocene mitogenomes reconstructed from the environmental DNA of permafrost

Tyler Murchie, Emil Karpinski, Katherine Eaton, Ana T Duggan, Sina Baleka, Grant Zazula, Ross DE MacPhee, Duane Froese, Hendrik N Poinar

Current Biology (Accepted, 2021)

115.

Examining pathogen DNA recovery across the remains of a 14th century Italian monk (St. Brancorsini) infected with Brucella melitensis

Jessica Hider, Ana T Duggan, Jennifer Klunk, Katherine Eaton, Long, George Sam, Emil Karpinski, GBrian Golding, Tracy L Prowse, Hendrik N Poinar, Gino Fornaciari

(In Prep)

116.

Black Death shaped the evolution of immune genes

Jennifer Klunk, Tauras Vilgalys, Christian Demeure, Mari Cobb, Derek Elli, Rebecca Redfern, Sharon N DeWitte, Julia Gamble, Jesper L Boldsen, Ann G Carmichael, … Luis Barreiro

(In Prep, 2021)

117.

Human ectoparasites and the spread of plague in Europe during the Second Pandemic

Katharine R Dean, Fabienne Krauer, Lars Walløe, Ole Christian Lingjærde, Barbara Bramanti, Nils Chr Stenseth, Boris V Schmid

Proceedings of the National Academy of Sciences of the United States of America (2018-02-06) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5819418/

DOI: 10.1073/pnas.1715640115 · PMID: 29339508 · PMCID: PMC5819418

118.

The next phase of SARS-CoV-2 surveillance: real-time molecular epidemiology

Bas B Oude Munnink, Nathalie Worp, David F Nieuwenhuijse, Reina S Sikkema, Bart Haagmans, Ron AM Fouchier, Marion Koopmans

Nature Medicine (2021-09) https://www.nature.com/articles/s41591-021-01472-w

DOI: 10.1038/s41591-021-01472-w

119.

SARS-CoV-2 Whole Genome Sequencing in Ontario, December 14, 2021

Public Health Ontario

(2021-12-14) https://www.publichealthontario.ca/-/media/documents/ncov/epi/covid-19-sars-cov2-whole-genome-sequencing-epi-summary.pdf

120.

Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Benoit Morel, Pierre Barbera, Lucas Czech, Ben Bettisworth, Lukas Hübner, Sarah Lutteropp, Dora Serdari, Evangelia-Georgia Kostaki, Ioannis Mamais, Alexey M Kozlov, … Alexandros Stamatakis

Molecular Biology and Evolution (2021-05-01) https://doi.org/10.1093/molbev/msaa314

DOI: 10.1093/molbev/msaa314

121.

National Center for Biotechnology Information

NCBI

https://www.ncbi.nlm.nih.gov/

122.

GISAID - Initiative

GISAID

https://www.gisaid.org/

123.

VirusSeq Portal

CanCOGeN

https://virusseq-dataportal.ca/

124.

CoVizu: Rapid analysis and visualization of the global diversity of SARS-CoV-2 genomes

Roux-Cil Ferreira, Emmanuel Wong, Gopi Gugan, Kaitlyn Wade, Molly Liu, Laura Muñoz Baena, Connor Chato, Bonnie Lu, Abayomi S Olabode, Art FY Poon

Virus Evolution (2021-12-01) https://doi.org/10.1093/ve/veab092

DOI: 10.1093/ve/veab092

Database	Description
Assembly	Descriptions of the names and structure of genomic assemblies, statistical reports, and sequence data links.
BioSample	Characteristics of the biological source materials used in experiments.
BioProject	Goals and progress of the experimental initiatives, originating from an individual organization or a consortium.
Nucleotide	Sequences collected from a variety of sources, including GenBank, RefSeq, TPA and PDB.
PubMed	Bibliographic information and citations for biomedical literature from MEDLINE, life science journals, and other online publications.
SRA	Composition of raw sequencing data and post-processed alignments generated via high-throughput sequencing platforms.

Category	Population	Morelli et al. 2010	Cui et al. 2013	Pisarenko et al. 2021	This Study
Informative Dates	1.ORI	-326, 1793	1735, 1863	1744, 1842	1806, 1901
No Temporal Signal	1.IN	-2388, 1606	1500, 1750*	1791, 1897	1651, 1913
Sampling Bias	1.ANT	-4909, 1322	1377, 1650	1483, 1704	1655, 1835
Informative Dates	1.PRE	–	1312, 1353	–	1214, 1315
Informative Dates	2.MED	-583, 1770	1550, 1800*	1413, 1653	1560, 1845
No Temporal Signal	2.ANT	-3994, 1460	1550, 1800*	1373, 1628	1509, 1852
No Temporal Signal	4.ANT	–	1200, 1700*	1611, 1816	1848, 1968
No Temporal Signal	3.ANT	–	1450, 1850*	1531, 1742	1769, 1947
Sampling Bias	0.ANT	-6857, 1199	100, 1100*	1033, 1435	1357, 1797
Informative Dates	0.ANT4	–	–	–	39, 234
Sampling Bias	0.PE	-26641, -598	-4394, 510	-377, 499	1573, 1876
Informative Dates	0.PRE	–	–	–	-3098, -2786

aDNA	Ancient DNA
DNA	Deoxyribonucleic acid
MRCA	Most Recent Common Ancestor
NCBI	National Center for Biotechnology Information
SRA	Sequence Read Archive
tMRCA	Time to the most recent common ancestor

Author

Supervisor

Lay Abstract

Abstract

Acknowledgments

List of Abbreviations and Symbols

Declaration of Academic Achievement

1 Introduction

2 NCBImeta: Efficient and comprehensive metadata retrieval from NCBI databases

2.1 Summary

2.2 Background

2.3 NCBImeta

2.4 Use Case

2.5 Future Work

2.6 Availability

2.7 Acknowledgments

3 Plagued by a cryptic clock: Insight and issues from the global phylogeny of Yersinia pestis

3.1 Abstract

3.2 Introduction

3.3 Results and Discussion

3.3.1 Population Structure

3.3.1.1 Biovars

3.3.1.2 Major Branches

3.3.1.3 Time Period

3.3.1.4 Integrative Approach

3.3.2 Estimating Rates of Evolutionary Change

3.3.2.1 No Temporal Signal

3.3.2.2 Irreproducible Estimates

3.3.2.3 Informative Rates and Dates

3.3.3 Estimating Ancestral Locations and Spread

3.3.3.1 The Third Pandemic (19th - 20th Century CE)

3.3.3.2 The Second Pandemic (14th - 19th Century CE)

3.4 Conclusions

3.5 Methods

3.5.1 Data Collection

3.5.2 Sequence Alignment

3.5.3 Maximum Likelihood Phylogenetic Analysis

3.5.4 Data Partitions

3.5.5 Estimating Rates of Evolutionary Change

3.5.6 Estimating Ancestral Locations and Spread

3.5.7 Visualization

3.6 Acknowledgments

3.7 Author Contributions

3.8 Competing Interests Statement

3.9 Supplementary Information

3.9.1 Tables

3.9.2 Figures

4 Plague in Denmark (1000-1800): A longitudinal study of Yersinia pestis

4.1 Abstract

4.2 Introduction

4.3 Results and Discussion

4.4 Conclusion

4.5 Materials and Methods

4.6 Data Availability

4.7 Acknowledgments

4.8 Author Contributions

4.9 Competing Interests Statement

5 Conclusion

5.1 Main Findings and Contributions

5.2 Future Directions

5.2.1 Same ‘Plague’, New Problems

5.2.2 New ‘Plague’, Same Problems

6 References

3.3.3.1 The Third Pandemic (19^th - 20^th Century CE)

3.3.3.2 The Second Pandemic (14^th - 19^th Century CE)