Authors: Tim Sandle (Bio Products Laboratory, Elstree, UK)
Data-driven approaches to disease tracking and assessment are often used in predicting and preventing the spread of pathogens, as well as assessing whether a disease outbreak will occur and what the probable impact will be on a given population. Such approaches have been made easier through the digital capture of data and facilitated through forms of artificial intelligence such as machine learning. However, while the quality of data remains a key factor, any attempt of infectious disease prediction rests with the robustness of the model. An emerging area that appears to offer strong predictive power is pathogen biogeography.
Pathogen biogeography (sometimes conflated as ‘pathogeography’) refers to the study of the distribution of a disease (or associated vector), together with the associated ecosystems, within a defined geographic space and subject to an assessment across a selected geological time (biogeography coupled with a focus on a specific pathogen). Tools associated with pathogen biogeography can be used either retrospectively (to make sense of past disease outbreaks), to examine current epidemiological concerns, or to make predictions as to the types and extents of future disease outbreaks.
The origins of biogeography date back to the Victorian era of exploration. However, it has only been in recent years that biogeography has become integrated into human medicine and datasets leveraged for global health management. Microbial biogeography, for example, has been an area of study for over 30 years, although much of the focus has been on understanding the role of microorganisms as signifiers of ecological change.
To draw inferences from data sets, the study of pathogen biogeography requires the collection of data from multiple sources. Such sources include the spatial distribution of pathogens around the world, biogeographical information, community ecology and epidemiology. To begin any such analysis, the extent to which a given organism is pathogenic must be known. This entails an understanding of host–microbiota interactions in order to differentiate between beneficial and harmful organisms and their contribution to pathogenesis. Broadening out from this, the other factors that could constitute wide disease spread then need to be considered.
Despite a more globalized world, the geographic distribution of the majority of human infectious diseases is constrained through a combination of ecological barriers that place limits on the extent of dispersal, and through the inability of reservoir hosts or vectors to migrate from their ecological niche. There are, of course, exceptions to this and the 2020 outbreak of SARS-CoV-2, which causes COVID-19, provides a serious exemplar.
Predicting disease spread
Where pathogen biogeography has its strength, and through continued development can make an important contribution to medical science, is through the analysis of biogeographic patterns providing predictions into the potential distributions of new or emerging infectious disease risks. The diversity of the biogeographic patterns (‘pathodiversity’) can act as a strong signal for subsequent pathogen spread. This is perhaps a surprisingly unexplored area. For example, within GIDEON, a global database of clinically relevant human infectious diseases, the 350-plus significant global pathogens listed are comprehensively indexed in terms of virulence and treatment options alongside a paucity of information concerning the ecological distribution of the agents.
To address this and provide a starting point for pathogen biogeography to become more accurate, researchers at Louisiana State University (USA), Georgetown University (WA, USA), and the University of Montreal (Canada) have developed a new model. This model draws upon data relating to pathogens of concern in order to forecast outbreaks. The research team contends that their model, through considering a diversity of pathogens, introduces a level of accuracy missing from earlier models, which have tended to focus on one pathogen only. The reason for looking at multiple pathogens at one time, in addition to a given pathogen of concern, is based on the premise that the distribution of different pathogens combined with data from other regions can help to predict the outbreak potential of the pathogen of interest. This is because related pathogens often have a similar regional and global distribution.
It also stands that invariably, countries with similar climates and infrastructure tend to have similar pathogen communities. Things are not static, however, and the strength of the model is that it reacts to dynamic changes in terms of new inputs. This is important given the impact of multiple factors on different parts of the globe, including increasing contact between humans and animals (especially in the context of zoonotic pathogens making up the majority of emerging diseases), climate change, alterations to land use, food security and war leading to displaced populations.
Applying a new model
In order to process such vast quantities of complex data, the researchers used a machine learning algorithm. This type of algorithm improves with each new data input and with its own assessment of previous predictive outputs. By testing out the wide dataset, the researchers demonstrated that they can forecast the spread of a single pathogen within a given country with far greater predictive accuracy than when using a single-pathogen model. This could allow them to provide an earlier warning to public health officials and governments. For example, the algorithm could be used to determine whether instigating a travel ban will delay the arrival of an infectious disease in a country by days or weeks, or whether such a measure will be ineffective at eliminating the risk of the disease crossing borders in the long term and that health protection resources would be best invested elsewhere.
One conclusion drawn from the model to date is the suggestion that the significant changes impacting the planet, mentioned above, are creating conditions for more rapid and possibly more regular patterns of disease spread. For instance, helping to unpick the reasons why diseases are more likely to spill over from wildlife to humans in deforested habitats, or the extent to which habitat loss or global heating patterns are associated with the global emergence of infectious diseases. Climate variations also change the nature and diversity of pathogens, with areas with little or no precipitation during the driest part of the year generally having a different variety of pathogens to areas subject to cooler climates or greater rainfall.
While no single model can provide all of the answers in relation to emerging diseases, and though this specific model requires further development and fine-tuning, through continuing to collect data and make forecasts, a novel benchmark is being established. This provides global epidemiologists with an additional mechanism to assess the ebb and flow of potential national, regional and global epidemics and pandemics.
Reference: Dallas TA, Carlson CJ, Poisot T. Testing predictability of disease outbreaks with a simple model of pathogen biogeography. R. Soc. Open Sci. 6 (11):190883 (2019)
You might also like:
For more COVID-19 content, head to our COVID-19 Hub >>>