Delivering transformational data analysis for infectious disease drug discovery

Artificial intelligence and the insights it can provide offer a route for infectious diseases to stay in the spotlight and draw interest from R&D investors worldwide. Liam Tremble at Poolbeg Pharma explores how this technology can be applied to data analysis
Pharmaceutical R&D is becoming increasingly expensive, and returns on investment (ROIs) have been falling over the last several decades, with static levels of new drug approvals since the 1980s. It is estimated to cost over $2 billion to develop a single product and over 90% of products which have a filed investigational new drug application (IND) go on to fail. ROI from the pharma sector has fallen from 10% in 2010 to just 2% in 2018 but has recently started to recover.1-2
No sector has been as severely impacted as the infectious disease market. Before COVID-19, almost two decades had passed without substantial improvement in treatment options for a range of viral illnesses, such as influenza.
Part of the dilemma of diminished returns is thought to be related to the consistent improvement of each new drug. Subsequently, making incremental improvements in new drug candidates has become more difficult, especially when coupled with the probability that the majority of intuitive drugs, considered likely to succeed, have already been discovered. Now, more than ever, there is a need to look beyond conventional drug development approaches to discover the next generation of therapeutics.
In recent years, researchers have been battling against reduced returns by using artificial intelligence (AI) to help guide drug discovery and development within the pharma sector. AI is designed to deal with big datasets from a variety of sources and formats, making it ideally equipped to deal with modern cutting-edge analysis techniques. These range from single-cell transcriptomic sequencing and proteomic analysis, which produce Big Data, to the diverse data types that together describe a disease phenotype.
AI is a high-power computing technique, which uses iterative ‘learning’ algorithms to interpret, learn, and discover from underlying patterns in ‘Big’ Data. In addition to traditional Big Data, modern AI providers incorporate machine reading comprehension (MRC) to allow algorithms to learn from unstructured text-based publications, creating formidable ‘intelligent’ machines, which are up to date with the latest literature.
The US is the undoubted leader in pharma AI, with over 184 companies and over $12 billion of capital invested in providing AI based tools to aid drug discovery, development, and medical treatment. In silico AI tools exist to identify disease specific targets, interventions against those targets, coupled with the absorption, distribution, metabolism, and excretion (ADME) and toxicology profiles of them, can determine their probability of success in the clinic. Pharma companies are using AI to expand their pipelines and to prioritise existing assets, with many major pharma companies refraining from commercial decisions on their pipelines without the input of AI-based predictive outcomes.
The role of AI in infectious disease is particularly pertinent due to the diverse range of factors that can impact the trajectory of infection and immunity. Although conserved responses have been identified in infectious disease, such as the decoupled interferon responses, which distinguish severe and mild viral infections, a multitude of factors such as age, human leukocyte antigen (HLA) type, immunological history, immunological status, volume of pathogen initially experienced, and host comorbidities, can all result in varied responses to a single pathogen.
‘Original antigenic sin,’ the process by which immunological history can impede antigen specific responses by ‘antigen trapping’ early in infection, has been identified by vaccine developers as a major obstacle in the design of universal vaccines. Particularly in elderly or immunocompromised individuals, it can be difficult to stimulate a durable vaccine response in a consistent manner due to the multitude of underlying factors. It is likely that elements of personalisation will be needed to drive the effector mechanisms for lasting immunity.
The non-biased ability of AI to integrate multi-omic data makes it an ideal platform to deal with these wide-ranging factors that affect and predict immunity, host response, and recovery in the face of a plethora of infectious diseases. It also makes it the ideal partner in helping to identify the next generation of pharmaceutical products to prevent and treat disease.
One of the significant challenges for AIbased discovery is the quality of the data input and the presence of high-powered comprehensive datasets that can be used to validate its findings. Commercial grade platforms often overcome this issue by the integration of gargantuan datasets taken from publicly available information. However, these datasets are often incomplete due to limited publishing of original datasets and data protection legislation, which regulate clinical data.
Despite the large-scale support for AIdriven improvement of healthcare delivery, progressive improvements in AI have largely outpaced the development of the regulatory framework surrounding it. Data protection legislation, designed over the past decade to counter the exploitation of personal data by corporate interests, often finds itself at odds with the principles of AI-based learning.
AI is beginning to revolutionise the delivery of healthcare. Its ability to integrate and infer from diverse data types, (such as imaging data, clinical notes, demographics, and lab results in real time to produce tools for diagnoses and prognosis), can aid clinical decision-making. The ethical implications of rapidly advancing AI-based contributions to medical treatments have stirred global bodies to develop guidelines and principles for the integration of AI in medical settings.
Integration of AI in the drug discovery and drug development process is impacted less by these ethical concerns. However, issues still exist, such as the propagation of ethnic bias in medical treatments due to underlying bias in datasets, which may manifest further differences in healthcare outcomes across underserved minorities and developing nations.
Global initiatives, such as the Human Vaccines Project, have been engaged in sustained efforts to characterise the immune system in exquisite detail, with the underlying belief that integration of diverse high depth datasets will produce prognostic and interventional products to improve health outcomes. Single-cell next-generation sequencing, full length protein microarrays, phage display, and cell phenotyping arrays produce Big Data, which can be layered onto host biology.
The power of AI will advance exponentially over the coming years. However, in order to accelerate insights, there is the opportunity to progress beyond the gradual accumulation of data snippets and to input bespoke high depth data from infectious disease. High depth analysis of clinical data is cost prohibitive, particularly when the insights of AI may be only the first step in the development of a new product.
In light of the COVID-19 pandemic, it is vital that national bodies recognise the potential of AI to prepare for and respond to future challenges. There has been a unique opportunity to provide bespoke data for AIdriven infectious disease research through human challenge trials, in which volunteers are inoculated with an infectious agent in carefully controlled conditions and monitored through health, sickness, and recovery. During this time, daily biological samples can be obtained, revealing local and systemic responses to the challenge on a real-time basis, which can then be coupled to an intervention.
Academic institutions such as Imperial College London and Oxford University have championed the technique in recent years, including challenge of healthy volunteers with SARS-CoV-2. A beneficial impact of the high depth data analysis of modern immunological techniques has been the reduction in the number of subjects required for powered and meaningful interpretations.
Recent evidence in COVID-19 has shown that germline mutations and HLA types can profoundly influence susceptibility to severe disease. The data highlight the importance of matched genetic, transcriptome, and immunology datasets for training AI algorithms. In the absence of matched HLA data, immunological trends can be misattributed to other mechanisms or signals that may not be detectable behind the background variability that these factors create. The noise of biology will always be present, but it is the responsibility of those using AI for research to produce clean data, which minimise confounding factors and facilitate the next generation of insights.
As a scientific community, we have great confidence in the ability of AI to lead the next generation of interventions in the war on infectious disease. However, while AI will not replace the requirement for basic research, our ability to integrate AI cutting-edge analyses into clinical datasets will speed up the development of novel interventions which can improve patient outcomes.
The potential for using AI analysis of biological data to quickly and cost effectively identify many more interesting and efficacious drug candidates for infectious diseases with serious unmet needs, is both very real and very exciting.

Liam Tremble PhD, Project Manager, R&D Operations at Poolbeg Pharma plc (London AIM: POLB) – a clinical stage infectious disease pharmaceutical company, which aims to develop multiple products faster and more cost effectively than the conventional biotech model. Liam, an immunologist, has worked at hVIVO, part of Open Orphan PLC, the provider of human challenge trials with a focus on strategic engagement to enhance recruitment for clinical trials. Prior to that, he was a researcher at the Cork Cancer Centre, Republic of Ireland. He completed his doctoral degree at University College Cork on the role of tumour associated macrophages in melanoma.