With billions of molecules available for use in drug discovery and development, how can researchers determine which ones will be useful in combating particular diseases? That question was among those considered in a freewheeling discussion at BioPharm America™ 2019 in Boston, where panelists from companies ranging from startups to big pharma outlined their successes and challenges in using artificial intelligence (AI) and machine learning.
The panel was moderated by Yizhen Dong, principal at 11.2 Capital, a San Francisco seed stage venture firm investing in founders “who are solving the biggest challenges facing our world with emerging technologies, including application of AI in healthcare and life sciences.”
The panelists were: Ron Alfa, senior VP of translational discovery and chief evangelist at Salt-Lake-City-based Recursion Pharmaceuticals; Martin Akerman, chief technology officer and co-founder of Envisagenics, Inc., of New York City; Heather Arnett, VP of research at NuMedii, Inc., in San Mateo; Joseph Szustakowski, Executive Director of Translational Bioinformatics at the global pharmaceutical company Bristol-Myers Squibb, and Shanrong Zhao, director of computational biology at Pfizer, Inc., which is also global.
AI in target discovery
Moderator Yizhen Dong opened the discussion by asking panelists from NuMedii, Recursion and Envisagenics how they are using artificial intelligence and machine learning to target which molecules in the body are associated with a particular disease process.
NuMedii is a small company devoted to accelerating the discovery of precision therapies to address high unmet needs, especially in cancer and rare diseases. NuMedii’s AIDD (Artificial Intelligence for Drug Discovery) harnesses Big Data and AI to rapidly discover connections between drugs and diseases at a systems level. The company extracts information from a vast array of disparate data stores to create a structured, proprietary data resource spanning hundreds of diseases and thousands of compounds. Its proprietary AI and machine learning algorithms allow the company to extend well beyond conventional ‘target-centric’ drug discovery approaches by facilitating the exploration of favorable ‘poly-pharmacology’ proles that can potentially improve therapeutic efficacy by modulating effects on multiple disease pathways.
In the view of Heather Arnett of NuMedii, AI is not a “panacea to cure all of big pharma and big biotech ills in making sure we get full productivity for what happens in the clinic.” But genomic, molecular and other biomedical data “has gotten so big and so complex than an individual post doc or a scientist is not going to be able to have all that information in their brain in a way that’s going to yield new targets [molecules in the body intrinsically associated with a particular disease process]. So we use AI and machine learning almost as a crutch for the integration and intellectual pulling together of all that data in order to identify targets and biomarkers (bodily substances indicative of disease, infection, or environmental exposure) and to stratify patients into subgroups likely to respond to particular therapeutics.”
Rather than have drug companies get caught up on what she termed the “me too’s” and the “me betters,” or the “I’m researching this pathway and I’m going to hit this piece of it, and this other company is going to go further upstream or downstream,” she said, “AI lets us push targeted mutation in a way that lets you think outside the box and come up with novel targets that you wouldn’t otherwise have thought of.
“Our work focuses heavily on single cell sequencing data, which we use to predict the best targets for intervention for particular diseases for the right patients,” Arnett explained. “AI has particular strengths and brings things to the table which we can use to augment our targets. We particularly favor clinically rich data sets. So, where you have rich medical clinical metadata, we can use AI to help ensure that we’re matching ourselves to the right patients—those most likely to respond to a treatment.”
Recursion is applying deep learning to millions of high-resolution cellular images, with a high potential to make biological discoveries with a medical impact and breadth never seen before.
Recursion’s Ron Alfa pointed out that direct discovery using AI can take a long time. “It takes a year or so to have a really sufficient body of data to prove a pattern,” he said. To help solve that problem, “Recursion has built the world’s largest data set of biological images. Using the latest in artificial intelligence methods, we’re training our algorithms on this data to predict important properties of new medicines with the goal of discovering treatments for hundreds of diseases.
“There’s a ton of data to generate across a lot of different types of experiments. But we think of AI as a tool, essentially, to be able to probe large systems data sets to answer certain questions. For example, when we do hit identification [that is, seek a compound with confirmed activity against a biological target] we can apply machine learning to images in order to understand biology that you otherwise wouldn’t be able to characterize without some specific understanding of that biological content. That enables us to discover drugs that work via lots of different mechanisms, not just common pathways.
“From there, how do we find a target?” he asked. “The way we approach that is to build very large, genome wide CRISPR [gene editing] experiments and then examine biology that’s happening in cells to understand how a drug may or may not be working.…Just looking at the effect that a compound produces and comparing it with others allows us to understand whether a target we’ve identified is very similar to others or whether it’s different. There are a lot of different approaches you can use to understand what drugs are lacking, and what the targets should be.”
Envisagenics focuses on the discovery of RNA therapeutics to combat the more than 370 genetic diseases or cancers shown to be caused by mutations affecting RNA splicing. The company’s mission is to reduce the complexity of biomedical data to accelerate the development of innovative therapeutic solutions through RNA splicing analytics and artificial intelligence.
Envisagenics’ breakthrough technology, SpliceCore, is a cloud-based platform that is experimentally validated to predict drug targets and biomarkers through splicing discovery from RNA-sequencing (RNA-seq) data, using artificial intelligence. The platform replaces expensive drug-target selection and lead design with efficient computer simulations, thus decreasing time, cost, and failure risk of drug development programs.
In the view of Martin Akerman of Envisagenics, using RNA-seq data and artificial intelligence are “both a necessity and an opportunity to develop therapeutics that target splicing errors.” Integrating data from hundreds of patients, “we use artificial intelligence and machine learning to find therapeutic points of intervention and ‘the right’ patient populations.” The company also designs RNA therapeutics to treat diseases caused by RNA splicing errors. “We are using our basic data to find antibodies and new antigens; we also formed a lab to do RNA splicing and to regenerate a variety of those peptides which are invisible to the genome.” But, “validating targets is our special angle.”
Moving the discussion from target discovery to the next step in drug development, Dong asked Abraham Heifetz, the Atomwise CEO, how AI and machine learning are used to find “hits”—small molecules that bind optimally to a biological target and modify its function.
Atomwise technology, based on convolutional neural networks (the same AI technology that recognizes faces in a crowd, enables self-driving cars) uses a statistical approach that extracts the insights from millions of experimental affinity measurements and thousands of protein structures to predict the binding of small molecules to proteins. This fundamental tool makes it possible for chemists to pursue hit discovery, lead optimization and toxicity predictions with unparalleled precision and accuracy.
Heifetz pointed out that advances in synthetic chemistry have led to the development of billions of molecules—some of which are possible “ligands” or molecules that might bind to and modify biological targets in the cells.
Fifteen years ago, he said, it was possible to purchase a million molecules…and pharma owned five million molecules, which were screened by robots. But with new chemical synthesis techniques, by the end of 2018, some 3.8 billion molecules were available for purchase, and in 2019, 11 billion. “I personally believe that, next year, there will be 100 billion molecules that we can purchase for $100 apiece,” he said. “We can’t buy 100 billion molecules, we can’t store 100 billion, and we can’t test 100 billion molecules. And so, to explore that space, it must be done computationally.
“First, you have to run the physical experiment to determine whether it makes sense to make a particular molecule. But, in the world of searching, the challenge is, if you have a 99% accurate model, with 1 percent false positive, your correct answer is lost in a sea of a billion molecules. It’s not a question of making new molecules. It’s a question of filtering, searching, matching. And those are fundamentally AI questions.” The synthetic chemists are “incredibly powerful,” but, he added, “they need help and guidance in figuring out which molecules to produce. AI can do that very quickly and rapidly identify hits.”
To that end, he said, Atomwise “invented the use of the compositional neural networks for drug discovery, the use of deep neural networks for structural biologics construction and has been running the biggest application of machine learning to drug discovery in history—comprising more than 250 separate projects in 36 countries with hundreds of researchers.”
According to the Atomwise website, the company’s machine learning capabilities have improved hit rates by up to 10,000 times and can deliver accuracy comparable to wet lab experiments. The technology screens for potency, selectivity, and polypharmacology, and guards against off-target toxicity. It screens more than 100 million compounds each day and delivers results 100 times faster than ultra-high throughput screening.
Optimizing hits for clinical trials
After finding hits, one of the biggest challenges facing discovery, especially at the early stages, is optimizing the right molecules to combat specific disease states in patients likeliest to respond to particular treatments, Dong said. “How are pharmaceutical companies using artificial intelligence to tackle that big problem?” he asked.
Bristol-Myers Squibb (BMS)
Joseph Szustakowski of BMS replied: “With the advent of next generation sequencing technologies and other modern proteomics technology platforms, we are able to very rigidly characterize the molecular and cellular state of patients. For example, in an oncology setting, we routinely collect tumor biopsies and perform omics and RNA sequencing. We collect digital pathology images, and the question is, how do we sift through all those variables to identify the signals that will help us identify which applications are most likely to be of benefit?”
To that end, Szustakowski explained, his team uses artificial intelligence and machine learning in two main ways.
(1) Regarding high dimensional molecular biological data sets: “If we’re measuring the expression or the activity of 20,000 genes, how do we identify the handful that are the most relevant in terms of identifying important characteristics of the tumor micro-environment for which patients are more likely to respond to therapy? For example, BMS is working on heart failure—and there’s enormous medical need there. But it’s difficult to identify which patients are mostly likely to benefit from the therapy. Patients want to enroll in clinical trials—but trials are notoriously large, expensive and difficult to run. So we’ve been working with fairly large frameworks of heart failure subjects and have identified a small panel of proteins that will help us in a prognostic sense, identifying those patients who are at great risk for some sort of cardiac failure.”
(2) “The second ‘bucket’ of activities where we’re getting a lot of traction is in image analysis specifically applied to digital pathology,” he said. “In oncology, images are already used for diagnosis in a real-world setting; they are also diagnostic in a clinical trial setting. We have an abundance of pathology slides that we’ve gone through and digitized. We have way more slides and way more questions than you would like to ask those slides. And we have physical human pathologists. So, we’ve been working internally and with external partners to develop deep learning approaches that we can use to automate the analysis of the knowledge.
“We can do things like segment the image to determine what regions are tumor, what regions are normal; and where are the boundary regions. We can also identify specific cell types, which cells are normal, and different immune cells in order to digitally modify the expression of specific biomarkers that are relevant. And then, because we’re able to automatically extract all those features, we can start to superimpose them on top of each other and ask questions about how many people are expressing CDA positive T cells that are within a certain distance of the tumors. We can then look to see if that is in some way protective or associated with another problem.”
His group at BMS also performs “downstream” and predictive analysis, he said.
Shanrong Zhao, director of computational biology at Pfizer, said that another challenge in clinical trials is to decide when to increase dosage of a particular drug. For example, “it is impossible to take patient measurements every day in a trial.” But once the team has a baseline for the patient and the disease stage, artificial intelligence and machine learning can be used to predict patients’ situations at any time point over many weeks—and to determine appropriate dosages. AI and machine learning can also be used to synthetically generate larger patient samples, to simulate results, and to protect the personal identities of trial participants in situations where data is shared, he said.
Startups, emerging companies and big pharma: A multidisciplinary, collaborative future
In his final question, Dong asked whether big pharma is likelier to build artificial intelligence capabilities internally or to acquire technologies or companies. Szustakowski, of BMS, responded that “for successful projects, you need large, high quality labeled data sets and fairly specific questions that you're trying to address. We can't just take a whole bunch of data and the question and throw it at some computer scientist and expect to get back solid, actionable, information. It’s very important to have the collection of talent expertise and experience.” For that reason, he said, “We’re agnostic as to exactly where each of those ingredients come from. In some cases, we internally have the data steps necessary to try to answer these questions. In other cases, we will work with external partners. We have to focus in on specific problems and take the opportunity to solve them in a way that current approaches allow. It’s very much a multidisciplinary problem.”
Anita Harris is a writer and founder of the Harris Communications Group, an integrated public relations and digital marketing firm based in Cambridge, MA.
Join us for more compelling panel discussions during Digital Medicine & Medtech Showcase at Biotech Showcase, coming up January 13—15, 2020 in San Francisco.