2023 ARCHIVES

Cambridge Healthtech Institute’s 2nd Annual

Machine Learning Approaches for Protein Engineering

Balancing Theory with Practice

May 18 - 19, 2023 ALL TIMES EDT

The arrival of machine learning and AI tools promise to have a tremendous impact on the field of protein engineering. Drug discovery and development processes are fraught with inefficiencies due to the lack of predictive tools. For machine learning and AI to truly change the way drugs get discovered, designed and optimized in the future, there is much that needs to be learned about how to adapt them for use in antibody discovery, training set development, prediction, screening, simulation and optimization. Join the esteemed faculty of the 2nd Annual Machine Learning Approaches for Protein Engineering track at PEGS Boston to learn how to transform the process of antibody development and ultimately improve success rates.

Scientific Advisory Board
M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, Inc.
Victor Greiff, PhD, Associate Professor, Oslo University Hospital
Maria Wendt, PhD, Head, Biologics Research US, Sanofi

Sunday, May 14

- 5:00 pm Main Conference Registration1:00 pm

Recommended Pre-Conference Short Course2:00 pm

SC3: In silico and Machine Learning Tools for Antibody Design and Developability Predictions

*Separate registration required. See short courses page for details.

Thursday, May 18

Registration and Morning Coffee7:30 am

8:25 am

Chairperson's Remarks

Maria Wendt, PhD, Global Head and Vice President, Digital and Biologics Strategy and Innovation, Sanofi

8:30 am KEYNOTE PRESENTATION:

Recent Advances in Protein Engineering

Regina Barzilay, PhD, Delta Electronics Professor, Electrical Engineering & Computer Science, Massachusetts Institute of Technology

9:00 am

Surface ID: A Deep Learning-Based Molecular Descriptor and a Useful Tool for Drug Discovery

Yu Qiu, PhD, Senior Principal Scientist, Sanofi Genzyme R&D Center

“Surface ID” is a geometric deep learning system for high-throughput surface comparison based on geometric and chemical features. Surface ID offers a novel grouping and alignment algorithm useful for clustering proteins by function, visualization, and in silico screening of potential binding partners to a target molecule.

9:30 am

Discovering Antibodies from Patient Serum after Vaccination and Infection with SARS-CoV2

Natalie Castellana, CEO, Abterra Biosciences

Serum antibodies from three individuals who had been fully vaccinated against SARS-CoV-2 and subsequently infected with the virus were analyzed by Alicanto. Serum antibodies were fractionated based on binding to the receptor-binding domain (RBD) and those binding to non-RBD sites on the spike protein. Memory B cells reactive to spike protein were enriched and sequenced via next-generation sequencing. A subset of the B cell sequences were identified among the serum antibodies.

Coffee Break in the Exhibit Hall with Poster Viewing10:00 am

10:40 am

Accelerating Therapeutics Discovery with Disruptive Digital Innovation

Peter Clark, PhD, Head of Computational Science & Engineering, Therapeutics Discovery, Janssen R&D

Significant advances in computational methods and hardware-accelerated scientific computing have enabled the dawn of a new era of medicine in which lifesaving therapeutic molecules can be designed and optimized with greater speed and precision than ever before. At Johnson & Johnson, we are leveraging data from across the pharmaceutical value chain, manifested in a knowledge graph to inform novel computational, deep learning models to drive innovation and disrupt the therapeutic research and development lifecycle; building and leveraging our collective institutional knowledge across therapeutic programs and indications in order to inform novel AI/ML models to accelerate the development of lifesaving therapies for patients across the globe.

11:10 am

Addressing Real-World Challenges in AI-Guided Design and Optimization of Biologics

Christopher J. Langmead, PhD, Director of Digital Biologics Discovery, Amgen

This presentation will provide an overview of the key challenges faced when using AI/ML to guide the design and optimization of biologics, including multi-specifics. We will then discuss some of the techniques used within Amgen to address these issues. Finally, we will argue that certain challenges are best solved through collaborative mechanisms, such as federated learning.

11:40 am

Designing Highly Stable Protein Libraries by Interpreting Deep Learning Models Trained on Flow Cytometry-Based Assays

Andrew Chang, PhD, CEO, DeepSeq.AI

The deep-learning model is no longer a black box. This talk will cover how we design a high-throughput assay to generate a protein-stability dataset for training the language model. In addition to predicting novel stable sequences using the trained model, such a model can educate us on what patterns or motifs make a particular sequence stable. Therefore, such protein stability knowledge extracted directly from the trained model becomes valuable for scientists redesigning a more stable protein library.

Luncheon in the Exhibit Hall and Last Chance for Poster Viewing12:10 pm

1:15 pm

Chairperson's Remarks

Victor Greiff, PhD, Associate Professor, Immunology, University of Oslo

1:20 pm

Success and Challenges in AI-Driven Antibody Discovery-- From Humanoid Antibodies to de novo Design

Joshua Smith, PhD, Molecular Design, Principal Scientist, Just- Evotec Biologics

Machine learning has become an integral part of antibody discovery and development. I will describe how we designed the J.HAL antibody discovery library with a generative machine learning method and share experimental results from recent discovery campaigns. I will also outline our computational approach to the problem of antigen-specific antibody design and our plans for experimental validation.

1:50 pm

Applications of Geometric Deep Learning Model with a Novel Coarse-Grained Protein Structure Representation

Jae Hyeon Lee, PhD, Machine Learning Scientist, Prescient Design

I will discuss a new protein structure prediction model based on a novel coarse-grained protein structure that achieves atomic accuracy on antibody structure prediction and is orders of magnitude faster than other state-of-the-art models. In addition, I'll describe its application in various antibody property prediction and design tasks.

2:20 pm

A future AI & robotics drug discovery that predicts antibody/peptide properties to discover drug candidates

Satoshi Tamaki, Ph.D., Chief Scientific Officer, MOLCURE Inc.

The development of antibodies and peptides can be very challenging because of the need to optimize and balance trade-offs in affinity, specificity, and other physicochemical properties. In response to this challenge, MOLCURE has built a platform that integrates AI, robotics and molecular biology experiments. The platform has generated >1 billion data points to train AI models that can identify novel, high affinity antibody and peptide drug candidates with optimized physicochemical properties.

2:35 pm

Deep Learning Enables Exploration of Antibody Space on Unprecedented Scale

Yi Li, Vice President of Strategic Development, Head of Antibody Discovery, XtalPi, Inc.

The theoretical antibody sequence space is immense and beyond the interrogation by ordinary wet-lab means. Deep learning has established its superiority in fields where high-dimensional big data is involved. We demonstrate the potential of deep learning to explore the whole antibody sequence space and find therapeutic candidates with superior efficacy and developability.

Networking Refreshment Break2:50 pm

3:20 pm

Predicting Disposition: Progress towards Relevant Preclinical Models for the Pharmacokinetics of Biologics

Vanita D. Sood, PhD, Senior Vice President, Head of Drug Discovery Research Stealth Versant Ventures NewCo

The disposition of biologics (including clearance and immunogenicity) are key properties that influence efficacy (no exposure, no effect); tolerability/safety (neutralizing or clearing anti-drug antibodies); and commercial success (route of administration, patient convenience). Compared to small molecules, there is a dearth of predictive preclinical models of clinical pharmacokinetics. I will discuss recent progress and challenges in predicting human PK.

3:50 pm

Antibody Profiling at Scale

H. Benjamin Larman, PhD, Associate Professor, Pathology, Johns Hopkins University

The Larman laboratory creates technologies for unbiased characterization of serum antibodies at cohort scale. This seminar will provide an overview of our current antibody profiling capabilities, recent findings, and ongoing developmental efforts that seek to overcome existing limitations of high-throughput antibody analyses.

Close of Day4:20 pm

Friday, May 19

Registration Open7:00 am

7:30 amInteractive Discussions with Continental Breakfast

Interactive Discussions are informal, moderated discussions, allowing participants to exchange ideas and experiences and develop future collaborations around a focused topic. Each discussion will be led by a facilitator who keeps the discussion on track and the group engaged. To get the most out of this format, please come prepared to share examples from your work, be a part of a collective, problem-solving session, and participate in active idea sharing. Please visit the Interactive Discussions page on the conference website for a complete listing of topics and descriptions.

TABLE 1: Meaningful Representation of Biologics for Machine Learning - IN-PERSON ONLY

Yu Qiu, PhD, Senior Principal Scientist, Sanofi Genzyme R&D Center

ML doesn’t understand protein. Digital representation (numerical features) is needed as input
Meaningful representation (features) is a key for ML models
Protein can be represented as 1D sequence (one hot or embedding), 3D structure (point cloud of cartesian coordinates, or graphs with nodes and edges), or surface patches
Surface ID is deep learning derived representation, encoding geometric and chemical properties, that can be used for surface patch comparison
Applications of Surface ID include paratope clustering, PPI classification, database mining etc.

TABLE 2: Implementation of Disruptive Digital Innovation & Deep Learning Models to Accelerate Therapeutics Discovery of Protein Therapeutics: Challenges & Opportunities - IN-PERSON ONLY

Peter Clark, PhD, Head of Computational Science & Engineering, Therapeutics Discovery, Janssen R&D

Explore common challenges for end-to-end integration and enterprise deployment of AI/ML models across the R&D product lifecycle
How are organizations leveraging the growing suite of predictive models to inform and accelerate generative design and optimization of protein therapeutics?
How can we foster collaboration between different departments, including research, development, and CMC, to establish AI as a core organizational discipline?
What are the opportunities & best practices for incorporating AI/ML models and integrated lab automation platforms from discovery to development?
How are advancements in computational hardware and infrastructure driving innovation in our digital platforms and business processes?

8:25 am

Chairperson's Remarks

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, Inc.

8:30 am

Assessing the Quality of Antibody-Antigen Models Using AlphaFold

Francis Gaudreault, PhD, Research Officer, Human Health Therapeutics, National Research Council Canada

AlphaFold has revolutionized the structure prediction of proteins alone or in the complex. The need for co-evolutionary sequence constraints for structure prediction limits its use against antibody- antigen complexes. We predicted the structure of antibody- antigen complexes using traditional physics-based protein-protein docking tools. We evaluated the ability of AlphaFold in the quality assessment of models. Our results highlight that AlphaFold can rescue poorly-ranked models and better discriminate good-quality models from decoys.

9:00 am

Machine Learning Prediction of Methionine and Tryptophan Photooxidation Susceptibility

Jared Delmar, PhD, Associate Director, Biopharmaceutical Development, AstraZeneca

Photooxidation of methionine (Met) and tryptophan (Trp) residues is common and includes major degradation pathways that often pose a serious threat to the success of therapeutic proteins. We applied the random forest machine learning algorithm to in-house liquid chromatography-tandem mass spectrometry (LC-MS/MS) datasets (Met, n = 421; Trp, n = 342) of tryptic therapeutic protein peptides to create computational models for Met and Trp photooxidation. We show that our machine learning models predict Met and Trp photooxidation likelihood with accuracy and further identify important physical, chemical, and formulation parameters that influence photooxidation.

9:30 am

Developability Profiling of Natural Antibody Repertoires

Victor Greiff, PhD, Associate Professor, Immunology, University of Oslo

Developability, the set of physicochemical properties of an antibody relevant for manufacturing and success in clinical trials, is one of the key determinants for success during clinical testing and any developability parameters can be computed from the antibody sequence and structure. Although the distribution of developability parameters of natural antibody repertoires may provide guidance on the potential suitability of therapeutic antibody candidates, the sequence and structural distributional landscape of the natural antibody repertoire has not yet been described. We quantify the redundancy, sensitivity, and predictability of developability parameters in natural and clinical-stage antibodies. Exploiting the vast amount of available antibody high-throughput data will facilitate the derivation of the rules underlying developability profiles to guide antibody therapeutic discovery.

10:00 am

Predictive Modeling of Concentration-Dependent Viscosity Behavior of Monoclonal Antibody Solutions

Christoph Grapentin, PhD, Principal Scientist / Group Leader, Drug Product Services, Lonza

Solutions of monoclonal antibodies (mAbs) can show increased viscosity at high concentration, which can be a disadvantage during protein purification, filling, and administration. We present a modeling approach employing artificial neural networks (ANNs) using experimental factors combined with simulation-derived parameters plus viscosity data from 27 highly concentrated (180 mg/mL) mAbs. These ANNs can be used to predict if mAbs exhibit problematic viscosity at distinct concentrations or to model viscosity-concentration-curves

10:15 am

Sequencing to Synthesis: How Machine Learning Maximizes Process Efficiency in Antibody Discovery

Ilaria DeVito, Senior Strategic Intelligence Analyst, Gene Synthesis, Azenta Life Sciences

Azenta has innovative end-to-end Ab discovery solution combining the strengths of in vitro and in silico technology resulting in Ab candidates that can be readily synthesized making the discovery and development of Ab therapies quicker and more efficient. Azenta's in silico antibody discovery module (ADM) developed by Specifica and powered by OpenEye, uses machine learning to generate a diverse list of Ab candidates for recombinant production.

Networking Coffee Break10:30 am

11:00 am

Predicting scFv Thermostability Using Machine Learning on Sequence and Structure Features

Kathy Y. Wei, PhD, Scientific Co-Founder, 310.ai

Multi-specific biologics are of interest due to the advantage of engaging distinct targets. One important component is the scFv, but their relatively poor thermostability often hampers development. As experimental methods are laborious and expensive, computational methods are an attractive alternative. Here, we show two machine learning approaches – one with pre-trained language models (PTLM), and second, a supervised convolutional neural network (CNN) trained with Rosetta energetics – to better classify thermostable scFv variants from sequence. On out-of-distribution sequences, we show that a simple CNN model outperforms a general PTLM trained on diverse protein sequences (Spearman ?=0.4 vs 0.15).

11:30 am

Development of Machine Learning Models for Prediction of Antibody Non-Specificity

Laila Sakhnini, PhD, Senior Research Scientist, Biophysics & Injectable Formulation, Novo Nordisk AS

Over the years, there has been an increased focus on decreasing non-specific binding during early-stage drug development. It has been recognized as a root cause for failure in many drug programs due to unexpected pharmacokinetics and elevated toxicity. From a computational design perspective, prediction has remained a challenge. Proposed work describes the development of sequence-based machine learning models for prediction of this property with accuracy of up to 74%, enabling flagging and deselection of non-specificity at an early-stage.

Close of Machine Learning Approaches for Protein Engineering Conference12:00 pm