2025 ARCHIVES

Cambridge Healthtech Institute’s 4th Annual

Machine Learning Approaches for Protein Engineering

Putting Theory into Practice and Streamlining Biologic Development

May 15 - 16, 2025 ALL TIMES EDT

There is a growing need to improve current drug discovery and development processes and increase efficiency as the failure rate is very high and bringing a drug to market is staggeringly expensive. Machine learning and AI represent the future of the industry and have the capacity to completely revamp the way protein structures and biologics will get predicted, discovered, designed and optimized in the future. We are at an important juncture where these methods are getting incorporated at various points in the biologics pipeline and the subsequent retraining is critical. Join the unparalleled faculty of the 4th Annual Machine Learning Approaches for Protein Engineering track at PEGS Boston to understand how the culture and processes in antibody and protein drug development are evolving and how computational methods are being implemented.

Scientific Advisory Board
M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, an IQVIA business
Victor Greiff, PhD, Associate Professor, University of Oslo and Director of Computational Immunology, IMPRINT
Maria Wendt, PhD, Global Head and Vice President, Digital and Biologics Strategy and Innovation, Sanofi

Sunday, May 11

1:00 pmMain Conference Registration

2:00 pmRecommended Pre-Conference Short Course

SC1: In silico and Machine Learning Tools for Antibody Design and Developability Predictions

*Separate registration required. See short course page for details.

Thursday, May 15

7:45 amRegistration and Morning Coffee

8:25 am

Chairperson's Remarks

Maria Wendt, PhD, Global Head, Preclinical Computational Innovation Strategy Research Platforms, R&D, Sanofi

8:29 am

CANCELLED- KEYNOTE PRESENTATION: Generative AI to Accelerate Prediction and Design in Biomedicine and Sustainability

Debora S. Marks, PhD, Associate Professor, Systems Biology, Harvard Medical School

There’s now an amazing opportunity to accelerate discovery across important 21st century challenges by using computation tightly coupled to biological experiments and clinical medicine. I will describe some recent approaches from my lab for these challenges where we have developed new machine learning methods that can exploit the enormous natural sequence diversity and our ability to sequence DNA at scale. To demonstrate the power of these new approaches I will present recent work predicting the effects of human genetic variation on disease and drug response, anticipation of viral escape from the host immune system for vaccine design, protein design for enzyme optimization, antibodies and sustainable biomaterials.

8:30 am

Generalisable Models and Rapid Validation in Antibody Design

Pietro Sormanni, PhD, Group Leader, Royal Society University Research Fellow, Chemistry of Health, Yusuf Hamied Department of Chemistry, University of Cambridge

Advances in machine learning are transforming antibody engineering, yet key challenges remain: sparse training data, unreliable in silico performance metrics, and slow experimental feedback. In this talk, I will present recent work addressing these limitations. First, I will show how generalisable models can be built from limited and heterogeneous data to predict antibody biophysical traits, and how we applied this framework to nanobody thermostability with the development of NanoMelt. Second, I will examine common pitfalls in evaluating models for epitope-specific antibody design and propose alternative metrics that better capture practical performance. Finally, I will introduce SpyBLI, a rapid kinetic screening platform that delivers quantitative binding data in under 24 hours, without requiring purification. Together, these advances demonstrate how predictive algorithms and streamlined experimental pipelines can be combined to accelerate the design–make–test cycle in antibody discovery.

9:00 am

Large Language Models for mRNA Design

Sven Jager, PhD, Lead, Computational Science, Sanofi Germany GmbH

mRNA-based vaccines and therapeutics are increasingly popular and used for a variety of conditions. A key challenge in designing these mRNAs is sequence optimization. Even small proteins or peptides can be encoded by a vast number of mRNA sequences, each affecting properties such as expression, stability, and immunogenicity. To facilitate the selection of optimal sequences, we developed CodonBERT, a large language model (LLM) specifically for mRNAs.

9:30 am

CANCELLED: Progress Report on AlphaFold and OpenFold-Driven Biomolecular Modeling

Nazim Bouatta, PhD, Senior Research Fellow, Lab of Systems Pharmacology and Systems Biology, Harvard Medical School

AlphaFold2 has transformed structural biology with groundbreaking advances in protein structure prediction. However, despite these advances, many challenges remain. In this talk, I will present a progress update on AlphaFold2 and share insights gained from OpenFold, an optimized and trainable variant of AlphaFold2. I’ll also explore potential paths to address the limitations of AlphaFold-like systems.

10:00 am

In silico-Driven Strategies to Unlock the Therapeutic Potential of Rabbit-Derived Antibodies

Shuji Sato, Senior Director Client Relations, ImmunoPrecise Antibodies

This session will explore effective strategies for accelerating lead selection from a diverse panel of antibodies. Key techniques presented include proprietary methods for leveraging the unique immune system of rabbits, early epitope landscape profiling, and the use of IPA's in silico-driven diversification and optimization workflows, resulting in the rapid delivery of optimized antibodies ready for clinical development.

10:15 am

GenAI in protein engineering: What is working, what is not, where may we go next?

Stef Van Grieken, Cofounder & CEO, Cradle Bio

GenAI is making its way into industrial protein engineering applications for both therapeutic and non-therapeutic applications. Significant usage with early adopters is emerging across the industry, but large scale adoption of these methods is still in its early days. In this talk we explore - from the perspective of a software company for AI-guided discovery & optimization now serving >25 enterprises across pharma, industrial bio, agriculture and food - where we see teams be successful, what challenges they face and which pitfalls they could have avoided. We will also outline a maturity model that looks towards how industry can fulfill the promise of reaching broad and reliable GenAI adoption in the coming years.

10:30 amCoffee Break in the Exhibit Hall with Poster Viewing

11:15 amTransition to Plenary Fireside Chat

11:25 am

Riding the Next Biotech Wave—Trends in Biotech Investments, Partnering, and M&As

PANEL MODERATOR:

Jakob Dupont, MD, Executive Partner, Sofinnova Investments

Emerging Biotherapeutic Modalities, Technologies and Innovations— ADCs, radiopharmaceuticals, GLP-1, AI, machine learning, and other exciting trends to watch
Introduction to different strategies for investments, M&As, partnering, licensing etc.
Investing in platforms versus assets
Advice on funding options for start-ups, early to late stage clinical programs, etc.

PANELISTS:

Hong Xin, PhD, Senior Director, External Innovation Search & Evaluation, Johnson & Johnson

Shyam Masrani, Partner, Medicxi

Uciane Scarlett, PhD, Former Principal, MPM BioImpact

Anthony B. Barry, PhD, Executive Director, ES&I Lead, Biotherapeutics, Technologies, and Digital, Pfizer Inc.

12:25 pmLuncheon in the Exhibit Hall and Last Chance for Poster Viewing

1:55 pm

Chairperson's Remarks

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, an IQVIA business

2:00 pm

Biophysical Cartography of the Native and Human-Engineered Antibody Landscapes Quantifies the Plasticity of Antibody Developability

Victor Greiff, PhD, Associate Professor, University of Oslo and Director of Computational Immunology, IMPRINT

Developing effective monoclonal antibody (mAb) therapies requires optimizing multiple properties, known as 'developability,' to ensure they can progress through the development pipeline. However, there is limited understanding of how the characteristics (redundancy, predictability, sensitivity) of developability parameters (DPs) in human-engineered antibodies compare to those in natural antibodies. We analyzed 86 DPs across two million antibody sequences, finding key differences in the predictability and sensitivity of sequence- and structure-based DPs. Our findings reveal that human-engineered antibodies occupy a narrower space within the natural antibody landscape, offering a foundation for more precise mAb design.

2:30 pm

In silico Methods for Antibody Drug Conjugate (ADC) Biophysical Assessment and Design Optimization

Nandhini Rajagopal, PhD, Principal Scientist, Genentech Inc

An Antibody Drug Conjugate (ADC) is a targeted cancer therapy that combines a cancer-specific antibody with a cytotoxic drug to deliver treatment directly to tumors, minimizing damage to healthy cells. However, ADCs pose several developability challenges due to their hybrid modality involving both small and large molecules. In this talk, I'll explore computational techniques that transform ADC evaluation and enhancement by predicting critical biophysical properties such as stability, solubility, and viscosity, thereby streamlining the design process. I'll highlight key methodologies and real-world applications, demonstrating the benefits of advanced in silico models in ADC development.

3:00 pm

Insights from the AIntibody Benchmarking Competition

Andrew R.M. Bradbury, MD, PhD, CSO, Specifica, an IQVIA business

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, an IQVIA business

This talk highlights the integration of strategic data collection and intelligent experimental design to advance AI-powered antibody discovery and optimization. We will share updates from the AIntibody competition, a benchmarking initiative engaging the biotech, pharma, academia, and AI communities. These efforts, alongside tailored discovery campaigns for individual collaborators, aim to accelerate innovation and drive progress in early-stage therapeutic discovery.

3:30 pm

AbGPT: De novo Antibody Design via Generative Language Modeling

Amir Barati Farimani, PhD, Associate Professor, Machine Learning, Carnegie Mellon University

The adaptive immune response relies on B-cell receptors (BCRs) for pathogen neutralization, yet designing BCRs de novo remains challenging due to structural complexity. Here, we introduce Antibody Generative Pretrained Transformer (AbGPT), a fine-tuned model from a foundational protein language model. Using a tailored generation and filtering pipeline, AbGPT generated 15,000 high-quality BCR sequences, effectively capturing the intrinsic variability and conserved regions critical to antibody design.

4:00 pmNetworking Refreshment Break

4:30 pm

Scaling Foundation Models for Protein Generation

Ali Madani, PhD, Founder and CEO, Profluent Bio

Language models learn powerful representations of protein biology. We introduce a new foundation model suite that directly investigates scaling effects for protein generation. We then apply this for applications in antibody and gene editor design.

5:00 pm

AI-Driven Modeling of the Immune Receptors

Maria Rodriguez Martinez, PhD, Associate Professor, Biomedical Informatics & Data Science, Yale University

I will present recent work on interpretable deep learning models for predicting immune receptor binding specificity, including TITAN and DECODE, which jointly model T cell receptor (TCR)–epitope interactions and extract binding-associated motifs. I will also show how machine learning can help identify disease-associated TCRs in rare autoimmune conditions such as giant cell arteritis. Finally, I will discuss our ongoing efforts to integrate deep generative structural models to capture the flexibility of T and B cell receptors, enabling structure-informed therapeutic design.

5:30 pmClose of Day

Friday, May 16

7:15 amRegistration Open

7:30 amInteractive Discussions

Interactive Discussions are informal, moderated discussions, allowing participants to exchange ideas and experiences and develop future collaborations around a focused topic. Each discussion will be led by a facilitator who keeps the discussion on track and the group engaged. To get the most out of this format, please come prepared to share examples from your work, be a part of a collective, problem-solving session, and participate in active idea sharing. Please visit the Interactive Discussions page on the conference website for a complete listing of topics and descriptions.

TABLE 1:

Delivering on the AI Antibody Promise: The AIntibody Benchmarking Competition

Andrew R.M. Bradbury, MD, PhD, CSO, Specifica, an IQVIA business

M. Frank Erasmus, PhD, Head, Bioinformatics, Specifica, an IQVIA business

AI promises in antibody discovery and optimization: will they really revolutionize the field? or just another way of addressing solved problems?
What can AI do now? And where are we seeing the greatest value relative to existing technologies?
The AIntibody benchmarking competition: Did AI deliver on the AIntibody challenges?
Ideas for future benchmarking competitions

8:25 am

Chairperson's Remarks

Victor Greiff, PhD, Associate Professor, University of Oslo and Director of Computational Immunology, IMPRINT

8:30 am

Closed-Form Test Functions for Biophysical Sequence Optimization Algorithms

Samuel Stanton, PhD, Machine Learning Scientist, Prescient Design, Computational Sciences, Genentech

Many researchers are trying to replicate the success of machine learning (ML) in computer vision and natural language processing in modeling biophysical systems. As a discipline, ML heavily relies on low-cost empirical benchmarks to guide algorithm development, but available benchmarks for biophysical applications have major shortcomings. Drawing inspiration from mutational landscape models, we propose Ehrlich functions, a new class of test functions for biophysical sequence optimization algorithms.

9:00 am

Avoiding Pitfalls in ML Model Validation for Protein Design: The Importance of Data Splits

Norbert Furtmann, PhD, Head, Computational and High-Throughput Protein Engineering, Large Molecule Research, Sanofi

This presentation will explore the development of machine learning models for protein property prediction. Focusing on customized protein language models for the NANOBODY modality, it will demonstrate the implementation of a downstream thermostability predictor. Using this example, the talk will emphasize the critical importance of meaningful data splits in model training and validation.

9:30 am

Closing the Loop: Ultra-Fast Wet Lab Validation for AI-Guided Protein Design

Julian Englert, MS, Co-Founder and CEO, Adaptyv Biosystems

Adaptyv accelerates data generation for training and validating AI models with a high-throughput lab that companies can access via our software interface and API. We empower protein-design teams to validate their AI models many times faster than before, without the need to run in-house wet labs. We're partnering with dozens of companies—from techbio startups to major pharma—and generated lab data for thousands of novel proteins.

10:00 am

Antibody Design Using Generative Artificial Intelligence

Ido Calman, MS, AI Research, Absci Corp

Douglas Ganini da Silva, Director Purification & Analytics, Purification & Analytics, Absci Corp

The traditional antibody discovery process is complex and costly, inspiring investigation into AI-based approaches for improving and scaling our ability to design therapeutic antibodies. Here, we discuss Absci’s recent progress in developing and experimentally validating IgDesign2, an AI Platform for antibody inverse folding. Then, we share updates on our most advanced asset, ABS-101, a potential therapeutic antibody for inflammatory bowel disease. Together, these results inspire confidence in the potential for AI to improve biologics drug discovery.

10:30 amNetworking Coffee Break

11:00 am

Harnessing Language Models for Antibody Prioritization

André A. R. Teixeira, PhD, Senior Director, Antibody Platform, Institution for Protein Innovation

We are leveraging protein language models to prioritize antibody candidates with superior biophysical properties, streamlining the path from discovery to functional reagents. By applying these models to high-throughput data from our integrated antibody platform—including antigen design, yeast display, sequencing, and characterization—we can rapidly select leads with enhanced developability. This talk will highlight examples and comparisons between different models and languages.

11:30 am

The IMMREP TCR-Epitope Prediction Challenge: Lessons Learned and Future Directions

Justin Barton, PhD, AI Scientist, Xaira Therapeutics

The IMMREP23 competition evaluated TCR-pMHC interaction prediction methods from 53 participating teams submitting 398 sets of predictions. Results showed reasonable performance for "seen" pMHC targets but near-random performance for "unseen" peptides, highlighting an unsolved generalization challenge. Here we discuss what has been learned from detailed analysis of predictions and provide insights for improving future benchmarks by carefully addressing biases in dataset construction.

12:00 pm

Protein Binder Design and a Closed Loop Pipeline at the DTU Arena for Life Science Automation

Timothy Patrick Jenkins, PhD, Assistant Professor and Head, Data Science, DTU Bioengineering

Building on DTU’s in-house expertise in generative protein binder design, this new project at the DTU Arena for Life Science Automation (DALSA) aims to establish a fully automated closed-loop pipeline in collaboration with Novo Nordisk. By integrating AI-driven design with automated build, test, and learn stages, the pipeline will significantly accelerate the discovery and optimization of therapeutic candidates. The project showcases DALSA’s role as an open-access hub for innovation, demonstrating how cutting-edge automation can transform life science R&D and bridge academic and industrial efforts in drug development?

Closing the loop in therapeutic discovery: How automation and AI can transform binder design from months to days, enabling rapid iteration and optimisation.
Industry-academic co-development: Lessons from the DALSA–Novo Nordisk collaboration on aligning priorities, timelines, and infrastructure for mutual benefit.
From proof-of-concept to scalable platform: The challenges of moving from isolated automation steps to a fully integrated closed-loop system.
Rethinking validation bottlenecks: Why 'build' and 'test' stages—not just design—are now the limiting factors, and how automation addresses this.

12:30 pmClose of Summit