Preprint | Enhancing Enzyme Activity with Mutation Combinations Guided by Few-shot Learning and Causal Inference

ABSTRACT: Designing enzyme sequences to enhance product yield represents a fundamental challenge in metabolic engineering. Here, we established a workflow that integrates computational predictions with efficient experimental iteration to obtain outsized gains in product yield. Based on causal inference and examination of published datasets from previous yield-boosting engineering efforts, we realized and ultimately experimentally confirmed that in vivo unit yield (yield/expression) can serve as an attractive surrogate for aqueous Kcat/KM when optimizing for in vivo enzyme activity. In our workflow, we initially predict activity-enhancing single mutants by calculating the binding affinities of reactive intermediates, followed by experimental investigations of unit yield. Subsequently, we predict activity-enhancing mutation combinations using a few-shot learning model we developed called Physics-Inspired Feature Selection of Protein Language Models (PIFS-PLM), which requires only 60–100 experimentally examined mutation combinations as input, and which identifies enzyme regions likely to support additional yield gains from mutation based on the “local activity landscape”. In a case study of a bicyclogermacrene (BCG) synthase, we achieve a 72-fold increase in BCG yield based on combinations of 12 individual mutations, and provide extensive crystallographic and biochemical evidence for impacts from specific mutations. Thus, optimizing for unit yield is highly efficient as an alternative to optimizing for thermostability, and our study provides a powerful workflow for the efficient engineering of high-yield enzyme variants.

For detail: https://www.researchsquare.com/article/rs-5354708/v1