[
    {
        "id": "authors:zy2xp-md064",
        "collection": "authors",
        "collection_id": "zy2xp-md064",
        "cite_using_url": "https://authors.library.caltech.edu/records/zy2xp-md064",
        "type": "article",
        "title": "mGem: Facilitated fermentation\u2014an underappreciated mode of energy conservation",
        "author": [
            {
                "family_name": "Ciemniecki",
                "given_name": "John A.",
                "orcid": "0000-0003-2789-6700"
            },
            {
                "family_name": "Glasser",
                "given_name": "Nathaniel R.",
                "orcid": "0000-0002-2833-5166",
                "clpid": "Glasser-Nathaniel-Robert"
            },
            {
                "family_name": "Gralnick",
                "given_name": "Jeffrey A.",
                "orcid": "0000-0001-9250-7770"
            },
            {
                "family_name": "Newman",
                "given_name": "Dianne K.",
                "orcid": "0000-0003-1647-1918",
                "clpid": "Newman-D-K"
            }
        ],
        "abstract": "<p>Here, we introduce a new name into the bacterial energy conservation lexicon: facilitated fermentation. This name is necessary because the more familiar terms \"respiration\" and \"fermentation\" do not adequately describe how electron balancing is coupled to energy conservation for organisms that engage in this metabolism. Facilitated fermentation is when ATP is predominantly made via a substrate-level pathway that is redox-coupled to a terminal electron acceptor reduced outside of the cell. The coupling is often facilitated by an extracellular electron shuttle or outer membrane protein that shuttles electrons from the electron transport chain to the extracellular acceptor. Naming facilitated fermentation is timely because it has recently been demonstrated to support both growth and non-growth states in bacteria that are important in nature and disease. We hope that the introduction of this term will inspire future research to evaluate the extent of facilitated fermentation's prevalence and impact in the microbial world and beyond.</p>",
        "doi": "10.1128/mbio.02494-25",
        "pmcid": "PMC13059778",
        "issn": "2150-7511",
        "publisher": "American Society for Microbiology",
        "publication": "mBio",
        "publication_date": "2026-04-08",
        "series_number": "4",
        "volume": "17",
        "issue": "4",
        "pages": "e02494-25"
    },
    {
        "id": "authors:3h7zz-szy25",
        "collection": "authors",
        "collection_id": "3h7zz-szy25",
        "cite_using_url": "https://authors.library.caltech.edu/records/3h7zz-szy25",
        "type": "monograph",
        "title": "Discovery of a phenazine\u2013thiol conjugase from sparse data using genome-informed machine learning",
        "author": [
            {
                "family_name": "Shan",
                "given_name": "Xiaoyu",
                "orcid": "0000-0001-9631-3244",
                "clpid": "Shan-Xiaoyu"
            },
            {
                "family_name": "Trindade",
                "given_name": "In\u00eas B.",
                "orcid": "0000-0002-6746-8455",
                "clpid": "Trindade-Ines-B"
            },
            {
                "family_name": "Glasser",
                "given_name": "Nathaniel R.",
                "orcid": "0000-0002-2833-5166",
                "clpid": "Glasser-Nathaniel-Robert"
            },
            {
                "family_name": "Thalhammer",
                "given_name": "Korbinian O.",
                "orcid": "0000-0001-6882-8611",
                "clpid": "Thalhammer-Korbinian-O"
            },
            {
                "family_name": "Scurria",
                "given_name": "Matthew",
                "orcid": "0009-0001-0598-2133",
                "clpid": "Scurria-Matthew"
            },
            {
                "family_name": "Mora",
                "given_name": "Ariane",
                "orcid": "0000-0003-1331-8192"
            },
            {
                "family_name": "Conway",
                "given_name": "Stuart J.",
                "orcid": "0000-0002-5148-117X"
            },
            {
                "family_name": "Newman",
                "given_name": "Dianne K.",
                "orcid": "0000-0003-1647-1918",
                "clpid": "Newman-D-K"
            }
        ],
        "abstract": "<p>Machine learning has enabled powerful biological discoveries using models trained on large datasets. However, for many important biological questions, such as identifying enzymes that transform understudied substrates, sparsity of training data is often a major bottleneck. Here, using phenazine natural products as a case study, we show that integrating genome-informed data augmentation with contrastive learning in protein language space enables identification of phenazine-interacting proteins starting from only 14 known phenazine modifying sequences. Applying this framework led to the discovery of PTC (Phenazine-Thiol Conjugase), the first enzyme known to catalyze phenazine thioconjugation, a phenazine modification reaction long observed but previously presumed to occur only through non-enzymatic chemistry. In silico simulation and experimental measurements demonstrate that PTC binds to both phenazine and glutathione as substrates. Recombinant expression and biochemical characterization reveal that PTC promotes glutathione-dependent modification of phenazines, yielding distinct reaction outcomes that depend on substrate identity. Although thiol-conjugated phenazine products exhibit reduced toxicity to bacterial cells, deletion of the gene encoding PTC does not confer a strong fitness disadvantage, illustrating how direct learning of sequences can uncover relevant enzymes that might evade phenotype-based genetic screens. Together, these results demonstrate that coupling comparative genomics with protein machine learning can convert &ldquo;small data&rdquo; typically outside the scope of machine learning into actionable predictive power, thereby facilitating enzyme discovery.</p>\n<div class=\"subsection\">\n<p><strong>Significance</strong> Machine learning excels when large, well-labeled datasets are available, yet many biologically important problems lack sufficient experimental data to support such approaches to discovery. This limitation is particularly acute for identifying enzymes acting on rare or understudied substrates. Here, we show that genomic organization can be leveraged as an additional source of biological information to address data sparsity. Starting with only 14 enzymes experimentally shown to modify phenazines, we developed a model identifying phenazine-interacting enzymes by integrating genome-informed data augmentation with protein machine learning. Guided by the model, we discovered the first enzyme known to catalyze thioconjugation modifications of phenazines, demonstrating a simple yet powerful strategy for extracting predictive insight from sparse biological knowledge.</p>\n</div>",
        "doi": "10.64898/2026.03.05.709892",
        "publisher": "bioRxiv",
        "publication_date": "2026-03-06"
    }
]