Explain ProCALM’s successful generation for EC 7.1.1.2 and 7.1.1.9
Ascertain why ProCALM, a ProGen2-based protein language model finetuned with conditional adapters for joint conditioning on enzyme commission (EC) number and taxonomy, successfully generated sequences corresponding to EC 7.1.1.2 and EC 7.1.1.9 even when bacterial sequences for these EC classes were held out during training; identify the model- and data-related factors that enable this outcome.
References
7.1.1.2 and 7.1.1.9 are transmembrane enzymes part of large complexes, but it is not clear why these particular functions were successfully generated.
— Function-Guided Conditional Generation Using Protein Language Models with Adapters
(2410.03634 - Yang et al., 2024) in Discussion