General solution for capturing context-specific signals in multitask seq2func models

Develop a general training and optimization approach that reliably captures context-specific regulatory signals in multitask sequence-to-function models trained across multiple assays and cell types, mitigating bias toward broadly shared features and improving modeling of differential regulation among closely related cell types.

Background

Multitask seq2func models share supervision across experiments and cell types, which helps learn broadly predictive features but can bias optimization toward common regulatory programs (e.g., housekeeping), underrepresenting context-specific features. While targeted fine-tuning, upsampling, and focal-style losses can partially recover these signals, these fixes are case-specific and do not constitute a general solution.

References

Although targeted fine-tuning, data upsampling, and focal-style losses can partially recover context-specific signals, a general solution remains an open challenge.

Toward Interpretable and Generalizable AI in Regulatory Genomics  (2602.01230 - Nagai et al., 1 Feb 2026) in Section “Data and Task Design Shape Seq2func Models”