Quantify the number of true ac4C sites represented in the positive-sample sets

Ascertain the exact number of true N4-acetylcytidine (ac4C) modification sites present within the collected positive-sample sets derived from acRIP-seq peak and sub-peak data for mRNA, recognizing that these sets contain true ac4C sites but the total quantity is currently unknown.

Background

The study constructs positive-sample sets from acRIP-seq peak and sub-peak regions, which are known to include true ac4C modifications but lack base-level resolution. As a result, the dataset includes true signals without a clear count of how many distinct ac4C sites are represented.

This uncertainty complicates evaluation, validation, and interpretation of computational models and motif analyses, motivating a need to quantify the exact number of true ac4C sites encompassed by the positive sets.

References

However, we can be certain that these positive samples contain true ac4C modification sites, even though their exact quantity remains unknown.

Language-Inspired Modeling Reveals Redundant Encoding of N4-acetylcytidine(ac4C) Modifications in mRNA  (2503.23497 - Yang et al., 30 Mar 2025) in Methods — Motif Distribution