Register Dispersion: Reducing the Footprint of the Vector Register File in Vector Engines of Low-Cost RISC-V CPUs

Published 21 Mar 2025 in cs.AR | (2503.17333v2)

Abstract: The deployment of Machine Learning (ML) applications at the edge on resource-constrained devices has accentuated the need for efficient ML processing on low-cost processors. While traditional CPUs provide programming flexibility, their general-purpose architecture often lacks the throughput required for complex ML models. The augmentation of a RISC-V processor with a vector unit can provide substantial data-level parallelism. However, increasing the data-level parallelism supported by vector processing would make the Vector Register File (VRF) a major area consumer in ultra low-cost processors, since 32 vector registers are required for RISC-V Vector ISA compliance. This work leverages the insight that many ML vectorized kernels require a small number of active vector registers, and proposes the use of a physically smaller VRF that dynamically caches only the vector registers currently accessed by the application. This approach, called Register Dispersion, maps the architectural vector registers to a smaller set of physical registers. The proposed ISA-compliant VRF is significantly smaller than a full-size VRF and operates like a conventional cache, i.e., it only stores the most recently accessed vector registers. Essential registers remain readily accessible within the compact VRF, while the others are offloaded to the cache/memory sub-system. The compact VRF design is demonstrated to yield substantial area and power savings, as compared to using a full VRF, with no or minimal impact on performance. This effective trade-off renders the inclusion of vector units in low-cost processors feasible and practical.