Hardware specialization vs. cross-platform backbone trade-off
Determine whether it is preferable, for on-device deployment of the Liquid Foundation Models (LFM2), to specialize model variants and quantization schemes to specific hardware targets (e.g., particular CPUs, NPUs, or GPUs) or to maintain a single cross-platform backbone, by characterizing the impact of each approach on time-to-first-token, decode latency, prefill throughput, peak memory, and downstream quality under edge constraints.
References
LFM2 also runs competitively on modern NPUs and GPUs, but these accelerators were not central to the hardware-in-the-loop search, and we do not claim the resulting architectures or quantization schemes are optimal for large-batch server settings or any particular accelerator family. The trade-off between specializing model variants for specific hardware targets versus maintaining a single cross-platform backbone remains open.