The Immutable Tensor Architecture: A Pure Dataflow Approach for Secure, Energy-Efficient AI Inference
Abstract: The deployment of LLMs on consumer edge devices is throttled by the "Memory Wall" -- the prohibitive bandwidth and energy cost of fetching gigabytes of model weights from DRAM for every token generated. Current architectures (GPUs, NPUs) treat model weights as mutable software data, incurring massive energy penalties to maintain general-purpose programmability. We propose The Immutable Tensor Architecture (ITA), a paradigm shift that treats model weights not as data, but as physical circuit topology. By encoding parameters directly into the metal interconnects and logic of mature-node ASICs (28nm/40nm), ITA eliminates the memory hierarchy entirely. We present a "Split-Brain" system design where a host CPU manages dynamic KV-cache operations while the ITA ASIC acts as a stateless, ROM-embedded dataflow engine.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.