MobileRAG: Mobile Retrieval-Augmented Generation

Updated 1 February 2026

MobileRAG is a retrieval-augmented generation system designed for resource-constrained mobile devices, integrating on-device vector retrieval and GUI-centric knowledge bases for efficient automation.
Key innovations include the EcoVector algorithm and Selective Content Reduction, which optimize memory usage and CPU performance while ensuring offline operation and user privacy.
Empirical benchmarks show that MobileRAG significantly outperforms traditional methods in task success rates and adaptability across single-agent and hierarchical multi-agent mobile pipelines.

MobileRAG is a class of Retrieval-Augmented Generation (RAG) systems specifically tailored for mobile and resource-constrained environments, addressing limitations of conventional RAG methods regarding memory, energy, and real-time interaction with mobile graphical interfaces. MobileRAG encompasses a range of system architectures—on-device vector retrievers, hierarchical dual-level augmentation, and GUI-centric knowledge bases—deployed in single-agent and multi-agent mobile automation pipelines. Its unifying goal is to empower mobile agents to robustly execute complex tasks by leveraging augmented, contextually relevant knowledge and memory, while maintaining stringent requirements for efficiency and privacy.

1. Core Components and On-Device RAG Advancements

MobileRAG was first formulated to address inefficiencies in adapting RAG to mobile devices, where server-grade memory and compute are unavailable. Key technical innovations are:

EcoVector Algorithm: A memory- and CPU-efficient vector search method, EcoVector partitions and partially loads compressed index data, drastically reducing memory footprint and CPU usage on-device. This solution builds on the Product Quantization (PQ) framework of Jégou et al., splitting vectors into subspaces, quantizing with local codebooks, and storing only compact indices. In MobileRAG’s memory formula:

$\mathtt{Mem}_{\mathrm{IVFPQ}} = N_c \cdot 4d + 8N + N \cdot (M_{pq} \cdot n_{bits}/8) + 2^{n_{bits}} \cdot 4d$

where $N_c$ : #centroids, $d$ : dimension, $N$ : #vectors, $M_{pq}$ : PQ groups, $n_{bits}$ : PQ coding bits (Park et al., 1 Jul 2025).

Selective Content Reduction (SCR): SCR filters out text irrelevant to the user’s query or the model’s reasoning needs, ensuring that the LLM input remains within token and memory constraints without impacting accuracy.

Offline Operation and Privacy: By supporting all retrieval and generation steps on-device, MobileRAG ensures user data remains local, with privacy and offline viability prioritized above cloud-based speedups (Park et al., 1 Jul 2025).

2. Retrieval-Augmented Generation in Mobile Agents

MobileRAG serves as the RAG engine for a variety of mobile agent architectures:

Single-Agent RAG-Enhanced Pipelines

MobileRAG acts as a modular subsystem in frameworks such as AppAgent v2:

Structured KB Construction: As mobile agents explore GUIs (using AndroidController or similar), they generate function-descriptions and element-level records, which are embedded and indexed as a vector KB.
Query-Driven Context Retrieval: On each step, the agent forms a natural language query representing the current goal and context, computes its embedding, and retrieves the top-K element descriptions by cosine similarity.
Prompt Integration: The retrieved contexts are injected into an LLM prompt, using chain-of-thought (CoT) templates engineered to elicit explicit, interpretable UI action sequences (TapButton, Swipe, etc.).
Action Execution and Feedback Loop: Actions are executed on the device, and new discoveries or state changes update the KB for future retrievals.

This architecture enables continual refinement and dynamic adaptation to changing UIs and workflows (Li et al., 2024).

Hierarchical Multi-Agent and Dual-Level Augmentation

Mobile-Agent-RAG introduces a hierarchical dual-level retrieval system for long-horizon, cross-app automation:

Manager-RAG: For high-level planning, Manager-RAG retrieves full-plan exemplars from a curated knowledge base indexed by open-domain task instructions. This reduces strategic hallucination.
Operator-RAG: For subtask execution, Operator-RAG retrieves fine-grained, app-specific action exemplars (UI states, screenshots, ground-truth atomic actions) using subtask embeddings.
Workflow: At each timestep, the Perceptor extracts screenshot features, Manager-RAG plans the next subgoal, Operator-RAG provides contextual execution grounding, and reflective modules update memory. No LLM fine-tuning is required; effectiveness is realized through KB construction and prompt engineering (Zhou et al., 15 Nov 2025).

3. Memory, Context, and Knowledge Base Strategies

MobileRAG frameworks universally rely on dense, retrieval-oriented KBs:

KB Type	Typical Content	Indexing/Embedding Approach
Structured GUI	{element_id, label, parser_info, coords, visual_feats, function_desc}	Text embedding (e.g., Ada-002); FAISS/Annoy vector index (Li et al., 2024)
Plan Exemplar	{(task instruction, plan steps)}	Contriever-MSMARCO or BGE-small (Zhou et al., 15 Nov 2025, Loo et al., 4 Sep 2025)
Past Memory	{user query, action sequence}	Cosine similarity in MemRAG store (Loo et al., 4 Sep 2025)

Dynamic Update: KBs support runtime additions and corrections—new GUI elements, new apps, and successful routines—ensuring adaptation to evolving device states and UI versions.
Partitioned Indexing: Per-app or per-version partitions prevent contamination from unrelated UIs, critical on platforms where app appearances and feature layouts change frequently.
Memory Management: To prevent memory bloat, older or seldom-used entries are pruned, and hierarchical or summarization strategies are under investigation (Loo et al., 4 Sep 2025).

4. Retrieval, Generation, and Execution Pipeline

The typical MobileRAG query resolution pipeline consists of:

Memory Retrieval (MemRAG): For repeated or near-duplicate queries, a memory store directly reuses historical action sequences if similarity (via cosine metric) is above threshold.
External and Local Retrieval: If memory fails, agents leverage InterRAG (web knowledge) and LocalRAG (on-device apps) in parallel. Each retrieval computes

$s(q, d) = \frac{E(q)\cdot E(d)}{\|E(q)\|\|E(d)\|}$

to score candidates.

Prompt Assembly and LLM Planning: A prompt is constructed integrating the original query, recovered plans/snippets/context, and instructions for UI interaction, then passed to the LLM.
Action Execution and Iterative Feedback: Actions are executed stepwise; success propagates to memory storage for future shortcutting (Loo et al., 4 Sep 2025).

The dual-level retrieval (Manager-RAG and Operator-RAG) ensures both global and local grounding. Operators receive immediate action exemplars tightly coupled with perceived UI states; the manager shapes high-level strategy using human-verified plans (Zhou et al., 15 Nov 2025).

5. Benchmarks, Experimental Results, and Comparative Analysis

MobileRAG systems have undergone extensive benchmarking across standard and new evaluation protocols:

AppAgent Benchmarks: On DroidTask (GPT-4 backbone), MobileRAG achieves a 77.8% completion rate, outperforming LLM-only (31.6%), AutoDroid w/o memory (63.5%), and AutoDroid w/ memory (71.3%). On AppAgent's internal tasks, MobileRAG reaches 93.3% SR under manual exploration (Li et al., 2024).
MobileRAG-Eval: On complex real-world mobile and multi-app workflows (Pixel 9, GPT-4o), MobileRAG delivers 80.0% task success, a 10.3% improvement over the previous state-of-the-art (Mobile-Agent-E). All-component ablations confirm that removal of LocalRAG or InterRAG sharply degrades performance (Loo et al., 4 Sep 2025).
KG-Android-Bench and Cross-Platform Transfer: KG-RAG (a MobileRAG instantiation for KG-driven GUIs) boosts success rates by 8.9% over AutoDroid on DroidTask, and shows substantial cross-platform transfer utility (e.g., +40% SR on Weibo-web) (Guan et al., 30 Aug 2025).
Mobile-Eval-RAG: Mobile-Agent-RAG achieves a completion rate of 75.7% and operator accuracy of 90.1% on multi-app automation with Gemini-1.5-Pro, beating Mobile-Agent-E by 11.0 and 16.0 percentage points respectively. Efficiency (CR/Steps) doubles relative to baseline (Zhou et al., 15 Nov 2025).

6. Practical Deployment, Limitations, and Prospects

Deployment Guidance: Empirical findings demonstrate that initial manual KB seeding accelerates adaptation, especially for unfamiliar apps. Multi-modal parsing (fusing structured fields, OCR, visual cues) increases recall on non-standard UIs.

Tradeoffs: The most significant practical bottleneck is the initial UTG/KB extraction time, but >90% of SR improvements are reached within 4 hours of app exploration, and index prefetching helps meet stringent latency requirements (Guan et al., 30 Aug 2025).

Limitations:

Dependency on external APIs (cloud search, app stores) introduces latency and possible availability constraints.
Static similarity thresholds for retrieval may require end-user tuning.
Flat memory stores may not scale indefinitely; ongoing work seeks hierarchical or summary-based structures.

Future Directions: Areas of active research include on-device approximate nearest-neighbor retrieval, multi-modal memory constructions (text, screenshots, actions), and continual on-device fine-tuning for personalized adaptation (Loo et al., 4 Sep 2025).

MobileRAG now defines the state of the art in embedding-based, real-time retrieval-augmented generation for mobile agents, integrating advances in vector search, retrieval-centric knowledge management, and prompt-based reasoning. Its broad adoption across single and hierarchical multi-agent frameworks signals a robust paradigm shift for on-device automation, with high efficiency, resilience to UI change, and near-human-level operation success on multi-step, multi-app tasks (Park et al., 1 Jul 2025, Li et al., 2024, Guan et al., 30 Aug 2025, Loo et al., 4 Sep 2025, Zhou et al., 15 Nov 2025).