An Empirical Study of Multi-Agent RAG for Real-World University Admissions Counseling

Published 15 Jul 2025 in cs.SE and cs.IR | (2507.11272v1)

Abstract: This paper presents MARAUS (Multi-Agent and Retrieval-Augmented University Admission System), a real-world deployment of a conversational AI platform for higher education admissions counseling in Vietnam. While LLMs offer potential for automating advisory tasks, most existing solutions remain limited to prototypes or synthetic benchmarks. MARAUS addresses this gap by combining hybrid retrieval, multi-agent orchestration, and LLM-based generation into a system tailored for real-world university admissions. In collaboration with the University of Transport Technology (UTT) in Hanoi, we conducted a two-phase study involving technical development and real-world evaluation. MARAUS processed over 6,000 actual user interactions, spanning six categories of queries. Results show substantial improvements over LLM-only baselines: on average 92 percent accuracy, hallucination rates reduced from 15 precent to 1.45 percent, and average response times below 4 seconds. The system operated cost-effectively, with a two-week deployment cost of 11.58 USD using GPT-4o mini. This work provides actionable insights for the deployment of agentic RAG systems in low-resource educational settings.

Abstract PDF Upgrade to Chat

Summary

The paper introduces MARAUS, a multi-agent RAG system that integrates hybrid retrieval and LLM re-ranking to enhance admissions counseling.
It achieved 98.5% precision and reduced hallucinations to 1.45% while processing over 6,000 user queries at a Vietnamese university.
MARAUS offers a scalable, cost-effective solution that minimizes repetitive query handling compared to traditional LLM-only models.

Multi-Agent RAG for University Admissions Counseling

This paper introduces MARAUS (Multi-Agent and Retrieval-Augmented University Admission System), a conversational AI platform designed for higher education admissions counseling in Vietnam. The study addresses the gap between the potential of LLMs for automating advisory tasks and the limitations of existing prototype solutions by deploying a system that integrates hybrid retrieval, multi-agent orchestration, and LLM-based generation. The system was evaluated in a real-world setting at the University of Transport Technology (UTT) in Hanoi, processing over 6,000 user interactions across six query categories.

The paper reviews the use of LLMs in educational Q&A systems, highlighting the application of RAG-based systems in managing institutional queries and enhancing university application support. Prior works, such as a system combining GPT-3.5 with LlamaIndex [chen_facilitating_2024] and HICON AI [singla_hicon_2024], which tailors college admissions advice using applicant profiles and resume analysis, are discussed. A knowledge graph approach in the Vietnamese admission context [bui_cross-data_2024] is also mentioned. The authors position MARAUS as an advancement over existing systems by adopting novel strategies based on current GPT models.

Figure 1: An overview of the MARAUS system.

The paper also details various retrieval strategies for RAG systems, categorizing them into keyword-based, semantic vector-based, and hybrid methods. Keyword-based retrieval, like BM25 [robertson_probabilistic_2009], is noted for its efficiency but struggles with synonyms and informal language. Semantic retrieval methods, using models like SBERT [reimers_sentence-bert_2019] and MPNet [song_mpnet_2020], address these limitations but can retrieve contextually irrelevant passages. The hybrid retrieval strategy employed in MARAUS combines FAISS and BM25 with a GPT cross-encoder for re-ranking, balancing precision and recall.

Research Design and Methodology

The research employs an in-depth single-case study design [runeson_guidelines_2008] to understand the extent to which RAG grounded in LLMs can support real-world admission activities. The study was conducted in two phases: (1) technological experimentation and (2) process experimentation, involving real users interacting with an AI conversational system.

The case study focuses on the University of Transport Technology (UTT) in Hanoi, Vietnam, which processes over 10,000 applicant inquiries annually. The existing inquiry-handling process at UTT is decentralized and manually intensive, suffering from high-volume repetition of FAQ-type queries, lack of integrated knowledge management, and inability to personalize responses. The authors address potential threats to validity, such as the operationalization of chatbot performance and the generalizability of findings. They mitigate these threats through standard experimental metrics, detailed documentation of configuration parameters, and public release of the codebase and test dataset.

MARAUS System Architecture and Implementation

The core of MARAUS is a multi-agent coordinator that classifies incoming queries into distinct processing pipelines: information search agent, score calculation agent, recommendation agent, and a general query agent.

All textual data undergoes rigorous preprocessing, including removing boilerplate content, HTML artifacts, and near-duplicates. Text normalization includes lowercasing, diacritic normalization, and tokenization via \textsf{VnCoreNLP}. PII is redacted using regular expression patterns, and entropy-based filters detect nonsensical input. Document corpora are segmented into overlapping context windows, embedded with the XAI-encode-all-mpnet-base-v2 model, and stored in a FAISS IndexFlatIP.

Figure 2: Systems' accuracy.

The hybrid RAG pipeline combines dense vector retrieval and sparse keyword search. Semantic retrieval uses Xenova/all-mpnet-base-v2 to embed user queries, and FAISS retrieves the top- $k$ similar FAQ entries and document chunks. An ElasticSearch 8.11 BM25 index runs in parallel for keyword retrieval. The union of BM25 and FAISS outputs is passed to a hybrid re-ranking stage, where GPT-4o mini acts as a zero-shot cross-encoder, assigning relevance scores to candidate passages. A custom post-processor enforces citation integrity, regenerating answers lacking passage citations with penalized decoding parameters to mitigate hallucinations.

Experimental Results and Findings

The MARAUS system was evaluated across multiple configurations: LLM-only, RAG+Re-rank, and Hybrid RAG. The Hybrid RAG pipeline outperformed baselines across all metrics, achieving near-perfect precision (98.5%) and significantly reducing the hallucination rate to 1.45%, while maintaining a response time of 3.75 seconds. The data collected from a two-week deployment of MARAUS showed that the cost for actual running of the bot is 11.58 USD with GPT-4o mini and accuracy consistently ranged from 87\% to 94\%. User satisfaction averaged 4.5/5, with officers noting a reduction in repetitive question handling.

Discussion and Comparison to Existing Systems

The paper compares MARAUS to existing systems like HiCON [singla_hicon_2024] and a ChatGPT-3.5-based system [chen_facilitating_2024], noting that MARAUS offers a cost-effective and scalable solution by leveraging multi-agent strategies and off-the-shelf LLM components. In contrast to a knowledge graph-based approach [bui_cross-data_2024], MARAUS enables real-world adoption without the overhead of maintaining complex knowledge graphs and the system is tested with a larger and more diverse dataset.

Conclusion

The study concludes that MARAUS, an operational, multi-agent RAG-based chatbot, outperforms traditional LLM-only approaches and prior RAG-based systems in university admissions counseling. The integration of hybrid retrieval, LLM-based re-ranking, and multi-agent task specialization contributed significantly to these outcomes. Future work will extend MARAUS to other universities and explore broader domains such as financial aid advising or career counseling.

Markdown Report Issue