ComRAG: Retrieval-Augmented Generation with Dynamic Vector Stores for Real-time Community Question Answering in Industry

Published 26 Jun 2025 in cs.CL and cs.AI | (2506.21098v2)

Abstract: Community Question Answering (CQA) platforms can be deemed as important knowledge bases in community, but effectively leveraging historical interactions and domain knowledge in real-time remains a challenge. Existing methods often underutilize external knowledge, fail to incorporate dynamic historical QA context, or lack memory mechanisms suited for industrial deployment. We propose ComRAG, a retrieval-augmented generation framework for real-time industrial CQA that integrates static knowledge with dynamic historical QA pairs via a centroid-based memory mechanism designed for retrieval, generation, and efficient storage. Evaluated on three industrial CQA datasets, ComRAG consistently outperforms all baselines--achieving up to 25.9% improvement in vector similarity, reducing latency by 8.7% to 23.3%, and lowering chunk growth from 20.23% to 2.06% over iterations.

Abstract PDF Upgrade to Chat

Summary

The paper proposes a novel retrieval-augmented generation framework that integrates static and dynamic vector stores to enhance real-time industrial community question answering.
It employs centroid-based memory mechanisms and adaptive temperature tuning to efficiently handle high- and low-quality QA pairs while reducing latency up to 23.3%.
Experimental results on datasets like MSQA, ProCQA, and PolarDBQA show significant improvements in metrics such as BLEU, ROUGE-L, and cosine similarity, demonstrating its scalable impact.

ComRAG: Retrieval-Augmented Generation for Real-time Community Question Answering

The paper "ComRAG: Retrieval-Augmented Generation with Dynamic Vector Stores for Real-time Community Question Answering in Industry" presents a framework that enhances real-time industrial community question answering (CQA) by integrating static domain knowledge with dynamic community interaction history. This system addresses the shortcomings of existing methods, particularly their lack of effective dynamic QA context integration and suitable memory mechanisms required for industrial deployment. ComRAG proposes a novel approach of combining retrieval-augmented generation, which facilitates efficient and accurate question answering by utilizing both static and dynamic knowledge stores.

Architecture and Methodology

ComRAG Architecture

The architecture of ComRAG includes a static knowledge vector store and two dynamic CQA vector stores—one for high-quality and another for low-quality QA pairs. These stores are maintained using a centroid-based memory mechanism to manage memory efficiently. This design allows for scalable real-time CQA, addressing the challenges posed by the continuous influx of questions and the variable quality of responses.

Figure 1: ComRAG architecture for real-time CQA. The system integrates a static knowledge vector store and two dynamic CQA vector stores (high- and low-quality), with the latter managed via a centroid-based memory mechanism.

Retrieval and Generation Strategies

ComRAG employs three strategic paths for question answering:

Directly reusing answers from high-quality QA pairs.
Generating responses while referencing high-quality content.
Generating responses while avoiding low-quality content based on static knowledge.

An adaptive temperature tuning mechanism is employed to regulate the diversity and consistency of the generated responses. This tuning is based on the variance of historical answer quality scores, ensuring balance between exploration and reliability.

Dynamic Vector Store Design

The dynamic CQA vector stores handle the continuous growth of historical QA pairs through a centroid-based clustering approach. This mechanism involves clustering similar questions and maintaining only the centroids to prevent memory overflow, ensuring efficient retrieval. The system updates these stores by evaluating new QA pairs and deciding their inclusion based on quality thresholds.

Experimental Setup and Results

Dataset and Evaluation

The experiments are conducted on datasets like MSQA, ProCQA, and PolarDBQA, covering diverse domains from Microsoft's technologies to database systems like PolarDB. Each dataset utilizes an external knowledge base for retrieval, simulating real-time QA scenarios where questions arrive sequentially.

Performance Metrics

ComRAG's performance is evaluated using a combination of lexical and semantic metrics such as BLEU, ROUGE-L, BERT-Score, and cosine similarity. Additionally, Avg Time is used to measure processing efficiency. The results demonstrate that ComRAG outperforms baseline methods across all datasets, achieving significant improvements in metrics like SIM with gains up to 25.9% and latency reductions by up to 23.3%.

Figure 2: Ablation study on PolarDBQA under a 10-round iterative evaluation setting.

Impact of Hyperparameters

Ablation studies reveal the sensitivity of performance to various hyperparameters such as similarity threshold $\tau$ , reuse threshold $\delta$ , and quality threshold $\gamma$ . Adjusting these parameters shows their critical role in balancing efficiency, memory usage, and answer quality.

Real-world Implications and Future Directions

Scalability and Adaptability

ComRAG's integration of static and dynamic knowledge components makes it particularly well-suited for industrial applications where CQA systems must rapidly adapt to new information and user interactions. Its modular design allows for scalable deployment, capable of accommodating varying computational environments and domain-specific requirements.

Limitations and Future Work

Though ComRAG effectively balances efficiency and quality, it relies on static thresholds for clustering and does not factor in topic relevance or usage frequency, which could further optimize memory management. Additionally, the system's reliance on rule-based query strategies may benefit from incorporating machine learning techniques to enhance routing robustness.

Conclusion

ComRAG offers a robust framework for real-time community question answering, leveraging retrieval-augmented generation to augment response quality and system adaptability. Its dynamic-vector-store-driven architecture provides an efficient solution for managing the unique challenges associated with industrial-scale CQA, setting a foundation for future improvements in interactive AI systems.

Markdown Report Issue