- The paper proposes a novel retrieval-augmented generation framework that integrates static and dynamic vector stores to enhance real-time industrial community question answering.
- It employs centroid-based memory mechanisms and adaptive temperature tuning to efficiently handle high- and low-quality QA pairs while reducing latency up to 23.3%.
- Experimental results on datasets like MSQA, ProCQA, and PolarDBQA show significant improvements in metrics such as BLEU, ROUGE-L, and cosine similarity, demonstrating its scalable impact.
The paper "ComRAG: Retrieval-Augmented Generation with Dynamic Vector Stores for Real-time Community Question Answering in Industry" presents a framework that enhances real-time industrial community question answering (CQA) by integrating static domain knowledge with dynamic community interaction history. This system addresses the shortcomings of existing methods, particularly their lack of effective dynamic QA context integration and suitable memory mechanisms required for industrial deployment. ComRAG proposes a novel approach of combining retrieval-augmented generation, which facilitates efficient and accurate question answering by utilizing both static and dynamic knowledge stores.
Architecture and Methodology
ComRAG Architecture
The architecture of ComRAG includes a static knowledge vector store and two dynamic CQA vector stores—one for high-quality and another for low-quality QA pairs. These stores are maintained using a centroid-based memory mechanism to manage memory efficiently. This design allows for scalable real-time CQA, addressing the challenges posed by the continuous influx of questions and the variable quality of responses.
Figure 1: ComRAG architecture for real-time CQA. The system integrates a static knowledge vector store and two dynamic CQA vector stores (high- and low-quality), with the latter managed via a centroid-based memory mechanism.
Retrieval and Generation Strategies
ComRAG employs three strategic paths for question answering:
- Directly reusing answers from high-quality QA pairs.
- Generating responses while referencing high-quality content.
- Generating responses while avoiding low-quality content based on static knowledge.
An adaptive temperature tuning mechanism is employed to regulate the diversity and consistency of the generated responses. This tuning is based on the variance of historical answer quality scores, ensuring balance between exploration and reliability.
Dynamic Vector Store Design
The dynamic CQA vector stores handle the continuous growth of historical QA pairs through a centroid-based clustering approach. This mechanism involves clustering similar questions and maintaining only the centroids to prevent memory overflow, ensuring efficient retrieval. The system updates these stores by evaluating new QA pairs and deciding their inclusion based on quality thresholds.
Experimental Setup and Results
Dataset and Evaluation
The experiments are conducted on datasets like MSQA, ProCQA, and PolarDBQA, covering diverse domains from Microsoft's technologies to database systems like PolarDB. Each dataset utilizes an external knowledge base for retrieval, simulating real-time QA scenarios where questions arrive sequentially.
ComRAG's performance is evaluated using a combination of lexical and semantic metrics such as BLEU, ROUGE-L, BERT-Score, and cosine similarity. Additionally, Avg Time is used to measure processing efficiency. The results demonstrate that ComRAG outperforms baseline methods across all datasets, achieving significant improvements in metrics like SIM with gains up to 25.9% and latency reductions by up to 23.3%.
Figure 2: Ablation study on PolarDBQA under a 10-round iterative evaluation setting.
Impact of Hyperparameters
Ablation studies reveal the sensitivity of performance to various hyperparameters such as similarity threshold τ, reuse threshold δ, and quality threshold γ. Adjusting these parameters shows their critical role in balancing efficiency, memory usage, and answer quality.
Real-world Implications and Future Directions
Scalability and Adaptability
ComRAG's integration of static and dynamic knowledge components makes it particularly well-suited for industrial applications where CQA systems must rapidly adapt to new information and user interactions. Its modular design allows for scalable deployment, capable of accommodating varying computational environments and domain-specific requirements.
Limitations and Future Work
Though ComRAG effectively balances efficiency and quality, it relies on static thresholds for clustering and does not factor in topic relevance or usage frequency, which could further optimize memory management. Additionally, the system's reliance on rule-based query strategies may benefit from incorporating machine learning techniques to enhance routing robustness.
Conclusion
ComRAG offers a robust framework for real-time community question answering, leveraging retrieval-augmented generation to augment response quality and system adaptability. Its dynamic-vector-store-driven architecture provides an efficient solution for managing the unique challenges associated with industrial-scale CQA, setting a foundation for future improvements in interactive AI systems.