Papers
Topics
Authors
Recent
Search
2000 character limit reached

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

Published 23 Sep 2024 in cs.SE and cs.AI | (2409.14644v3)

Abstract: The advent of LLMs has significantly advanced AI in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code clustering. However, existing methods for source code embedding, including those based on LLMs, often rely on costly supervised training or fine-tuning for domain adaptation. This paper proposes a novel approach to embedding source code by combining large language and sentence embedding models. This approach attempts to eliminate the need for task-specific training or fine-tuning and to effectively address the issue of erroneous information commonly found in LLM-generated outputs. To evaluate the performance of our proposed approach, we conducted a series of experiments on three datasets with different programming languages by considering various LLMs and sentence embedding models. The experimental results have demonstrated the effectiveness and superiority of our approach over the state-of-the-art unsupervised approaches, such as SourcererCC, Code2vec, InferCode, TransformCode, and LLM2Vec. Our findings highlight the potential of our approach to advance the field of SE by providing robust and efficient solutions for source code embedding tasks.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.