Representation Learning for Stack Overflow Posts: How Far are We?
Abstract: The tremendous success of Stack Overflow has accumulated an extensive corpus of software engineering knowledge, thus motivating researchers to propose various solutions for analyzing its content.The performance of such solutions hinges significantly on the selection of representation model for Stack Overflow posts. As the volume of literature on Stack Overflow continues to burgeon, it highlights the need for a powerful Stack Overflow post representation model and drives researchers' interest in developing specialized representation models that can adeptly capture the intricacies of Stack Overflow posts. The state-of-the-art (SOTA) Stack Overflow post representation models are Post2Vec and BERTOverflow, which are built upon trendy neural networks such as convolutional neural network (CNN) and Transformer architecture (e.g., BERT). Despite their promising results, these representation methods have not been evaluated in the same experimental setting. To fill the research gap, we first empirically compare the performance of the representation models designed specifically for Stack Overflow posts (Post2Vec and BERTOverflow) in a wide range of related tasks, i.e., tag recommendation, relatedness prediction, and API recommendation. To find more suitable representation models for the posts, we further explore a diverse set of BERT-based models, including (1) general domain LLMs (RoBERTa and Longformer) and (2) LLMs built with software engineering-related textual artifacts (CodeBERT, GraphCodeBERT, and seBERT). However, it also illustrates the ``No Silver Bullet'' concept, as none of the models consistently wins against all the others. Inspired by the findings, we propose SOBERT, which employs a simple-yet-effective strategy to improve the best-performing model by continuing the pre-training phase with the textual artifact from Stack Overflow.
- Classifying stack overflow posts on API issues. In 2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER). IEEE, 244–254.
- Unified Pre-training for Program Understanding and Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June 6-11, 2021, Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tür, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou (Eds.). Association for Computational Linguistics, 2655–2668. https://doi.org/10.18653/v1/2021.naacl-main.211
- Longformer: The Long-Document Transformer. CoRR abs/2004.05150 (2020). arXiv:2004.05150 https://arxiv.org/abs/2004.05150
- Automatically classifying posts into question categories on stack overflow. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). IEEE, 211–21110.
- BIKER: a tool for Bi-information source based API method recommendation. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Marlon Dumas, Dietmar Pfahl, Sven Apel, and Alessandra Russo (Eds.). ACM, 1075–1079. https://doi.org/10.1145/3338906.3341174
- Tree-to-tree Neural Networks for Program Translation. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett (Eds.). 2552–2562. https://proceedings.neurips.cc/paper/2018/hash/d759175de8ea5b1d9a2660e45554894f-Abstract.html
- An Empirical Study on the Usage of Transformer Models for Code Completion. IEEE Trans. Software Eng. 48, 12 (2022), 4818–4837. https://doi.org/10.1109/TSE.2021.3128234
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=r1xMH1BtvB
- Norman Cliff. 2014. Ordinal methods for behavioral data analysis. Psychology Press.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. https://doi.org/10.18653/v1/n19-1423
- CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020 (Findings of ACL, Vol. EMNLP 2020), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139
- Deep API learning. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (2016).
- GraphCodeBERT: Pre-training Code Representations with Data Flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net. https://openreview.net/forum?id=jLoC4ez43PZ
- Don’t stop pretraining: adapt language models to domains and tasks. arXiv preprint arXiv:2004.10964 (2020).
- PTM4Tag: Sharpening Tag Recommendation of Stack Overflow Posts with Pre-trained Models. CoRR abs/2203.10965 (2022). https://doi.org/10.48550/arXiv.2203.10965 arXiv:2203.10965
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735–1780.
- API Method Recommendation without Worrying about the Task-API Knowledge Gap. 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE) (2018), 293–304.
- API method recommendation without worrying about the task-API knowledge gap. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Marianne Huchard, Christian Kästner, and Gordon Fraser (Eds.). ACM, 293–304. https://doi.org/10.1145/3238147.3238191
- CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. CoRR abs/1909.09436 (2019). arXiv:1909.09436 http://arxiv.org/abs/1909.09436
- Contrastive Code Representation Learning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 5954–5971. https://doi.org/10.18653/v1/2021.emnlp-main.482
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/forum?id=H1eA7AEtvS
- Deep learning. nature 521, 7553 (2015), 436–444.
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault (Eds.). Association for Computational Linguistics, 7871–7880. https://doi.org/10.18653/v1/2020.acl-main.703
- TagDC: A tag recommendation method for software information sites with a combination of deep learning and collaborative filtering. J. Syst. Softw. 170 (2020), 110783. https://doi.org/10.1016/j.jss.2020.110783
- Traceability Transformed: Generating more Accurate Links with Pre-Trained BERT Models. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 324–335. https://doi.org/10.1109/ICSE43902.2021.00040
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR abs/1907.11692 (2019). arXiv:1907.11692 http://arxiv.org/abs/1907.11692
- Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664 (2021).
- Using Deep Learning to Generate Complete Log Statements. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2279–2290. https://doi.org/10.1145/3510003.3511561
- Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks. In 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, 336–347. https://doi.org/10.1109/ICSE43902.2021.00041
- CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis. ICLR (2023).
- Attention-based model for predicting question relatedness on Stack Overflow. 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) (2021), 97–107.
- Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), Marilyn A. Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, 2227–2237. https://doi.org/10.18653/v1/n18-1202
- Improving Low Quality Stack Overflow Post Detection. In 30th IEEE International Conference on Software Maintenance and Evolution, Victoria, BC, Canada, September 29 - October 3, 2014. IEEE Computer Society, 541–544. https://doi.org/10.1109/ICSME.2014.90
- BERT with History Answer Embedding for Conversational Question Answering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, Paris, France, July 21-25, 2019, Benjamin Piwowarski, Max Chevalier, Éric Gaussier, Yoelle Maarek, Jian-Yun Nie, and Falk Scholer (Eds.). ACM, 1133–1136. https://doi.org/10.1145/3331184.3331341
- Improving language understanding by generative pre-training. (2018).
- Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- Discovering, explaining and summarizing controversial discussions in community q&a sites. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 151–162.
- PostFinder: Mining Stack Overflow posts to support software developers. Inf. Softw. Technol. 127 (2020), 106367. https://doi.org/10.1016/j.infsof.2020.106367
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR abs/1910.01108 (2019). arXiv:1910.01108 http://arxiv.org/abs/1910.01108
- Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing 45, 11 (1997), 2673–2681.
- Question Relatedness on Stack Overflow: The Task, Dataset, and Corpus-inspired Models. ArXiv abs/1905.01966 (2019).
- Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 380–385. https://doi.org/10.18653/v1/n19-1035
- Code and Named Entity Recognition in StackOverflow. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4913–4926. https://doi.org/10.18653/v1/2020.acl-main.443
- Small and Practical BERT Models for Sequence Labeling. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 3630–3634. https://doi.org/10.18653/v1/D19-1374
- An Empirical Study on Learning Bug-Fixing Patches in the Wild via Neural Machine Translation. ACM Trans. Softw. Eng. Methodol. 28, 4 (2019), 19:1–19:29. https://doi.org/10.1145/3340544
- Using Pre-Trained Models to Boost Code Review Automation. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022. ACM, 2291–2302. https://doi.org/10.1145/3510003.3510621
- Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
- On the validity of pre-trained transformers for natural language processing in the software engineering domain. IEEE Transactions on Software Engineering (2022), 1–1. https://doi.org/10.1109/TSE.2022.3178469
- CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708. https://doi.org/10.18653/v1/2021.emnlp-main.685
- Huihui Wei and Ming Li. 2017. Supervised Deep Features for Software Functional Clone Detection by Exploiting Lexical and Syntactical Information in Source Code. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, Carles Sierra (Ed.). ijcai.org, 3034–3040. https://doi.org/10.24963/ijcai.2017/423
- CLEAR: Contrastive Learning for API Recommendation. 44th International Conference on Software Engineering (ICSE) (2022).
- What do developers search for on the web? Empirical Software Engineering 22 (2017), 3149–3185.
- Post2Vec: Learning Distributed Representations of Stack Overflow Posts. IEEE Transactions on Software Engineering (2021), 1–1. https://doi.org/10.1109/TSE.2021.3093761
- Prediction of relatedness in stack overflow: deep learning vs. SVM: a reproducibility study. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2018, Oulu, Finland, October 11-12, 2018, Markku Oivo, Daniel Méndez Fernández, and Audris Mockus (Eds.). ACM, 21:1–21:10. https://doi.org/10.1145/3239235.3240503
- AnswerBot: Automated generation of answer summary to developers’ technical questions. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 706–716.
- Aspect-Based API Review Classification: How Far Can Pre-Trained Transformer Model Go?. In 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE Computer Society.
- Sentiment analysis for software engineering: How far can pre-trained transformer models go?. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 70–80.
- Is deep learning better than traditional approaches in tag recommendation for software information sites? Inf. Softw. Technol. 109 (2019), 1–13. https://doi.org/10.1016/j.infsof.2019.01.002
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.