The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums

Published 6 Aug 2024 in cs.CR, cs.AI, and cs.CL | (2408.03354v3)

Abstract: LLMs can be used to analyze cyber threat intelligence (CTI) data from cybercrime forums, which contain extensive information and key discussions about emerging cyber threats. However, to date, the level of accuracy and efficiency of LLMs for such critical tasks has yet to be thoroughly evaluated. Hence, this study assesses the performance of an LLM system built on the OpenAI GPT-3.5-turbo model [8] to extract CTI information. To do so, a random sample of more than 700 daily conversations from three cybercrime forums - XSS, Exploit_in, and RAMP - was extracted, and the LLM system was instructed to summarize the conversations and predict 10 key CTI variables, such as whether a large organization and/or a critical infrastructure is being targeted, with only simple human-language instructions. Then, two coders reviewed each conversation and evaluated whether the information extracted by the LLM was accurate. The LLM system performed well, with an average accuracy score of 96.23%, an average precision of 90% and an average recall of 88.2%. Various ways to enhance the model were uncovered, such as the need to help the LLM distinguish between stories and past events, as well as being careful with verb tenses in prompts. Nevertheless, the results of this study highlight the relevance of using LLMs for cyber threat intelligence.

Abstract PDF HTML Upgrade to Chat

Summary

The paper demonstrates that LLMs can accurately extract predefined CTI variables from cybercrime forums, achieving an average accuracy of 97.96%.
The study employs a robust methodology that includes systematic data collection, specific model prompting, and thorough evaluation against human-coded analyses.
The findings imply that LLMs can streamline CTI workflows by efficiently identifying critical threat indicators, though enhanced context interpretation remains necessary.

Evaluating LLMs for Cyber Threat Intelligence in Cybercrime Forums

The paper under review investigates the potential efficacy of LLMs for extracting and summarizing Cyber Threat Intelligence (CTI) from discussions in cybercrime forums. This research responds to the critical question of whether LLMs can accurately process and extract useful intelligence data from unstructured textual information prevalent in such forums. The team conducted a robust examination by employing the GPT-3.5-turbo-16k-0613 model to analyze 500 conversations extracted from three notorious cybercrime platforms: XSS, Exploit.in, and RAMP.

Methodological Approach

The researchers meticulously structured their approach to incorporate data collection, model prompting, and evaluation. Data were systematically harvested from select cybercrime forums, with each conversation evaluated to extract ten predefined CTI variables. The aim was to summarize the discussions and assess the model's ability to identify salient CTI indicators such as ongoing sales, intended targets, and references to specific technologies or vulnerabilities. A core component of the methodology involved comparing the LLM-derived summaries and extractions against those independently coded by human analysts, highlighting variances and pinpointing areas for model enhancements.

Empirical Findings

The evaluation demonstrated that the LLM system exhibited marked accuracy in coding the intended CTI variables, achieving an average accuracy of 97.96%, with a minimum of 95% and a maximum of 100% accuracy across various categories. Notably, the model achieved perfect accuracy for identifying 'industries' mentioned within conversations but showed slightly lower results concerning critical infrastructure targeting (95%) and large organization targeting (96.2%). Such outcomes underscore the model's substantial competence in processing complex threads to isolate critical data points pertinent to CTI tasks.

Implications and Future Directions

This research evidences that LLMs are viable tools for augmenting the capabilities of CTI analysts by automating the initial stages of data processing and intelligence extraction from cybercrime platforms. The demonstrated accuracy indicates that these models can significantly streamline the categorization and prioritization of security threats, thus refining analyst focus and efficiency.

However, the study also acknowledges inherent model limitations, including difficulties with contextual and temporal interpretations, as well as potential misconceptions arising from vague or undefined concepts. The inability of the LLM to consistently distinguish between current activities and historical accounts points to a vital area for ongoing refinement. Additionally, preprocessing strategies such as data chunking and context management require optimization to fully leverage LLM potentialities.

Conclusion and Prospective Enhancements

In conclusion, the application of LLM technology to CTI—particularly within the dynamic environment of cybercrime forums—offers substantial promise. Despite notable successes, the research delineates opportunities for refinement, primarily by tackling nuanced contextual challenges and exploring further enhancements through cutting-edge models such as Claude 3.5 or GPT-4. Future research could focus on these areas, potentially boosting the resilience and accuracy of LLMs in the domain of cybersecurity intelligence. The promising results of this study lay the groundwork for further explorations into the integration of advanced LLMs with existing cyber threat intelligence frameworks.

Markdown Report Issue