A text analysis for Operational Risk loss descriptions

Published 2 Dec 2022 in stat.AP | (2212.01285v3)

Abstract: Financial institutions manage operational risk (OpRisk) by carrying out activities required by regulation, such as collecting loss data, calculating capital requirements, and reporting. For this purpose, for each OpRisk event, loss amounts, dates, organizational units involved, event types, and descriptions are recorded in the OpRisk databases. In recent years, operational risk functions have been required to go beyond their regulatory tasks to proactively manage operational risk, preventing or mitigating its impact. As OpRisk databases also contain event descriptions, an area of opportunity is to extract information from such texts. The present work introduces for the first time a structured workflow for the application of text analysis techniques (one of the main Natural Language Processing tasks) to the OpRisk event descriptions to identify managerial clusters (more granular than regulatory categories) representing the root-causes of the underlying risks. We have complemented and enriched the established framework of statistical methods based on quantitative data. Specifically, after delicate tasks like data cleaning, text vectorization, and semantic adjustment, we have applied methods of dimensionality reduction and several clustering models with algorithms to compare their performances and weaknesses. Our results improve retrospective knowledge of loss events and enable to mitigate future risks.