Papers
Topics
Authors
Recent
Search
2000 character limit reached

Towards De-identification of Legal Texts

Published 9 Oct 2019 in cs.CL | (1910.03739v1)

Abstract: In many countries, personal information that can be published or shared between organizations is regulated and, therefore, documents must undergo a process of de-identification to eliminate or obfuscate confidential data. Our work focuses on the de-identification of legal texts, where the goal is to hide the names of the actors involved in a lawsuit without losing the sense of the story. We present a first evaluation on our corpus of NLP tools in tasks such as segmentation, tokenization and recognition of named entities, and we analyze several evaluation measures for our de-identification task. Results are meager: 84% of the documents have at least one name not covered by NER tools, something that might lead to the re-identification of involved names. We conclude that tools must be strongly adapted for processing texts of this particular domain.

Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.