Papers
Topics
Authors
Recent
Search
2000 character limit reached

Watermark Text Pattern Spotting in Document Images

Published 10 Jan 2024 in cs.CV | (2401.05167v2)

Abstract: Watermark text spotting in document images can offer access to an often unexplored source of information, providing crucial evidence about a record's scope, audience and sometimes even authenticity. Stemming from the problem of text spotting, detecting and understanding watermarks in documents inherits the same hardships - in the wild, writing can come in various fonts, sizes and forms, making generic recognition a very difficult problem. To address the lack of resources in this field and propel further research, we propose a novel benchmark (K-Watermark) containing 65,447 data samples generated using Wrender, a watermark text patterns rendering procedure. A validity study using humans raters yields an authenticity score of 0.51 against pre-generated watermarked documents. To prove the usefulness of the dataset and rendering technique, we developed an end-to-end solution (Wextract) for detecting the bounding box instances of watermark text, while predicting the depicted text. To deal with this specific task, we introduce a variance minimization loss and a hierarchical self-attention mechanism. To the best of our knowledge, we are the first to propose an evaluation benchmark and a complete solution for retrieving watermarks from documents surpassing baselines by 5 AP points in detection and 4 points in character accuracy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Layoutlmv2: Multi-modal pre-training for visually-rich document understanding, arXiv preprint arXiv:2012.14740 (2020).
  2. Selfdoc: Self-supervised document representation learning, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5652–5660.
  3. Docformer: End-to-end transformer for document understanding, arXiv preprint arXiv:2106.11539 (2021).
  4. Formnet: Structural encoding beyond sequential modeling in form document information extraction, arXiv preprint arXiv:2203.08411 (2022).
  5. Layoutlmv3: Pre-training for document ai with unified text and image masking, arXiv preprint arXiv:2204.08387 (2022).
  6. Consent: Context sensitive transformer for bold words classification, arXiv preprint arXiv:2205.07683 (2022).
  7. Tanda: Transfer and adapt pre-trained transformer models for answer sentence selection, in: AAAI, 2020.
  8. Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task, in: Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 5505–5514.
  9. Towards explainable ai: Assessing the usefulness and impact of added explainability features in legal document summarization, in: Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, 2021, pp. 1–7.
  10. Analysis of graphsum’s attention weights to improve the explainability of multi-document summarization, in: The 23rd International Conference on Information Integration and Web Intelligence, 2021, pp. 359–366.
  11. Bertgcn: Transductive text classification by combining gcn and bert, arXiv preprint arXiv:2105.05727 (2021).
  12. AWS-Textract, Textract-aws ocr engine, https://aws.amazon.com/textract (2019).
  13. Glass: Global to local attention for scene-text spotting, in: ECCV, Springer, 2022, pp. 249–266.
  14. Fots: Fast oriented text spotting with a unified network, in: CVPR, 2018, pp. 5676–5685.
  15. Aster: An attentional scene text recognizer with flexible rectification, PAMI 41 (2018) 2035–2048.
  16. Char-net: A character-aware neural network for distorted scene text recognition, in: AAAI, volume 32, 2018.
  17. Lal: Linguistically aware learning for scene text recognition, in: Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4051–4059.
  18. Character region attention for text spotting, in: ECCV, Springer, 2020, pp. 504–521.
  19. Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting, arXiv preprint arXiv:2105.03620 (2021).
  20. Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition, in: CVPR, 2021, pp. 7098–7107.
  21. Robustscanner: Dynamically enhancing positional clues for robust text recognition, in: ECCV, Springer, 2020, pp. 135–151.
  22. Deep features for text spotting, in: ECCV, Springer, 2014, pp. 512–528.
  23. Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291–7299.
  24. Document understanding dataset and evaluation (dude), in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 19528–19540.
  25. Structural similarity for document image classification and retrieval, Pattern Recognition Letters 43 (2014) 119–126.
  26. J.-P. T. Guillaume Jaume, Hazim Kemal Ekenel, Funsd: A dataset for form understanding in noisy scanned documents, in: ICDAR-OST, 2019.
  27. Unstructured object matching using co-salient region segmentation, in: CVPR Workshops, 2022, pp. 5051–5060.
  28. Large sequence representation learning via multi-stage latent transformers, in: Proceedings of the 29th International Conference on Computational Linguistics, 2022, pp. 4633–4639.
  29. Faster r-cnn: Towards real-time object detection with region proposal networks, NIPS 28 (2015) 91–99.
  30. You only look once: Unified, real-time object detection, in: CVPR, 2016, pp. 779–788.
  31. Efficientdet: Scalable and efficient object detection, in: CVPR, 2020, pp. 10781–10790.
  32. Focal loss for dense object detection, in: ICCV, 2017.
  33. End-to-end object detection with transformers, in: ECCV, Springer, 2020, pp. 213–229.
  34. Exploring plain vision transformer backbones for object detection, in: ECCV, Springer, 2022, pp. 280–296.
  35. k-nn embeded space conditioning for enhanced few-shot object detection, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 401–410.
  36. Kleister: key information extraction datasets involving long documents with complex layouts, in: ICDAR, Springer, 2021, pp. 564–579.
  37. K. Lang, T. Mitchell, Newsgroup 20 dataset, 1999.
  38. Textsnake: A flexible representation for detecting text of arbitrary shapes, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 20–36.
  39. Real-time scene text detection with differentiable binarization and adaptive scale fusion, PAMI (2022).
  40. Towards unified scene text spotting based on sequence generation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 15223–15232.
  41. Turning a clip model into a scene text detector, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6978–6988.
  42. Deep residual learning for image recognition, in: NIPS, 2017.
  43. Swin transformer: Hierarchical vision transformer using shifted windows, in: ICCV, 2021, pp. 10012–10022.
  44. Attention is all you need, in: NIPS, 2017.
  45. Master: Multi-aspect non-local network for scene text recognition, Pattern Recognition 117 (2021) 107980.
  46. On recognizing texts of arbitrary shapes with 2d self-attention, in: CVPR Workshops, 2020, pp. 546–547.
  47. D. Bautista, R. Atienza, Scene text recognition with permuted autoregressive sequence models, in: European Conference on Computer Vision, Springer Nature Switzerland, Cham, 2022, pp. 178–196. URL: https://doi.org/10.1007/978-3-031-19815-1_11. doi:10.1007/978-3-031-19815-1_11.
  48. Microsoft coco: Common objects in context, in: ECCV, 2014.
  49. I. Loshchilov, F. Hutter, Decoupled weight decay regularization, in: ICLR, 2019.
Citations (1)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.