- The paper demonstrates that FAWA achieves a 100% attack success rate while significantly reducing perturbation visibility compared to traditional OCR attacks.
- It employs both gradient-based and optimization methods to generate natural watermark perturbations that preserve text readability.
- Experimental results reveal that FAWA requires 60% fewer perturbations and 78% fewer iterations while outperforming baseline attacks in MSE, PSNR, and SSIM metrics.
Fast Adversarial Watermark Attack on Optical Character Recognition Systems: A Summary of FAWA
The paper "FAWA: Fast Adversarial Watermark Attack on Optical Character Recognition (OCR) Systems" presents a novel adversarial attack method specifically designed to manipulate OCR systems. This essay provides a detailed exploration of the paper's contributions, methodological framework, and experimental results.
Introduction to FAWA
The Fast Adversarial Watermark Attack (FAWA) addresses limitations in existing adversarial attacks on OCR systems. Conventional methods often lead to visually noticeable and unnatural perturbations, especially in text images with clear backgrounds. FAWA introduces a watermark-style perturbation that appears natural, significantly reducing the perturbation visibility to human observers while maintaining a 100% attack success rate. The technique effectively disguises adversarial changes as watermarks, leveraging both gradient-based and optimization-based perturbation generation.
Technical Challenges and Innovations
Traditional adversarial attacks alter image features using minor Lp​-norm perturbations or restrict modifications to small areas. However, text images' unique attributes—such as white backgrounds and dense character arrangements—pose unique challenges. FAWA circumvents these challenges by:
- Utilizing Watermarks: The watermark-style perturbations, confined to specific areas, overlay text without impairing readability, unlike patch-based methods that disrupt clean backgrounds.
- Employing a White-Box Attack Methodology: FAWA assumes access to model parameters for targeted attacks, ensuring precise manipulation of OCR outputs.
Methodology
The FAWA framework comprises three key steps:
- Watermark Positioning: Automatically determining optimal watermark placement to concentrate perturbations. This involves creating baseline adversarial images, identifying perturbed regions, and applying erosion/dilation operations to pinpoint positioning.
Figure 1: Find the position of watermarks.
- Adversarial Perturbation Generation: Utilizing either gradient-based methods like MI-FGSM or optimization-based methods to generate perturbations within watermarks. The gradient-based attacks utilize CTC loss functions to enhance efficiency and perturbations are confined to watermark boundaries to prevent background pollution.
- Color Watermarks Application: Optionally converting gray-scale watermarks into full-color ones to enhance text readability, especially for colored prints.
Experimental Results
The FAWA's efficacy is validated through comprehensive experiments on Calamari-OCR, an open-source OCR framework. Key findings include:
- FAWA achieves 100% attack success with 60% less perturbations and approximately 78% fewer iterations than non-watermarked adversarial attacks.
- The perturbation level analysis reveals a considerable reduction in saliency, both positive and negative, with watermarks.


Figure 2: Attack efficiency in word images with Arial font.
- Watermark attacks consistently outperform baseline attacks in terms of MSE, PSNR, and structural similarity (SSIM), demonstrating superior visual quality.
- The tradeoff between attack efficiency and perturbation level is adjustable via hyper-parameters (e.g., step size α, tradeoff parameter c), enabling fine-tuning based on attack constraints.
Further Extensions and Applications
FAWA's robustness extends beyond English text, successfully attacking sequence-based OCR systems for Chinese characters.
Figure 3: A Chinese paragraph example. \scriptsize{MSE/PSNR/SSIM: 735.34/19.46/0.697}
Additionally, FAWA can be used to enhance OCR accuracy by embedding protective perturbations that reinforce recognition of watermark-enhanced text. This dual capability to both attack and protect demonstrates FAWA's versatility.
Conclusion
FAWA significantly advances the field of adversarial attacks on OCR systems by integrating efficient and visually coherent watermark perturbations. The implications of such a system are multifaceted, ranging from potential exploitation in malicious settings to constructive applications in OCR accuracy enhancement. Future research may explore expanding FAWA’s applications across varied languages and document types, ensuring text manipulation remains both effective and discreet.