Undocumented PDF processing in Gemini models

Determine the internal PDF-processing performed by Google Gemini 1.5 Pro and Google Gemini 1.5 Flash when PDFs are uploaded via the Google Gemini API, to clarify what processing occurs between the raw PDF and the text presented to the language model during data extraction tasks.

Background

The study compared LLMs for systematic review data extraction across 112 PDFs, sending full PDF files directly to Gemini 1.5 Pro and Gemini 1.5 Flash, while Mistral Large 2 received text extracted from PDFs via PyPDF2. The authors noted that this difference in input modality may introduce inconsistencies in performance comparisons.

Critically, the authors explicitly state that they do not know what processing Gemini applies to PDFs uploaded via its API, leaving an unresolved question about how Gemini’s ingestion pipeline transforms PDF content prior to model analysis.

References

For example, with Gemini models we were able to send the PDF files directly to the model. As such, we do not know what types of processing may have occurred.

Large Language Models with Human-In-The-Loop Validation for Systematic Review Data Extraction  (2501.11840 - Schroeder et al., 21 Jan 2025) in Part II – Discussion