Undocumented PDF processing in Gemini models
Determine the internal PDF-processing performed by Google Gemini 1.5 Pro and Google Gemini 1.5 Flash when PDFs are uploaded via the Google Gemini API, to clarify what processing occurs between the raw PDF and the text presented to the language model during data extraction tasks.
References
For example, with Gemini models we were able to send the PDF files directly to the model. As such, we do not know what types of processing may have occurred.
— Large Language Models with Human-In-The-Loop Validation for Systematic Review Data Extraction
(2501.11840 - Schroeder et al., 21 Jan 2025) in Part II – Discussion