Extraction of Layout Entities and Sub-layout Query-based Retrieval of Document Images
Abstract: Layouts and sub-layouts constitute an important clue while searching a document on the basis of its structure, or when textual content is unknown/irrelevant. A sub-layout specifies the arrangement of document entities within a smaller portion of the document. We propose an efficient graph-based matching algorithm, integrated with hash-based indexing, to prune a possibly large search space. A user can specify a combination of sub-layouts of interest using sketch-based queries. The system supports partial matching for unspecified layout entities. We handle cases of segmentation pre-processing errors (for text/non-text blocks) with a symmetry maximization-based strategy, and accounting for multiple domain-specific plausible segmentation hypotheses. We show promising results of our system on a database of unstructured entities, containing 4776 newspaper images.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.