Papers
Topics
Authors
Recent
Search
2000 character limit reached

ChartFormer: A Large Vision Language Model for Converting Chart Images into Tactile Accessible SVGs

Published 29 May 2024 in cs.CV | (2405.19117v1)

Abstract: Visualizations, such as charts, are crucial for interpreting complex data. However, they are often provided as raster images, which are not compatible with assistive technologies for people with blindness and visual impairments, such as embossed papers or tactile displays. At the same time, creating accessible vector graphics requires a skilled sighted person and is time-intensive. In this work, we leverage advancements in the field of chart analysis to generate tactile charts in an end-to-end manner. Our three key contributions are as follows: (1) introducing the ChartFormer model trained to convert raster chart images into tactile-accessible SVGs, (2) training this model on the Chart2Tactile dataset, a synthetic chart dataset we created following accessibility standards, and (3) evaluating the effectiveness of our SVGs through a pilot user study with an refreshable two-dimensional tactile display. Our work is publicly available at https://github.com/nsothman/ChartFormer .

Citations (1)

Summary

  • The paper introduces ChartFormer, a transformer-based model that converts raster charts into tactile-accessible SVGs.
  • It develops the Chart2Tactile dataset with 10,000 synthetic chart images to meet accessibility standards.
  • A pilot user study validates the model’s utility, while identifying challenges in rendering complex charts.

ChartFormer: A Large Vision LLM for Converting Chart Images into Tactile Accessible SVGs

Introduction

The paper "ChartFormer: A Large Vision LLM for Converting Chart Images into Tactile Accessible SVGs" addresses the significant challenge of rendering complex data visualizations into formats accessible to people with blindness and visual impairments. Recognizing the limitations of raster images in accessibility contexts, this work introduces an innovative approach using a vision-LLM to convert raster chart images into Scalable Vector Graphics (SVGs), a format that supports tactile interactions.

Contributions

The authors highlight three primary contributions. First, they introduce the ChartFormer model, a transformer-based approach for converting raster charts into tactile-accessible SVGs. Second, they develop the Chart2Tactile dataset, a comprehensive synthetic chart dataset designed following accessibility standards. Third, they present evaluation results from a pilot user study involving a two-dimensional tactile display, demonstrating the utility of the generated SVGs.

Dataset and Methodology

The creation of the Chart2Tactile dataset, consisting of 10,000 tactile chart images across four categories, underpins the model's training. The dataset was synthesized by drawing upon existing datasets, such as VisText and ChartX, for their comprehensive metadata and chart images (Figure 1). Figure 1

Figure 1: A scatter plot sample: (a) the original synthesized raster image; (b) the tactile version following accessibility guidelines.

SVGs were rendered following accessibility guidelines, ensuring distinct tactile elements and appropriate use of textures and symbol types. The ChartFormer model, built on the LLaVA-1.5 architecture, extracts essential metadata and styles from raster x-y charts to populate SVG templates, facilitating conversion into tactile-accessible formats (Figure 2). Figure 2

Figure 2: The ChartFormer takes a raster x-y plot as an input. The essential metadata and styles are extracted, which are then used to populate the svgwrite templates. For better viewing resolution, please visit our project page.

User Study and Outcomes

A pilot user study with four participants emphasized the model's capacity to generate meaningful tactile representations. While participants successfully navigated simpler charts, complex charts posed challenges, highlighting areas for further refinement. Feedback from the study suggested improvements in SVG rendering, particularly to avoid the staircasing effect in tactile output and enhance line smoothing (Figure 3). Figure 3

Figure 3: SVG-formatted line charts used in the user study, showcasing varying complexities: (A) a single line; (B) two lines; (C) six lines. For better viewing resolution, please visit our project page.

Discussion

ChartFormer demonstrates a significant step towards automated generation of accessible tactile graphics, yet limitations remain. The current focus on x-y plots suggests an opportunity to expand the system to accommodate more complex visualization types. Integrating an interface for sighted users could further improve the customization and accuracy of outputs.

Conclusion

This research contributes a powerful vision-LLM designed to enhance accessibility via tactile graphics. It sets a precedent for further exploration into AI-driven accessibility tools, encouraging the refinement of models to better serve visually impaired communities through few- or zero-shot tuning techniques. By offering a novel dataset and model, the authors lay the groundwork for continued advancements in accessible data visualization.

In summary, the ChartFormer model exemplifies the potential of AI in bridging accessibility gaps, paving the way for broader applications in educational and professional contexts. As AI capabilities evolve, this research will be instrumental in shaping inclusive solutions that consider the diverse needs of all users.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

Explain it Like I'm 14

What is this paper about?

This paper introduces ChartFormer, an AI system that turns pictures of charts (like line graphs and bar charts) into special digital drawings called SVGs that people who are blind or have low vision can touch and explore using tactile devices. The goal is to make charts easier to understand without needing a sighted expert to redraw them.

What questions does the paper try to answer?

To make charts more accessible, the authors focus on three main goals:

  • Can an AI look at a chart image and automatically create a touch-friendly version in SVG?
  • Can we train this AI using a dataset built with accessibility rules in mind?
  • Do people who are blind find these AI-made SVG charts useful on tactile displays?

How did the researchers do it?

They built a system called ChartFormer based on a “vision-LLM.” Think of it like a smart assistant that can both look at pictures and describe or recreate them in words or code.

Here’s the approach, explained simply:

  • Chart images: The input is a regular chart image (a “raster,” which is a picture made of tiny dots).
  • SVG output: The system creates an SVG (Scalable Vector Graphics), which is like a digital drawing made of lines and shapes. SVGs can be resized without getting blurry and can include text descriptions and styles—perfect for tactile displays and screen readers.
  • Training data: They created a new dataset called Chart2Tactile with 10,000 charts (line, bar, scatter, and error-bar). These charts were built to follow accessibility rules, such as using thicker lines, clear patterns (like dashed or dotted), and Braille labels.
  • Accessibility rules: For tactile charts, they followed guidelines like:
    • Make lines and shapes easy to feel by using different thicknesses or patterns.
    • Avoid thin lines that are hard to detect by touch.
    • Use Braille for text and keep it horizontal.
    • For crowded charts (like scatter plots), only include important points to avoid clutter.
    • Put text in boxes to make it easier to find by touch.
  • AI training: They fine-tuned an existing model (LLaVA-1.5, a popular vision-LLM) so it could:
    • Identify the chart type (line, bar, scatter, etc.).
    • Read titles and labels.
    • Figure out axis ranges and label types (like numbers, dates, or text).
    • Extract the data needed to draw the chart in an accessible way.
  • Rendering: Using code templates, the AI fills in the necessary info and styles to produce the tactile-ready SVG.

What did they find, and why does it matter?

The authors ran a small user study with four blind participants using a 2D tactile display. Here’s what happened:

  • For simpler charts (one or two lines), participants could identify lines, intersections, and trends (increasing, decreasing, or flat) without trouble.
  • For a very busy chart (six lines), it was harder to count all intersections, mostly because there was a lot happening in the same space.
  • Zoom and audio descriptions helped: Some participants used zoom on the tactile device to feel fine details, and they liked having audio descriptions for text labels and titles.
  • Suggested improvement: Participants asked for smoother lines to avoid a “stair-step” feel (jagged edges on the tactile display).

This matters because it shows AI can help make charts accessible—quickly and consistently—without needing a trained designer for every chart. It could make school materials, reports, and scientific papers more usable for people who are blind or have low vision.

What does this mean for the future?

This work is a first step toward automatic, touch-friendly charts:

  • It could save time for teachers, students, and professionals who need accessible visuals.
  • The dataset and code are public, so others can build on this.
  • There’s room to grow: The system currently focuses on x–y charts and tested line charts in the user study. Future work could support more chart types, add a simple editing interface for fine-tuning results, and include larger user studies.
  • Big picture: AI models that can “see” and “talk” (like ChartFormer) could be adapted to help many different groups, making digital information more inclusive across education and workplaces.

Glossary

  • Accessibility guidelines: Standards and best practices that ensure visualizations are usable by people with visual impairments. "We have also introduced the first dataset for tactile visualizations that complies with accessibility guidelines."
  • Accessibility standards: Formal criteria that define how content should be made accessible. "the Chart2Tactile dataset, a synthetic chart dataset we created following accessibility standards"
  • Assistive technologies: Tools that help people with disabilities access content. "assistive technologies for people with blindness and visual impairments, such as embossed papers or tactile displays."
  • Audio descriptions: Spoken or textual narrations that describe visual elements for non-visual access. "They also appreciated the audio descriptions, which facilitated access to the chart's textual elements."
  • Baseline weights: Pretrained parameters used to initialize a model before further training. "We used the baseline weights from ChartLLama"
  • Braille: A tactile writing system used by people who are blind or visually impaired. "Text in tactile illustrations should be in Braille, oriented horizontally."
  • Bounding box: A rectangular region that encloses content to aid separation and exploration. "Enclose text content with a bounding box for better exploration and distinguishing separate texts more effectively."
  • Chart analysis: Automated understanding and processing of chart images and structures. "we leverage advancements in the field of chart analysis to generate tactile charts in an end-to-end manner."
  • Chart summarization: Generating concise textual descriptions of charts’ key information. "have focused on making visualizations accessible through chart summarization tasks"
  • CSV: A plain-text format for tabular data where values are separated by commas. "with each chart including four modalities: image, CSV, Python code, and text description."
  • Embossed papers: Raised-print materials that convey tactile information for non-visual reading. "such as embossed papers or tactile displays."
  • End-to-end: A system that performs the entire task from input to final output without manual intermediate steps. "generate tactile charts in an end-to-end manner."
  • Error-bar charts: Charts that include error bars to represent variability or uncertainty in data. "spanning 4 categories (line, bar, scatter and error-bar charts)"
  • Fine-tuned: Further trained a pretrained model on a specific dataset or task. "and then fine-tuned the model for 10 epochs using our dataset."
  • GUI: A graphical user interface that enables interaction through visual elements. "a GUI that comprises six steps to convert bitmap images into printable SVG format."
  • Hyperparameters: Training settings controlling model behavior (e.g., learning rate, epochs). "adopted the same hyperparameters for training"
  • Matplotlib: A Python library for creating static, animated, and interactive visualizations. "converting charts into Matplotlib Python codes."
  • Metadata extraction: Parsing and retrieving structured information from chart images. "beginning with metadata extraction and followed by conversion into tactile format."
  • Modality: A mode or form of data representation or interaction (e.g., visual, tactile, textual). "but none have considered the tactile modality."
  • Raster images: Pixel-based images that can blur when scaled. "they are often provided as raster images"
  • Refreshable two-dimensional tactile display: A device that dynamically renders tactile graphics for non-visual exploration. "a pilot user study with an refreshable two-dimensional tactile display."
  • Scalable Vector Graphics (SVG): A resolution-independent vector image format based on XML. "saved in the Scalable Vector Graphics (SVG) format."
  • Screen readers: Software that converts text and interface elements into speech or Braille output. "enhances interactivity when used with screen readers or tactile displays."
  • Staircasing effect: A jagged appearance in tactile or low-resolution renderings of lines. "staircasing effect in the tactile output."
  • svgwrite: A Python library for programmatically generating SVG files. "we used the svgwrite Python package"
  • Time-series data: Sequential data points indexed by time. "each accompanied by time series data and a raster version."
  • Transformer-based model: A neural architecture leveraging attention mechanisms for sequence processing. "We introduce a transformer-based model that extracts key information and assigns styles for the SVG file."
  • Vega-Lite: A high-level grammar of interactive graphics for declarative visualization. "rendered a raster image using Vega-Lite"
  • Vision encoder: The component of a multimodal model that processes visual inputs. "comprising a vision encoder for image input and a LLM for text output decoding."
  • Vision-LLM (VLM): A multimodal model that jointly processes visual and textual data. "trained a large Vision-LLM (VLM) on synthetically generated images across 10 chart types."
  • x-y plot: A chart plotting data points with values along horizontal (x) and vertical (y) axes. "The ChartFormer takes a raster x-y plot as an input."
  • XML-based: Structured using the Extensible Markup Language format. "SVGs are XML-based files that store geometrical shapes using mathematical formulas in a hierarchical structure."

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.