ChartQA-X: Generating Explanations for Visual Chart Reasoning

Arizona State University
alt text

ChartQA-X dataset enables training VLMs capable of generating both answers and explanations in response to user questions about charts.

Abstract

The ability to explain complex information from chart images is vital for effective data-driven decision-making. In this work, we address the challenge of generating detailed explanations alongside answering questions about charts. We present ChartQA-X, a comprehensive dataset comprising 30,799 chart samples across four chart types, each paired with contextually relevant questions, answers, and explanations. Explanations are generated and selected based on metrics such as faithfulness, informativeness, coherence, and perplexity. Our human evaluation with 245 participants shows that model-generated explanations in ChartQA-X surpass human-written explanations in accuracy and logic and are comparable in terms of clarity and overall quality. Moreover, models fine-tuned on ChartQA-X show substantial improvements across various metrics, including absolute gains of up to 24.57 points in explanation quality, 18.96 percentage points in question-answering accuracy, and 14.75 percentage points on unseen benchmarks for the same task. By integrating explanatory narratives with answers, our approach enables agents to convey complex visual information more effectively, improving comprehension and greater trust in the generated responses.

ChartQA-X Dataset Creation Pipeline

The ChartQA-X pipeline generates and selects high-quality explanations using six VLMs and ROSCOE metrics, followed by correctness checks and evaluations across human studies, benchmarks, and generalization tests.

Quantitative Results

Question answering accuracy (%), without data table in the input, calculated for different chart and question types (test set only). Best scores are in bold, and second-best scores are underlined.


ROSCOE scores on the ChartQA-X test set without data table in the input. Best scores are in bold, and second-best scores are underlined. FS: Faithfulness Step, FT: Faithfulness Token, IS: Informativeness Step, IC: Informativeness Chain, SRC: Source-Consistency, SFC: Self-Consistency, PS: Perplexity Step, PC: Perplexity Chain, GS: Grammar Step, and AS: Aggregate Score.


Generalization results on other chart datasets. Models trained on ChartQA-X transfer more effectively to DVQA, PlotQA, and FigureQA, showing stronger cross-dataset robustness. Best scores are bold and second-best are underlined.

Qualitative Results

Examples from our human-study evaluation comparing ChartQA-X explanations with human-written explanations. The top row shows cases where ChartQA-X provides clearer or more accurate reasoning, and the bottom row shows cases where human explanations outperform our model. Likert-scale ratings reflect perceived accuracy, clarity, logic, and overall quality, with highlights marking errors or inconsistencies.

Acknowledgement

This research was supported by the National Eye Institute (NEI) of the National Institutes of Health (NIH) under award number R01EY034562. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank Research Computing (RC) at Arizona State University (ASU) for providing the computing resources used in this work.

BibTeX


    @article{hegde2025chartqa,
      title   = {ChartQA-X: Generating Explanations for Visual Chart Reasoning},
      author  = {Hegde, Shamanthak and Fazli, Pooyan and Seifi, Hasti},
      journal = {arXiv preprint arXiv:2504.13275},
      year    = {2025}
    }