Outperformed larger commercial models

MIT researchers teach AI models to interpret

MIT researchers have built a tool that could democratize artificial intelligence—starting with something as ordinary as a chart. The project, called ChartNet, is a dataset of over a million varied charts designed to teach AI models how to read and interpret the graphs, bar charts, and visualizations that fill financial reports and business analyses around the world.

The problem ChartNet solves is deceptively simple yet consequential. Even the most advanced vision-language models—AI systems trained to understand both images and text—struggle to extract reliable information from charts. This matters because businesses, from finance firms to pharmaceutical companies, rely on charts to make critical decisions. When an AI misinterprets a chart, it can mislead decision-makers downstream. Jovana Kondic, an MIT electrical engineering and computer science graduate student who led the research, explains the gap: "A vision-language model, unlike our brains, may need to see thousands of examples during training to reliably recognize something as a line chart."

The core innovation is how the researchers built ChartNet itself. Rather than scraping limited chart images from the internet—which creates a data bottleneck—the MIT team and their colleagues at the MIT-IBM Computing Research Lab developed a synthetic data generation pipeline. They took existing charts and translated them into code, then used an automated system to create hundreds of variations of each one by changing the chart type, data values, colors, topics, and other visual elements. Each of the million-plus charts in the dataset comes bundled with the code that generated it, a textual description, the underlying numerical data in table form, and question-and-answer pairs to teach models how to reason about the information correctly.

The results challenge a prevailing assumption about AI: that bigger is always better. The researchers trained a series of open-source vision-language models using ChartNet, and many of these smaller models significantly outperformed commercial models orders of magnitude larger on tasks like extracting specific data points from charts and summarizing them in plain language. This is a genuine breakthrough for small firms with tight budgets. Dhiraj Joshi, a senior scientist at IBM Research on the project, notes that chart understanding is especially critical in finance: "The finance industry thrives on charts. If vision-language models can extract information out of charts, like descriptions of trends, that facilitates a lot of workflows that happen downstream."

The team—which includes researchers from MIT, the MIT-IBM Computing Research Lab, and IBM Research—presented the work at the IEEE Computer Vision and Pattern Recognition Conference. By releasing ChartNet as an open-source resource, they've created what Kondic calls "a one-stop shop for chart understanding, covering basically anything that an AI model and a practitioner who is training that model might need." The dataset can now be used by researchers and practitioners to improve AI capabilities for business trend analysis, scientific figure interpretation, and countless other applications where charts hold the keys to understanding complex information.

What makes this work feel genuinely hopeful is its accessibility. A small startup without billions to spend on computing power can now train a high-performing AI model using the same dataset as a major corporation. That's not just a technical achievement—it's a shift in who gets to benefit from artificial intelligence's promise.

MIT researchers teach AI models to interpret charts