AI Projects 15 Jun 2025

Dolphin Document Parsing

I don't frequently get truly excited about a model - but I just tested out the Dolphin model from ByteDance, and I am excited.

- Chris

Document Image Parsing via Heterogeneous Anchor Prompting

Let's face it - there are a lot of models out there. They do everything up to and potentially including slicing bread. There is a lot of competition to being a great model, but in my mind there are only a few things that matter. Dolphin checks a significant number of boxes for me.

First, is it fast. On their publicly available test site, I ran several images and pdfs, and saw correct results in seconds. This isn't just fast for AI, but it is logistically useful for document processing.

Second, is it useful. This comes in two categories. License is a major concern. If I need to pay thousands of dollars for a yearly renewal, that promises to save $100 a year, it's not useful. Dolphin being MIT licensed allows it to by used in a much more permissive manner. Their GitHub page complete with a demo is readily available : https://github.com/bytedance/Dolphin?tab=readme-ov-file

The obvious second is whether it does anything worthwhile. Small and medium businesses in particular, but almost anyone certainly has a hidden trove of pdfs or images that contain transactions from the dawn of time - or at least the entity in question. So, a model that can read and parse these images and pdfs - that has some real value. There is a mountain of unstructured data hiding history and insight, and any step we can take to better understand the data is without question beneficial.

So yes, this repository to me is if not pure gold, it's certainly close. Check them out on Hugging face: https://huggingface.co/ByteDance/Dolphin