GLM OCR: extract text, tables, and formulas from document images

01What it does

GLM-OCR is a small multimodal OCR model built for document understanding. In practical terms, it gives you separate prompts for common extraction jobs: text recognition, formula recognition, and table recognition.

That makes it more focused than a general vision chatbot. If you have a screenshot, scanned page, receipt, slide, table, or technical page, you can ask the demo to extract the part that matters instead of asking a broad image question and cleaning up the answer yourself.

The reader-facing workflow is the public GLM OCR Demo Space. It wraps the open `zai-org/GLM-OCR` weights in a simple Gradio app, so you can test an image in the browser first. If the result is useful, the model card also gives local paths through Transformers, vLLM, SGLang, and Ollama.

02How to try it

Start with the Hugging Face Space and upload one document image that has structure. A receipt with totals, a screenshot with small UI text, a table from a report, or a math-heavy page will tell you more than a clean one-line sample.

Run the same image through the available recognition types. Use `Text` first to check basic extraction, then try `Table` or `Formula` when the page actually contains those elements. On the first pass, look for three things: whether the reading order makes sense, whether small text survives, and whether structured content stays usable instead of turning into a flat paragraph.

For local testing, use the backing model page. Ollama is the shortest path for a quick local check, while Transformers, vLLM, and SGLang are better fits if you want to build a repeatable OCR service around it.

03Caveat

The Space is the fastest way to test, but the full local story still depends on model-serving tooling and a suitable GPU path for serious use. Also, the demo works on uploaded images, so complex multi-page PDFs may need preprocessing before you judge the model fairly.

04What you can do with it

Extract text from screenshots, receipts, scanned pages, and document photos.
Test whether tables survive OCR well enough for cleanup or downstream parsing.
Pull formulas from technical pages before moving them into notes or search.
Compare a small OCR-specific model against a general vision-language model on your own documents.
Use the browser demo as a quick filter before setting up a local OCR workflow.

Try the demo

View model page

Neural Expedition · Useful open-source AI, curated without hype.

Field notes

01What it does

02How to try it

03Caveat

04What you can do with it