Facebook iconHow Good Is LightOnOCR-2-1B for Document OCR and Parsing?
F22 logo
Blogs/AI

How Good Is LightOnOCR-2-1B for Document OCR and Parsing?

Written by Seerin
Mar 6, 2026
36 Min Read
How Good Is LightOnOCR-2-1B for Document OCR and Parsing? Hero

Building document processing pipelines is rarely simple. Most OCR systems rely on multiple stages: detection, text extraction, layout parsing, and table reconstruction. When documents become complex, these pipelines often break, making them costly and difficult to maintain.

I wanted to understand whether a lightweight end-to-end model could simplify this process without sacrificing document structure.

LightOnOCR-2-1B, released by LightOn, takes a different approach. Instead of relying on fragmented OCR components, it processes the entire document as a single vision-language task and converts page layouts directly into structured Markdown.

This raises a practical question for developers working with document AI:

Can a 1B-parameter open-source model realistically compete with paid OCR APIs like Llama Parse v2 or Gemini?

To answer that, this article compares LightOnOCR-2-1B against these systems using real documents such as tax forms, medical records, and receipts.

Open-Source Agility vs Proprietary Power

Before looking at the results, it helps to understand the systems being compared. This is not just an accuracy comparison; the models represent different approaches to document parsing. Some rely on large API-driven systems, while others focus on lightweight, specialized architectures.

LightOnOCR-2-1B

LightOnOCR-2-1B takes a different approach from traditional OCR pipelines. According to the documentation on Hugging Face, the model avoids multi-stage detection and recognition systems that often fail when document layouts become complex.

Instead, it uses a native-resolution Vision Transformer (ViT) combined with a Qwen2.5-based decoder to process the entire document as a single vision-language task.

Key characteristics

  • Open-source model
  • Lightweight enough to run on consumer GPUs
  • Processes the entire page layout directly
  • No per-page API cost
  • Full control over data and privacy

Llama Parse v2 (Agentic Plus)

Llama Parse v2 from LlamaIndex is widely used for RAG-ready document extraction. The Agentic Plus tier relies on an agentic workflow that analyzes layout structure and attempts to reconstruct the document intelligently.

Key characteristics

  • API-based document parsing service
  • Designed for complex document layouts
  • Produces structured outputs for downstream RAG pipelines

Limitations

  • Credit-based pricing model
  • Approximately 45 credits per page, which can become expensive when processing large document volumes.

Gemini 3

Google’s Gemini multimodal models can also perform document OCR and layout understanding. Because Gemini is a large multimodal model, it is capable of interpreting both text and context within the document.

Key characteristics

  • Multimodal model capable of OCR and reasoning
  • Strong contextual understanding of documents

Limitations

  • API-only usage
  • High compute cost for large document pipelines

Model Variants Released by LightOn

LightOn also released multiple variants of the model designed for different use cases.

LightOnOCR-2-1B

The flagship model used in this evaluation. It is refined using RLVR (Reinforcement Learning from Visual Rewards) to improve document structure extraction.

LightOnOCR-2-1B-bbox

A specialized variant that predicts bounding boxes for images and document elements in addition to extracting text.

OCR-Soup / Bbox-Soup

Merged variants that combine multiple training checkpoints to improve robustness across different document layouts.

Base Variants

Minimal versions designed for fine-tuning on domain-specific datasets, such as legal documents or medical records.

Deploying LightOnOCR-2-1B: Transformers vs vLLM

LightOnOCR-2-1B can be deployed in multiple ways depending on the use case. For experimentation or local development, the model integrates directly with the Hugging Face Transformers ecosystem. For production workloads, it can also be served efficiently using vLLM.

1. Using Hugging Face Transformers

The simplest way to run the model is through the Transformers library, which now includes native support for LightOnOCR-2-1B models. This approach is useful for experimentation, prototyping, and local testing without complex infrastructure.

Key points

  • Native integration with Hugging Face Transformers
  • Works well for local experimentation
  • Can be loaded using standard classes such as AutoModel or LightOnOcrForConditionalGeneration

Temperature configuration

While temperature 0 is commonly used for deterministic OCR tasks, it may cause generation loops in some cases. A temperature value around 0.2 helps maintain stability while preserving document structure during generation.

2. Serving with vLLM for Production

For production environments, vLLM provides a more efficient way to serve the model. Official support for LightOnOCR-2-1B begins from vLLM v0.11.1, enabling higher throughput and better GPU utilization.

Advantages

  • High-throughput inference
  • Efficient VRAM usage through PagedAttention
  • Ability to process multiple document pages simultaneously

vLLM also allows the model to be deployed as an OpenAI-compatible API, making it easier to integrate into existing pipelines. In many cases, this allows developers to replace external OCR APIs such as Llama Parse or Gemini with a locally hosted endpoint.

Example serving command

vllm serve lightonai/LightOnOCR-2-1B

Innovations in AI
Exploring the future of artificial intelligence
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 7 Mar 2026
10PM IST (60 mins)

With this setup, the model can function as a local document parsing service capable of handling large OCR workloads.

Real-World Structural Integrity: Document Layout Testing

Beyond text extraction, document parsing also depends on how well a model preserves the visual structure of the page. To evaluate this, I ran side-by-side comparisons using LightOnOCR-2-1B and Llama Parse v2 across several structured documents.

In the examples below, LightOnOCR-2-1B outputs are shown on the left, while Llama Parse v2 outputs appear on the right.

Example 1: Tax Form Layout Preservation

Tax forms contain tightly aligned fields, nested labels, and strict horizontal relationships between elements. These layouts often expose weaknesses in OCR pipelines.

In this example using a section of a standard tax return form, both systems successfully extracted the text. However, differences appear when reconstructing the table structure.

Llama Parse v2 attempted to rebuild the layout using complex rowspan-based table structures. While accurate, this sometimes produced Markdown that is harder to read and process for downstream RAG pipelines.

LightOnOCR-2-1B preserved a simpler row-based structure that maintained the visual relationship between fields such as “Firm’s name” and “Firm’s address.” The output remains structurally consistent while staying easier to interpret in Markdown format.

Example2: 

Gemini output:

Form 1040EZ (2010)

Income Tax Return for Single and Joint Filers With No Dependents

Name, Address, and SSNYour first name and initialLast nameYour social security number

See separate instructions.




If a joint return, spouse's first name and initial

Last name

Spouse's social security number






Home address (number and street). If you have a P.O. box, see instructions.

Apt. no.

Make sure the SSN(s) above are correct.


City, town or post office, state, and ZIP code. If you have a foreign address, see instructions.




See separate instructions.

Your first name and initial

Last name

Your social security number

1 of 5

Presidential Election Campaign | Check here if you, or your spouse if a joint return, want $3 to go to this fund . . [ ] You [ ] Spouse

Income1Wages, salaries, and tips. This should be shown in box 1 of your Form(s) W-2. Attach your Form(s) W-2.1

Attach Form(s) W-2 here.

2

Taxable interest. If the total is over $1,500, you cannot use Form 1040EZ.

2


Enclose, but do not attach, any payment.

3

Unemployment compensation and Alaska Permanent Fund dividends (see page 11).

3


4

Add lines 1, 2, and 3. This is your adjusted gross income.

4



5

If someone can claim you (or your spouse if a joint return) as a dependent, check the applicable box(es) below and enter the amount from the worksheet on back. [ ] You [ ] Spouse If no one can claim you (or your spouse if a joint return), enter $9,350 if single; $18,700 if married filing jointly. See back for explanation.

5



6

Subtract line 5 from line 4. If line 5 is larger than line 4, enter -0-. This is your taxable income.

6



Attach Form(s) W-2 here.

1

2

Wages, salaries, and tips. This should be shown in box 1 of your Form(s) W-2. Attach your Form(s) W-2.

Taxable interest. If the total is over $1,500, you cannot use Form 1040EZ.

1

2


1 of 5

Payments, Credits, and Tax7Federal income tax withheld from Form(s) W-2 and 1099.7

8

Making work pay credit (see worksheet on back).

8



9a

Earned income credit (EIC) (see page 13).

9a



b

Nontaxable combat pay election. 9b




10

Add lines 7, 8, and 9a. These are your total payments and credits.

10



11

Tax. Use the amount on line 6 above to find your tax in the tax table on pages 27 through 35 of the instructions. Then, enter the tax from the table on this line.

11



8

7

Making work pay credit (see worksheet on back).

Federal income tax withheld from Form(s) W-2 and 1099.

8

7


1 of 5

Refund12aIf line 10 is larger than line 11, subtract line 11 from line 10. This is your refund. If Form 8888 is attached, check here [ ]12a

b

Routing number [ ][ ][ ][ ][ ][ ][ ][ ][ ]

c Type: [ ] Checking [ ] Savings



d

Account number [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]




b

12a

Routing number [ ][ ][ ][ ][ ][ ][ ][ ][ ]

If line 10 is larger than line 11, subtract line 11 from line 10. This is your refund. If Form 8888 is attached, check here [ ]

c Type: [ ] Checking [ ] Savings

12a


1 of 2

Amount You Owe

13

If line 11 is larger than line 10, subtract line 10 from line 11. This is the amount you owe. For details on how to pay, see page 19.

13



Third Party DesigneeDo you want to allow another person to discuss this return with the IRS (see page 20)? [ ] Yes. Complete the following. [ ] No

Designee's name: ________________ Phone no.: ________________ Personal identification number (PIN): _____


Designee's name: ________________ Phone no.: ________________ Personal identification number (PIN): _____

Do you want to allow another person to discuss this return with the IRS (see page 20)? [ ] Yes. Complete the following. [ ] No

1 of 1

Sign HereUnder penalties of perjury, I declare that I have examined this return, and to the best of my knowledge and belief, it is true, correct, and accurately lists all amounts and sources of income I received during the tax year. Declaration of preparer (other than the taxpayer) is based on all information of which the preparer has any knowledge.

Your signature

Date



Spouse's signature. If a joint return, both must sign.

Date



Your signature

Under penalties of perjury, I declare that I have examined this return, and to the best of my knowledge and belief, it is true, correct, and accurately lists all amounts and sources of income I received during the tax year. Declaration of preparer (other than the taxpayer) is based on all information of which the preparer has any knowledge.

Date

1 of 4

Paid Preparer Use OnlyPrint/Type preparer's namePreparer's signatureDateCheck [ ] if self-employedPTIN

Firm's name >



Firm's EIN >



Firm's address >



Phone no.



Firm's name >

Print/Type preparer's name

Preparer's signature

Date

Firm's EIN >

Check [ ] if self-employed

PTIN

1 of 2

For Disclosure, Privacy Act, and Paperwork Reduction Act Notice, see page 36. | Cat. No. 11329W | Form 1040EZ (2010)

Gemini Output Observation

The Gemini OCR result missed several sentences present in the document header and did not properly follow the original structural layout. Additionally, some content sections were not captured completely, leading to partial information loss.

LightOnOCR:

Markdown:

# Department of the Treasury--Internal Revenue Service

## Form 1040EZ

### Income Tax Return for Single and Joint Filers With No Dependents (99) 2010

OMB No. 1545-0074

---

**Name, Address, and SSN**

*See separate instructions.*

**Presidential Election Campaign** 
*(see page 9)*

**PRINT CLEARLY**

<table>
  <thead>
    <tr>
      <th>Your first name and initial</th>
      <th>Last name</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>If a joint return, spouse's first name and initial</td>
      <td>Last name</td>
    </tr>
    <tr>
      <td>Home address (number and street). If you have a P.O. box, see instructions.</td>
      <td>Apt. no.</td>
    </tr>
    <tr>
      <td>City, town or post office, state, and ZIP code. If you have a foreign address, see instructions.</td>
      <td></td>
    </tr>
  </tbody>
</table>


**Your social security number**

**Spouse's social security number**

> ▲ Make sure the SSN(s) above are correct. ▲

> Checking a box below will not change your tax or refund.

---

Check here if you, or your spouse if a joint return, want $3 to go to this fund . . . ▶ ☐ You ☐ Spouse

---

## Income

*Attach Form(s) W-2 here.*

*Enclose, but do not attach, any payment.*

You may be entitled to a larger deduction if you file Form 1040A or 1040. See Before You Begin on page 4.

1. Wages, salaries, and tips. This should be shown in box 1 of your Form(s) W-2. Attach your Form(s) W-2. 1

2. Taxable interest. If the total is over $1,500, you cannot use Form 1040EZ. 2

3. Unemployment compensation and Alaska Permanent Fund dividends (see page 11). 3

4. Add lines 1, 2, and 3. This is your adjusted gross income. 4

5. If someone can claim you (or your spouse if a joint return) as a dependent, check the applicable box(es) below and enter the amount from the worksheet on back. 
☐ You ☐ Spouse 
If no one can claim you (or your spouse if a joint return), enter $9,350 if single; $18,700 if married filing jointly. See back for explanation. 5

6. Subtract line 5 from line 4. If line 5 is larger than line 4, enter -0-. This is your taxable income. ▶ 6

---

## Payments, Credits, and Tax

7. Federal income tax withheld from Form(s) W-2 and 1099. 7

8. Making work pay credit (see worksheet on back). 8

9a. Earned income credit (EIC) (see page 13). 9a

b. Nontaxable combat pay election. 9b

10. Add lines 7, 8, and 9a. These are your total payments and credits. ▶ 10

11. Tax. Use the amount on line 6 above to find your tax in the tax table on pages 27 through 35 of the instructions. Then, enter the tax from the table on this line. 11

---

## Refund

*Have it directly deposited! See page 18 and fill in 12b, 12c, and 12d or Form 8888.*

12a. If line 10 is larger than line 11, subtract line 11 from line 10. This is your refund. 
If Form 8888 is attached, check here ▶ ☐ 12a

▶ b Routing number ▶ c Type: ☐ Checking ☐ Savings

▶ d Account number

---

## Amount You Owe

13. If line 11 is larger than line 10, subtract line 10 from line 11. This is the amount you owe. For details on how to pay, see page 19. ▶ 13

---

## Third Party Designee

Do you want to allow another person to discuss this return with the IRS (see page 20)? ☐ Yes. Complete the following. ☐ No

Designee's name ▶ 
Phone no. ▶ 
Personal identification number (PIN) ▶

---

## Sign Here

*Joint return? See page 6.*

*Keep a copy for your records.*

Under penalties of perjury, I declare that I have examined this return, and to the best of my knowledge and belief, it is true, correct, and accurately lists all amounts and sources of income I received during the tax year. Declaration of preparer (other than the taxpayer) is based on all information of which the preparer has any knowledge.

Your signature 
Date 
Your occupation 
Daytime phone number

Spouse's signature. If a joint return, both must sign. 
Date 
Spouse's occupation

---

## Paid Preparer Use Only

<table>
  <thead>
    <tr>
      <th>Print/Type preparer's name</th>
      <th>Preparer's signature</th>
      <th>Date</th>
      <th>Check ☐ if self-employed</th>
      <th>PTIN</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Firm's name ▶</td>
      <td></td>
      <td></td>
      <td>Firm's EIN ▶</td>
      <td></td>
    </tr>
    <tr>
      <td>Firm's address ▶</td>
      <td></td>
      <td></td>
      <td>Phone no.</td>
      <td></td>
    </tr>
  </tbody>
</table>


---

For Disclosure, Privacy Act, and Paperwork Reduction Act Notice, see page 36.

Cat. No. 11329W 
Form 1040EZ (2010)

LightOnOCR-2-1B Observation

LightOnOCR-2-1B successfully extracted all textual content while preserving the overall document structure. The formatting is largely comparable to Llama Parse v2, with only minor structural differences in the generated Markdown. Despite being a lightweight model, the output demonstrates strong text extraction accuracy and reliable layout preservation.

Llama Parse V2:

Markdown

Department of the Treasury--Internal Revenue Service
**Form 1040EZ** **Income Tax Return for Single and Joint Filers With No Dependents** (99) **2010** OMB No. 1545-0074


<table>
  <tbody>
    <tr>
        <td rowspan="4">**Name, Address, and SSN**<br/><br/>See separate instructions.</td>
        <td rowspan="4">P<br/>R<br/>I<br/>N<br/>T<br/><br/>C<br/>L<br/>E<br/>A<br/>R<br/>L<br/>Y</td>
        <td>Your first name and initial</td>
        <td>Last name</td>
        <td>Your social security number</td>
    </tr>
    <tr>
        <td>If a joint return, spouse's first name and initial</td>
        <td>Last name</td>
        <td>Spouse's social security number</td>
    </tr>
    <tr>
        <td>Home address (number and street). If you have a P.O. box, see instructions.</td>
        <td>Apt. no.</td>
        <td rowspan="2">▲ Make sure the SSN(s) above are correct. ▲<br/><br/>Checking a box below will not change your tax or refund.</td>
    </tr>
    <tr>
        <td colspan="2">City, town or post office, state, and ZIP code. If you have a foreign address, see instructions.</td>
    </tr>
  </tbody>
</table>

**Presidential Election Campaign** (see page 9)
Check here if you, or your spouse if a joint return, want $3 to go to this fund . . . ▶ [ ] **You** [ ] **Spouse**

---

**Income**
**Attach Form(s) W-2 here.**
Enclose, but do not attach, any payment.

You may be entitled to a larger deduction if you file Form 1040A or 1040. See *Before You Begin* on page 4.

<table>
  <tbody>
    <tr>
        <td>**1**</td>
        <td>Wages, salaries, and tips. This should be shown in box 1 of your Form(s) W-2. Attach your Form(s) W-2.</td>
        <td>1</td>
        <td></td>
    </tr>
    <tr>
        <td>**2**</td>
        <td>Taxable interest. If the total is over $1,500, you cannot use Form 1040EZ.</td>
        <td>2</td>
        <td></td>
    </tr>
    <tr>
        <td>**3**</td>
        <td>Unemployment compensation and Alaska Permanent Fund dividends (see page 11).</td>
        <td>3</td>
        <td></td>
    </tr>
    <tr>
        <td>**4**</td>
        <td>Add lines 1, 2, and 3. This is your adjusted gross income.</td>
        <td>4</td>
        <td></td>
    </tr>
    <tr>
        <td>**5**</td>
        <td>If someone can claim you (or your spouse if a joint return) as a dependent, check the applicable box(es) below and enter the amount from the worksheet on back.<br/>[ ] **You** [ ] **Spouse**<br/>If no one can claim you (or your spouse if a joint return), enter $9,350 if single; $18,700 if married filing jointly. See back for explanation.</td>
        <td>5</td>
        <td></td>
    </tr>
    <tr>
        <td>**6**</td>
        <td>Subtract line 5 from line 4. If line 5 is larger than line 4, enter -0-.<br/>**This is your taxable income.**</td>
        <td>▶</td>
        <td>6</td>
    </tr>
  </tbody>
</table>

---

**Payments, Credits, and Tax**

<table>
  <tbody>
    <tr>
        <td>**7**</td>
        <td>Federal income tax withheld from Form(s) W-2 and 1099.</td>
        <td>7</td>
        <td></td>
    </tr>
    <tr>
        <td>**8**</td>
        <td>Making work pay credit (see worksheet on back).</td>
        <td>8</td>
        <td></td>
    </tr>
    <tr>
        <td>**9a**</td>
        <td>Earned income credit (EIC) (see page 13).</td>
        <td>9a</td>
        <td></td>
    </tr>
    <tr>
        <td>**b**</td>
        <td>Nontaxable combat pay election.</td>
        <td>9b</td>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <td>**10**</td>
        <td>Add lines 7, 8, and 9a. These are your total payments and credits.</td>
        <td>▶</td>
        <td>10</td>
    </tr>
    <tr>
        <td>**11**</td>
        <td>Tax. Use the amount on line 6 above to find your tax in the tax table on pages 27 through 35 of the instructions. Then, enter the tax from the table on this line.</td>
        <td>11</td>
        <td></td>
    </tr>
  </tbody>
</table>

---

**Refund**
Have it directly deposited! See page 18 and fill in 12b, 12c, and 12d or Form 8888.

<table>
  <tbody>
    <tr>
        <td>**12a**</td>
        <td>If line 10 is larger than line 11, subtract line 11 from line 10. This is your refund.<br/>If Form 8888 is attached, check here ▶ [ ]</td>
        <td>12a</td>
        <td></td>
    </tr>
    <tr>
        <td>**b**</td>
        <td>▶ Routing number</td>
        <td>[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]</td>
        <td>▶ **c** Type: [ ] Checking [ ] Savings</td>
    </tr>
    <tr>
        <td>**d**</td>
        <td>▶ Account number</td>
        <td colspan="2">[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]</td>
    </tr>
  </tbody>
</table>

---

**Amount You Owe**

<table>
  <tbody>
    <tr>
        <td>**13**</td>
        <td>If line 11 is larger than line 10, subtract line 10 from line 11. This is the amount you owe. For details on how to pay, see page 19.</td>
        <td>▶</td>
        <td>13</td>
    </tr>
  </tbody>
</table>

---

**Third Party Designee**
Do you want to allow another person to discuss this return with the IRS (see page 20)? [ ] **Yes.** Complete the following. [ ] **No**

<table>
  <tbody>
    <tr>
        <td>Designee's name</td>
        <td>▶</td>
        <td>Phone no.</td>
        <td>▶</td>
        <td>Personal identification number (PIN)</td>
        <td>▶</td>
    </tr>
  </tbody>
</table>

---

**Sign Here**
Joint return? See page 6. Keep a copy for your records.

Under penalties of perjury, I declare that I have examined this return, and to the best of my knowledge and belief, it is true, correct, and accurately lists all amounts and sources of income I received during the tax year. Declaration of preparer (other than the taxpayer) is based on all information of which the preparer has any knowledge.

<table>
  <tbody>
    <tr>
        <td> Your signature</td>
        <td> Date</td>
        <td> Your occupation</td>
        <td> Daytime phone number</td>
    </tr>
    <tr>
        <td>[rowspan=2]</td>
        <td></td>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <td colspan="3"></td>
    </tr>
    <tr>
        <td> Spouse's signature. If a joint return, both must sign.</td>
        <td> Date</td>
        <td> Spouse's occupation</td>
        <td></td>
    </tr>
    <tr>
        <td>[rowspan=2]</td>
        <td></td>
        <td></td>
        <td></td>
    </tr>
    <tr>
        <td colspan="3"></td>
    </tr>
  </tbody>
</table>

---

**Paid Preparer Use Only**

<table>
  <tbody>
    <tr>
        <td> Print/Type preparer's name</td>
        <td> Preparer's signature</td>
        <td> Date</td>
        <td> Check [ ] if self-employed</td>
        <td> PTIN</td>
    </tr>
    <tr>
        <td></td>
        <td></td>
        <td></td>
        <td>Firm's EIN ▶</td>
        <td></td>
    </tr>
    <tr>
        <td>Firm's name ▶</td>
        <td colspan="2"></td>
        <td>Phone no.</td>
        <td></td>
    </tr>
    <tr>
        <td>Firm's address ▶</td>
        <td colspan="4"></td>
    </tr>
  </tbody>
</table>


For Disclosure, Privacy Act, and Paperwork Reduction Act Notice, see page 36. Cat. No. 11329W **Form 1040EZ** (2010)

Llama Parse v2 Observation

Llama Parse v2 extracted all content without omissions and reconstructed the document structure accurately. The generated Markdown closely follows the original layout, preserving headings, sections, and text flow. This makes the output reliable for downstream tasks such as document indexing, structured extraction, and RAG pipelines.

LightOnOCR vs Llama Parse v2 vs Gemini: Model Comparison

FeatureGeminiLlama Parse v2LightOnOCR

Content Extraction

Some content missed

No content missed

All text extracted

Header Recognition

Incomplete

Accurate

Accurate

Structure Preservation

Poor

Well structured

Nearly well-structured

Output Reliability

Medium

High

High

Model Type

API-based

API-based

Open-source

Cost

Paid API usage

Paid API usage

Free / Open-source

Content Extraction

Gemini

Some content missed

Llama Parse v2

No content missed

LightOnOCR

All text extracted

1 of 6

Medical Authorization Records: Handling Complex Layouts

Medical authorization forms often contain dense layouts with multiple columns and sections where information on the left and right sides of the page are unrelated. These formats can be challenging for OCR systems that rely on rigid table reconstruction.

In the “Authorization for the Release of Medical Records” example, both models extracted the text, but differences appeared in how the layout was reconstructed.

Split column handling

LightOnOCR-2-1B followed the natural visual flow of the document, keeping address and contact information separated from the main body content.

Checkbox lists

Checkbox sections are difficult for many OCR pipelines. LightOnOCR-2-1B represented these as a clear bullet list, while Llama Parse attempted to convert them into table structures, which disrupted the logical flow of the "check all that apply" section.

Row integrity

For signature fields and dates, LightOnOCR-2-1B maintained proper row alignment without merging text across rows, avoiding the line-shift issues sometimes seen in reconstructed layouts.

LightOnOCR: (Medical Record)

Markdown:

# Acupuncture Clinic for Pain Relief & Sports Medicine

## Authorization for the Release of Medical Records

This authorization must be written, dated and signed by the patient or by a person authorized by law to give authorization. It is valid until revoked in writing. Records are requested for continuity of care. This clinic does not offer reimbursement for records received.

Patient: ___________________________ Social Security #: ___ - ___ - ___ DOB: ___ / ___ / ___

Please obtain information from the following:

Name of Physician ___________________________

Name of Clinic/Hospital ___________________________

Street Address ___________________________

City, State, Zip Code ___________________________

Please send my medical information to:

Name of Person to Receive Information @

Robert Fueston 
3166 Custer Dr., Suite 201 
Lexington, KY 40517 
Phone: 859-273-1011 / Fax: 859-273-1041 
Website: www.acupuncturelev.com

---

By checking the spaces below, I authorize the above physician/clinic/hospital to release written records pertaining to the following information going back one year. I also authorize the above physician/clinic/hospital to provide the following information via telephone consultation:

- [ ] Medical records needed for continuity of care 
- [ ] Diagnostic imaging reports 
- [ ] Pathology reports 
- [ ] Laboratory reports 
- [ ] Other: ___________________________

Date ___________________________ Patient Signature ___________________________

Signature of Parent/Guardian if Applicable ___________________________

---

I understand that certain information in these records cannot be released without specific authorization because of federal or state laws. By signing the spaces below, I specifically authorize the release of the following confidential information for us by above said physician/clinic/hospital. I also authorize the above physician/clinic/hospital to provide the following information via telephone consultation:

Patient Signature ___________________________

HIV/AIDS test results and related information, including high risk behavior documentation. This information may not be further disclosed without The specific written authorization of the tested individual

Patient Signature ___________________________

Drug/Alcohol diagnosis, treatment, or referral information. Federal Regulation, 42 CFR Part 2, requires a description of how much and what kind Of information is to be disclosed. Please provide a description of this information:

___________________________________________________________

Mental Health treatment information

Patient Signature ___________________________

---

Office use only: Date sent: ___________________________ Initials: ___________________________

Llama parse v2:(Medical Record)

Markdown:

# Acupuncture Clinic for Pain Relief & Sports Medicine
## Authorization for the Release of Medical Records

This authorization must be written, dated and signed by the patient or by a person authorized by law to give authorization. It is valid until revoked in writing. Records are requested for continuity of care. This clinic does not offer reimbursement for records received.

<table>
  <tbody>
    <tr>
        <td>Patient: ________________________________________</td>
        <td>Social Security #: ____ - ____ - ____</td>
        <td>DOB: ____ / ____ / ____</td>
    </tr>
  </tbody>
</table>
<table>
  <thead>
    <tr>
        <th>Please **obtain** information **from** the following:</th>
        <th>Please **send** my medical information **to**:</th>
    </tr>
  </thead>
  <tbody>
    <tr>
        <td>__________________________________________________<br/>Name of Physician</td>
        <td>__________________________________________________ @<br/>Name of Person to Receive Information</td>
    </tr>
    <tr>
        <td>__________________________________________________<br/>Name of Clinic/Hospital</td>
        <td>**Robert Fueston**<br/>**3166 Custer Dr., Suite 201**<br/>**Lexington, KY 40517**<br/>**Phone: 859-273-1011 / Fax: 859-273-1041**</td>
    </tr>
    <tr>
        <td>__________________________________________________<br/>Street Address</td>
        <td></td>
    </tr>
    <tr>
        <td>__________________________________________________<br/>City, State, Zip Code</td>
        <td>**Website: www.acupunctureky.com**</td>
    </tr>
  </tbody>
</table>

By **checking** the spaces below, I authorize the above physician/clinic/hospital to release written records pertaining to the following information **going back one year**. I also authorize the above physician/clinic/hospital to provide the following information via telephone consultation:

<table>
  <tbody>
    <tr>
        <td>[ ] Medical records needed for continuity of care</td>
        <td>[ ] Diagnostic imaging reports</td>
        <td>[ ] Pathology reports</td>
    </tr>
    <tr>
        <td>[ ] Laboratory reports</td>
        <td colspan="2">[ ] Other: ____________________________________________________________________________________________________________________</td>
    </tr>
  </tbody>
</table>
<table>
  <tbody>
    <tr>
        <td>________________________________</td>
        <td>________________________________________________________________________________<br/>Date</td>
        <td>Patient Signature</td>
    </tr>
    <tr>
        <td></td>
        <td>________________________________________________________________________________<br/>Signature of Parent/Guardian if Applicable</td>
        <td></td>
    </tr>
  </tbody>
</table>

I understand that certain information in these records cannot be released without specific authorization because of federal or state laws. By **signing** the spaces below, I specifically authorize the release of the following confidential information for us by above said physician/clinic/hospital. I also authorize the above physician/clinic/hospital to provide the following information via telephone consultation:

<table>
  <tbody>
    <tr>
        <td>__________________________________________________<br/>Patient Signature</td>
        <td>HIV/AIDS test results and related information, including high risk behavior documentation. **This information may not be further disclosed without the specific written authorization of the tested individual**</td>
    </tr>
    <tr>
        <td>__________________________________________________<br/>Patient Signature</td>
        <td>Drug/Alcohol diagnosis, treatment, or referral information. Federal Regulation, 42 CFR Part 2, requires a description of how much and what kind of information is to be disclosed. Please provide a description of this information:<br/>________________________________________________________________________________<br/>________________________________________________________________________________</td>
    </tr>
    <tr>
        <td>__________________________________________________<br/>Patient Signature</td>
        <td>Mental Health treatment information</td>
    </tr>
  </tbody>
</table>
<table>
  <tbody>
    <tr>
        <td>**Office use only:**</td>
        <td>Date sent: ____________________</td>
        <td>Initials: ____________________</td>
    </tr>
  </tbody>
</table>

Comments:

This example shows that model size alone does not determine document parsing quality. By avoiding complex agentic processing, LightOnOCR’s 1B architecture produces a direct and consistent structural representation of the document, which can be beneficial for developers who require clean and predictable Markdown output.

Innovations in AI
Exploring the future of artificial intelligence
Murtuza Kutub
Murtuza Kutub
Co-Founder, F22 Labs

Walk away with actionable insights on AI adoption.

Limited seats available!

Calendar
Saturday, 7 Mar 2026
10PM IST (60 mins)

Example 3:

LightOnOCR:

Markdown:

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th></th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td>IKAN GURAME MED</td>
      <td>158,000</td>
    </tr>
    <tr>
      <td></td>
      <td>SOP IKAN</td>
      <td></td>
    </tr>
    <tr>
      <td>1</td>
      <td>CUMI GR JUNJAN</td>
      <td>129,000</td>
    </tr>
    <tr>
      <td></td>
      <td>=*MEDIUM*=</td>
      <td></td>
    </tr>
    <tr>
      <td>1</td>
      <td>CUMI GR TEPUNG</td>
      <td>129,000</td>
    </tr>
    <tr>
      <td></td>
      <td>=*MEDIUM*=</td>
      <td></td>
    </tr>
    <tr>
      <td>1</td>
      <td>ABSIO TH PC JMR</td>
      <td>147,000</td>
    </tr>
    <tr>
      <td></td>
      <td>=*MEDIUM*=</td>
      <td></td>
    </tr>
    <tr>
      <td>1</td>
      <td>POCAI BWG PUTIH</td>
      <td>90,000</td>
    </tr>
    <tr>
      <td></td>
      <td>=*MEDIUM*=</td>
      <td></td>
    </tr>
    <tr>
      <td>1</td>
      <td>LUMPIA UDG PREM</td>
      <td>144,000</td>
    </tr>
    <tr>
      <td></td>
      <td>=*MEDIUM*=</td>
      <td></td>
    </tr>
    <tr>
      <td>6</td>
      <td>NASI PUTIH</td>
      <td>10,000 60,000</td>
    </tr>
    <tr>
      <td>3</td>
      <td>HOT TEA</td>
      <td>12,000 36,000</td>
    </tr>
    <tr>
      <td>1</td>
      <td>AQUA</td>
      <td>11,000</td>
    </tr>
    <tr>
      <td>2</td>
      <td>ICED TEA</td>
      <td>12,000 24,000</td>
    </tr>
    <tr>
      <td>1</td>
      <td>ICED TEA</td>
      <td>12,000</td>
    </tr>
    <tr>
      <td></td>
      <td>FOOD</td>
      <td>797,000</td>
    </tr>
    <tr>
      <td></td>
      <td>BEVERAGES</td>
      <td>83,000</td>
    </tr>
    <tr>
      <td></td>
      <td>OTHERS</td>
      <td>60,000</td>
    </tr>
    <tr>
      <td></td>
      <td>SUBTOTAL</td>
      <td>940,000</td>
    </tr>
    <tr>
      <td></td>
      <td>SERVICE CHARGE</td>
      <td>56,400</td>
    </tr>
    <tr>
      <td></td>
      <td>Tax 10%</td>
      <td>99,640</td>
    </tr>
    <tr>
      <td></td>
      <td>DU</td>
      <td>1,096,040</td>
    </tr>
  </tbody>
</table>

LightOnOCR-2-1B Observation

LightOnOCR-2-1B extracted all text accurately without content loss. In some cases, certain table values appeared on the following row rather than the same row, but the overall structure remained clear and usable for interpretation and downstream processing.

Llamaparse v2:

Markdown:

<table>
  <tbody>
    <tr>
        <td>1</td>
        <td>IKAN GURAME MED<br/>SOP IKAN</td>
        <td>158,000</td>
        <td></td>
    </tr>
    <tr>
        <td>1</td>
        <td>CUMI GR JUNJAN<br/>=*MEDIUM*=</td>
        <td>129,000</td>
        <td></td>
    </tr>
    <tr>
        <td>1</td>
        <td>CUMI GR TEPUNG<br/>=*MEDIUM*=</td>
        <td>129,000</td>
        <td></td>
    </tr>
    <tr>
        <td>1</td>
        <td>AGSIO TH PC JMR<br/>=*MEDIUM*=</td>
        <td>147,000</td>
        <td></td>
    </tr>
    <tr>
        <td>1</td>
        <td>POCAI BWG PUTIH<br/>=*MEDIUM*=</td>
        <td>90,000</td>
        <td></td>
    </tr>
    <tr>
        <td>1</td>
        <td>LUMPIA UDG PREM<br/>=*MEDIUM*=</td>
        <td>144,000</td>
        <td></td>
    </tr>
    <tr>
        <td>6</td>
        <td>NASI PUTIH</td>
        <td>10,000</td>
        <td>60,000</td>
    </tr>
    <tr>
        <td>3</td>
        <td>HOT TEA</td>
        <td>12,000</td>
        <td>36,000</td>
    </tr>
    <tr>
        <td>1</td>
        <td>AQUA</td>
        <td></td>
        <td>11,000</td>
    </tr>
    <tr>
        <td>2</td>
        <td>ICED TEA</td>
        <td>12,000</td>
        <td>24,000</td>
    </tr>
    <tr>
        <td>1</td>
        <td>ICED TEA</td>
        <td></td>
        <td>12,000</td>
    </tr>
    <tr>
        <td></td>
        <td>FOOD</td>
        <td></td>
        <td>797,000</td>
    </tr>
    <tr>
        <td></td>
        <td>BEVERAGES</td>
        <td></td>
        <td>83,000</td>
    </tr>
    <tr>
        <td></td>
        <td>OTHERS</td>
        <td></td>
        <td>60,000</td>
    </tr>
    <tr>
        <td></td>
        <td>SUBTOTAL</td>
        <td></td>
        <td>940,000</td>
    </tr>
    <tr>
        <td></td>
        <td>SERVICE CHARGE</td>
        <td></td>
        <td>56,400</td>
    </tr>
    <tr>
        <td></td>
        <td>Tax 10%</td>
        <td></td>
        <td>99,640</td>
    </tr>
    <tr>
        <td></td>
        <td>DU 1,096,040</td>
        <td colspan="2"></td>
    </tr>
  </tbody>
</table>

Llama Parse v2 Observation

Llama Parse v2 extracted all text accurately and preserved the table structure effectively. Related values such as 12,000 and 24,000 were correctly aligned within the same row, closely reflecting the layout of the original bill.

Multilingual Testing: Hindi Document Support

Many lightweight OCR models perform well on Latin scripts but struggle with complex writing systems such as Devanagari. To evaluate multilingual capability, I tested LightOnOCR-2-1B using Hindi documents.

LightOn mentions improved multilingual support in this version, and the results reflect that. The model extracted Hindi text accurately while preserving the document layout. Despite the presence of ligatures and vertical markers common in Devanagari scripts, the formatting remained stable and readable.

The model handled structural elements in Hindi documents similarly to English layouts, maintaining consistent spacing and alignment. For developers working with multilingual datasets or processing documents in regions such as India, this capability is particularly valuable.

Final Verdict: Evaluating LightOnOCR-2-1B

Based on the document tests conducted in this comparison, LightOnOCR-2-1B demonstrates that smaller models can still deliver reliable document parsing when designed as an end-to-end vision-language system.

1B Architecture

Despite its relatively small parameter size, the model processes documents in a single pipeline. This reduces the structural errors that often appear in multi-stage OCR systems where detection, recognition, and layout parsing are handled separately.

Cost vs Performance

Llama Parse v2 Agentic Plus remains a strong option for complex document parsing. However, the credit-based pricing model (around 45 credits per page) can become expensive for large-scale workloads. LightOnOCR-2-1B provides an open-source alternative that can produce clean Markdown structures without per-page API costs.

Temperature Configuration

When self-hosting the model, generation temperature can affect stability. Using a temperature of 0 may occasionally lead to generation loops, while a value around 0.2 tends to produce more stable outputs without affecting document structure.

Conclusion

LightOnOCR-2-1B may not replace every document parsing solution. However, the tests show that a lightweight end-to-end OCR model can still deliver reliable text extraction and strong layout preservation.

Compared with API-based systems like Llama Parse v2 and Gemini, LightOnOCR-2-1B provides a practical alternative for teams that want predictable Markdown outputs without ongoing API costs.

For developers building document processing pipelines, especially those handling structured forms, receipts, or multilingual datasets, LightOnOCR-2-1B offers a lightweight open-source option worth considering.

Frequently Asked Questions

What is LightOnOCR-2-1B?

LightOnOCR-2-1B is an open-source vision-language OCR model designed for document parsing. It converts document images directly into structured Markdown, allowing developers to extract text, tables, and layouts without building complex multi-stage OCR pipelines.

How does LightOnOCR-2-1B differ from traditional OCR systems?

Traditional OCR pipelines usually rely on separate steps such as text detection, recognition, and layout reconstruction. LightOnOCR-2-1B processes the entire document as a single vision-language task, reducing structural errors and simplifying document parsing workflows.

Can LightOnOCR-2-1B replace paid OCR APIs like Llama Parse or Gemini?

LightOnOCR-2-1B can serve as a practical alternative for many document parsing tasks, especially when teams want to avoid per-page API costs. However, enterprise APIs like Llama Parse or Gemini may still provide advantages in certain complex document scenarios.

Does LightOnOCR-2-1B support multilingual documents?

Yes. LightOnOCR-2-1B supports multiple languages, including scripts such as Devanagari used in Hindi documents. In testing, the model preserved both text accuracy and layout structure in multilingual documents.

Can LightOnOCR-2-1B be deployed locally?

Yes. The model can be deployed locally using frameworks like Hugging Face Transformers for experimentation or vLLM for high-throughput production serving.

While temperature 0 is typically used for deterministic OCR tasks, it may occasionally cause generation loops. A temperature value around 0.2 generally provides more stable results while maintaining document structure.

Author-Seerin
Seerin

I am an AIML intern and AI enthusiast passionate about solving real-world problems using artificial intelligence and building practical, impactful solutions.

Share this article

Phone

Next for you

How To Build a Voice AI Agent (Using LiveKit)? Cover

AI

Mar 6, 20269 min read

How To Build a Voice AI Agent (Using LiveKit)?

Voice AI agents are becoming increasingly common in applications such as customer support automation, AI call centers, and real-time conversational assistants. Modern voice systems can process speech in real time, understand conversational context, handle interruptions, and respond with natural-sounding speech while maintaining low latency. I wanted to understand what it actually takes to build a production-ready voice AI agent using modern tools. In this guide, I explain how to build a voice

vLLM vs vLLM-Omni: Which One Should You Use? Cover

AI

Mar 6, 20268 min read

vLLM vs vLLM-Omni: Which One Should You Use?

Serving large language models efficiently is a major challenge when building AI applications. As usage scales, systems must handle multiple requests simultaneously while maintaining low latency and high GPU utilization. This is where inference engines like vLLM and vLLM-Omni become important. vLLM is designed to maximize performance for text-based LLM workloads, while vLLM-Omni extends the same architecture to support multimodal inputs such as images, audio, and video. In this guide, we compar

DSPy vs Normal Prompting: A Practical Comparison Cover

AI

Feb 23, 202618 min read

DSPy vs Normal Prompting: A Practical Comparison

When you build an AI agent that books flights, calls tools, or handles multi-step workflows, one question comes up quickly: how should you control the model? Most developers use prompt engineering. You write detailed instructions, add examples, adjust wording, and test until it works. Sometimes it works well. Sometimes changing a single sentence breaks the entire workflow. DSPy offers a different approach. Instead of manually crafting prompts, you define what the system should do, and the fram