[Public] Enhancing ReadDocumentAdvanced to Detect Dotted Border Tables in IronOCR

Summary

IronOCR's powerful ReadDocumentAdvanced method is specialized in extracting data from tables in documents. However, a known limitation has been its inability to detect tables with non-solid borders, particularly those using dotted or dashed lines (e.g., border: 1px dotted black). Though the texts may be able to be extracted, but to access table data in each cell will need IronOCR to be able to detect the table object first.

This article outlines an enhancement to address this limitation and possible workaround for extracting data from a dotted table.

Background

Modern PDFs and scanned documents often use varied border styles to format tables — from solid to dotted or dashed lines. While IronOCR has performed well in detecting solid-bordered tables, however, it will throw exception when trying to access a data on a simple sample input as below

Unhandled exception. System.InvalidOperationException: Sequence contains no elements

Sample input

This is due to the method inability to detect dotted line tables in a document

Workaround

The workaround below helps in:

making the dotted or dashed borders of a table to be solid internally
Detecting Table objects in the structured output

One effective technique involves applying a Dilate() filter before processing. This pre-processing step merges the gaps between the dots in the border, allowing the OCR engine to treat dotted borders as continuous lines — effectively converting visual dotted lines into solid contours.

Example Code:

var ocr = new IronTesseract();

ocr.Configuration.ReadDataTables = true;

var input = new OcrInput();

input.Load("image-20250408-144240.png");

input.Dilate();

input.SaveAsImages("export.png", AnyBitmap.ImageFormat.Png);

var res = ocr.ReadDocumentAdvanced(input);

Console.WriteLine(res.Tables.First().CellInfos.First().CellText);

The image below is the output of the image after Dilate() filter is applied.

export.png_0

This small adjustment significantly enhances IronOCR’s ability to extract full tables, even when unconventional border styles are used.