Skip to content
English
  • There are no suggestions because the search field is empty.

[Public] Handling Unexpected White Spaces in IronOCR

Extra spaces caused by Arial font and workarounds

When using OCR (Optical Character Recognition) to extract text from documents, you may occasionally encounter unexpected white spaces between characters. This typically results from variations in character width and spacing, and unfortunately, applying filters to the input doesn't always resolve the issue.

Based on investigation, the following approaches have proven effective in avoiding extra spaces.

  1. Use the Roboto Font
    Roboto is a clean, modern typeface that is OCR-friendly and often yields better character recognition results.

  2. Use ReadDocumentAdvanced()
    This method offers more control over the OCR process and can enhance the handling of complex text layouts.

  3. Use OcrLanguage.EnglishBest
    This advanced language model provides better accuracy than the standard English option.

  4. Train a Custom Font
    If the input PDF uses a unique or non-standard font, default OCR settings may struggle. Custom font training allows the OCR engine to recognize specific fonts more accurately, significantly improving performance.
    Learn how to train a custom font here: OCR Custom Font Training Guide

Before: (Between 7 and 1)

After: