IronOCR on .NET: Large TIFF Files Fail to Load (Files Over 2 GB)

Overview

IronOCR loads image input into a single in-memory buffer through its internal AnyBitmap type. That buffer is indexed by a 32-bit integer, so it caps at roughly 2 GB no matter how much system memory is available. A TIFF file larger than 2 GB exceeds this limit and fails to load. The fix is to split the TIFF into sub-2 GB chunks and pass each chunk to IronOCR as a byte array.

Version Metadata

- Version Found: 2026.5.2
- Version Resolved: Unknown

Environment

Language/Runtime: .NET (cross-platform — the limit is a runtime constraint, not OS-specific)

Cause

OcrInput.LoadImage(filePath) creates an AnyBitmap internally and materializes the whole image into one buffer. Because that buffer is indexed by int, it is effectively limited to about 2 GB regardless of available system memory. A TIFF slightly over that limit cannot be held, so the load fails. Note that LoadImage(filePath) currently returns zero loaded pages instead of throwing a clear exception — this silent failure is a known issue and a clearer exception is planned. The underlying single-buffer design is an architectural limit; native per-page TIFF streaming, which would avoid buffering the whole file, is not yet available.

Solution

Recommended: Add Magick.NET so you can split the TIFF before handing it to IronOCR. Pick the variant that matches your project (Q8/Q16, AnyCPU/x64).

dotnet add package Magick.NET-Q16-AnyCPU

Open the TIFF as a MagickImageCollection, split it into page-count chunks that each stay under 2 GB, convert each chunk to a byte array, and load it with OcrInput.LoadImage(byte[]) — not the file path.

var inputPath = "input.tiff";
   var pagesPerChunk = 100;

   using var allPages = new MagickImageCollection(inputPath);

   int chunkNumber = 1;

   for (int i = 0; i < allPages.Count; i += pagesPerChunk)
   {
       using var chunk = new MagickImageCollection();

       for (int j = i; j < Math.Min(i + pagesPerChunk, allPages.Count); j++)
       {
           chunk.Add(allPages[j].Clone());
       }

       foreach (var image in chunk)
       {
           image.SetCompression(CompressionMethod.LZW);
       }

       var chunkBytes = chunk.ToByteArray();

       using (var ocrInput = new OcrInput())
       {
           ocrInput.LoadImage(chunkBytes);

           var pages = ocrInput.GetPages().ToList();
           Console.WriteLine($"Loaded pages: {pages.Count}");

           var result = new IronTesseract().Read(ocrInput);
           Console.WriteLine("OCR Text Length: " + (result.Text?.Length ?? 0));
       }

       Console.WriteLine($"Chunk {chunkNumber} processed");
       chunkNumber++;
   }

Tune pagesPerChunk for your data. Lower it if a chunk approaches 2 GB or if memory is tight; raise it for smaller pages to reduce overhead.
Account for Magick.NET in your deployment. It ships ImageMagick native binaries, which add to package size and bring a native dependency footprint.