Skip to content
English
  • There are no suggestions because the search field is empty.

IronOCR on .NET: Large TIFF Files Fail to Load (Files Over 2 GB)

Overview 

IronOCR loads image input into a single in-memory buffer through its internal AnyBitmap type. That buffer is indexed by a 32-bit integer, so it caps at roughly 2 GB no matter how much system memory is available. A TIFF file larger than 2 GB exceeds this limit and fails to load. The fix is to split the TIFF into sub-2 GB chunks and pass each chunk to IronOCR as a byte array.

Environment
  • Language/Runtime: .NET (cross-platform — the limit is a runtime constraint, not OS-specific)
Cause

OcrInput.LoadImage(filePath) creates an AnyBitmap internally and materializes the whole image into one buffer. Because that buffer is indexed by int, it is effectively limited to about 2 GB regardless of available system memory. A TIFF slightly over that limit cannot be held, so the load fails. Note that LoadImage(filePath) currently returns zero loaded pages instead of throwing a clear exception — this silent failure is a known issue and a clearer exception is planned. The underlying single-buffer design is an architectural limit; native per-page TIFF streaming, which would avoid buffering the whole file, is not yet available.

Solution
  1. Recommended: Add Magick.NET so you can split the TIFF before handing it to IronOCR. Pick the variant that matches your project (Q8/Q16, AnyCPU/x64).

    dotnet add package Magick.NET-Q16-AnyCPU


  2. Open the TIFF as a MagickImageCollection, split it into page-count chunks that each stay under 2 GB, convert each chunk to a byte array, and load it with OcrInput.LoadImage(byte[]) — not the file path.
    var inputPath = "input.tiff";
       var pagesPerChunk = 100;

       using var allPages = new MagickImageCollection(inputPath);

       int chunkNumber = 1;

       for (int i = 0; i < allPages.Count; i += pagesPerChunk)
       {
           using var chunk = new MagickImageCollection();

           for (int j = i; j < Math.Min(i + pagesPerChunk, allPages.Count); j++)
           {
               chunk.Add(allPages[j].Clone());
           }

           foreach (var image in chunk)
           {
               image.SetCompression(CompressionMethod.LZW);
           }

           var chunkBytes = chunk.ToByteArray();

           using (var ocrInput = new OcrInput())
           {
               ocrInput.LoadImage(chunkBytes);

               var pages = ocrInput.GetPages().ToList();
               Console.WriteLine($"Loaded pages: {pages.Count}");

               var result = new IronTesseract().Read(ocrInput);
               Console.WriteLine("OCR Text Length: " + (result.Text?.Length ?? 0));
           }

           Console.WriteLine($"Chunk {chunkNumber} processed");
           chunkNumber++;
       }

     

  3. Tune pagesPerChunk for your data. Lower it if a chunk approaches 2 GB or if memory is tight; raise it for smaller pages to reduce overhead.
  4. Account for Magick.NET in your deployment. It ships ImageMagick native binaries, which add to package size and bring a native dependency footprint.