IronOCR on .NET: Large TIFF Files Fail to Load (Files Over 2 GB)
IronOCR loads image input into a single in-memory buffer through its internal AnyBitmap type. That buffer is indexed by a 32-bit integer, so it caps at roughly 2 GB no matter how much system memory is available. A TIFF file larger than 2 GB exceeds this limit and fails to load. The fix is to split the TIFF into sub-2 GB chunks and pass each chunk to IronOCR as a byte array.
- Language/Runtime: .NET (cross-platform — the limit is a runtime constraint, not OS-specific)
OcrInput.LoadImage(filePath) creates an AnyBitmap internally and materializes the whole image into one buffer. Because that buffer is indexed by int, it is effectively limited to about 2 GB regardless of available system memory. A TIFF slightly over that limit cannot be held, so the load fails. Note that LoadImage(filePath) currently returns zero loaded pages instead of throwing a clear exception — this silent failure is a known issue and a clearer exception is planned. The underlying single-buffer design is an architectural limit; native per-page TIFF streaming, which would avoid buffering the whole file, is not yet available.
- Recommended: Add Magick.NET so you can split the TIFF before handing it to IronOCR. Pick the variant that matches your project (Q8/Q16, AnyCPU/x64).
dotnet add package Magick.NET-Q16-AnyCPU - Open the TIFF as a
MagickImageCollection, split it into page-count chunks that each stay under 2 GB, convert each chunk to a byte array, and load it withOcrInput.LoadImage(byte[])— not the file path.
var inputPath = "input.tiff";
var pagesPerChunk = 100;
using var allPages = new MagickImageCollection(inputPath);
int chunkNumber = 1;
for (int i = 0; i < allPages.Count; i += pagesPerChunk)
{
using var chunk = new MagickImageCollection();
for (int j = i; j < Math.Min(i + pagesPerChunk, allPages.Count); j++)
{
chunk.Add(allPages[j].Clone());
}
foreach (var image in chunk)
{
image.SetCompression(CompressionMethod.LZW);
}
var chunkBytes = chunk.ToByteArray();
using (var ocrInput = new OcrInput())
{
ocrInput.LoadImage(chunkBytes);
var pages = ocrInput.GetPages().ToList();
Console.WriteLine($"Loaded pages: {pages.Count}");
var result = new IronTesseract().Read(ocrInput);
Console.WriteLine("OCR Text Length: " + (result.Text?.Length ?? 0));
}
Console.WriteLine($"Chunk {chunkNumber} processed");
chunkNumber++;
} - Tune
pagesPerChunkfor your data. Lower it if a chunk approaches 2 GB or if memory is tight; raise it for smaller pages to reduce overhead. - Account for Magick.NET in your deployment. It ships ImageMagick native binaries, which add to package size and bring a native dependency footprint.