Skip to main content
Extract Data From Files

Use our OCR API to extract the text from the following supported file formats:

The following file formats are supported:

  • Images: JPEG, PNG, GIF, HEIC, SVG, WEBP, TIFF

  • Microsoft Office: DOC, DOCX, XLS, XLSX, PPT, PPTX

  • Open Office: ODS, ODT, ODP

  • PDF: Both digital and image-only files are supported. PDFs may be single or multi-page and may contain multiple document types (e.g., 3 ID pages plus 1 invoice).

  • ZIP: May only contain the supported file formats

  • MSG: Outlook message files and the contents within (e.g., email's PDFs attachments)

  • Audio: MP3, OGG, FLAC, WAV

  • Video: MOV, MP4, AVI, WMV, M4V

  • Text: CSV

You may send the document's mime type and binary in Base64 encoding:

  • {"document":"data:image/jpeg;base64,/9j/4AAQSkZJR..."} for a JPEG or

  • {"document":"data:application/pdf;base64,/9j/4AAQSkZJR..."} for a PDF

or simply provide the URL of the document:

  • {"url":"https://base64.ai/static/content/features/data-extraction/models/1.png"}

  • {"url":"https://base64.ai/static/content/features/data-extraction/models/health/sbc/1.pdf"}

Content-Type: application/json header is always required. Password-protected files are not supported.