Guides
Supported File Types
This document lists all file types supported by the crawler.dev API for text extraction. In general, the API aims to extract text from any filetype. If you can open the file on your computer and see text, then the API can extract text from it.
Note: Audio file formats are excluded as the API focuses on visual text conversion, not audio transcription.
Document Formats
application/pdf
Microsoft Office Documents
Word Documents
application/msword(Word 97-2003)application/vnd.ms-word.document.macroenabled.12(Word 2007+ with macros)application/vnd.ms-word.template.macroenabled.12(Word template with macros)application/vnd.openxmlformats-officedocument.wordprocessingml.document(Word 2007+)application/vnd.openxmlformats-officedocument.wordprocessingml.template(Word template)application/vnd.ms-word2006ml(Word 2006 XML)application/vnd.ms-wordml(Word XML)
Excel Spreadsheets
application/vnd.ms-excel(Excel 97-2003)application/vnd.ms-excel.sheet.2(Excel 2.0)application/vnd.ms-excel.sheet.3(Excel 3.0)application/vnd.ms-excel.sheet.4(Excel 4.0)application/vnd.ms-excel.sheet.binary.macroenabled.12(Excel binary with macros)application/vnd.ms-excel.sheet.macroenabled.12(Excel 2007+ with macros)application/vnd.ms-excel.template.macroenabled.12(Excel template with macros)application/vnd.ms-excel.addin.macroenabled.12(Excel add-in with macros)application/vnd.ms-excel.workspace.3(Excel workspace 3.0)application/vnd.ms-excel.workspace.4(Excel workspace 4.0)application/vnd.openxmlformats-officedocument.spreadsheetml.sheet(Excel 2007+)application/vnd.openxmlformats-officedocument.spreadsheetml.template(Excel template)application/vnd.ms-spreadsheetml(Excel XML)
PowerPoint Presentations
application/vnd.ms-powerpoint(PowerPoint 97-2003)application/vnd.ms-powerpoint.presentation.macroenabled.12(PowerPoint 2007+ with macros)application/vnd.ms-powerpoint.slide.macroenabled.12(PowerPoint slide with macros)application/vnd.ms-powerpoint.slideshow.macroenabled.12(PowerPoint slideshow with macros)application/vnd.ms-powerpoint.template.macroenabled.12(PowerPoint template with macros)application/vnd.ms-powerpoint.addin.macroenabled.12(PowerPoint add-in with macros)application/vnd.openxmlformats-officedocument.presentationml.presentation(PowerPoint 2007+)application/vnd.openxmlformats-officedocument.presentationml.template(PowerPoint template)application/vnd.openxmlformats-officedocument.presentationml.slide(PowerPoint slide)application/vnd.openxmlformats-officedocument.presentationml.slideshow(PowerPoint slideshow)
Other Microsoft Formats
application/rtf(Rich Text Format)application/vnd.ms-outlook(Outlook message)application/vnd.ms-outlook-pst(Outlook PST file)application/vnd.ms-project(Microsoft Project)application/vnd.ms-visio.drawing(Visio drawing)application/vnd.ms-visio.drawing.macroenabled.12(Visio drawing with macros)application/vnd.ms-visio.template(Visio template)application/vnd.ms-visio.template.macroenabled.12(Visio template with macros)application/vnd.ms-visio.stencil(Visio stencil)application/vnd.ms-visio.stencil.macroenabled.12(Visio stencil with macros)application/vnd.ms-xpsdocument(XPS document)application/x-msaccess(Microsoft Access)application/x-mspublisher(Microsoft Publisher)application/sldworks(SolidWorks)application/vnd.ms-htmlhelp(CHM help file)application/x-chm(CHM help file)application/chm(CHM help file)application/onenote; format=one(OneNote)application/vnd.ms-tnef(TNEF)application/x-tnef(TNEF)application/ms-tnef(TNEF)application/x-ms-owner(Microsoft Owner file)
OpenDocument Formats (LibreOffice, OpenOffice)
OpenDocument Text
application/vnd.oasis.opendocument.text(ODT)application/vnd.oasis.opendocument.text-template(ODT template)application/vnd.oasis.opendocument.text-master(ODT master)application/vnd.oasis.opendocument.text-web(ODT web)application/x-vnd.oasis.opendocument.text(ODT)application/x-vnd.oasis.opendocument.text-template(ODT template)application/x-vnd.oasis.opendocument.text-master(ODT master)application/x-vnd.oasis.opendocument.text-web(ODT web)
OpenDocument Spreadsheet
application/vnd.oasis.opendocument.spreadsheet(ODS)application/vnd.oasis.opendocument.spreadsheet-template(ODS template)application/x-vnd.oasis.opendocument.spreadsheet(ODS)application/x-vnd.oasis.opendocument.spreadsheet-template(ODS template)
OpenDocument Presentation
application/vnd.oasis.opendocument.presentation(ODP)application/vnd.oasis.opendocument.presentation-template(ODP template)application/x-vnd.oasis.opendocument.presentation(ODP)application/x-vnd.oasis.opendocument.presentation-template(ODP template)
OpenDocument Graphics
application/vnd.oasis.opendocument.graphics(ODG)application/vnd.oasis.opendocument.graphics-template(ODG template)application/x-vnd.oasis.opendocument.graphics(ODG)application/x-vnd.oasis.opendocument.graphics-template(ODG template)
OpenDocument Other
application/vnd.oasis.opendocument.chart(ODC)application/vnd.oasis.opendocument.chart-template(ODC template)application/x-vnd.oasis.opendocument.chart(ODC)application/x-vnd.oasis.opendocument.chart-template(ODC template)application/vnd.oasis.opendocument.formula(ODF)application/vnd.oasis.opendocument.formula-template(ODF template)application/x-vnd.oasis.opendocument.formula(ODF)application/x-vnd.oasis.opendocument.formula-template(ODF template)application/vnd.oasis.opendocument.image(ODI)application/vnd.oasis.opendocument.image-template(ODI template)application/x-vnd.oasis.opendocument.image(ODI)application/x-vnd.oasis.opendocument.image-template(ODI template)
Flat OpenDocument
application/vnd.oasis.opendocument.flat.text(Flat ODT)application/vnd.oasis.opendocument.flat.spreadsheet(Flat ODS)application/vnd.oasis.opendocument.flat.presentation(Flat ODP)
Apple iWork Formats
application/vnd.apple.keynote(Keynote)application/vnd.apple.keynote.13(Keynote 13)application/vnd.apple.keynote.18(Keynote 18)application/vnd.apple.pages(Pages)application/vnd.apple.pages.13(Pages 13)application/vnd.apple.pages.18(Pages 18)application/vnd.apple.numbers(Numbers)application/vnd.apple.numbers.13(Numbers 13)application/vnd.apple.numbers.18(Numbers 18)application/vnd.apple.iwork(Generic iWork)
Other Document Formats
application/epub+zip(EPUB)application/x-ibooks+zip(iBooks)application/vnd.wordperfect; version=5.0(WordPerfect 5.0)application/vnd.wordperfect; version=5.1(WordPerfect 5.1)application/vnd.wordperfect; version=6.x(WordPerfect 6.x)application/x-quattro-pro; version=9(Quattro Pro 9)application/x-hwp-v5(Hangul Word Processor v5)application/x-mif(FrameMaker MIF)application/vnd.mif(FrameMaker MIF)application/x-maker(FrameMaker)application/x-fictionbook+xml(FictionBook)application/x-prt(PRT)
Text Formats
Plain Text
text/plaintext/csv(CSV)text/tsv(TSV)
Markup Languages
text/html(HTML)application/xhtml+xml(XHTML)application/vnd.wap.xhtml+xml(Mobile XHTML)application/x-asp(ASP)text/xml(XML)application/xml(XML)image/svg+xml(SVG)
Code Formats
text/x-c++src(C++ source)text/x-groovy(Groovy source)text/x-java-source(Java source)
Other Text Formats
text/vnd.iptc.anpa(IPTC ANPA)application/x-xliff+xml(XLIFF 1.2)application/x-xliff+zip(XLZ)
Image Formats
Raster Images
image/png(PNG)image/jpeg(JPEG)image/jpg(JPEG)image/gif(GIF)image/bmp(BMP)image/x-ms-bmp(BMP)image/tiff(TIFF)image/webp(WebP)image/x-icon(ICO)image/vnd.wap.wbmp(WBMP)image/x-jbig2(JBIG2)image/x-xcf(XCF)image/jp2(JPEG 2000)image/jpx(JPEG 2000)image/bpg(BPG)image/x-bpg(BPG)image/heic(HEIC)image/heif(HEIF)image/heic-sequence(HEIC sequence)image/heif-sequence(HEIF sequence)image/icns(macOS icon)image/x-portable-pixmap(PPM)
Vector Images
image/svg+xml(SVG)image/vnd.dwg(AutoCAD DWG)image/emf(Enhanced Metafile)image/wmf(Windows Metafile)image/vnd.adobe.photoshop(PSD)
OCR Formats
image/ocr-jpeg(OCR JPEG)image/ocr-png(OCR PNG)image/ocr-tiff(OCR TIFF)image/ocr-gif(OCR GIF)image/ocr-bmp(OCR BMP)image/ocr-jp2(OCR JPEG 2000)image/ocr-jpx(OCR JPEG 2000)image/ocr-x-portable-pixmap(OCR PPM)
Archive and Compression Formats
Archive Formats
application/zip(ZIP)application/x-tar(TAR)application/x-7z-compressed(7-Zip)application/x-rar-compressed(RAR)application/x-cpio(CPIO)application/x-arj(ARJ)application/x-archive(Archive)application/java-archive(JAR)
Compression Formats
application/gzip(GZIP)application/x-gzip(GZIP)application/x-bzip2(BZIP2)application/x-bzip(BZIP)application/x-compress(Compress)application/x-lzma(LZMA)application/x-xz(XZ)application/x-lz4(LZ4)application/x-snappy(Snappy)application/x-brotli(Brotli)application/zlib(Zlib)application/deflate64(Deflate64)application/x-java-pack200(Pack200)
Email Formats
message/rfc822(Email message)application/mbox(Mailbox)
Database Formats
application/x-dbf(dBASE)application/x-sas-data(SAS data)application/x-matlab-data(MATLAB)
Feed Formats
application/atom+xml(Atom)application/rss+xml(RSS)
Executable and Binary Formats
application/x-msdownload(Windows executable)application/x-sharedlib(Shared library)application/x-elf(ELF)application/x-object(Object file)application/x-executable(Executable)application/x-coredump(Core dump)application/java-vm(Java class)
Other Formats
application/x-plist(Property list)application/x-bplist(Binary property list)application/x-bplist-itunes(iTunes binary property list)application/x-bplist-memgraph(Memory graph)application/x-bplist-webarchive(Web archive)application/applefile(AppleSingle)application/dif+xml(Data Interchange Format)application/pkcs7-signature(PKCS7 signature)application/pkcs7-mime(PKCS7 MIME)application/timestamped-data(Timestamped data)application/kate(Kate subtitle)application/ogg(OGG container)
File Type Detection
The API automatically detects file types using:
- Content-Type header (if provided)
- File extension (if available)
- Magic bytes (file signature detection)
Notes
- Audio formats are excluded: The API focuses on visual text extraction and does not support audio file transcription.
- OCR support: The API supports OCR (Optical Character Recognition) for images, allowing text extraction from scanned documents and images containing text.
- Archive handling: The API can extract text from files within archives (ZIP, TAR, etc.) without requiring manual extraction.
- Format variations: Many formats have multiple MIME types or version-specific types. The API handles these variations automatically.
Format Support Updates
File type support is continuously updated as new formats are added to the API. The list above represents the current supported formats.
