Learn how to extract text from any kind of file or URL with the crawler.dev JavaScript SDK.
Prerequisites
To get the most out of this guide, you'll need to:
Create a free crawler.dev account
Create an API key
Installation
Install the crawler.dev JavaScript SDK using npm:
Quick Start
Here's how to get started with text extraction using JavaScript:
import fs from 'fs' ;
import CrawlerDev from 'crawler.dev' ;
const client = new CrawlerDev ({
apiKey: process.env[ 'API_CRAWLER_DEV_SDKS_API_KEY' ], // This is the default and can be omitted
});
// Extract text from a file
const fileExtraction = await client.extract. fromFile ({
file: fs. createReadStream ( 'path/to/file' )
});
// Extract text from a URL
const urlExtraction = await client.extract. fromUrl ({
url: 'https://example.com'
});
console. log ( 'File content type:' , fileExtraction.contentType);
console. log ( 'URL text:' , urlExtraction.text);
Features
Full TypeScript support with comprehensive type definitions
Works in Node.js and browser environments
Automatic retries and error handling
Built-in request/response validation
Async/await support
Repository
Examples
import fs from 'fs' ;
import CrawlerDev from 'crawler.dev' ;
const client = new CrawlerDev ({
apiKey: process.env[ 'API_CRAWLER_DEV_SDKS_API_KEY' ]
});
const pdfFile = fs. createReadStream ( 'document.pdf' );
const result = await client.extract. fromFile ({ file: pdfFile });
console. log (result.text);
console. log ( 'Content type:' , result.contentType);
import CrawlerDev from 'crawler.dev' ;
const client = new CrawlerDev ({
apiKey: process.env[ 'API_CRAWLER_DEV_SDKS_API_KEY' ]
});
const urls = [
'https://example.com/page1' ,
'https://example.com/page2' ,
'https://example.com/page3'
];
const results = await Promise . all (
urls. map ( url => client.extract. fromUrl ({ url }))
);
results. forEach (( result , index ) => {
console. log ( `Text from ${ urls [ index ] }:` , result.text);
});
Error Handling
The SDK provides comprehensive error handling:
import CrawlerDev from 'crawler.dev' ;
const client = new CrawlerDev ({
apiKey: process.env[ 'API_CRAWLER_DEV_SDKS_API_KEY' ]
});
try {
const result = await client.extract. fromUrl ({
url: 'https://example.com'
});
console. log (result.text);
} catch (error) {
if (error instanceof CrawlerDev . APIError ) {
console. log (error.status); // 400, 401, 429, etc.
console. log (error.name); // BadRequestError, AuthenticationError, RateLimitError, etc.
if (error.status === 401 ) {
console. error ( 'Invalid API key' );
} else if (error.status === 429 ) {
console. error ( 'Rate limit exceeded' );
} else {
console. error ( 'API error:' , error.message);
}
} else {
console. error ( 'An error occurred:' , error.message);
}
}
Error codes are as follows:
Status Code Error Type 400 BadRequestError401 AuthenticationError403 PermissionDeniedError404 NotFoundError422 UnprocessableEntityError429 RateLimitError>=500 InternalServerErrorN/A APIConnectionError