npm library
The contextractor package is both a command-line tool and a TypeScript library you can import into your own Node.js application. It is built on the Rust port of Trafilatura (extraction) and Crawlee (TypeScript crawler driving Playwright).
Install
npm install contextractor
npx playwright install chromium
Requires Node 22+. Playwright Chromium is needed for browser-based crawling.
Exports
| Export | Description |
|---|---|
buildProgram() | Build the Commander program the CLI uses; run any subcommand with parseAsync |
runExportAction(opts) | Export stored content to an output directory; returns an ExportResult |
configureStorage(dir) | Point Crawlee storage at a directory before running |
resolveStorageDir() | Resolve the storage directory using the same order as the CLI |
Dataset, KeyValueStore, Configuration | Crawlee storage classes, re-exported for reading results |
Run extraction programmatically
Drive the CLI program from code, then read the results back from the dataset:
import {
buildProgram,
configureStorage,
Dataset,
resolveStorageDir,
} from "contextractor";
const storageDir = resolveStorageDir();
configureStorage(storageDir);
const program = buildProgram();
await program.parseAsync([
"node",
"contextractor",
"extract",
"https://example.com/",
"--save-destination",
"dataset",
]);
const ds = await Dataset.open("default");
const page = await ds.getData({ limit: 100 });
console.log(`Extracted ${page.count} item(s)`);
Routing output to the dataset destination inlines the extracted content in each record, so Dataset.open(...) can read it back directly. The default destination is the key-value store, where each record references its content by key.
Export stored content
runExportAction reads the dataset record index and, for every success record, writes one file per saved format to the output directory — using the inline content or fetching the key-value-store blob by key. File names are derived from the record title (then its URL, then page), and a manifest.json listing every record is written alongside the files.
import { runExportAction } from "contextractor";
const result = await runExportAction({
outputDir: "./contextractor-output",
dataset: "default",
keyValueStore: "default",
});
console.log(`Wrote ${result.filesWritten.length} file(s) to ${result.outputDir}`);
ExportOpts accepts outputDir, dataset, keyValueStore, and storageDir. ExportResult reports outputDir, filesWritten, recordsTotal, and manifestPath.
Read Crawlee storage directly
import { Dataset, type DatasetContent, KeyValueStore } from "contextractor";
const ds = await Dataset.open("my-dataset");
await ds.forEach((item: DatasetContent) => console.log(item));
const kvs = await KeyValueStore.open("default");
const value = await kvs.getValue("my-key");
Where to go next
- npm CLI — the command-line tool, flag reference, and JSON config.
- Apify Actor — run extraction at scale on the Apify platform.
- Playground — paste HTML and preview extraction results in your browser.
Updated: June 3, 2026