Export extracted content to files

Contextractor playground

Preview extraction results, adjust extraction settings, and generate ready-to-run commands.

Export


What to do:

NPM package CLI

Export stored extraction results to files using the contextractor CLI. Requires Node 22+.

Content variants

What is Contextractor?

Contextractor extracts clean, readable content from any web page — stripping away navigation, ads, and boilerplate to leave just the text you need.

It is built on the Rust port of Trafilatura for extraction, with Crawlee — a TypeScript crawler driving Playwright — handling the crawling. Ideal for building LLM training datasets, RAG pipelines, and research applications.

Run Contextractor at scale on Apify, or use the Playground to enter a URL and preview extraction results in your browser. Source code on GitHub.

What is Contextractor?

What is Trafilatura?

Trafilatura is an open-source library that extracts the main content from web pages — article text, headings, and metadata — while stripping navigation, ads, sidebars, and footers. It uses a heuristic pipeline with fallback algorithms and consistently scores highest in independent extraction benchmarks. Contextractor runs the Rust port of Trafilatura through a napi-rs binding, paired with Crawlee and Playwright for crawling — same heuristics, no Python runtime required.