Single page extraction - extract one URL

Contextractor playground

Preview extraction results, adjust extraction settings, and generate ready-to-run commands.

URL to extract

Extraction & output

Extraction mode

Content

Output

Page & browser

Applies to generated commands only β€” not used by the β€œCrawl URL” preview.

Browser behavior

Storage

Applies to generated commands only β€” not used by the β€œCrawl URL” preview.

Save


What to do:

What is Contextractor?

Contextractor extracts clean, readable content from any web page β€” stripping away navigation, ads, and boilerplate to leave just the text you need.

It is built on the Rust port of Trafilatura for extraction, with Crawlee β€” a TypeScript crawler driving Playwright β€” handling the crawling. Ideal for building LLM training datasets, RAG pipelines, and research applications.

Run Contextractor at scale on Apify, or use the Playground to enter a URL and preview extraction results in your browser. Source code on GitHub.

What is Contextractor?

What is Trafilatura?

Trafilatura is an open-source library that extracts the main content from web pages β€” article text, headings, and metadata β€” while stripping navigation, ads, sidebars, and footers. It uses a heuristic pipeline with fallback algorithms and consistently scores highest in independent extraction benchmarks. Contextractor runs the Rust port of Trafilatura through a napi-rs binding, paired with Crawlee and Playwright for crawling β€” same heuristics, no Python runtime required.