Web content extraction tool
Contextractor
extracts clean, readable content from any webpage – powered by Trafilatura
Copy & paste
Upload
Upload an HTML file to extract content
Trafilatura config (JSON):
Reset to defaults
{ "fast": false, "favorPrecision": false, "favorRecall": false, "includeComments": true, "includeTables": true, "includeImages": false, "includeFormatting": true, "includeLinks": true, "deduplicate": false, "withMetadata": true, "onlyWithMetadata": false, "teiValidation": false }
Output format:
Plain text
JSON with metadata
Markdown
XML
XML-TEI (scholarly)
Pick your .html file