GitHub – kepano/defuddle: Extract the main content from web pages.

Extract the main content from web pages. Contribute to kepano/defuddle development by creating an account on GitHub.

**Defuddle: Turn Messy Web Pages into Clean, Usable Content**

Have you ever copied an article from the web, pasted it into your notes… and ended up with a chaotic mess of ads, menus, random buttons, and broken formatting? I have. More times than I’d like to admit. You just want the *actual content*, not the digital noise wrapped around it.

That’s exactly where Defuddle comes in.

You can check it out here:
https://github.com/kepano/defuddle

At its core, Defuddle extracts the main content from web pages. It strips away clutter like sidebars, comments, headers, footers, and other distractions, leaving you with clean, consistent HTML. Think of it like skimming the froth off a cup of coffee so you can actually taste what’s underneath.

The project is still labeled as a work in progress, which I appreciate. There’s something honest about that. It’s being actively shaped, refined, improved.

Originally built for the Obsidian Web Clipper, Defuddle focuses on producing reliable HTML that works smoothly with tools like HTML-to-Markdown converters such as Turndown. If you’ve ever wrestled with messy Markdown conversions, you’ll understand how valuable that consistency is.

It can even act as a replacement for Mozilla Readability, with some thoughtful differences. Code blocks are standardized, line numbers and heavy syntax styling are removed, but the programming language is preserved. Math elements, including MathJax and KaTeX, are converted into standard MathML. Footnotes and inline references are normalized. It’s all about structure and clarity.

There’s also a command-line interface, Node.js support, optional JSDOM integration, and different bundles depending on how much math handling you need. And if you’re working with modern client-side rendered sites, it can attempt async parsing with API fallbacks.

What I like most is the intention behind it. Defuddle isn’t just cleaning pages, it’s creating a stable foundation for whatever you want to build next. Notes. Archives. Research workflows. Even publishing pipelines.

As the web gets more complex, tools like this feel less like luxuries and more like essentials. And honestly, I’m excited to see where Defuddle evolves from here.

Kommentar abschicken