HyperFrames Open Source: Why HeyGen Lets AI Agents Create Video with HTML
Agentic Video Is HTML: Meet HyperFrames
If you’ve ever tried editing a video, you know the feeling. Timelines everywhere. Layers stacked like pancakes. Keyframes hiding in places you didn’t even know existed. It’s powerful, yes. But it’s also complicated.
Now imagine asking an AI agent to do all that.
In a recent post on X, the team behind HeyGen introduced HyperFrames, an open source framework that lets AI agents create and edit videos using something they already understand deeply, HTML, CSS, and JavaScript. You can see the original announcement here:
https://x.com/liu8in/status/2044827628700684463?s=52
The idea is surprisingly simple. Large language models were trained on the web. Billions of HTML pages. Endless CSS animations. JavaScript snippets everywhere. So instead of forcing agents to “learn” tools like After Effects, HyperFrames lets them work in their native language. They write structured HTML with a few added data attributes to define timing, layering, and transitions. That HTML then becomes a fully rendered video in MP4, MOV, or WebM format.
It sounds technical. But it’s actually elegant.
HyperFrames adds lightweight controls like data-start and data-duration to turn web elements into scenes on a timeline. Libraries like GSAP handle the animation. And because it’s all browser based, anything that works on the web can work in a video. Fonts, SVG, Three.js, even interactive style visuals.
What’s interesting is the backstory. HeyGen began with AI avatars, but quickly realized avatars alone aren’t enough. Motion graphics and visual storytelling make the difference between a plain clip and something that holds attention. With newer models like Gemini 3 and Opus 4.5, their Video Agent suddenly became capable of producing high quality multi scene videos through code generation alone.
And now it’s open source:
https://github.com/heygen-com/hyperframes
You can run it locally. No API keys required.
We’re moving toward a world where agents don’t just write text, they communicate visually. If HTML truly becomes the format of future video, this might be one of those early steps we look back on and think, that’s where it shifted.



Kommentar abschicken