Model briefingModel: Hy World 2 0ID: huggingface.co/spaces

HY-World 2.0

HY-World 2.0 is worth framing carefully. The broad promise is text, image, and video-to-world generation, but the part readers can actually test today is more concrete: use WorldMirror 2.0 to reconstruct a 3D scene from photos or video.

PublishedApril 20, 2026
Read time3 min
Tested byNeural Expedition

Field notes

What it does

HY-World 2.0 is Tencent's open world-model framework for turning visual input into persistent 3D scene representations instead of disposable video clips. The most reproducible path today is WorldMirror 2.0, which takes multi-view images or a casual video and predicts camera poses, depth, surface normals, point clouds, and Gaussian splatting output in one workflow.

That makes the release more useful as a reconstruction tool than as a finished one-click world generator. If you have a phone walkthrough of a room, a small outdoor scene, or a set of overlapping photos, the practical question is whether the model can recover enough structure to inspect, export, or use as a starting point for a heavier 3D pipeline.

How to try it

Start with the public Hugging Face Space and use one coherent capture, not a random image dump. A slow walkthrough video or a set of overlapping views from the same scene will tell you more than isolated screenshots. On the first pass, check whether the camera path looks stable, whether the depth and normals match the scene, and whether the Gaussian splat preserves the basic layout instead of turning into visual noise.

If the browser result looks promising, move to the Tencent model page and local code path for WorldMirror 2.0. The local setup is a real GPU workflow, with CUDA-oriented dependencies and optional multi-GPU execution, so treat it as a reconstruction stack rather than a lightweight web utility.

Caveat

Do not read this as the entire HY-World 2.0 promise being fully open today. WorldMirror 2.0 reconstruction is the reproducible path right now; the full text or single-image world-generation modules are still partly listed as coming soon, and local use needs a CUDA-capable setup.

What you can do with it

  • Turn a short phone capture into a rough 3D scene you can inspect.
  • Recover camera poses, depth, normals, and point clouds from multi-view footage.
  • Test whether a room, street corner, or site walkthrough is clean enough for a bigger reconstruction pipeline.
  • Compare an open 3D reconstruction workflow against video-only world model demos.

Try the demo

View model page

Neural Expedition · Useful open-source AI, curated without hype.