← Blog Engineering 7 min read

60 FPS in the Browser With 4,000 3D Models

Published April 30, 2026 · How VXLVERSE scales

A user opens our editor. They drop a forest scene with 200 trees, a dozen NPCs, scattered rocks, building props. The browser tab still hits 60fps. No crash. No GPU stall. That's the bar.

Getting there from "naive Babylon.js scene" took three iterations. This is the architecture that finally stuck.

The naive version (and why it dies)

You load a GLB. Babylon parses it, returns a mesh tree. You position it, add it to the scene. Fine. Now load the same GLB 200 times because the user wants 200 trees.

Each load goes through the network → parser → GPU buffer upload. Even with the network cached, you end up with 200 separate vertex buffers, 200 materials, and 200 draw calls per frame. GPU memory balloons because every copy keeps its own data, and the render thread has to issue every draw individually. Frame rate falls off a cliff.

Naive doesn't scale. We need three layers: a shared asset cache, instanced rendering, and refcounted disposal.

Layer 1: AssetContainer cache

Babylon's AssetContainer is the killer primitive. Load a GLB once into a container, then call instantiateModelsToScene() N times to get N independent copies that share geometry and materials.

// One container per unique asset path.
private containers = new Map<string, AssetContainer>();

async load(url: string): Promise<Mesh> {
  if (!this.containers.has(url)) {
    const container = await SceneLoader.LoadAssetContainerAsync(url, this.scene);
    this.containers.set(url, container);
  }
  return this.containers.get(url)!.instantiateModelsToScene();
}

Now all 200 trees share one GPU buffer. Network transfer happens once. GPU memory drops dramatically because there's no longer a per-instance copy of the geometry.

Layer 2: InstancedMesh for static props

The container trick saves memory but each tree still gets its own draw call. 200 trees = 200 draws per frame. Once a scene mixes trees, rocks, buildings, NPCs, and props, the per-frame draw count climbs fast — and the browser's render thread starts paying for the overhead.

Enter InstancedMesh. Same geometry, same material, drawn in ONE call with N transforms. Babylon's instantiateHierarchy() creates the right primitive per leaf — InstancedMesh for visible meshes, plain clones for transform nodes.

// Promote the first instance to invisible "template",
// instance every duplicate from it.
setHierarchyVisibility(template, false);
template.setEnabled(false);
template.freezeWorldMatrix();

for (const idx of indicesUsingThisAsset) {
  const instance = template.instantiateHierarchy(null);
  instance.position.copyFrom(spawn[idx].position);
  instance.scaling.copyFrom(spawn[idx].scale);
  setHierarchyVisibility(instance, true);
}

200 trees → 1 draw call. Each instance still gets its own world matrix (so picking, gizmos, animations work per-instance), but the GPU sees one geometry stream.

The catch: InstancedMesh shares animation state with its source. Every NPC of the same model walks in lockstep. So we batch only entities with no animation groups (props, scenery). Characters get full meshes.

Layer 3: Refcounted disposal

Now the user deletes 50 trees. We need to remove the instances. But we keep the template alive as long as ANY instance still references it. Once the last instance is gone, dispose the template too.

interface Template {
  sourceRoot: AbstractMesh;
  refCount: number;
}

onEntityRemove(idx: number) {
  const assetPath = this.assetByEntity.get(idx);
  const tmpl = this.templates.get(assetPath);
  tmpl.refCount--;
  if (tmpl.refCount <= 0) {
    tmpl.sourceRoot.dispose();
    this.templates.delete(assetPath);
  }
}

Combined: load once, instance many, dispose ref-counted. Memory stays bounded, draw calls stay flat regardless of scene size.

The performance gates we still need

Even with batching, two things will tank a browser 3D editor:

1. DOF + SSAO at retina DPR. Both are full-screen post-processes. At 1.5× retina, DOF (a multi-pass blur) and SSAO2 (a many-tap sample loop) both consume meaningful GPU per frame. Stacked, they can push frame time high enough that the render loop backs up — the FPS counter still shows 60 because the engine throttles, but the editor's fly cam starts to feel sluggish on input.

Solution: gate them off in the editor by default. Players see them in /games/[id]; authors see bloom + grain + vignette only. The visual mood survives.

2. Cascaded Shadow Maps with hundreds of caster meshes. Every caster gets rasterized into the shadow atlas every frame. A forest scene means hundreds of additional draws into the shadow map alone, before main rendering starts.

Solution: env meshes are receive-only by default. They look shadowed (the player + NPCs cast onto them) but don't burn GPU rendering themselves. Editor mode additionally caps casters to the nearest few dozen entities.

The win, qualitatively

The exact frame-rate delta depends on hardware, scene complexity, and a dozen other factors — I'm not going to invent numbers. What I can say from the codebase: with the asset cache + InstancedMesh batcher in place, a scene with hundreds of duplicated props compiles down to single-digit draw calls per shared asset, and GPU memory stays bounded because the geometry exists once.

One draw call per repeated asset. That's the win.

What this enables

VXLVERSE ships with 4,000+ free GLB models. Users compose dense scenes — buildings, trees, scattered props, NPCs, items — and the editor stays smooth on the hardware we test on. The same batcher runs in both editor and play modes, so scenes that compose well in the editor also publish cleanly to /games/[id].

Without batching, "browser 3D" feels like demoware. With batching, dense scenes become a normal thing instead of a hero feature.

Want to see it in action? Try VXLVERSE — drop a bunch of duplicated props in your scene, watch the draws-per-frame counter in the debug HUD, then move on with your life.

← Back to all articles