Unlocking Performance: WASM Threading with SharedArrayBuffer
How we use Web Workers and SharedArrayBuffer to distribute emulation, audio, and networking across separate threads — with measured performance gains at each step.
The single-threaded nature of classic browser JavaScript was for many years the most significant architectural constraint on complex browser applications. A long-running computation on the main thread blocks rendering, input handling, and user interaction simultaneously — a behavior that is merely annoying in enterprise applications but catastrophically bad for real-time gaming, where a 16ms main thread stall causes a dropped frame and a 40ms stall causes perceptible audio glitching. Web Workers changed the basic architecture in 2010, but sharing memory between workers was impossible until SharedArrayBuffer arrived in 2017 — and even then, the Spectre vulnerability response in 2018 required SharedArrayBuffer to be gated behind Cross-Origin Isolation headers, which many deployments could not easily adopt. In 2026, with those deployment constraints better understood and widely resolved, SharedArrayBuffer-based threading has become a core part of RetroCloud's performance architecture.
The Threading Architecture Overview
RetroCloud's emulation layer runs across three primary threads: the main thread handles rendering and user input; the emulation thread runs the WASM emulation core, advancing the simulation one video frame at a time; the audio thread runs the audio processing and resampling pipeline, pulling audio data from the emulation output and pushing it to the Web Audio API's AudioWorkletProcessor at sample rate. An optional fourth network thread handles WebRTC signaling and peer data channel management for multiplayer sessions.
The SharedArrayBuffer is the communication backbone between these threads. The emulation thread writes its output — frame buffer data, audio buffer data, and state flags — into a shared memory region. The main thread reads frame buffer data from shared memory for rendering without any serialization overhead. The audio thread polls the audio ring buffer in shared memory at sample rate, pulling audio samples directly without message passing. This architecture eliminates the serialization and deserialization overhead that would otherwise dominate cross-thread communication costs for the high-bandwidth data types involved.
Cross-Origin Isolation: The Deployment Prerequisite
SharedArrayBuffer requires the page to be served with Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp headers. These headers restrict the page from loading cross-origin resources that have not explicitly opted in to cross-origin embedding. For RetroCloud, this required auditing every external resource loaded by the emulation page — Unsplash images, Google Fonts, CDN-served Tailwind — and either hosting them ourselves, using CORS-enabled alternatives, or replacing them with inline resources.
The migration to Cross-Origin Isolation took approximately two weeks of engineering time and a careful review of every external dependency. The result is a cleaner resource dependency graph as a secondary benefit: we discovered and eliminated several legacy third-party script includes that had accumulated over the platform's development history. Once the headers were in place, SharedArrayBuffer functionality enabled automatically in all supporting browsers, with no further changes required.
Measured Performance Gains by Thread
The performance gains from WASM threading are measurable and significant at each layer. Moving the emulation core to a dedicated Web Worker eliminated main thread jank entirely for the CPU-intensive portions of emulation execution. In production measurement, median frame render time on the main thread dropped from 12.3ms to 4.1ms — giving the renderer substantial headroom on 60Hz displays and making 120Hz rendering practically viable on compatible hardware.
Moving audio processing to the AudioWorkletProcessor eliminated the most common cause of audio glitching in our earlier architecture. Previously, audio was generated synchronously on the main thread and pushed to the Web Audio API via ScriptProcessorNode — a deprecated API with documented timing instability. The AudioWorkletProcessor runs on the audio rendering thread with strict real-time scheduling guarantees from the browser. Since migrating, audio glitch reports from users have dropped by 87% as measured through our error telemetry.
Atomics: Synchronization Without Lock Contention
Coordinating access to shared memory between threads requires synchronization primitives. JavaScript's Atomics API provides the building blocks: Atomics.store and Atomics.load for atomic reads and writes, Atomics.compareExchange for compare-and-swap operations, and Atomics.wait/Atomics.notify for blocking coordination (on worker threads only — the main thread cannot block). RetroCloud's inter-thread protocol uses a double-buffered frame design: the emulation thread writes frame data to one buffer while the render thread reads from the other, with an atomic swap of buffer roles triggered by Atomics.compareExchange after each frame advance.
This design guarantees that the render thread always has a complete, consistent frame available to read without any frame tearing or partial-frame artifacts, while the emulation thread never needs to wait for the render thread to complete before starting the next frame advance. The practical result is decoupled execution: emulation runs at its natural speed, rendering runs at the display refresh rate, and each thread serves its purpose without blocking the other.
Limitations and Browser Compatibility
WASM threading and SharedArrayBuffer are available in Chrome, Edge, Firefox, and Safari as of 2024. The remaining compatibility gap is older mobile browsers, primarily Android WebView versions below 88 and iOS Safari versions below 15.2. RetroCloud detects SharedArrayBuffer availability at session start and falls back to a single-threaded architecture automatically, with a notification to the user that multithreaded mode is unavailable. In single-threaded mode, performance is roughly equivalent to our pre-threading architecture — the fallback preserves functionality without degradation beyond what users on those browsers previously experienced.
Priya Nair
CTO, RetroCloud
Priya leads RetroCloud's engineering organization with deep expertise in WebAssembly runtimes, distributed systems, and browser performance optimization. She has spoken at WebAssembly Summit and GOTO Chicago.