How We Added System Audio to Screen Recording
We already had screen recording with webcam overlay, drawing tools, and pause/resume. But when someone recorded a product walkthrough with background music or a tutorial with UI sounds, the recording was silent. The browser captured the screen but not the audio coming from it.
The fix was surprisingly small — two constraints on one API call. But understanding why those constraints exist and how they interact with different browsers took more work than writing the code.
The problem
The getDisplayMedia() API lets web apps capture a user’s screen. You can request audio alongside video:
const stream = await navigator.mediaDevices.getDisplayMedia({
video: true,
audio: true,
});
This works — sort of. The audio: true flag tells the browser you’d like audio, but the browser decides what that means. In Chrome, the share picker shows an “Also share tab audio” checkbox. If the user shares a tab, they get tab audio. If they share an entire screen or window, they might get nothing.
The real issue is that Chrome doesn’t offer system-level audio capture by default. Without an explicit signal, it limits you to tab audio at best.
The constraints
Chrome 105+ supports two constraints that change the behavior:
const stream = await navigator.mediaDevices.getDisplayMedia({
video: true,
audio: true,
systemAudio: "include",
suppressLocalAudioPlayback: true,
});
systemAudio: "include" tells Chrome to offer system audio capture in the share picker, not just tab audio. When a user shares their entire screen, Chrome can now capture audio from any application — not just browser tabs.
suppressLocalAudioPlayback: true solves the echo problem. Without it, if you’re recording a tab that’s playing audio, the user hears the audio locally and it gets recorded. During playback, you’d hear the audio twice — once from the recording and once from the echo picked up by the microphone. Suppressing local playback mutes the captured source’s audio output so it only exists in the recording.
Implementation
The full change was three parts: a state variable, modified constraints, and a toggle button.
The state defaults to true because most people recording their screen want audio:
const [systemAudioEnabled, setSystemAudioEnabled] = useState(true);
The getDisplayMedia call builds its options based on that state:
const displayMediaOptions: DisplayMediaStreamOptions
& Record<string, unknown> = {
video: true,
audio: systemAudioEnabled,
};
if (systemAudioEnabled) {
displayMediaOptions.systemAudio = "include";
displayMediaOptions.suppressLocalAudioPlayback = true;
}
const screenStream = await navigator.mediaDevices
.getDisplayMedia(displayMediaOptions);
The & Record<string, unknown> type intersection is a pragmatic choice. TypeScript’s built-in DisplayMediaStreamOptions doesn’t include systemAudio or suppressLocalAudioPlayback — they’re Chrome-specific extensions that haven’t landed in the standard DOM types yet. Instead of creating a custom type declaration file for two properties, we use a type intersection that allows setting arbitrary keys.
When audio is toggled off, we pass audio: false and skip the extra constraints entirely. The browser won’t request any audio, and the recording is video-only.
Browser compatibility
Here’s what happens on each browser:
| Browser | systemAudio | suppressLocalAudioPlayback | Result |
|---|---|---|---|
| Chrome 105+ | Supported | Supported | System + tab audio captured, local playback suppressed |
| Edge 105+ | Supported | Supported | Same as Chrome (Chromium-based) |
| Firefox | Ignored | Ignored | Tab audio may work on some configs, no system audio |
| Safari | Ignored | Ignored | No audio capture from getDisplayMedia |
The key insight: browsers silently ignore unknown constraints in getDisplayMedia(). They don’t throw errors or reject the promise. Firefox and Safari simply skip systemAudio and suppressLocalAudioPlayback and proceed as if they weren’t there. This means we can safely pass these constraints on all browsers without feature detection.
No try/catch, no if (typeof systemAudio !== 'undefined'), no user-agent sniffing. The API was designed this way — unknown constraints are a no-op.
No backend changes
Our server-side video pipeline already handled audio correctly. The ffmpeg command for compositing screen + webcam recordings uses:
ffmpeg -i screen.mp4 -i webcam.webm \
-map "0:a?" \
...
The ? suffix on 0:a? means “include the audio track from input 0 if it exists, otherwise skip it.” When audio is present, it gets included. When it’s not, ffmpeg doesn’t fail — it just produces a video-only output. This pattern was already in place from the webcam compositing work, so system audio flows through the pipeline with zero changes.
The transcoding step similarly handles audio transparently. MP4 outputs get AAC audio. WebM outputs copy the original codec. If there’s no audio track, both paths produce video-only files.
The toggle
We added an “Audio On/Off” button to the recorder’s idle UI, sitting between the Camera toggle and the Start Recording button:
[Camera Off] [Audio On] [Start Recording]
It follows the exact same styling pattern as the Camera toggle — accent background when on, transparent with border when off. The toggle only appears in the idle state and disappears during recording, countdown, and paused states, which is the natural behavior since the idle UI section is already conditionally rendered.
When audio is on, the button reads “Audio On” with the accent background. When off, “Audio Off” with a transparent background and border. The aria-label switches between “Disable system audio” and “Enable system audio” for accessibility.
What about microphone audio?
System audio from getDisplayMedia and microphone audio from getUserMedia are separate streams. We deliberately don’t mix them. If someone wants narration over their screen recording, they use the webcam overlay — it records separately with its own audio track and gets composited server-side.
Mixing two audio streams in the browser would require a Web Audio API AudioContext with a MediaStreamAudioDestinationNode to combine sources. That’s significant complexity for a marginal benefit, especially when server-side compositing already handles multiple tracks cleanly.
macOS caveat
On macOS, Chrome can capture tab audio but not system-wide audio from other applications without additional OS-level permissions. The systemAudio: "include" constraint is a hint to the browser — it tells the share picker to offer audio options. If the OS doesn’t support system audio capture, the browser simply doesn’t offer it. The recording still succeeds; it just won’t have audio from non-browser sources.
On Windows and Linux, system audio capture generally works without additional setup.
Try it
SendRec is open source (AGPL-3.0) and self-hostable. Record a screen with system audio at app.sendrec.eu. Pull the image from Docker Hub, check the Self-Hosting Guide, or browse the source code.