How We Added Real-Time Drawing Annotations to Screen Recordings

February 11, 2026

Screen recordings get better when you can point at things. Circle a button, underline text, draw an arrow to the error message. Without annotations, you end up saying “see the thing in the top right” and hoping the viewer finds it.

We wanted to add freehand drawing to SendRec — draw on your screen during recording, and have the annotations baked into the final video. No post-production editing. No separate annotation layer that only works in our player. Just a WebM file with your drawings burned in.

The challenge: a screen recording is a MediaStream. You can’t draw on a MediaStream. You draw on a <canvas>. So how do you merge the two?

Two canvases, one video

The architecture uses two invisible canvases and one visible one:

Drawing canvas — visible, overlaid on the video preview. This is where the user draws. It captures pointer events and renders strokes in real-time.
Compositing canvas — hidden. On every animation frame, it draws the screen capture first, then the drawing canvas on top. The MediaRecorder records from this canvas.

The visible <video> element shows the raw screen capture as a preview. The drawing canvas sits on top of it with position: absolute and pointer-events: none (until draw mode is activated). The user sees their screen with drawings overlaid. The compositing canvas — which nobody sees — produces the final merged stream.

Screen capture (MediaStream)
        ↓
   <video> preview ←── what the user sees
        ↓
Compositing canvas ←── draws video frame + drawing canvas
        ↓
   captureStream() ←── what MediaRecorder records
        +
Drawing canvas ←── where the user draws (overlaid on preview)

The drawing hook

Drawing is handled by a dedicated React hook — useDrawingCanvas. It manages draw mode, color, line thickness, and the pointer event handlers.

The key detail is coordinate scaling. The drawing canvas is displayed at whatever size fits the preview, but its internal resolution matches the screen capture (typically 1920×1080). Pointer events arrive in CSS pixels, which need to be scaled to canvas coordinates:

const scalePointerToCanvas = useCallback((e: PointerEvent) => {
  const canvas = canvasRef.current;
  if (!canvas) return { x: 0, y: 0 };
  const rect = canvas.getBoundingClientRect();
  const scaleX = captureWidth / rect.width;
  const scaleY = captureHeight / rect.height;
  return {
    x: (e.clientX - rect.left) * scaleX,
    y: (e.clientY - rect.top) * scaleY,
  };
}, [canvasRef, captureWidth, captureHeight]);

Without this scaling, drawings would appear in the wrong position. A stroke near the bottom-right of a preview shown at 800×450 needs to map to coordinates near (1920, 1080) on the canvas.

The actual drawing uses standard Canvas 2D API — beginPath, moveTo, lineTo, stroke. Nothing exotic. We use pointerEvents instead of mouseEvents to support touch and stylus input. The canvas is rendered with touch-action: none to prevent the browser from interpreting drawing gestures as scrolls or zooms.

When draw mode is off, the canvas has pointer-events: none in CSS, letting clicks pass through to the video below.

The compositing loop

The second hook — useCanvasCompositing — runs the compositing loop. It’s surprisingly short:

const compositeFrame = useCallback(() => {
  if (!isRunning.current) return;
  const canvas = compositingCanvasRef.current;
  const video = screenVideoRef.current;
  const drawing = drawingCanvasRef.current;
  if (!canvas || !video) return;

  const ctx = canvas.getContext("2d");
  if (!ctx) return;

  ctx.drawImage(video, 0, 0, captureWidth, captureHeight);
  if (drawing) {
    ctx.drawImage(drawing, 0, 0, captureWidth, captureHeight);
  }

  animFrameRef.current = requestAnimationFrame(compositeFrame);
}, [compositingCanvasRef, screenVideoRef, drawingCanvasRef,
    captureWidth, captureHeight]);

Each frame: draw the video, draw the annotations on top, schedule the next frame. The drawImage call accepts both <video> and <canvas> elements, so the same API composites both layers.

The compositing canvas then exposes a MediaStream via captureStream(). We call it without arguments, which tells the browser to capture a new frame whenever the canvas content changes — naturally matching the animation loop’s frame rate without hardcoding a value.

const stream = canvas.captureStream();
audioTracks.forEach((track) => stream.addTrack(track));

Audio tracks from the original screen capture are added to the composited stream. The MediaRecorder records from this merged stream — video from the compositing canvas, audio from the screen capture. The result is a single WebM file with annotations baked in.

Why not use a canvas for the screen capture too?

An earlier version of SendRec’s screen recorder used a canvas as an intermediary for the screen capture itself — drawing each frame from getDisplayMedia onto a canvas, then recording the canvas. We wrote about why we removed that.

The short version: drawing a <video> onto a canvas forces the browser to decode each frame on the main thread, copy the pixels, and re-encode them. This tanks performance on large displays and drops frames. The direct getDisplayMedia → MediaRecorder path avoids all of that by keeping the video pipeline in hardware.

With drawing annotations, we’re back to using a canvas — but only when annotations are being composited. The difference is that we’re intentionally trading some performance for the ability to merge two visual layers. The canvas is unavoidable here; there’s no browser API to merge a drawing overlay with a MediaStream without one.

In practice, the performance is fine. The compositing loop draws two images per frame (the video and the drawing layer). On modern hardware, this runs comfortably at 60fps even at 1920×1080. The screen capture does the heavy lifting; we’re just copying its decoded frames and adding a transparent overlay.

Canvas dimensions matter

One subtle bug we hit: the canvas dimensions must be set to match the screen capture’s actual resolution, not the CSS display size. If you create a canvas at the default 300×150 and display it at 960×540, your drawings will be 300×150 and look blocky in the recorded video.

We read the actual capture dimensions from the screen stream’s video track settings immediately after getDisplayMedia resolves:

const videoTrack = screenStream.getVideoTracks()[0];
const settings = videoTrack.getSettings();
const width = settings.width ?? 1920;
const height = settings.height ?? 1080;

Both canvases — drawing and compositing — are set to these dimensions. The canvas CSS size is set to 100% of the preview container, so it visually overlays the video. But internally, all drawing and compositing happens at the capture resolution.

There’s a timing nuance here too. React state updates are batched, so calling setCaptureWidth(width) won’t update the canvas dimensions until the next render. But we need the canvases sized correctly before starting the compositing loop in the same function. The fix is setting dimensions imperatively via refs:

if (compositingCanvasRef.current) {
  compositingCanvasRef.current.width = width;
  compositingCanvasRef.current.height = height;
}

This runs synchronously, ensuring the canvases are correctly sized before the first composited frame.

The UI

The drawing controls appear during recording: a Draw toggle, color picker, line thickness selector, and Clear button. When the preview is expanded to full viewport width, the controls move above the video (using CSS order: -1) to keep them accessible without overlapping the content.

The expand/collapse button uses alignItems: center on a flex parent to center a 100vw-wide element within a narrower container. The element overflows equally on both sides, centering it relative to the viewport without any calc() or transform hacks. We add overflow-x: hidden to the document root when expanded to prevent a horizontal scrollbar on platforms where the scrollbar consumes layout space.

What we didn’t build

We deliberately kept the annotation tool simple:

Freehand drawing only — no shapes, arrows, text, or stamps. These would require a shape selection UI, resize handles, and undo logic. Freehand covers 90% of the “let me point at this” use case.
No undo — just a Clear button that wipes the canvas. Undo would require tracking every stroke as a separate operation and replaying them on each undo. Not worth the complexity for a recording tool where you can just keep talking.
No per-frame annotation timeline — drawings persist for the rest of the recording. This is a feature, not a limitation. In a live recording, you draw while talking. The annotation appears when you draw it and stays visible until you clear it or stop recording.

Try it

SendRec is open source (AGPL-3.0) and self-hostable. Drawing annotations are live at app.sendrec.eu. The drawing hook is in useDrawingCanvas.ts and the compositing hook is in useCanvasCompositing.ts.