Capture Desktop Perception

In Real Time

Capture Desktop Perception In Real Time

Stream screen, mic, camera, and system audio with native SDKs. Get live context events and transcripts for assistance. Recording and replay are opt in.

Check Documentation

Example Agents

Inputs

Screen

Mic

Camera

System Audio

Multiple displays

Capture Client

Native SDK

Context

Event

Streams

No FFmpeg. No GStreamer.

Native Mac, Windows, Linux.

Inputs

Screen

Mic

Camera

System Audio

Multiple displays

Capture Client

Native SDK

Context

Event

Streams

No FFmpeg. No GStreamer.

Native Mac, Windows, Linux.

Products

Solutions

Enterprise

Developers

Company

Pricing

Docs

Try VideoDB

Install & Authenticate

Ship capture as a separate component, authenticate with session tokens

Python

Node

pip install "videodb[capture]"

Python

Node

pip install "videodb[capture]"

Python

Node

pip install "videodb[capture]"

Python

Node

pip install "videodb[capture]"

Separate install so you can bundle it

into your customer app

Separate install so you can bundle it into your customer app

Session token auth, no API keys on

clients

Session token auth, no API keys on clients

Designed for user facing apps with

explicit consent

Designed for user facing apps with explicit consent

Permissions & Channel Discovery

Discover channels safely, then choose what to stream

Request permissions

python

await client.request_permission("microphone")
await client.request_permission("screen_capture")

Discover channels

python

channels = await client.channels()

PERMISSION

ENABLES

microphone

Access to mic audio sources

screen_capture

Access to display video sources and system audio

CHANNELS

Mics

Default mic

Displays

Primary display

Display 2

System audio

Default output

Quickstart Capture & Live Events

Python

import asyncio
from videodb.capture import CaptureClient

async def main():
    client_token = "<CLIENT_TOKEN_FROM_BACKEND>"
    capture_session_id = "<CAPTURE_SESSION_ID_FROM_BACKEND>"
    client = CaptureClient(client_token=client_token)

    await client.start_capture_session(
        capture_session_id=capture_session_id,
        channels=selected,
        primary_video_channel_id=display.name if display else None,
    )

    async for ev in client.events():
    event_type = ev["event"]
    payload = ev["data"]

    print(f"{event_type}: {payload}")

    if event_type in ("recording-complete", "error"):
        break

    await client.stop_capture()
    await client.shutdown()

asyncio.run(main())

Event Stream

transcript

visual_state

status

recording-complete

error

High-Level Architecture

Client captures media. Backend owns indexing policy. VideoDB Cloud runs real time pipelines and returns context, events, and optional memory.

BACKEND

Your server

Holds API key, creates sessions

token and session ID

DESKTOP CLIENT

Capture SDK

Streams media with token

Media Streams

screen

•

mic

•

camera

•

audio

VideoDB CLOUD

Real-time pipelines

Processes streams, emits events

DATA PLANE

Real-time Data Stream

transcript

visual_state

index_updates

alerts

session_status

BACKEND

Your server

Holds API key, creates sessions

token and session ID

DESKTOP CLIENT

Capture SDK

Streams media with token

Streams

screen

•

mic

•

camera

•

audio

VideoDB CLOUD

Real-time pipelines

Processes streams, emits events

DATA PLANE

Real-time Data Stream

transcript

visual_state

index_updates

alerts

session_status

Read Backend Guide

Control, Cost & Opt-In Memory

Client captures media. Backend owns indexing policy. VideoDB Cloud runs real time pipelines and returns context, events, and optional memory.

Adaptive streaming

Tune compute by adjusting sampling and stream policy while keeping the agent loop responsive.

Opt-in recording for replay

Store only when the user explicitly enables it. End of session can

become a native video object.

Store only when the user explicitly enables it. End of session can

become a native video object.

Semantic search when stored

Index and search any moment later, with playable windows and metadata.

Pause and resume per channel

Stop streaming mic or a display without affecting other channels.

Build With Capture SDK

Desktop Screen and Mic Aware Agents:

Build agents that understand what users see on their screens

View sample app

Monitor OpenClaw Agents:

Monitor in realtime and set alerts on your open claw agents

View sample app

Live Sales Copilot

Agent that record sales calls with real-time transcription, visual context and AI-powered insights

View sample app

Productivity Tracker

AI-powered desktop app that records your screen, understands what you're doing, and gives you actionable productivity insights

View sample app

Need private VPC deployment?

Enterprise-grade security for your organization

Chat with us

FAQs

What is the Capture SDK?

Capture SDK is a native client library that captures desktop screen, mic, camera, and system audio and streams it to VideoDB in real time. The point is to turn a user session into live context, events, and optional replay, not just a recording file.

How is this different from a screen recorder?

What is the recommended architecture?

Why does the client use a token instead of an API key?

What is the Capture SDK?

How is this different from a screen recorder?

What is the recommended architecture?

Why does the client use a token instead of an API key?

What is the Capture SDK?

How is this different from a screen recorder?

What is the recommended architecture?

Why does the client use a token instead of an API key?

What is the Capture SDK?

How is this different from a screen recorder?

What is the recommended architecture?

Why does the client use a token instead of an API key?

The Perception Layer for AI

Apt 2111 Lansing Street San Francisco, CA 94105 USA

HD-239, WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala I Block, Bengaluru, Karnataka, 560034

sales@videodb.com

SEE

CaptureSDK

Live Streams

Ingest Files

UNDERSTAND

Indexes

ACT

Events and Alerts

Programmable Editing

USE-CASES

Real Time Monitoring

Search Media Archives

AUTOMATION

VideoDB MCP

Zapier

n8n

DEVELOPERS

Quickstart

Director

Python SDK

Node SDK

Examples

ENTERPRISE

Media

Pricing

RESOURCES

About us

LEGAL

DPA

Terms

Security

Privacy

The Perception Layer for AI

Apt 2111 Lansing Street San Francisco, CA 94105 USA

HD-239, WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala I Block, Bengaluru, Karnataka, 560034

sales@videodb.com

USE-CASES

AUTOMATION

DEVELOPERS

ENTERPRISE

ABOUT US

PLATFORM OVERVIEW

LEGAL

UNDERSTAND

ACT

SEE

The Perception Layer for AI

Apt 2111 Lansing Street San Francisco, CA 94105 USA

HD-239, WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala I Block, Bengaluru, Karnataka, 560034

sales@videodb.com

SEE

CaptureSDK

Live Streams

Ingest Files

UNDERSTAND

Indexes

ACT

Events and Alerts

Programmable Editing

USE-CASES

Real Time Monitoring

Search Media Archives

AUTOMATION

VideoDB MCP

Zapier

n8n

DEVELOPERS

Quickstart

Director

Python SDK

Node SDK

Examples

ENTERPRISE

Media

Pricing

RESOURCES

About us

LEGAL

DPA

Terms

Security

Privacy