Capture Desktop Perception
In Real Time
Capture Desktop Perception In Real Time
Stream screen, mic, camera, and system audio with native SDKs. Get live context events and transcripts for assistance. Recording and replay are opt in.
Stream screen, mic, camera, and system audio with native SDKs. Get live context events and transcripts for assistance. Recording and replay are opt in.
Inputs
Screen
Mic
Camera
System Audio
Multiple displays
Capture Client
Native SDK
Context
Event
Streams
No FFmpeg. No GStreamer.
Native Mac, Windows, Linux.
Inputs
Screen
Mic
Camera
System Audio
Multiple displays
Capture Client
Native SDK
Context
Event
Streams
No FFmpeg. No GStreamer.
Native Mac, Windows, Linux.
Install & Authenticate
Install & Authenticate
Install & Authenticate
Ship capture as a separate component, authenticate with session tokens
Ship capture as a separate component, authenticate with session tokens
Python
Node
pip install "videodb[capture]"
Python
Node
pip install "videodb[capture]"
Python
Node
pip install "videodb[capture]"
Python
Node
pip install "videodb[capture]"
Separate install so you can bundle it
into your customer app
Separate install so you can bundle it into your customer app
Session token auth, no API keys on
clients
Session token auth, no API keys on clients
Designed for user facing apps with
explicit consent
Designed for user facing apps with explicit consent
Permissions & Channel Discovery
Permissions & Channel Discovery
Permissions & Channel Discovery
Discover channels safely, then choose what to stream
Discover channels safely, then choose what to stream
Request permissions
python
await client.request_permission("microphone") await client.request_permission("screen_capture")
Discover channels
python
channels = await client.channels()
PERMISSION
ENABLES
microphone
Access to mic audio sources
screen_capture
Access to display video sources and system audio
CHANNELS
Mics
Default mic
Displays
Primary display
Display 2
System audio
Default output
Quickstart Capture & Live Events
Quickstart Capture & Live Events
Quickstart Capture & Live Events
Python
import asyncio from videodb.capture import CaptureClient async def main(): client_token = "<CLIENT_TOKEN_FROM_BACKEND>" capture_session_id = "<CAPTURE_SESSION_ID_FROM_BACKEND>" client = CaptureClient(client_token=client_token) await client.start_capture_session( capture_session_id=capture_session_id, channels=selected, primary_video_channel_id=display.name if display else None, ) async for ev in client.events(): event_type = ev["event"] payload = ev["data"] print(f"{event_type}: {payload}") if event_type in ("recording-complete", "error"): break await client.stop_capture() await client.shutdown() asyncio.run(main())
Event Stream
transcript
visual_state
status
recording-complete
error
High-Level Architecture
High-Level Architecture
High-Level Architecture
Client captures media. Backend owns indexing policy. VideoDB Cloud runs real time pipelines and returns context, events, and optional memory.
Client captures media. Backend owns indexing policy. VideoDB Cloud runs real time pipelines and returns context, events, and optional memory.
BACKEND
Your server
Holds API key, creates sessions
token and session ID
DESKTOP CLIENT
Capture SDK
Streams media with token
Media Streams
screen
•
mic
•
camera
•
audio
VideoDB CLOUD
Real-time pipelines
Processes streams, emits events
DATA PLANE
Real-time Data Stream
transcript
visual_state
index_updates
alerts
session_status
BACKEND
Your server
Holds API key, creates sessions
token and session ID
DESKTOP CLIENT
Capture SDK
Streams media with token
Streams
screen
•
mic
•
camera
•
audio
VideoDB CLOUD
Real-time pipelines
Processes streams, emits events
DATA PLANE
Real-time Data Stream
transcript
visual_state
index_updates
alerts
session_status


Control, Cost & Opt-In Memory
Control, Cost & Opt-In Memory
Control, Cost & Opt-In Memory
Client captures media. Backend owns indexing policy. VideoDB Cloud runs real time pipelines and returns context, events, and optional memory.
Client captures media. Backend owns indexing policy. VideoDB Cloud runs real time pipelines and returns context, events, and optional memory.
Adaptive streaming
Adaptive streaming
Tune compute by adjusting sampling and stream policy while keeping the agent loop responsive.
Tune compute by adjusting sampling and stream policy while keeping the agent loop responsive.
Opt-in recording for replay
Opt-in recording for replay
Store only when the user explicitly enables it. End of session can
become a native video object.
Store only when the user explicitly enables it. End of session can
become a native video object.
Semantic search when stored
Semantic search when stored
Index and search any moment later, with playable windows and metadata.
Index and search any moment later, with playable windows and metadata.
Pause and resume per channel
Pause and resume per channel
Stop streaming mic or a display without affecting other channels.
Stop streaming mic or a display without affecting other channels.
Build With Capture SDK
Build With Capture SDK
Build With Capture SDK
Desktop Screen and Mic Aware Agents:
Build agents that understand what users see on their screens
Monitor OpenClaw Agents:
Monitor in realtime and set alerts on your open claw agents
Live Sales Copilot
Agent that record sales calls with real-time transcription, visual context and AI-powered insights
Productivity Tracker
AI-powered desktop app that records your screen, understands what you're doing, and gives you actionable productivity insights
Need private VPC deployment?
Need private VPC deployment?
Enterprise-grade security for your organization
FAQs
FAQs
FAQs
What is the Capture SDK?
Capture SDK is a native client library that captures desktop screen, mic, camera, and system audio and streams it to VideoDB in real time. The point is to turn a user session into live context, events, and optional replay, not just a recording file.
How is this different from a screen recorder?
What is the recommended architecture?
Why does the client use a token instead of an API key?
What is the Capture SDK?
How is this different from a screen recorder?
What is the recommended architecture?
Why does the client use a token instead of an API key?
What is the Capture SDK?
Capture SDK is a native client library that captures desktop screen, mic, camera, and system audio and streams it to VideoDB in real time. The point is to turn a user session into live context, events, and optional replay, not just a recording file.
How is this different from a screen recorder?
What is the recommended architecture?
Why does the client use a token instead of an API key?
What is the Capture SDK?
Capture SDK is a native client library that captures desktop screen, mic, camera, and system audio and streams it to VideoDB in real time. The point is to turn a user session into live context, events, and optional replay, not just a recording file.
How is this different from a screen recorder?
What is the recommended architecture?
Why does the client use a token instead of an API key?
Apt 2111 Lansing Street San Francisco, CA 94105 USA
HD-239, WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala I Block, Bengaluru, Karnataka, 560034
sales@videodb.com
AUTOMATION
RESOURCES
Apt 2111 Lansing Street San Francisco, CA 94105 USA
HD-239, WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala I Block, Bengaluru, Karnataka, 560034
sales@videodb.com
Apt 2111 Lansing Street San Francisco, CA 94105 USA
HD-239, WeWork Prestige Atlanta, 80 Feet Main Road, Koramangala I Block, Bengaluru, Karnataka, 560034
sales@videodb.com









