Skip to main content
Agentic Perception

Build AI agents with visual perception.

One SDK gives your agent eyes, ears, and memory across screen, mic, video, and live sessions. Native runtimes for Mac, Windows, Linux, and the web.

A new surface for AI

AI is moving out of the chatbox.
Your agents need eyes and ears.

Agents are creating content, running marketing, recording meetings, taking calls, and using the computer.

The world they operate in is live, continuous, and perceived through vision and voice. Not turns of text.

VideoDB gives your agents realtime real-world context and memory. One SDK across screen, mic, files, and live streams, so your agent can see what just happened, recall what it watched, and act on what it heard.

Screen Mic Camera Files Live streams
Capabilities

One SDK.
All of media.

Files, live streams, screen captures. All enter the same system.

One command.

npx skills add video-db/skills. Bootstraps every primitive into your agent runtime.

Files, RTSP, screen, mic.

One API across every source.

Compose understanding.

Custom indexes the way you compose endpoints.

Search returns a playable clip.

Not metadata. Not timestamps. A clip the agent can play.

Stream in, stream out.

Sub-second alert, act, respond.

Claude Code · OpenAI · Cursor · n8n · Zapier.

Drop into any agent that speaks tools.

Two modes for agents

Realtime by default.
Memory when you ask.

Stream in, context out. Nothing is stored unless you say so. Flip one flag when a moment is worth keeping.

Mode 1 · Ephemeral

Realtime. No storage.

Frames flow in, structured events flow out. Nothing touches disk. Best for live copilots, alerting, and anything sub-second.

Default
Mode 2 · Memory

Remember and search.

Flip one flag and the moment becomes a searchable clip. Memory and search are opt-in: on for the moments you care about, off everywhere else.

Optional
Perception box

Build a perception box.
Realtime, private, predictable.

A dedicated perception runtime for teams that need realtime throughput, predictable cost and load, and zero outbound calls to a model API.

Sized to your fleet. Every frame, every inference, every retrieval runs inside the box. Use the bundled models, or bring your own open-weight model.

Realtime processing Zero outbound One flat number Bring your own model

Realtime, sub-second pipeline

Ingest, perception, and event-out sized to your throughput from day one.

Built-in network monitor

Verify isolation in one glance. A live view of every connection the runtime makes.

Bundled perception models

Vision, speech, and embedding models pre-loaded and ready to use.

One capacity envelope

Flat monthly cost. No per-token surprises, no traffic-driven spikes.

Give your agents
eyes and ears.

npx skills add video-db/skills
Machine