Runs on your own GPU · or in the cloud

Videos you can
search.

Drop in a video. Seconds later it's searchable, clickable and readable — every moment, every object, every word.

One workspace.
Three superpowers.

Type a word.
Jump to the moment.

Every frame is visually indexed, every spoken word transcribed. One search box covers both — type “car” and land on every car that appears or is mentioned.

  • Visual search over every frame
  • Transcript search, word-timed
  • Results in ~40 ms per query
v1d.io — search
v1d in-video search with keyframe hits

Click it.
It knows what it is.

Click anything in the frame — or draw a box. You get a pixel-perfect mask, an automatic name, and tracking through the clip.

  • Segmentation on a single click
  • Auto-labeling names the selection
  • Track it across the whole video
v1d.io — select
v1d click-to-segment with mask overlay

The video writes
its own story.

Scene by scene, in plain language — with corrected dialogue rendered as quotes. Read any scene on its own, or export the whole thing as SRT.

  • Scene-by-scene narrative
  • Dialogue corrected, never invented
  • One-click SRT export
v1d.io — story · real output
0:00

A young man with intense, wide eyes presses a hand to his forehead in a close-up, appearing distressed.

— “Only there can it be unmade.” (off-screen voice)

0:02

A single gleaming gold ring rests on a textured, earthy surface, its polished surface reflecting the light.

0:05

A close-up shows a stern figure with pointed ears and a metallic headband, brows furrowed against a dim, earthy background.

— “It must be taken deep into Mordor and cast back into the fiery chasm from whence it came.” (the stern figure)

0:13

The same gold ring again — polished interior catching the light, the outer edge showing subtle wear.

2 cues · 2 corrected · ⬇ Download SRT
01

Upload → searchable
in seconds

Scenes, keyframes, embeddings and transcript — indexed automatically on drop.

02

One click →
mask + name + track

Interactive segmentation with automatic labeling, ready to follow through the clip.

03

SRT out,
dialogue corrected

Mis-heard words fixed against what's on screen — never paraphrased, never invented.

UNDER THE HOOD

Visual embeddings Click-to-segment Word-timed transcription Depth & 3D poses Scene detection Local GPU or cloud Scale-to-zero

Let your videos speak.

Try the demo →