video-product-snapshot
Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
How it works
- Extracts frames from the video at a configurable interval using
ffmpeg
- Sends each frame to a vision model to detect whether a product is visible and rate confidence
- Picks the highest-confidence frame as the best snapshot
- Optionally calls an image-search API with the snapshot to find matching products
Install
bun install
bun run build # outputs dist/run.js
Usage
bun dist/run.js <command> [options]
Commands
| Command |
Description |
detect <video> |
Extract frames and detect product snapshots |
search <image> |
Search products by image via API |
detect-and-search <video> |
Full pipeline: detect best snapshot then search |
session |
Print current auth session token |
Options (detect / detect-and-search)
| Flag |
Default |
Description |
--interval=<sec> |
1 |
Seconds between sampled frames |
--max-frames=<n> |
60 |
Max frames to analyze |
--output-dir=<dir> |
next to video |
Directory to save extracted frames |
--min-confidence=<0-1> |
0.7 |
Minimum confidence to include a frame |
--dry-run |
— |
Parse args and print config without running |
Examples
# Detect products, sample every 3 seconds
bun dist/run.js detect ./demo.mp4 --interval=3
# Full pipeline with higher confidence threshold
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# Search using an existing snapshot image
bun dist/run.js search ./snapshot.jpg
Output
All commands return JSON to stdout.
{
"bestSnapshot": {
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/frame_0004.jpg",
"confidence": 0.92,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
},
"productFrames": [...],
"searchBody": { ... }
}
productFrames — all detected frames sorted by confidence (highest first)
bestSnapshot — the top-ranked frame
searchBody — image-search API response (only for search / detect-and-search)
Environment variables
Copy .env.example to .env and fill in the values.
Vision (required for detect)
| Variable |
Required |
Description |
VISION_API_KEY |
Yes |
API key for the vision model |
VISION_API_BASE |
No |
OpenAI-compatible base URL (default: OpenAI) |
VISION_MODEL |
No |
Model name (default: gpt-4o-mini) |
Image search (required for search / detect-and-search)
| Variable |
Required |
Description |
ONEBOUND_UPLOAD_ENDPOINT |
Yes |
Endpoint to upload a local image and get a public URL |
ONEBOUND_SEARCH_ENDPOINT |
Yes |
Reverse image search endpoint |
ONEBOUND_KEYWORD_SEARCH_ENDPOINT |
No |
Keyword search endpoint for re-ranking results |
These proxy through a local woo-data-scrawler instance — no Onebound API key needed directly.
Other
| Variable |
Required |
Description |
AUTH_RT_BIN |
No |
Override path to the auth-rt binary |
TELEMETRY_ENDPOINT |
No |
POST skill execution results to a Loki-compatible endpoint |
Prerequisites
- Bun runtime
ffmpeg and ffprobe in PATH
auth-rt CLI in PATH (required for search / detect-and-search)