2.9 KiB
2.9 KiB
video-product-snapshot
Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
How it works
- Extracts frames from the video at a configurable interval using
ffmpeg - Sends each frame to a vision model to detect whether a product is visible and rate confidence
- Picks the highest-confidence frame as the best snapshot
- Optionally calls an image-search API with the snapshot to find matching products
Install
bun install
bun run build # outputs dist/run.js
Usage
bun dist/run.js <command> [options]
Commands
| Command | Description |
|---|---|
detect <video> |
Extract frames and detect product snapshots |
search <image> |
Search products by image via API |
detect-and-search <video> |
Full pipeline: detect best snapshot then search |
session |
Print current auth session token |
Options (detect / detect-and-search)
| Flag | Default | Description |
|---|---|---|
--interval=<sec> |
1 |
Seconds between sampled frames |
--max-frames=<n> |
60 |
Max frames to analyze |
--output-dir=<dir> |
next to video | Directory to save extracted frames |
--min-confidence=<0-1> |
0.7 |
Minimum confidence to include a frame |
--dry-run |
— | Parse args and print config without running |
Examples
# Detect products, sample every 3 seconds
bun dist/run.js detect ./demo.mp4 --interval=3
# Full pipeline with higher confidence threshold
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# Search using an existing snapshot image
bun dist/run.js search ./snapshot.jpg
Output
All commands return JSON to stdout.
{
"bestSnapshot": {
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/frame_0004.jpg",
"confidence": 0.92,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
},
"productFrames": [...],
"searchBody": { ... }
}
productFrames— all detected frames sorted by confidence (highest first)bestSnapshot— the top-ranked framesearchBody— image-search API response (only forsearch/detect-and-search)
Environment variables
| Variable | Required | Description |
|---|---|---|
VISION_API_KEY |
Yes | API key for the vision model endpoint |
VISION_API_BASE |
No | OpenAI-compatible base URL (default: OpenAI) |
VISION_MODEL |
No | Model name (default: gpt-4o-mini) |
# Use a local or custom provider
VISION_API_BASE=https://your-llm-endpoint/v1
VISION_MODEL=claude-haiku-4-5-20251001
VISION_API_KEY=sk-...
Prerequisites
- Bun runtime
ffmpegandffprobein PATHauth-rtCLI in PATH (required forsearch/detect-and-search)