3.1 KiB
3.1 KiB
| name | description |
|---|---|
| video-product-snapshot | Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it. |
Video Product Snapshot
Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.
Run
bun dist/run.js <command> [args] [--dry-run]
Commands
| Command | Description |
|---|---|
detect <video-path> [options] |
Extract frames, detect product snapshots |
search <image-path> |
Search products by image via API |
detect-and-search <video-path> [options] |
Detect best snapshot then run image search |
session |
Get auth session token |
Options for detect / detect-and-search
| Flag | Default | Description |
|---|---|---|
--interval=<sec> |
1 |
Seconds between sampled frames |
--max-frames=<n> |
60 |
Max frames to analyze |
--output-dir=<dir> |
next to video | Directory to save snapshot images |
--min-confidence=<0-1> |
0.7 |
Minimum detection confidence threshold |
Examples
# Detect product frames in a video
bun dist/run.js detect ./product-demo.mp4
# Sample every 5 seconds, higher confidence threshold
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
# Search for products using an existing image
bun dist/run.js search ./snapshot.jpg
# Full pipeline: detect best product frame then search
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
Output
Returns JSON with:
productFrames[]: all detected product frames sorted by confidence (highest first)bestSnapshot: the highest-confidence product framesearchBody: image search API response (fordetect-and-searchandsearch)
Each ProductFrame contains:
{
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/snapshot/frame_0004.jpg",
"confidence": 0.92,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
}
Prerequisites
ffmpegandffprobein PATHVISION_API_KEY— API key for the vision endpointVISION_API_BASE— (optional) OpenAI-compatible base URL; omit to use OpenAI defaultVISION_MODEL— (optional) model name, defaultgpt-4o-miniauth-rtin PATH (forsearch/detect-and-searchAPI calls)
Example provider configs
# OpenAI (default)
VISION_API_KEY=sk-...
# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
VISION_API_KEY=...
VISION_API_BASE=http://localhost:11434/v1
VISION_MODEL=llava:13b
Rules — MUST follow
- Execute only, do not reason about internals. Run the CLI and return the output.
- No fallback strategies. Report errors as-is; do NOT try alternative approaches.
- No retry loops. If detection or search fails, report the failure.
- Trust the tool's output. The CLI handles session management and error formatting internally.