4.2 KiB
4.2 KiB
| name | description |
|---|---|
| video-product-snapshot | Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it. |
Video Product Snapshot
Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.
Run
bun dist/run.js <command> [args] [--dry-run]
Commands
| Command | Description |
|---|---|
detect <video-path> [options] |
Extract frames, detect product snapshots |
search <image-path> |
Search products by image via API |
detect-and-search <video-path> [options] |
Detect best snapshot then run image search |
session |
Get auth session token |
Options for detect / detect-and-search
| Flag | Default | Description |
|---|---|---|
--interval=<sec> |
1 |
Seconds between sampled frames |
--max-frames=<n> |
60 |
Max frames to analyze |
--output-dir=<dir> |
next to video | Directory to save snapshot images |
--min-confidence=<0-1> |
0.7 |
Minimum detection confidence threshold |
Examples
# Detect product frames in a video
bun dist/run.js detect ./product-demo.mp4
# Sample every 5 seconds, higher confidence threshold
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
# Search for products using an existing image
bun dist/run.js search ./snapshot.jpg
# Full pipeline: detect best product frame then search
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
Output
Returns JSON with:
productFrames[]: all detected product frames sorted by confidence (highest first)bestSnapshot: the highest-confidence product framesearchBody: image search API response (fordetect-and-searchandsearch)
Each ProductFrame contains:
{
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/snapshot/frame_0004.jpg",
"confidence": 0.92,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
}
Prerequisites
ffmpegandffprobein PATHVISION_API_KEY— API key for the vision endpointVISION_API_BASE— (optional) OpenAI-compatible base URL; omit to use OpenAI defaultVISION_MODEL— (optional) model name, defaultgpt-4o-miniauth-rtin PATH (forsearch/detect-and-searchAPI calls)
Example provider configs
# OpenAI (default)
VISION_API_KEY=sk-...
# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
VISION_API_KEY=...
VISION_API_BASE=http://localhost:11434/v1
VISION_MODEL=llava:13b
Result formatting
After the CLI completes, format rerank.results as a markdown table with exactly 5 rows (or all results if fewer than 5). Do NOT split into "最佳匹配" / "其他热门选项" — show everything in one flat table.
| # | 商品名称 | 价格 | 销量 | 链接 |
|---|---|---|---|---|
| 1 | {title} | ¥{promotion_price || price} | {sales ?? —}件 | 查看 |
- Use
promotion_pricewhen present, otherwiseprice - If
salesis missing or zero, show— - Always render as a markdown table, never as bullet points
Execution rules
For detect and detect-and-search (slow — use sub-agent)
Spawn a sub-agent via sessions_spawn. Do not run the command directly.
sessions_spawn(
task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
label: "video-product-snapshot",
runTimeoutSeconds: 300,
)
- Announce immediately that processing has started and share the
runId. - Wait for the sub-agent announcement, then parse and format the result for the user.
For search and session (fast — run directly)
Run the CLI command inline, no sub-agent needed.
General rules
- No fallback strategies. Report errors as-is; do NOT try alternative approaches.
- No retry loops. If detection or search fails, report the failure.
- Trust the tool's output. The CLI handles session management and error formatting internally.