--- name: video-product-snapshot description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it." --- # Video Product Snapshot Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API. ## Run ```bash bun dist/run.js [args] [--dry-run] ``` ## Commands | Command | Description | |---------|-------------| | `detect [options]` | Extract frames, detect product snapshots | | `search ` | Search products by image via API | | `detect-and-search [options]` | Detect best snapshot then run image search | | `session` | Get auth session token | ## Options for `detect` / `detect-and-search` | Flag | Default | Description | |------|---------|-------------| | `--interval=` | `1` | Seconds between sampled frames | | `--max-frames=` | `60` | Max frames to analyze | | `--output-dir=` | next to video | Directory to save snapshot images | | `--min-confidence=<0-1>` | `0.7` | Minimum detection confidence threshold | ## Examples ```bash # Detect product frames in a video bun dist/run.js detect ./product-demo.mp4 # Sample every 5 seconds, higher confidence threshold bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85 # Search for products using an existing image bun dist/run.js search ./snapshot.jpg # Full pipeline: detect best product frame then search bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20 ``` ## Output Returns JSON with: - `productFrames[]`: all detected product frames sorted by confidence (highest first) - `bestSnapshot`: the highest-confidence product frame - `searchBody`: image search API response (for `detect-and-search` and `search`) Each `ProductFrame` contains: ```json { "frameIndex": 4, "timestampSeconds": 9, "imagePath": "/path/to/snapshot/frame_0004.jpg", "confidence": 0.92, "description": "White sneaker with blue logo, left side view", "boundingHint": "centered" } ``` ## Prerequisites - `ffmpeg` and `ffprobe` in PATH - `VISION_API_KEY` — API key for the vision endpoint - `VISION_API_BASE` — (optional) OpenAI-compatible base URL; omit to use OpenAI default - `VISION_MODEL` — (optional) model name, default `gpt-4o-mini` - `auth-rt` in PATH (for `search` / `detect-and-search` API calls) ### Example provider configs ```bash # OpenAI (default) VISION_API_KEY=sk-... # Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.) VISION_API_KEY=... VISION_API_BASE=http://localhost:11434/v1 VISION_MODEL=llava:13b ``` ## Result formatting After the CLI completes, format `rerank.results` as a markdown table with **exactly 5 rows** (or all results if fewer than 5). Do NOT split into "最佳匹配" / "其他热门选项" — show everything in one flat table. | # | 商品名称 | 价格 | 销量 | 链接 | |---|----------|------|------|------| | 1 | {title} | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看](https://detail.1688.com/offer/{num_iid}.html) | - Use `promotion_price` when present, otherwise `price` - If `sales` is missing or zero, show `—` - Always render as a markdown table, never as bullet points ## Execution rules ### For `detect` and `detect-and-search` (slow — use sub-agent) Spawn a sub-agent via `sessions_spawn`. Do **not** run the command directly. ``` sessions_spawn( task: "Run this command and return the raw JSON output:\n\nbun dist/run.js \n\nCopy the entire JSON output as your reply.", label: "video-product-snapshot", runTimeoutSeconds: 300, ) ``` - Announce immediately that processing has started and share the `runId`. - Wait for the sub-agent announcement, then parse and format the result for the user. ### For `search` and `session` (fast — run directly) Run the CLI command inline, no sub-agent needed. ### General rules 1. **No fallback strategies.** Report errors as-is; do NOT try alternative approaches. 2. **No retry loops.** If detection or search fails, report the failure. 3. **Trust the tool's output.** The CLI handles session management and error formatting internally.