115 lines
3.6 KiB
Markdown
115 lines
3.6 KiB
Markdown
---
|
|
name: video-product-snapshot
|
|
description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it."
|
|
---
|
|
|
|
# Video Product Snapshot
|
|
|
|
Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.
|
|
|
|
## Run
|
|
|
|
```bash
|
|
bun dist/run.js <command> [args] [--dry-run]
|
|
```
|
|
|
|
## Commands
|
|
|
|
| Command | Description |
|
|
|---------|-------------|
|
|
| `detect <video-path> [options]` | Extract frames, detect product snapshots |
|
|
| `search <image-path>` | Search products by image via API |
|
|
| `detect-and-search <video-path> [options]` | Detect best snapshot then run image search |
|
|
| `session` | Get auth session token |
|
|
|
|
## Options for `detect` / `detect-and-search`
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `--interval=<sec>` | `1` | Seconds between sampled frames |
|
|
| `--max-frames=<n>` | `60` | Max frames to analyze |
|
|
| `--output-dir=<dir>` | next to video | Directory to save snapshot images |
|
|
| `--min-confidence=<0-1>` | `0.7` | Minimum detection confidence threshold |
|
|
|
|
## Examples
|
|
|
|
```bash
|
|
# Detect product frames in a video
|
|
bun dist/run.js detect ./product-demo.mp4
|
|
|
|
# Sample every 5 seconds, higher confidence threshold
|
|
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
|
|
|
|
# Search for products using an existing image
|
|
bun dist/run.js search ./snapshot.jpg
|
|
|
|
# Full pipeline: detect best product frame then search
|
|
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
|
|
```
|
|
|
|
## Output
|
|
|
|
Returns JSON with:
|
|
- `productFrames[]`: all detected product frames sorted by confidence (highest first)
|
|
- `bestSnapshot`: the highest-confidence product frame
|
|
- `searchBody`: image search API response (for `detect-and-search` and `search`)
|
|
|
|
Each `ProductFrame` contains:
|
|
```json
|
|
{
|
|
"frameIndex": 4,
|
|
"timestampSeconds": 9,
|
|
"imagePath": "/path/to/snapshot/frame_0004.jpg",
|
|
"confidence": 0.92,
|
|
"description": "White sneaker with blue logo, left side view",
|
|
"boundingHint": "centered"
|
|
}
|
|
```
|
|
|
|
## Prerequisites
|
|
|
|
- `ffmpeg` and `ffprobe` in PATH
|
|
- `VISION_API_KEY` — API key for the vision endpoint
|
|
- `VISION_API_BASE` — (optional) OpenAI-compatible base URL; omit to use OpenAI default
|
|
- `VISION_MODEL` — (optional) model name, default `gpt-4o-mini`
|
|
- `auth-rt` in PATH (for `search` / `detect-and-search` API calls)
|
|
|
|
### Example provider configs
|
|
|
|
```bash
|
|
# OpenAI (default)
|
|
VISION_API_KEY=sk-...
|
|
|
|
# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
|
|
VISION_API_KEY=...
|
|
VISION_API_BASE=http://localhost:11434/v1
|
|
VISION_MODEL=llava:13b
|
|
```
|
|
|
|
## Execution rules
|
|
|
|
### For `detect` and `detect-and-search` (slow — use sub-agent)
|
|
|
|
Spawn a sub-agent via `sessions_spawn`. Do **not** run the command directly.
|
|
|
|
```
|
|
sessions_spawn(
|
|
task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
|
|
label: "video-product-snapshot",
|
|
runTimeoutSeconds: 300,
|
|
)
|
|
```
|
|
|
|
- Announce immediately that processing has started and share the `runId`.
|
|
- Wait for the sub-agent announcement, then parse and format the result for the user.
|
|
|
|
### For `search` and `session` (fast — run directly)
|
|
|
|
Run the CLI command inline, no sub-agent needed.
|
|
|
|
### General rules
|
|
|
|
1. **No fallback strategies.** Report errors as-is; do NOT try alternative approaches.
|
|
2. **No retry loops.** If detection or search fails, report the failure.
|
|
3. **Trust the tool's output.** The CLI handles session management and error formatting internally.
|