4.2 KiB

Raw Permalink Blame History

name	description
video-product-snapshot	Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it.

Video Product Snapshot

Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.

Run

bun dist/run.js <command> [args] [--dry-run]

Commands

Command	Description
`detect <video-path> [options]`	Extract frames, detect product snapshots
`search <image-path>`	Search products by image via API
`detect-and-search <video-path> [options]`	Detect best snapshot then run image search
`session`	Get auth session token

Options for `detect` / `detect-and-search`

Flag	Default	Description
`--interval=<sec>`	`1`	Seconds between sampled frames
`--max-frames=<n>`	`60`	Max frames to analyze
`--output-dir=<dir>`	next to video	Directory to save snapshot images
`--min-confidence=<0-1>`	`0.7`	Minimum detection confidence threshold

Examples

# Detect product frames in a video
bun dist/run.js detect ./product-demo.mp4

# Sample every 5 seconds, higher confidence threshold
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85

# Search for products using an existing image
bun dist/run.js search ./snapshot.jpg

# Full pipeline: detect best product frame then search
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20

Output

Returns JSON with:

productFrames[]: all detected product frames sorted by confidence (highest first)
bestSnapshot: the highest-confidence product frame
searchBody: image search API response (for detect-and-search and search)

Each ProductFrame contains:

{
  "frameIndex": 4,
  "timestampSeconds": 9,
  "imagePath": "/path/to/snapshot/frame_0004.jpg",
  "confidence": 0.92,
  "description": "White sneaker with blue logo, left side view",
  "boundingHint": "centered"
}

Prerequisites

ffmpeg and ffprobe in PATH
VISION_API_KEY — API key for the vision endpoint
VISION_API_BASE — (optional) OpenAI-compatible base URL; omit to use OpenAI default
VISION_MODEL — (optional) model name, default gpt-4o-mini
auth-rt in PATH (for search / detect-and-search API calls)

Example provider configs

# OpenAI (default)
VISION_API_KEY=sk-...

# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
VISION_API_KEY=...
VISION_API_BASE=http://localhost:11434/v1
VISION_MODEL=llava:13b

Result formatting

After the CLI completes, format rerank.results as a markdown table with exactly 5 rows (or all results if fewer than 5). Do NOT split into "最佳匹配" / "其他热门选项" — show everything in one flat table.

#	商品名称	价格	销量	链接
1	{title}	¥{promotion_price \|\| price}	{sales ?? —}件	查看

Use promotion_price when present, otherwise price
If sales is missing or zero, show —
Always render as a markdown table, never as bullet points

Execution rules

For `detect` and `detect-and-search` (slow — use sub-agent)

Spawn a sub-agent via sessions_spawn. Do not run the command directly.

sessions_spawn(
  task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
  label: "video-product-snapshot",
  runTimeoutSeconds: 300,
)

Announce immediately that processing has started and share the runId.
Wait for the sub-agent announcement, then parse and format the result for the user.

For `search` and `session` (fast — run directly)

Run the CLI command inline, no sub-agent needed.

General rules

No fallback strategies. Report errors as-is; do NOT try alternative approaches.
No retry loops. If detection or search fails, report the failure.
Trust the tool's output. The CLI handles session management and error formatting internally.

4.2 KiB Raw Permalink Blame History