video-product-finder/SKILL.md

3.6 KiB

name description
video-product-snapshot Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it.

Video Product Snapshot

Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.

Run

bun dist/run.js <command> [args] [--dry-run]

Commands

Command Description
detect <video-path> [options] Extract frames, detect product snapshots
search <image-path> Search products by image via API
detect-and-search <video-path> [options] Detect best snapshot then run image search
session Get auth session token
Flag Default Description
--interval=<sec> 1 Seconds between sampled frames
--max-frames=<n> 60 Max frames to analyze
--output-dir=<dir> next to video Directory to save snapshot images
--min-confidence=<0-1> 0.7 Minimum detection confidence threshold

Examples

# Detect product frames in a video
bun dist/run.js detect ./product-demo.mp4

# Sample every 5 seconds, higher confidence threshold
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85

# Search for products using an existing image
bun dist/run.js search ./snapshot.jpg

# Full pipeline: detect best product frame then search
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20

Output

Returns JSON with:

  • productFrames[]: all detected product frames sorted by confidence (highest first)
  • bestSnapshot: the highest-confidence product frame
  • searchBody: image search API response (for detect-and-search and search)

Each ProductFrame contains:

{
  "frameIndex": 4,
  "timestampSeconds": 9,
  "imagePath": "/path/to/snapshot/frame_0004.jpg",
  "confidence": 0.92,
  "description": "White sneaker with blue logo, left side view",
  "boundingHint": "centered"
}

Prerequisites

  • ffmpeg and ffprobe in PATH
  • VISION_API_KEY — API key for the vision endpoint
  • VISION_API_BASE — (optional) OpenAI-compatible base URL; omit to use OpenAI default
  • VISION_MODEL — (optional) model name, default gpt-4o-mini
  • auth-rt in PATH (for search / detect-and-search API calls)

Example provider configs

# OpenAI (default)
VISION_API_KEY=sk-...

# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
VISION_API_KEY=...
VISION_API_BASE=http://localhost:11434/v1
VISION_MODEL=llava:13b

Execution rules

For detect and detect-and-search (slow — use sub-agent)

Spawn a sub-agent via sessions_spawn. Do not run the command directly.

sessions_spawn(
  task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
  label: "video-product-snapshot",
  runTimeoutSeconds: 300,
)
  • Announce immediately that processing has started and share the runId.
  • Wait for the sub-agent announcement, then parse and format the result for the user.

For search and session (fast — run directly)

Run the CLI command inline, no sub-agent needed.

General rules

  1. No fallback strategies. Report errors as-is; do NOT try alternative approaches.
  2. No retry loops. If detection or search fails, report the failure.
  3. Trust the tool's output. The CLI handles session management and error formatting internally.