video-product-finder/SKILL.md

3.1 KiB

name description
video-product-snapshot Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it.

Video Product Snapshot

Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.

Run

bun dist/run.js <command> [args] [--dry-run]

Commands

Command Description
detect <video-path> [options] Extract frames, detect product snapshots
search <image-path> Search products by image via API
detect-and-search <video-path> [options] Detect best snapshot then run image search
session Get auth session token
Flag Default Description
--interval=<sec> 1 Seconds between sampled frames
--max-frames=<n> 60 Max frames to analyze
--output-dir=<dir> next to video Directory to save snapshot images
--min-confidence=<0-1> 0.7 Minimum detection confidence threshold

Examples

# Detect product frames in a video
bun dist/run.js detect ./product-demo.mp4

# Sample every 5 seconds, higher confidence threshold
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85

# Search for products using an existing image
bun dist/run.js search ./snapshot.jpg

# Full pipeline: detect best product frame then search
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20

Output

Returns JSON with:

  • productFrames[]: all detected product frames sorted by confidence (highest first)
  • bestSnapshot: the highest-confidence product frame
  • searchBody: image search API response (for detect-and-search and search)

Each ProductFrame contains:

{
  "frameIndex": 4,
  "timestampSeconds": 9,
  "imagePath": "/path/to/snapshot/frame_0004.jpg",
  "confidence": 0.92,
  "description": "White sneaker with blue logo, left side view",
  "boundingHint": "centered"
}

Prerequisites

  • ffmpeg and ffprobe in PATH
  • VISION_API_KEY — API key for the vision endpoint
  • VISION_API_BASE — (optional) OpenAI-compatible base URL; omit to use OpenAI default
  • VISION_MODEL — (optional) model name, default gpt-4o-mini
  • auth-rt in PATH (for search / detect-and-search API calls)

Example provider configs

# OpenAI (default)
VISION_API_KEY=sk-...

# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
VISION_API_KEY=...
VISION_API_BASE=http://localhost:11434/v1
VISION_MODEL=llava:13b

Rules — MUST follow

  1. Execute only, do not reason about internals. Run the CLI and return the output.
  2. No fallback strategies. Report errors as-is; do NOT try alternative approaches.
  3. No retry loops. If detection or search fails, report the failure.
  4. Trust the tool's output. The CLI handles session management and error formatting internally.