3.1 KiB
3.1 KiB
video-product-snapshot
Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
How it works
- Extracts frames from the video at a configurable interval using
ffmpeg - Sends each frame to a vision model to detect whether a product is visible and rate confidence
- Picks the highest-confidence frame as the best snapshot
- Optionally calls an image-search API with the snapshot to find matching products
Install
bun install
bun run build # outputs dist/run.js
Usage
bun dist/run.js <command> [options]
Commands
| Command | Description |
|---|---|
detect <video> |
Extract frames and detect product snapshots |
search <image> |
Search products by image via API |
detect-and-search <video> |
Full pipeline: detect best snapshot then search |
session |
Print current auth session token |
Options (detect / detect-and-search)
| Flag | Default | Description |
|---|---|---|
--interval=<sec> |
1 |
Seconds between sampled frames |
--max-frames=<n> |
60 |
Max frames to analyze |
--output-dir=<dir> |
next to video | Directory to save extracted frames |
--min-confidence=<0-1> |
0.7 |
Minimum confidence to include a frame |
--dry-run |
— | Parse args and print config without running |
Examples
# Detect products, sample every 3 seconds
bun dist/run.js detect ./demo.mp4 --interval=3
# Full pipeline with higher confidence threshold
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# Search using an existing snapshot image
bun dist/run.js search ./snapshot.jpg
Output
All commands return JSON to stdout.
{
"bestSnapshot": {
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/frame_0004.jpg",
"confidence": 0.92,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
},
"productFrames": [...],
"searchBody": { ... }
}
productFrames— all detected frames sorted by confidence (highest first)bestSnapshot— the top-ranked framesearchBody— image-search API response (only forsearch/detect-and-search)
Environment variables
The only required configuration is CLIENT_KEY in ~/.openclaw/.env:
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
Everything else — vision API key, image search endpoints — is fetched automatically from the client config via auth-rt. No per-skill env vars needed.
Optional overrides
| Variable | Description |
|---|---|
VISION_MODEL |
Override model name (default: aliyun-cp-multimodal) |
VISION_API_BASE |
Override vision API base URL |
VISION_API_KEY |
Override vision API key |
AUTH_RT_BIN |
Override path to the auth-rt binary |
TELEMETRY_ENDPOINT |
POST execution results to a telemetry endpoint |
Prerequisites
- Bun runtime
ffmpegandffprobein PATHauth-rtCLI in PATH (required forsearch/detect-and-search)