video-product-finder/README.md

114 lines
3.5 KiB
Markdown

# video-product-snapshot
Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
## How it works
1. Extracts frames from the video at a configurable interval using `ffmpeg`
2. Sends each frame to a vision model to detect whether a product is visible and rate confidence
3. Picks the highest-confidence frame as the best snapshot
4. Optionally calls an image-search API with the snapshot to find matching products
## Install
```bash
bun install
bun run build # outputs dist/run.js
```
## Usage
```bash
bun dist/run.js <command> [options]
```
### Commands
| Command | Description |
|---------|-------------|
| `detect <video>` | Extract frames and detect product snapshots |
| `search <image>` | Search products by image via API |
| `detect-and-search <video>` | Full pipeline: detect best snapshot then search |
| `session` | Print current auth session token |
### Options (`detect` / `detect-and-search`)
| Flag | Default | Description |
|------|---------|-------------|
| `--interval=<sec>` | `1` | Seconds between sampled frames |
| `--max-frames=<n>` | `60` | Max frames to analyze |
| `--output-dir=<dir>` | next to video | Directory to save extracted frames |
| `--min-confidence=<0-1>` | `0.7` | Minimum confidence to include a frame |
| `--dry-run` | — | Parse args and print config without running |
### Examples
```bash
# Detect products, sample every 3 seconds
bun dist/run.js detect ./demo.mp4 --interval=3
# Full pipeline with higher confidence threshold
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# Search using an existing snapshot image
bun dist/run.js search ./snapshot.jpg
```
## Output
All commands return JSON to stdout.
```json
{
"bestSnapshot": {
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/frame_0004.jpg",
"confidence": 0.92,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
},
"productFrames": [...],
"searchBody": { ... }
}
```
- `productFrames` — all detected frames sorted by confidence (highest first)
- `bestSnapshot` — the top-ranked frame
- `searchBody` — image-search API response (only for `search` / `detect-and-search`)
## Environment variables
Copy `.env.example` to `.env` and fill in the values.
### Vision (required for `detect`)
| Variable | Required | Description |
|----------|----------|-------------|
| `VISION_API_KEY` | Yes | API key for the vision model |
| `VISION_API_BASE` | No | OpenAI-compatible base URL (default: OpenAI) |
| `VISION_MODEL` | No | Model name (default: `gpt-4o-mini`) |
### Image search (required for `search` / `detect-and-search`)
| Variable | Required | Description |
|----------|----------|-------------|
| `ONEBOUND_UPLOAD_ENDPOINT` | Yes | Endpoint to upload a local image and get a public URL |
| `ONEBOUND_SEARCH_ENDPOINT` | Yes | Reverse image search endpoint |
| `ONEBOUND_KEYWORD_SEARCH_ENDPOINT` | No | Keyword search endpoint for re-ranking results |
These proxy through a local `woo-data-scrawler` instance — no Onebound API key needed directly.
### Other
| Variable | Required | Description |
|----------|----------|-------------|
| `AUTH_RT_BIN` | No | Override path to the `auth-rt` binary |
| `TELEMETRY_ENDPOINT` | No | POST skill execution results to a Loki-compatible endpoint |
## Prerequisites
- [Bun](https://bun.sh) runtime
- `ffmpeg` and `ffprobe` in PATH
- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)