video-product-finder/README.md

# video-product-snapshot

Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.

## How it works

1. Extracts frames from the video at a configurable interval using `ffmpeg`
2. Sends each frame to a vision model to detect whether a product is visible and rate confidence
3. Picks the highest-confidence frame as the best snapshot
4. Optionally calls an image-search API with the snapshot to find matching products

## Install

```bash
bun install
bun run build        # outputs dist/run.js
```

## Usage

```bash
bun dist/run.js <command> [options]
```

### Commands

| Command | Description |
|---------|-------------|
| `detect <video>` | Extract frames and detect product snapshots |
| `search <image>` | Search products by image via API |
| `detect-and-search <video>` | Full pipeline: detect best snapshot then search |
| `session` | Print current auth session token |

### Options (`detect` / `detect-and-search`)

| Flag | Default | Description |
|------|---------|-------------|
| `--interval=<sec>` | `1` | Seconds between sampled frames |
| `--max-frames=<n>` | `60` | Max frames to analyze |
| `--output-dir=<dir>` | next to video | Directory to save extracted frames |
| `--min-confidence=<0-1>` | `0.7` | Minimum confidence to include a frame |
| `--dry-run` | — | Parse args and print config without running |

### Examples

```bash
# Detect products, sample every 3 seconds
bun dist/run.js detect ./demo.mp4 --interval=3

# Full pipeline with higher confidence threshold
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85

# Search using an existing snapshot image
bun dist/run.js search ./snapshot.jpg
```

## Output

All commands return JSON to stdout.

```json
{
  "bestSnapshot": {
    "frameIndex": 4,
    "timestampSeconds": 9,
    "imagePath": "/path/to/frame_0004.jpg",
    "confidence": 0.92,
    "description": "White sneaker with blue logo, left side view",
    "boundingHint": "centered"
  },
  "productFrames": [...],
  "searchBody": { ... }
}
```

- `productFrames` — all detected frames sorted by confidence (highest first)
- `bestSnapshot` — the top-ranked frame
- `searchBody` — image-search API response (only for `search` / `detect-and-search`)

## Environment variables

The only required configuration is `CLIENT_KEY` in `~/.openclaw/.env`:

```
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
```

Everything else — vision API key, image search endpoints — is fetched automatically from the client config via `auth-rt`. No per-skill env vars needed.

### Optional overrides

| Variable | Description |
|----------|-------------|
| `VISION_MODEL` | Override model name (default: `aliyun-cp-multimodal`) |
| `VISION_API_BASE` | Override vision API base URL |
| `VISION_API_KEY` | Override vision API key |
| `AUTH_RT_BIN` | Override path to the `auth-rt` binary |
| `TELEMETRY_ENDPOINT` | POST execution results to a telemetry endpoint |

## Prerequisites

- [Bun](https://bun.sh) runtime
- `ffmpeg` and `ffprobe` in PATH
- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)
docs: add README 2026-04-20 04:06:20 +00:00			`# video-product-snapshot`

			`Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.`

			`## How it works`

			1. Extracts frames from the video at a configurable interval using `ffmpeg`
			`2. Sends each frame to a vision model to detect whether a product is visible and rate confidence`
			`3. Picks the highest-confidence frame as the best snapshot`
			`4. Optionally calls an image-search API with the snapshot to find matching products`

			`## Install`

			```bash
			`bun install`
			`bun run build # outputs dist/run.js`
			```

			`## Usage`

			```bash
			`bun dist/run.js <command> [options]`
			```

			`### Commands`

			`\| Command \| Description \|`
			`\|---------\|-------------\|`
			\| `detect <video>` \| Extract frames and detect product snapshots \|
			\| `search <image>` \| Search products by image via API \|
			\| `detect-and-search <video>` \| Full pipeline: detect best snapshot then search \|
			\| `session` \| Print current auth session token \|

			### Options (`detect` / `detect-and-search`)

			`\| Flag \| Default \| Description \|`
			`\|------\|---------\|-------------\|`
			\| `--interval=<sec>` \| `1` \| Seconds between sampled frames \|
			\| `--max-frames=<n>` \| `60` \| Max frames to analyze \|
			\| `--output-dir=<dir>` \| next to video \| Directory to save extracted frames \|
			\| `--min-confidence=<0-1>` \| `0.7` \| Minimum confidence to include a frame \|
			\| `--dry-run` \| — \| Parse args and print config without running \|

			`### Examples`

			```bash
			`# Detect products, sample every 3 seconds`
			`bun dist/run.js detect ./demo.mp4 --interval=3`

			`# Full pipeline with higher confidence threshold`
			`bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85`

			`# Search using an existing snapshot image`
			`bun dist/run.js search ./snapshot.jpg`
			```

			`## Output`

			`All commands return JSON to stdout.`

			```json
			`{`
			`"bestSnapshot": {`
			`"frameIndex": 4,`
			`"timestampSeconds": 9,`
			`"imagePath": "/path/to/frame_0004.jpg",`
			`"confidence": 0.92,`
			`"description": "White sneaker with blue logo, left side view",`
			`"boundingHint": "centered"`
			`},`
			`"productFrames": [...],`
			`"searchBody": { ... }`
			`}`
			```

			- `productFrames` — all detected frames sorted by confidence (highest first)
			- `bestSnapshot` — the top-ranked frame
			- `searchBody` — image-search API response (only for `search` / `detect-and-search`)

			`## Environment variables`

refactor: load vision config and search endpoints from auth client-config, no skill-level envs needed 2026-04-20 04:14:43 +00:00			The only required configuration is `CLIENT_KEY` in `~/.openclaw/.env`:
docs: add missing env vars for image search and telemetry 2026-04-20 04:08:38 +00:00
refactor: load vision config and search endpoints from auth client-config, no skill-level envs needed 2026-04-20 04:14:43 +00:00			```
			`CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx`
			```
docs: add missing env vars for image search and telemetry 2026-04-20 04:08:38 +00:00
refactor: load vision config and search endpoints from auth client-config, no skill-level envs needed 2026-04-20 04:14:43 +00:00			Everything else — vision API key, image search endpoints — is fetched automatically from the client config via `auth-rt`. No per-skill env vars needed.
docs: add missing env vars for image search and telemetry 2026-04-20 04:08:38 +00:00
refactor: load vision config and search endpoints from auth client-config, no skill-level envs needed 2026-04-20 04:14:43 +00:00			`### Optional overrides`
docs: add missing env vars for image search and telemetry 2026-04-20 04:08:38 +00:00
refactor: load vision config and search endpoints from auth client-config, no skill-level envs needed 2026-04-20 04:14:43 +00:00			`\| Variable \| Description \|`
			`\|----------\|-------------\|`
chore: set default vision model to aliyun-cp-multimodal 2026-04-20 04:18:01 +00:00			\| `VISION_MODEL` \| Override model name (default: `aliyun-cp-multimodal`) \|
refactor: load vision config and search endpoints from auth client-config, no skill-level envs needed 2026-04-20 04:14:43 +00:00			\| `VISION_API_BASE` \| Override vision API base URL \|
			\| `VISION_API_KEY` \| Override vision API key \|
			\| `AUTH_RT_BIN` \| Override path to the `auth-rt` binary \|
			\| `TELEMETRY_ENDPOINT` \| POST execution results to a telemetry endpoint \|
docs: add README 2026-04-20 04:06:20 +00:00
			`## Prerequisites`

			`- [Bun](https://bun.sh) runtime`
			- `ffmpeg` and `ffprobe` in PATH
			- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)