docs: add README

2026-04-20 12:06:20 +08:00 · 2026-04-20 12:06:20 +08:00 · 67abe94938
parent c3523d002e
commit 67abe94938
1 changed files with 99 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -0,0 +1,99 @@
+# video-product-snapshot
+
+Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
+
+## How it works
+
+1. Extracts frames from the video at a configurable interval using `ffmpeg`
+2. Sends each frame to a vision model to detect whether a product is visible and rate confidence
+3. Picks the highest-confidence frame as the best snapshot
+4. Optionally calls an image-search API with the snapshot to find matching products
+
+## Install
+
+```bash
+bun install
+bun run build        # outputs dist/run.js
+```
+
+## Usage
+
+```bash
+bun dist/run.js <command> [options]
+```
+
+### Commands
+
+| Command | Description |
+|---------|-------------|
+| `detect <video>` | Extract frames and detect product snapshots |
+| `search <image>` | Search products by image via API |
+| `detect-and-search <video>` | Full pipeline: detect best snapshot then search |
+| `session` | Print current auth session token |
+
+### Options (`detect` / `detect-and-search`)
+
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--interval=<sec>` | `1` | Seconds between sampled frames |
+| `--max-frames=<n>` | `60` | Max frames to analyze |
+| `--output-dir=<dir>` | next to video | Directory to save extracted frames |
+| `--min-confidence=<0-1>` | `0.7` | Minimum confidence to include a frame |
+| `--dry-run` | — | Parse args and print config without running |
+
+### Examples
+
+```bash
+# Detect products, sample every 3 seconds
+bun dist/run.js detect ./demo.mp4 --interval=3
+
+# Full pipeline with higher confidence threshold
+bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
+
+# Search using an existing snapshot image
+bun dist/run.js search ./snapshot.jpg
+```
+
+## Output
+
+All commands return JSON to stdout.
+
+```json
+{
+  "bestSnapshot": {
+    "frameIndex": 4,
+    "timestampSeconds": 9,
+    "imagePath": "/path/to/frame_0004.jpg",
+    "confidence": 0.92,
+    "description": "White sneaker with blue logo, left side view",
+    "boundingHint": "centered"
+  },
+  "productFrames": [...],
+  "searchBody": { ... }
+}
+```
+
+- `productFrames` — all detected frames sorted by confidence (highest first)
+- `bestSnapshot` — the top-ranked frame
+- `searchBody` — image-search API response (only for `search` / `detect-and-search`)
+
+## Environment variables
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `VISION_API_KEY` | Yes | API key for the vision model endpoint |
+| `VISION_API_BASE` | No | OpenAI-compatible base URL (default: OpenAI) |
+| `VISION_MODEL` | No | Model name (default: `gpt-4o-mini`) |
+
+```bash
+# Use a local or custom provider
+VISION_API_BASE=https://your-llm-endpoint/v1
+VISION_MODEL=claude-haiku-4-5-20251001
+VISION_API_KEY=sk-...
+```
+
+## Prerequisites
+
+- [Bun](https://bun.sh) runtime
+- `ffmpeg` and `ffprobe` in PATH
+- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)