skill: translate user-facing docs to Chinese, add detect-best commands
register-skill-release / register (push) Successful in 22s Details

- SKILL.md / README.md: full Chinese translation for Chinese users
- scripts/run.ts: help text in Chinese
- src/: add detectBest and detectBestAndSearch commands

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
ywkj 2026-04-25 15:13:07 +08:00
parent 33e3d378cc
commit 91a623751d
6 changed files with 232 additions and 142 deletions

View File

@ -1,62 +1,62 @@
# video-product-snapshot
# video-product-snapshot — 视频商品截图
Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
检测视频中的电商商品,提取最佳商品画面,并通过图片搜索在 1688 找同款。
## How it works
## 工作原理
1. Extracts frames from the video at a configurable interval using `ffmpeg`
2. Sends each frame to a vision model to detect whether a product is visible and rate confidence
3. Picks the highest-confidence frame as the best snapshot
4. Optionally calls an image-search API with the snapshot to find matching products
1. 使用 `ffmpeg` 按配置间隔从视频抽帧
2. 将每帧发给视觉模型,检测是否有商品并评分
3. 选出置信度最高的帧作为最佳商品截图
4. 可选:用这张截图调用图片搜索 API 找同款商品
## Install
## 安装
```bash
bun install
bun run build # outputs dist/run.js
bun run build # 输出到 dist/run.js
```
## Usage
## 使用方法
```bash
bun dist/run.js <command> [options]
```
### Commands
### 命令
| Command | Description |
|---------|-------------|
| `detect <video>` | Extract frames and detect product snapshots |
| `search <image>` | Search products by image via API |
| `detect-and-search <video>` | Full pipeline: detect best snapshot then search |
| `session` | Print current auth session token |
| 命令 | 说明 |
|------|------|
| `detect <video>` | 抽帧并检测商品画面 |
| `search <image>` | 用图片搜同款 |
| `detect-and-search <video>` | 完整流程:检测最佳画面 → 搜图 |
| `session` | 打印当前认证 session token |
### Options (`detect` / `detect-and-search`)
### 选项(`detect` / `detect-and-search`
| Flag | Default | Description |
|------|---------|-------------|
| `--interval=<sec>` | `1` | Seconds between sampled frames |
| `--max-frames=<n>` | `60` | Max frames to analyze |
| `--output-dir=<dir>` | next to video | Directory to save extracted frames |
| `--min-confidence=<0-1>` | `0.7` | Minimum confidence to include a frame |
| `--dry-run` | — | Parse args and print config without running |
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `--interval=<秒>` | `1` | 抽帧间隔(秒) |
| `--max-frames=<数量>` | `60` | 最多分析帧数 |
| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
| `--dry-run` | — | 解析参数并打印配置,不实际执行 |
### Examples
### 示例
```bash
# Detect products, sample every 3 seconds
# 检测商品,每 3 秒抽一帧
bun dist/run.js detect ./demo.mp4 --interval=3
# Full pipeline with higher confidence threshold
# 完整流程 + 更高置信度门槛
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# Search using an existing snapshot image
# 用已有截图搜同款
bun dist/run.js search ./snapshot.jpg
```
## Output
## 输出
All commands return JSON to stdout.
所有命令输出 JSON 到 stdout。
```json
{
@ -73,30 +73,30 @@ All commands return JSON to stdout.
}
```
- `productFrames`all detected frames sorted by confidence (highest first)
- `bestSnapshot`the top-ranked frame
- `searchBody`image-search API response (only for `search` / `detect-and-search`)
- `productFrames`所有检测到的画面,按置信度排序(最高在前)
- `bestSnapshot`排名第一的画面
- `searchBody`图片搜索 API 的返回(仅 `search` / `detect-and-search`
## Environment variables
## 环境变量
The only required configuration is `CLIENT_KEY` in `~/.openclaw/.env`:
唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`
```
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
```
All credentials and endpoints are fetched automatically from the client config via `auth-rt`. No per-skill env vars needed.
所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取,无需额外配置。
### Optional overrides
### 可选覆盖
| Variable | Description |
|----------|-------------|
| `VISION_MODEL` | Override model name (default: `aliyun-cp-multimodal`) |
| `AUTH_RT_BIN` | Override path to the `auth-rt` binary |
| `TELEMETRY_ENDPOINT` | POST execution results to a telemetry endpoint |
| 变量 | 说明 |
|------|------|
| `VISION_MODEL` | 覆盖模型名称(默认:`aliyun-cp-multimodal` |
| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |
## Prerequisites
## 前置依赖
- [Bun](https://bun.sh) runtime
- `ffmpeg` and `ffprobe` in PATH
- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)
- [Bun](https://bun.sh) 运行时
- 系统 PATH 中包含 `ffmpeg``ffprobe`
- 系统 PATH 中包含 `auth-rt` CLI`search` / `detect-and-search` 需要)

131
SKILL.md
View File

@ -1,126 +1,101 @@
---
name: video-product-snapshot
description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it."
description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it. / 检测视频中的商品提取最佳商品截图并通过图片搜索在1688找同款。当用户提供视频想找商品时使用。"
---
# Video Product Snapshot
# Video Product Snapshot — 视频商品截图
Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.
从视频中提取最佳商品画面,通过 Claude Vision 检测并截取,然后在 1688 上以图搜图 + 关键词重排序找到同款商品。
## Run
## 运行
```bash
bun dist/run.js <command> [args] [--dry-run]
```
## Commands
## 命令列表
| Command | Description |
|---------|-------------|
| `detect <video-path> [options]` | Extract frames, detect product snapshots |
| `search <image-path>` | Search products by image via API |
| `detect-and-search <video-path> [options]` | Detect best snapshot then run image search |
| `session` | Get auth session token |
| 命令 | 使用场景 |
|------|---------|
| `detect-best-and-search <video>` | **视频输入的默认命令。** 始终找出最佳画面(不管置信度高低),然后搜图。 |
| `detect-best <video>` | 只提取最佳画面,不搜图。 |
| `search <image-path>` | 已经有商品截图了,跳过检测直接搜图。 |
| `detect-and-search <video>` | 旧版。过滤可能太严格导致无结果。建议用 `detect-best-and-search`。 |
| `session` | 获取当前认证会话 token。 |
## Options for `detect` / `detect-and-search`
## `detect-best` / `detect-best-and-search` 选项
| Flag | Default | Description |
|------|---------|-------------|
| `--interval=<sec>` | `1` | Seconds between sampled frames |
| `--max-frames=<n>` | `60` | Max frames to analyze |
| `--output-dir=<dir>` | next to video | Directory to save snapshot images |
| `--min-confidence=<0-1>` | `0.7` | Minimum detection confidence threshold |
| 参数 | 默认值 | 说明 |
|------|--------|------|
| `--interval=<秒>` | `0.5` | 抽帧间隔(秒) |
| `--max-frames=<数量>` | `60` | 最多抽帧数 |
| `--output-dir=<目录>` | 视频同目录 | 帧图片保存目录 |
## Examples
## 画面选择原理
```bash
# Detect product frames in a video
bun dist/run.js detect ./product-demo.mp4
两轮 Vision 流水线:
# Sample every 5 seconds, higher confidence threshold
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
1. **过滤轮**(仅 `detect` / `detect-and-search`)—— 每帧二分类:保留/丢弃。可能过于严格返回空。
2. **排名轮** —— 所有候选帧一起发给模型,从中选出最清晰、最完整、最突出的一张商品图。
# Search for products using an existing image
bun dist/run.js search ./snapshot.jpg
`detect-best` 跳过第一轮,所有帧直接进排名轮。超过 20 帧时会均匀采样到 20 帧再调用。**只要视频能出帧,就一定返回结果。**
# Full pipeline: detect best product frame then search
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
```
## 输出格式
## Output
Returns JSON with:
- `productFrames[]`: all detected product frames sorted by confidence (highest first)
- `bestSnapshot`: the highest-confidence product frame
- `searchBody`: image search API response (for `detect-and-search` and `search`)
Each `ProductFrame` contains:
```json
{
"bestSnapshot": {
"frameIndex": 4,
"timestampSeconds": 9,
"imagePath": "/path/to/snapshot/frame_0004.jpg",
"confidence": 0.92,
"timestampSeconds": 2,
"imagePath": "/path/to/frame_0004.jpg",
"croppedImagePath": "/path/to/frame_0004_cropped.jpg",
"confidence": 0.95,
"description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
"boundingHint": "Product fully visible, centered, no hands"
},
"rerank": {
"results": [...]
}
}
```
## Prerequisites
## 结果展示格式
- `ffmpeg` and `ffprobe` in PATH
- `VISION_API_KEY` — API key for the vision endpoint
- `VISION_API_BASE` — (optional) OpenAI-compatible base URL; omit to use OpenAI default
- `VISION_MODEL` — (optional) model name, default `gpt-4o-mini`
- `auth-rt` in PATH (for `search` / `detect-and-search` API calls)
### Example provider configs
```bash
# OpenAI (default)
VISION_API_KEY=sk-...
# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
VISION_API_KEY=...
VISION_API_BASE=http://localhost:11434/v1
VISION_MODEL=llava:13b
```
## Result formatting
After the CLI completes, format `rerank.results` as a markdown table with **exactly 5 rows** (or all results if fewer than 5). Do NOT split into "最佳匹配" / "其他热门选项" — show everything in one flat table.
CLI 执行完成后,将 `rerank.results` 格式化为 markdown 表格,**每页 5 行**(如不足 5 行则全显示)。
| # | 商品名称 | 价格 | 销量 | 链接 |
|---|----------|------|------|------|
| 1 | {title} | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看](https://detail.1688.com/offer/{num_iid}.html) |
- Use `promotion_price` when present, otherwise `price`
- If `sales` is missing or zero, show `—`
- Always render as a markdown table, never as bullet points
- 有 `promotion_price` 用促销价,否则用原价
- `sales` 缺失或为零时显示 `—`
- 始终用 markdown 表格展示,不要用列表
## Execution rules
## 执行规则
### For `detect` and `detect-and-search` (slow — use sub-agent)
### 视频命令(慢 — 用 sub-agent 执行)
Spawn a sub-agent via `sessions_spawn`. Do **not** run the command directly.
涉及命令:`detect-best-and-search`、`detect-best`、`detect-and-search`
使用 `sessions_spawn` 创建 sub-agent 执行,**不要直接运行**。
```
sessions_spawn(
task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <完整命令>\n\nCopy the entire JSON output as your reply.",
label: "video-product-snapshot",
runTimeoutSeconds: 300,
)
```
- Announce immediately that processing has started and share the `runId`.
- Wait for the sub-agent announcement, then parse and format the result for the user.
- 通知用户处理已开始,告知 `runId`
- 等待 sub-agent 返回结果,然后解析并展示
### For `search` and `session` (fast — run directly)
### `search``session`(快 — 直接运行)
Run the CLI command inline, no sub-agent needed.
直接在本会话中运行,不需要 sub-agent。
### General rules
### 通用规则
1. **No fallback strategies.** Report errors as-is; do NOT try alternative approaches.
2. **No retry loops.** If detection or search fails, report the failure.
3. **Trust the tool's output.** The CLI handles session management and error formatting internally.
1. **视频输入 → 始终用 `detect-best-and-search`。** 不要用 `detect-and-search`
2. **不要重试。** 命令失败就直接报错。
3. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。

View File

@ -22,31 +22,31 @@ function loadDotenv(path: string): void {
}
function printUsage(): void {
console.error(`Usage:
console.error(`用法:
bun scripts/run.ts [--api-base=<url>] <command> [args...] [--dry-run]
Commands:
:
session
Get auth session token
session token
detect <video-path> [options]
Extract frames and detect ecommerce product snapshots
Options:
--interval=<seconds> Frame sampling interval (default: 1)
--max-frames=<n> Max frames to analyze (default: 60)
--output-dir=<dir> Where to save snapshots (default: next to video)
--min-confidence=<0-1> Minimum detection confidence (default: 0.7)
:
--interval=<> 默认: 1
--max-frames=<数量> 默认: 60
--output-dir=<目录> 默认: 视频所在目录
--min-confidence=<0-1> 默认: 0.7
search <image-path>
Search for products using an image via the ecom image-search API
ecom image-search API
detect-and-search <video-path> [options]
Detect best product snapshot from video then run image search + rerank
rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>]
Filter image search results using keyword intersection
Config: ~/.openclaw/.env (CLIENT_KEY), skill .env (VISION_API_KEY)
: ~/.openclaw/.env (CLIENT_KEY), skill .env (VISION_API_KEY)
`);
}

View File

@ -3,7 +3,7 @@ import * as path from 'path';
import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem } from './types.ts';
import { createSkillClient } from './auth-cli.ts';
import { extractFrames } from './frame-extractor.ts';
import { detectProductFrames } from './product-detector.ts';
import { detectProductFrames, detectBestFrame } from './product-detector.ts';
import { imageToBase64 } from './frame-extractor.ts';
import { generateText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
@ -39,6 +39,10 @@ export async function run(
return runSearch(args, dryRun);
case 'detect-and-search':
return runDetectAndSearch(args, dryRun);
case 'detect-best':
return runDetectBest(args, dryRun);
case 'detect-best-and-search':
return runDetectBestAndSearch(args, dryRun);
case 'rerank':
return runRerank(args, dryRun);
default:
@ -125,6 +129,83 @@ async function runSearch(args: string[], dryRun: boolean): Promise<SearchResult>
return { status: 'success', command: 'search', dryRun, imagePath, searchHttpStatus, searchBody: body };
}
async function runDetectBest(args: string[], dryRun: boolean): Promise<DetectResult> {
const videoPath = args[0];
if (!videoPath) return { status: 'failed', command: 'detect-best', dryRun, error: 'detect-best requires <video-path>' };
if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-best', dryRun, error: `video not found: ${videoPath}` };
const outputDir = getFlag(args, '--output-dir') || path.join(
path.dirname(videoPath),
`snapshots_${path.basename(videoPath, path.extname(videoPath))}_${Date.now()}`,
);
const intervalSeconds = parseFloat(getFlag(args, '--interval') || '0.5');
const maxFrames = parseInt(getFlag(args, '--max-frames') || '60', 10);
if (dryRun) {
return { status: 'success', command: 'detect-best', dryRun, videoPath, totalFramesExtracted: 0, productFrames: [], bestSnapshot: undefined };
}
const client = createSkillClient();
const visionConfig = await loadVisionConfig(client);
const frames = extractFrames(videoPath, outputDir, intervalSeconds, maxFrames);
if (frames.length === 0) {
return { status: 'failed', command: 'detect-best', dryRun, videoPath, error: 'no frames extracted from video' };
}
const best = await detectBestFrame(frames, 10, visionConfig);
return {
status: 'success',
command: 'detect-best',
dryRun,
videoPath,
totalFramesExtracted: frames.length,
productFrames: best ? [best] : [],
bestSnapshot: best ?? undefined,
};
}
async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
const detectResult = await runDetectBest(args, dryRun) as DetectResult;
if (detectResult.status === 'failed') return detectResult;
if (!detectResult.bestSnapshot) {
if (dryRun) return { ...detectResult, command: 'detect-best-and-search' };
return { ...detectResult, status: 'failed', error: 'no frame could be extracted from video' };
}
const best = detectResult.bestSnapshot;
const imageForSearch = best.croppedImagePath || best.imagePath;
const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;
let rerankResult: any = undefined;
if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
try {
fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
rerankResult = await runRerank([
`--image-results=${tmpFile}`,
`--description=${best.description}`,
'--top=10',
], dryRun);
} catch (e: any) {
rerankResult = { error: e.message };
} finally {
try { fs.unlinkSync(tmpFile); } catch {}
}
}
return {
...detectResult,
command: 'detect-best-and-search',
searchHttpStatus: searchResult.searchHttpStatus,
searchBody: searchResult.searchBody,
searchError: searchResult.error,
rerank: rerankResult,
} as any;
}
async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
const detectResult = await runDetect(args, dryRun) as DetectResult;
if (detectResult.status === 'failed') return detectResult;

View File

@ -151,6 +151,40 @@ async function withConcurrency<T>(
return results;
}
// Skips Pass 1 filter entirely — ranks all frames and always returns the best one.
// Evenly samples down to maxCandidates when there are too many frames.
export async function detectBestFrame(
frames: ExtractedFrame[],
concurrency: number = 10,
visionConfig: VisionConfig,
maxCandidates: number = 20,
): Promise<ProductFrame | null> {
if (frames.length === 0) return null;
const model = createVisionModel(visionConfig);
let candidates = frames;
if (frames.length > maxCandidates) {
const step = frames.length / maxCandidates;
candidates = Array.from({ length: maxCandidates }, (_, i) => frames[Math.floor(i * step)]);
}
const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
return {
frameIndex: bestFrame.frameIndex,
timestampSeconds: bestFrame.timestampSeconds,
imagePath: bestFrame.imagePath,
croppedImagePath: croppedPath,
confidence: 0.95,
description,
boundingHint: reasoning,
};
}
export async function detectProductFrames(
frames: ExtractedFrame[],
minConfidence: number,

View File

@ -1,4 +1,4 @@
export type Command = 'detect' | 'search' | 'detect-and-search' | 'rerank' | 'session';
export type Command = 'detect' | 'search' | 'detect-and-search' | 'detect-best' | 'detect-best-and-search' | 'rerank' | 'session';
export interface SearchItem {
num_iid: number;