skill: translate user-facing docs to Chinese, add detect-best commands

- SKILL.md / README.md: full Chinese translation for Chinese users - scripts/run.ts: help text in Chinese - src/: add detectBest and detectBestAndSearch commands Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:13:07 +08:00 · 2026-04-25 15:13:07 +08:00 · 91a623751d
parent 33e3d378cc
commit 91a623751d
6 changed files with 232 additions and 142 deletions
--- a/README.md
+++ b/README.md
@ -1,62 +1,62 @@
-# video-product-snapshot
+# video-product-snapshot — 视频商品截图

-Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
+检测视频中的电商商品，提取最佳商品画面，并通过图片搜索在 1688 找同款。

-## How it works
+## 工作原理

-1. Extracts frames from the video at a configurable interval using `ffmpeg`
-2. Sends each frame to a vision model to detect whether a product is visible and rate confidence
-3. Picks the highest-confidence frame as the best snapshot
-4. Optionally calls an image-search API with the snapshot to find matching products
+1. 使用 `ffmpeg` 按配置间隔从视频抽帧
+2. 将每帧发给视觉模型，检测是否有商品并评分
+3. 选出置信度最高的帧作为最佳商品截图
+4. 可选：用这张截图调用图片搜索 API 找同款商品

-## Install
+## 安装

 ```bash
 bun install
-bun run build        # outputs dist/run.js
+bun run build        # 输出到 dist/run.js
 ```

-## Usage
+## 使用方法

 ```bash
 bun dist/run.js <command> [options]
 ```

-### Commands
+### 命令

-| Command | Description |
-|---------|-------------|
-| `detect <video>` | Extract frames and detect product snapshots |
-| `search <image>` | Search products by image via API |
-| `detect-and-search <video>` | Full pipeline: detect best snapshot then search |
-| `session` | Print current auth session token |
+| 命令 | 说明 |
+|------|------|
+| `detect <video>` | 抽帧并检测商品画面 |
+| `search <image>` | 用图片搜同款 |
+| `detect-and-search <video>` | 完整流程：检测最佳画面 → 搜图 |
+| `session` | 打印当前认证 session token |

-### Options (`detect` / `detect-and-search`)
+### 选项（`detect` / `detect-and-search`）

-| Flag | Default | Description |
-|------|---------|-------------|
-| `--interval=<sec>` | `1` | Seconds between sampled frames |
-| `--max-frames=<n>` | `60` | Max frames to analyze |
-| `--output-dir=<dir>` | next to video | Directory to save extracted frames |
-| `--min-confidence=<0-1>` | `0.7` | Minimum confidence to include a frame |
-| `--dry-run` | — | Parse args and print config without running |
+| 参数 | 默认值 | 说明 |
+|------|--------|------|
+| `--interval=<秒>` | `1` | 抽帧间隔（秒） |
+| `--max-frames=<数量>` | `60` | 最多分析帧数 |
+| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
+| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
+| `--dry-run` | — | 解析参数并打印配置，不实际执行 |

-### Examples
+### 示例

 ```bash
-# Detect products, sample every 3 seconds
+# 检测商品，每 3 秒抽一帧
 bun dist/run.js detect ./demo.mp4 --interval=3

-# Full pipeline with higher confidence threshold
+# 完整流程 + 更高置信度门槛
 bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85

-# Search using an existing snapshot image
+# 用已有截图搜同款
 bun dist/run.js search ./snapshot.jpg
 ```

-## Output
+## 输出

-All commands return JSON to stdout.
+所有命令输出 JSON 到 stdout。

 ```json
 {
@ -73,30 +73,30 @@ All commands return JSON to stdout.
 }
 ```

- `productFrames` — all detected frames sorted by confidence (highest first)
- `bestSnapshot` — the top-ranked frame
- `searchBody` — image-search API response (only for `search` / `detect-and-search`)
+- `productFrames` — 所有检测到的画面，按置信度排序（最高在前）
+- `bestSnapshot` — 排名第一的画面
+- `searchBody` — 图片搜索 API 的返回（仅 `search` / `detect-and-search`）

-## Environment variables
+## 环境变量

-The only required configuration is `CLIENT_KEY` in `~/.openclaw/.env`:
+唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`：

 ```
 CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
 ```

-All credentials and endpoints are fetched automatically from the client config via `auth-rt`. No per-skill env vars needed.
+所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取，无需额外配置。

-### Optional overrides
+### 可选覆盖

-| Variable | Description |
-|----------|-------------|
-| `VISION_MODEL` | Override model name (default: `aliyun-cp-multimodal`) |
-| `AUTH_RT_BIN` | Override path to the `auth-rt` binary |
-| `TELEMETRY_ENDPOINT` | POST execution results to a telemetry endpoint |
+| 变量 | 说明 |
+|------|------|
+| `VISION_MODEL` | 覆盖模型名称（默认：`aliyun-cp-multimodal`） |
+| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
+| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |

-## Prerequisites
+## 前置依赖

- [Bun](https://bun.sh) runtime
- `ffmpeg` and `ffprobe` in PATH
- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)
+- [Bun](https://bun.sh) 运行时
+- 系统 PATH 中包含 `ffmpeg` 和 `ffprobe`
+- 系统 PATH 中包含 `auth-rt` CLI（`search` / `detect-and-search` 需要）
--- a/SKILL.md
+++ b/SKILL.md
@ -1,126 +1,101 @@
 ---
 name: video-product-snapshot
-description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it."
+description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it. / 检测视频中的商品，提取最佳商品截图，并通过图片搜索在1688找同款。当用户提供视频想找商品时使用。"
 ---

-# Video Product Snapshot
+# Video Product Snapshot — 视频商品截图

-Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.
+从视频中提取最佳商品画面，通过 Claude Vision 检测并截取，然后在 1688 上以图搜图 + 关键词重排序找到同款商品。

-## Run
+## 运行

 ```bash
 bun dist/run.js <command> [args] [--dry-run]
 ```

-## Commands
+## 命令列表

-| Command | Description |
-|---------|-------------|
-| `detect <video-path> [options]` | Extract frames, detect product snapshots |
-| `search <image-path>` | Search products by image via API |
-| `detect-and-search <video-path> [options]` | Detect best snapshot then run image search |
-| `session` | Get auth session token |
+| 命令 | 使用场景 |
+|------|---------|
+| `detect-best-and-search <video>` | **视频输入的默认命令。** 始终找出最佳画面（不管置信度高低），然后搜图。 |
+| `detect-best <video>` | 只提取最佳画面，不搜图。 |
+| `search <image-path>` | 已经有商品截图了，跳过检测直接搜图。 |
+| `detect-and-search <video>` | 旧版。过滤可能太严格导致无结果。建议用 `detect-best-and-search`。 |
+| `session` | 获取当前认证会话 token。 |

-## Options for `detect` / `detect-and-search`
+## `detect-best` / `detect-best-and-search` 选项

-| Flag | Default | Description |
-|------|---------|-------------|
-| `--interval=<sec>` | `1` | Seconds between sampled frames |
-| `--max-frames=<n>` | `60` | Max frames to analyze |
-| `--output-dir=<dir>` | next to video | Directory to save snapshot images |
-| `--min-confidence=<0-1>` | `0.7` | Minimum detection confidence threshold |
+| 参数 | 默认值 | 说明 |
+|------|--------|------|
+| `--interval=<秒>` | `0.5` | 抽帧间隔（秒） |
+| `--max-frames=<数量>` | `60` | 最多抽帧数 |
+| `--output-dir=<目录>` | 视频同目录 | 帧图片保存目录 |

-## Examples
+## 画面选择原理

-```bash
-# Detect product frames in a video
-bun dist/run.js detect ./product-demo.mp4
+两轮 Vision 流水线：

-# Sample every 5 seconds, higher confidence threshold
-bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
+1. **过滤轮**（仅 `detect` / `detect-and-search`）—— 每帧二分类：保留/丢弃。可能过于严格返回空。
+2. **排名轮** —— 所有候选帧一起发给模型，从中选出最清晰、最完整、最突出的一张商品图。

-# Search for products using an existing image
-bun dist/run.js search ./snapshot.jpg
+`detect-best` 跳过第一轮，所有帧直接进排名轮。超过 20 帧时会均匀采样到 20 帧再调用。**只要视频能出帧，就一定返回结果。**

-# Full pipeline: detect best product frame then search
-bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
-```
+## 输出格式

-## Output
-
-Returns JSON with:
- `productFrames[]`: all detected product frames sorted by confidence (highest first)
- `bestSnapshot`: the highest-confidence product frame
- `searchBody`: image search API response (for `detect-and-search` and `search`)
-
-Each `ProductFrame` contains:
 ```json
 {
+  "bestSnapshot": {
    "frameIndex": 4,
-  "timestampSeconds": 9,
-  "imagePath": "/path/to/snapshot/frame_0004.jpg",
-  "confidence": 0.92,
+    "timestampSeconds": 2,
+    "imagePath": "/path/to/frame_0004.jpg",
+    "croppedImagePath": "/path/to/frame_0004_cropped.jpg",
+    "confidence": 0.95,
    "description": "White sneaker with blue logo, left side view",
-  "boundingHint": "centered"
+    "boundingHint": "Product fully visible, centered, no hands"
+  },
+  "rerank": {
+    "results": [...]
+  }
 }
 ```

-## Prerequisites
+## 结果展示格式

- `ffmpeg` and `ffprobe` in PATH
- `VISION_API_KEY` — API key for the vision endpoint
- `VISION_API_BASE` — (optional) OpenAI-compatible base URL; omit to use OpenAI default
- `VISION_MODEL` — (optional) model name, default `gpt-4o-mini`
- `auth-rt` in PATH (for `search` / `detect-and-search` API calls)
-
-### Example provider configs
-
-```bash
-# OpenAI (default)
-VISION_API_KEY=sk-...
-
-# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
-VISION_API_KEY=...
-VISION_API_BASE=http://localhost:11434/v1
-VISION_MODEL=llava:13b
-```
-
-## Result formatting
-
-After the CLI completes, format `rerank.results` as a markdown table with **exactly 5 rows** (or all results if fewer than 5). Do NOT split into "最佳匹配" / "其他热门选项" — show everything in one flat table.
+CLI 执行完成后，将 `rerank.results` 格式化为 markdown 表格，**每页 5 行**（如不足 5 行则全显示）。

 | # | 商品名称 | 价格 | 销量 | 链接 |
 |---|----------|------|------|------|
 | 1 | {title}  | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看](https://detail.1688.com/offer/{num_iid}.html) |

- Use `promotion_price` when present, otherwise `price`
- If `sales` is missing or zero, show `—`
- Always render as a markdown table, never as bullet points
+- 有 `promotion_price` 用促销价，否则用原价
+- `sales` 缺失或为零时显示 `—`
+- 始终用 markdown 表格展示，不要用列表

-## Execution rules
+## 执行规则

-### For `detect` and `detect-and-search` (slow — use sub-agent)
+### 视频命令（慢 — 用 sub-agent 执行）

-Spawn a sub-agent via `sessions_spawn`. Do **not** run the command directly.
+涉及命令：`detect-best-and-search`、`detect-best`、`detect-and-search`
+
+使用 `sessions_spawn` 创建 sub-agent 执行，**不要直接运行**。

 ```
 sessions_spawn(
-  task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
+  task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <完整命令>\n\nCopy the entire JSON output as your reply.",
  label: "video-product-snapshot",
  runTimeoutSeconds: 300,
 )
 ```

- Announce immediately that processing has started and share the `runId`.
- Wait for the sub-agent announcement, then parse and format the result for the user.
+- 通知用户处理已开始，告知 `runId`
+- 等待 sub-agent 返回结果，然后解析并展示

-### For `search` and `session` (fast — run directly)
+### `search` 和 `session`（快 — 直接运行）

-Run the CLI command inline, no sub-agent needed.
+直接在本会话中运行，不需要 sub-agent。

-### General rules
+### 通用规则

-1. **No fallback strategies.** Report errors as-is; do NOT try alternative approaches.
-2. **No retry loops.** If detection or search fails, report the failure.
-3. **Trust the tool's output.** The CLI handles session management and error formatting internally.
+1. **视频输入 → 始终用 `detect-best-and-search`。** 不要用 `detect-and-search`。
+2. **不要重试。** 命令失败就直接报错。
+3. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。
--- a/scripts/run.ts
+++ b/scripts/run.ts
@ -22,31 +22,31 @@ function loadDotenv(path: string): void {
 }

 function printUsage(): void {
-  console.error(`Usage:
+  console.error(`用法:
  bun scripts/run.ts [--api-base=<url>] <command> [args...] [--dry-run]

-Commands:
+命令:
  session
-      Get auth session token
+      获取认证 session token

  detect <video-path> [options]
-      Extract frames and detect ecommerce product snapshots
-      Options:
-        --interval=<seconds>       Frame sampling interval (default: 1)
-        --max-frames=<n>           Max frames to analyze (default: 60)
-        --output-dir=<dir>         Where to save snapshots (default: next to video)
-        --min-confidence=<0-1>     Minimum detection confidence (default: 0.7)
+      从视频抽帧并检测商品画面
+      选项:
+        --interval=<秒>            抽帧间隔（默认: 1）
+        --max-frames=<数量>        最多分析帧数（默认: 60）
+        --output-dir=<目录>        截图保存目录（默认: 视频所在目录）
+        --min-confidence=<0-1>     最低检测置信度（默认: 0.7）

  search <image-path>
-      Search for products using an image via the ecom image-search API
+      用图片搜索商品（调用 ecom image-search API）

  detect-and-search <video-path> [options]
-      Detect best product snapshot from video then run image search + rerank
+      检测最佳商品画面 → 图片搜索 → 关键词重排序

  rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>]
-      Filter image search results using keyword intersection
+      通过关键词交并集过滤搜索结果

-Config: ~/.openclaw/.env (CLIENT_KEY), skill .env (VISION_API_KEY)
+配置文件: ~/.openclaw/.env (CLIENT_KEY), skill 目录 .env (VISION_API_KEY)
 `);
 }

--- a/src/index.ts
+++ b/src/index.ts
@ -3,7 +3,7 @@ import * as path from 'path';
 import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem } from './types.ts';
 import { createSkillClient } from './auth-cli.ts';
 import { extractFrames } from './frame-extractor.ts';
-import { detectProductFrames } from './product-detector.ts';
+import { detectProductFrames, detectBestFrame } from './product-detector.ts';
 import { imageToBase64 } from './frame-extractor.ts';
 import { generateText } from 'ai';
 import { createOpenAI } from '@ai-sdk/openai';
@ -39,6 +39,10 @@ export async function run(
      return runSearch(args, dryRun);
    case 'detect-and-search':
      return runDetectAndSearch(args, dryRun);
+    case 'detect-best':
+      return runDetectBest(args, dryRun);
+    case 'detect-best-and-search':
+      return runDetectBestAndSearch(args, dryRun);
    case 'rerank':
      return runRerank(args, dryRun);
    default:
@ -125,6 +129,83 @@ async function runSearch(args: string[], dryRun: boolean): Promise<SearchResult>
  return { status: 'success', command: 'search', dryRun, imagePath, searchHttpStatus, searchBody: body };
 }

+async function runDetectBest(args: string[], dryRun: boolean): Promise<DetectResult> {
+  const videoPath = args[0];
+  if (!videoPath) return { status: 'failed', command: 'detect-best', dryRun, error: 'detect-best requires <video-path>' };
+  if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-best', dryRun, error: `video not found: ${videoPath}` };
+
+  const outputDir = getFlag(args, '--output-dir') || path.join(
+    path.dirname(videoPath),
+    `snapshots_${path.basename(videoPath, path.extname(videoPath))}_${Date.now()}`,
+  );
+  const intervalSeconds = parseFloat(getFlag(args, '--interval') || '0.5');
+  const maxFrames = parseInt(getFlag(args, '--max-frames') || '60', 10);
+
+  if (dryRun) {
+    return { status: 'success', command: 'detect-best', dryRun, videoPath, totalFramesExtracted: 0, productFrames: [], bestSnapshot: undefined };
+  }
+
+  const client = createSkillClient();
+  const visionConfig = await loadVisionConfig(client);
+
+  const frames = extractFrames(videoPath, outputDir, intervalSeconds, maxFrames);
+  if (frames.length === 0) {
+    return { status: 'failed', command: 'detect-best', dryRun, videoPath, error: 'no frames extracted from video' };
+  }
+
+  const best = await detectBestFrame(frames, 10, visionConfig);
+
+  return {
+    status: 'success',
+    command: 'detect-best',
+    dryRun,
+    videoPath,
+    totalFramesExtracted: frames.length,
+    productFrames: best ? [best] : [],
+    bestSnapshot: best ?? undefined,
+  };
+}
+
+async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
+  const detectResult = await runDetectBest(args, dryRun) as DetectResult;
+  if (detectResult.status === 'failed') return detectResult;
+
+  if (!detectResult.bestSnapshot) {
+    if (dryRun) return { ...detectResult, command: 'detect-best-and-search' };
+    return { ...detectResult, status: 'failed', error: 'no frame could be extracted from video' };
+  }
+
+  const best = detectResult.bestSnapshot;
+  const imageForSearch = best.croppedImagePath || best.imagePath;
+  const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;
+
+  let rerankResult: any = undefined;
+  if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
+    const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
+    try {
+      fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
+      rerankResult = await runRerank([
+        `--image-results=${tmpFile}`,
+        `--description=${best.description}`,
+        '--top=10',
+      ], dryRun);
+    } catch (e: any) {
+      rerankResult = { error: e.message };
+    } finally {
+      try { fs.unlinkSync(tmpFile); } catch {}
+    }
+  }
+
+  return {
+    ...detectResult,
+    command: 'detect-best-and-search',
+    searchHttpStatus: searchResult.searchHttpStatus,
+    searchBody: searchResult.searchBody,
+    searchError: searchResult.error,
+    rerank: rerankResult,
+  } as any;
+}
+
 async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
  const detectResult = await runDetect(args, dryRun) as DetectResult;
  if (detectResult.status === 'failed') return detectResult;
--- a/src/product-detector.ts
+++ b/src/product-detector.ts
@ -151,6 +151,40 @@ async function withConcurrency<T>(
  return results;
 }

+// Skips Pass 1 filter entirely — ranks all frames and always returns the best one.
+// Evenly samples down to maxCandidates when there are too many frames.
+export async function detectBestFrame(
+  frames: ExtractedFrame[],
+  concurrency: number = 10,
+  visionConfig: VisionConfig,
+  maxCandidates: number = 20,
+): Promise<ProductFrame | null> {
+  if (frames.length === 0) return null;
+
+  const model = createVisionModel(visionConfig);
+
+  let candidates = frames;
+  if (frames.length > maxCandidates) {
+    const step = frames.length / maxCandidates;
+    candidates = Array.from({ length: maxCandidates }, (_, i) => frames[Math.floor(i * step)]);
+  }
+
+  const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
+
+  const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
+  await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
+
+  return {
+    frameIndex: bestFrame.frameIndex,
+    timestampSeconds: bestFrame.timestampSeconds,
+    imagePath: bestFrame.imagePath,
+    croppedImagePath: croppedPath,
+    confidence: 0.95,
+    description,
+    boundingHint: reasoning,
+  };
+}
+
 export async function detectProductFrames(
  frames: ExtractedFrame[],
  minConfidence: number,
--- a/src/types.ts
+++ b/src/types.ts
@ -1,4 +1,4 @@
-export type Command = 'detect' | 'search' | 'detect-and-search' | 'rerank' | 'session';
+export type Command = 'detect' | 'search' | 'detect-and-search' | 'detect-best' | 'detect-best-and-search' | 'rerank' | 'session';

 export interface SearchItem {
  num_iid: number;