fix: rerank top-N 10→5, 匹配 Feishu 列表展示

fix: 同步 auth-cli.ts 补充 clientConfig() 方法
chore: tweak README
2026-04-26 20:34:57 +08:00 · 2026-04-26 20:15:08 +08:00 · 2026-04-26 20:08:10 +08:00 · 2026-04-26 19:57:03 +08:00 · 2026-04-26 19:45:08 +08:00 · 2026-04-26 19:01:10 +08:00
7 changed files with 127 additions and 91 deletions
--- a/README.md
+++ b/README.md
@ -1,19 +1,22 @@
-# video-product-snapshot — 视频商品截图
+# video-product-snapshot — 视频商品以图搜图
-检测视频中的电商商品，提取最佳商品画面，并通过图片搜索在 1688 找同款。
+从视频中提取最佳商品帧，以图搜图在 1688 找同款。
 ## 工作原理
-1. 使用 `ffmpeg` 按配置间隔从视频抽帧
+1. `ffmpeg` 按 0.5s 间隔抽帧（最多 60 帧）
-2. 将每帧发给视觉模型，检测是否有商品并评分
+2. 视觉质量预过滤（亮度/方差剔除模糊帧）
-3. 选出置信度最高的帧作为最佳商品截图
+3. 容器/架子类产品检测 → 自动选择空载帧
-4. 可选：用这张截图调用图片搜索 API 找同款商品
+4. 视觉模型多帧对比排序，选出最佳商品帧
 5. 裁剪商品区域 → 上传 → 1688 图搜
 6. 后置过滤（视觉模型判断结果是否同款）→ rerank 排序
 ## 安装
 ```bash
 ./install.sh          # 安装 auth-rt + 依赖
 bun install
-bun run build        # 输出到 dist/run.js
+bun run build         # 输出到 dist/run.js
 ```
 ## 使用方法
@ -26,77 +29,74 @@ bun dist/run.js <command> [options]
 | 命令 | 说明 |
 |------|------|
-| `detect <video>` | 抽帧并检测商品画面 |
+| `detect-best-and-search <video>` | **推荐。** 最佳帧 → 图搜 → rerank |
-| `search <image>` | 用图片搜同款 |
+| `detect-best <video>` | 只提取最佳商品帧，不搜图 |
-| `detect-and-search <video>` | 完整流程：检测最佳画面 → 搜图 |
+| `detect-and-search <video>` | 两阶段过滤后图搜（较慢） |
-| `session` | 打印当前认证 session token |
+| `detect <video>` | 抽帧并逐帧检测商品 |
 | `search <image>` | 用已有图片搜同款 |
 | `rerank` | 关键词对图搜结果交叉过滤 |
 | `session` | 获取当前认证会话 token |
-### 选项（`detect` / `detect-and-search`）
+### 选项（`detect-best` / `detect-best-and-search`）
 | 参数 | 默认值 | 说明 |
 |------|--------|------|
-| `--interval=<秒>` | `1` | 抽帧间隔（秒） |
+| `--interval=<秒>` | `0.5` | 帧采样间隔 |
-| `--max-frames=<数量>` | `60` | 最多分析帧数 |
+| `--max-frames=<n>` | `60` | 最大分析帧数 |
-| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
+| `--output-dir=<目录>` | 视频同目录 | 截图保存目录 |
-| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
+| `--session-id=<id>` | 自动生成 | Langfuse session ID |
-| `--dry-run` | — | 解析参数并打印配置，不实际执行 |
+| `--dry-run` | — | 解析参数，不实际执行 |
 ### 示例
 ```bash
 # 检测商品，每 3 秒抽一帧
 bun dist/run.js detect ./demo.mp4 --interval=3
 # 完整流程 + 更高置信度门槛
 bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
 # 用已有截图搜同款
 bun dist/run.js search ./snapshot.jpg
 ```
 ## 输出
-所有命令输出 JSON 到 stdout。
+所有命令输出 JSON 到 stdout，包含 `sessionId` 字段用于 Langfuse 追踪。
 ```json
 {
  "sessionId": "skill-20260426-184345-lb06",
  "status": "success",
  "command": "detect-best-and-search",
  "bestSnapshot": {
-    "frameIndex": 4,
+    "frameIndex": 7,
-    "timestampSeconds": 9,
+    "timestampSeconds": 3,
-    "imagePath": "/path/to/frame_0004.jpg",
+    "imagePath": "/path/to/frame_0007.jpg",
-    "confidence": 0.92,
+    "croppedImagePath": "/path/to/frame_0007_cropped.jpg",
-    "description": "White sneaker with blue logo, left side view",
+    "description": "黑色金属床底鞋架 可折叠移动"
    "boundingHint": "centered"
  },
-  "productFrames": [...],
+  "rerank": {
-  "searchBody": { ... }
+    "keyword": "床底鞋架",
    "results": [
      { "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
    ]
  }
 }
 ```
- `productFrames` — 所有检测到的画面，按置信度排序（最高在前）
+## 鉴权架构
- `bestSnapshot` — 排名第一的画面
+
- `searchBody` — 图片搜索 API 的返回（仅 `search` / `detect-and-search`）
+```
 ~/.openclaw/.env
  CLIENT_KEY ──→ auth-rt ──→ 业务系统
                              ├── /session          → access_token
                              └── /client-config    → provider.api_key
                                                       provider.base_url
                                                       provider.model
 ```
 仅需配置 `CLIENT_KEY`，LLM 凭据和端点均由业务系统下发。
 ## 环境变量
 唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`：
 ```
 CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
 ```
 所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取，无需额外配置。
 ### 可选覆盖
 | 变量 | 说明 |
 |------|------|
-| `VISION_MODEL` | 覆盖模型名称（默认：`aliyun-cp-multimodal`） |
+| `CLIENT_KEY` | **必需。** 在 `~/.openclaw/.env` 中配置 |
 | `VISION_MODEL` | 覆盖模型名称（默认来自 client config） |
 | `SKILL_SESSION_ID` | Langfuse session ID（自动生成，格式 `skill-YYYYMMDD-HHMMSS-xxxx`） |
 | `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
-| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |
+| `TELEMETRY_ENDPOINT` | 遥测上报接口 |
 ## 前置依赖
 - [Bun](https://bun.sh) 运行时
- 系统 PATH 中包含 `ffmpeg` 和 `ffprobe`
+- 系统 PATH 中包含 `ffmpeg` / `ffprobe`（帧提取）
- 系统 PATH 中包含 `auth-rt` CLI（`search` / `detect-and-search` 需要）
+- `auth-rt` CLI（鉴权/API 调用，`install.sh` 自动安装）
--- a/SKILL.md
+++ b/SKILL.md
@ -67,7 +67,7 @@ bun dist/run.js <command> [args] [--dry-run]
 ## 结果展示格式
-将 `rerank.results`（优先）或 `searchBody.data.items.item` 格式化为 markdown 表格，**每页 5 行**：
+将 `rerank.results`（优先）或 `searchBody.data.items.item` 格式化为 markdown 表格，**最多 5 条**：
 | # | 商品名称 | 价格 | 销量 | 链接 |
 |---|----------|------|------|------|
--- a/scripts/run.ts
+++ b/scripts/run.ts
@ -93,17 +93,7 @@ async function main(): Promise<void> {
  if (positionals.length < 1) { printUsage(); process.exit(1); }
  const command = positionals[0] as Command;
-
+  const sessionId = process.env.SKILL_SESSION_ID!; // set by auth-cli.ts at module load
  // Resolve session ID: explicit CLI arg > env > auto-generate structured ID
  const sessionId = process.env.SKILL_SESSION_ID || (() => {
    const ts = new Date();
    const pad = (n: number) => String(n).padStart(2, '0');
    const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
    const rand = Math.random().toString(36).slice(2, 6);
    return `vps-${tsPart}-${rand}`;
  })();
  process.env.SKILL_SESSION_ID = sessionId;
  const startMs = Date.now();
  let result: Awaited<ReturnType<typeof run>>;
--- a/src/auth-cli.ts
+++ b/src/auth-cli.ts
@ -20,6 +20,18 @@ import * as path from 'path';
 import * as os from 'os';
 const home = process.env.HOME || os.homedir();
 // ── session ID (Langfuse tracing) ──
 // Priority: SKILL_SESSION_ID env > auto-generate
 const SESSION_ID = process.env.SKILL_SESSION_ID || (() => {
  const ts = new Date();
  const pad = (n: number) => String(n).padStart(2, '0');
  const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
  const rand = Math.random().toString(36).slice(2, 6);
  return `skill-${tsPart}-${rand}`;
 })();
 process.env.SKILL_SESSION_ID = SESSION_ID;
 const AUTH_RT_BIN = process.env.AUTH_RT_BIN
  || (() => {
    // Check if auth-rt is in PATH
--- a/src/index.ts
+++ b/src/index.ts
@ -212,7 +212,7 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
  // Otherwise fall back to the keyword-intersection rerank.
  if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
-    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 10);
+    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
    rerankResult = {
      source: 'post-filter',
      results: sorted,
@ -225,7 +225,7 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
      rerankResult = await runRerank([
        `--image-results=${tmpFile}`,
        `--description=${best.description}`,
-        '--top=10',
+        '--top=5',
      ], dryRun);
    } catch (e: any) {
      rerankResult = { error: e.message };
@ -360,7 +360,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
  // Otherwise fall back to the keyword-intersection rerank.
  if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
-    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 10);
+    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
    rerankResult = {
      source: 'post-filter',
      results: sorted,
@ -373,7 +373,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
      rerankResult = await runRerank([
        `--image-results=${tmpFile}`,
        `--description=${best.description}`,
-        '--top=10',
+        '--top=5',
      ], dryRun);
    } catch (e: any) {
      rerankResult = { error: e.message };
@ -417,13 +417,25 @@ function getFlag(args: string[], flag: string): string | undefined {
 }
 function createVisionModel(config: VisionConfig) {
-  const headers: Record<string, string> = {
+  const sessionId = config.sessionId || '';
-    'x-langfuse-tags': 'skill:video-product-snapshot',
+  const originFetch = globalThis.fetch;
  // Inject metadata.session_id into request body so LiteLLM → Langfuse creates sessions
  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
    if (init?.body && typeof init.body === 'string') {
      try {
        const body = JSON.parse(init.body);
        if (!body.metadata) body.metadata = {};
        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
        body.metadata.tags = ['skill:video-product-snapshot'];
        init = { ...init, body: JSON.stringify(body) };
      } catch {}
    }
    return originFetch(input, init);
  };
-  if (config.sessionId) {
+  const openai = createOpenAI({
-    headers['x-langfuse-session-id'] = config.sessionId;
+    apiKey: config.apiKey, baseURL: config.baseURL,
-  }
+    fetch: wrapped as typeof globalThis.fetch,
-  const openai = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers });
+  });
  return openai(config.model);
 }
@ -481,7 +493,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
  const positionals = args.filter((a) => !a.startsWith('--'));
  const imageResultsArg = getFlag(args, '--image-results') || positionals[0];
  const keywordArg = getFlag(args, '--keyword') || positionals[1];
-  const topN = parseInt(getFlag(args, '--top') || '10', 10);
+  const topN = parseInt(getFlag(args, '--top') || '5', 10);
  const description = getFlag(args, '--description') || '';
--- a/src/post-filter.ts
+++ b/src/post-filter.ts
@ -34,13 +34,24 @@ const FILTER_PROMPT = (count: number, description?: string) => {
 };
 function createModel(config: VisionConfig) {
-  const headers: Record<string, string> = {
+  const sessionId = config.sessionId || '';
-    'x-langfuse-tags': 'skill:video-product-snapshot',
+  const originFetch = globalThis.fetch;
  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
    if (init?.body && typeof init.body === 'string') {
      try {
        const body = JSON.parse(init.body);
        if (!body.metadata) body.metadata = {};
        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
        body.metadata.tags = ['skill:video-product-snapshot'];
        init = { ...init, body: JSON.stringify(body) };
      } catch {}
    }
    return originFetch(input, init);
  };
-  if (config.sessionId) {
+  const provider = createOpenAI({
-    headers['x-langfuse-session-id'] = config.sessionId;
+    apiKey: config.apiKey, baseURL: config.baseURL,
-  }
+    fetch: wrapped as typeof globalThis.fetch,
-  const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers });
+  });
  return provider(config.model);
 }
--- a/src/product-detector.ts
+++ b/src/product-detector.ts
@ -78,13 +78,24 @@ Return:
 - boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.0–1.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;
 function createVisionModel(config: VisionConfig) {
-  const headers: Record<string, string> = {
+  const sessionId = config.sessionId || '';
-    'x-langfuse-tags': 'skill:video-product-snapshot',
+  const originFetch = globalThis.fetch;
  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
    if (init?.body && typeof init.body === 'string') {
      try {
        const body = JSON.parse(init.body);
        if (!body.metadata) body.metadata = {};
        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
        body.metadata.tags = ['skill:video-product-snapshot'];
        init = { ...init, body: JSON.stringify(body) };
      } catch {}
    }
    return originFetch(input, init);
  };
-  if (config.sessionId) {
+  const provider = createOpenAI({
-    headers['x-langfuse-session-id'] = config.sessionId;
+    apiKey: config.apiKey, baseURL: config.baseURL,
-  }
+    fetch: wrapped as typeof globalThis.fetch,
-  const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers });
+  });
  return provider(config.model);
 }
Author	SHA1	Message	Date
ywkj	c5e1d0c88c	fix: rerank top-N 10→5, 匹配 Feishu 列表展示 register-skill-release / register (push) Successful in 17s Details	2026-04-26 20:34:57 +08:00
ywkj	eb8e7a7daf	fix: 同步 auth-cli.ts 补充 clientConfig() 方法 register-skill-release / register (push) Successful in 19s Details	2026-04-26 20:15:08 +08:00
ywkj	b6dc7af9bf	chore: tweak README register-skill-release / register (push) Successful in 18s Details	2026-04-26 20:08:10 +08:00
ywkj	7a43eb391b	docs: 更新 README 反映当前架构 register-skill-release / register (push) Successful in 18s Details - 补充 detect-best-and-search、detect-best、rerank 命令 - 更新鉴权架构说明（auth-rt 统一鉴权） - 补充 sessionId 和 Langfuse 追踪说明 - 更新环境变量表 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 19:57:03 +08:00
ywkj	d497e92626	fix: 用 fetch wrapper 注入 metadata.session_id 替代 HTTP header register-skill-release / register (push) Successful in 24s Details LiteLLM 不处理 x-langfuse-session-id header。改用 fetch 拦截器在请求体 metadata 里注入 session_id，LiteLLM 直接透传给 Langfuse 创建 session。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 19:45:08 +08:00
ywkj	a6b4f99a83	refactor: session ID 统一由 auth-cli.ts 生成 register-skill-release / register (push) Successful in 21s Details - scripts/run.ts: 移除重复的 session ID 自动生成逻辑 - src/auth-cli.ts: 同步自 auth-runtime（模块级 SKILL_SESSION_ID） Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 19:01:10 +08:00