fix: rerank top-N 10→5, 匹配 Feishu 列表展示

fix: 同步 auth-cli.ts 补充 clientConfig() 方法
chore: tweak README
2026-04-26 20:34:57 +08:00 · 2026-04-26 20:15:08 +08:00 · 2026-04-26 20:08:10 +08:00 · 2026-04-26 19:57:03 +08:00 · 2026-04-26 19:45:08 +08:00 · 2026-04-26 19:01:10 +08:00
8 changed files with 727 additions and 168 deletions
--- a/README.md
+++ b/README.md
@ -1,17 +1,20 @@
-# video-product-snapshot — 视频商品截图
+# video-product-snapshot — 视频商品以图搜图

-检测视频中的电商商品，提取最佳商品画面，并通过图片搜索在 1688 找同款。
+从视频中提取最佳商品帧，以图搜图在 1688 找同款。

 ## 工作原理

-1. 使用 `ffmpeg` 按配置间隔从视频抽帧
-2. 将每帧发给视觉模型，检测是否有商品并评分
-3. 选出置信度最高的帧作为最佳商品截图
-4. 可选：用这张截图调用图片搜索 API 找同款商品
+1. `ffmpeg` 按 0.5s 间隔抽帧（最多 60 帧）
+2. 视觉质量预过滤（亮度/方差剔除模糊帧）
+3. 容器/架子类产品检测 → 自动选择空载帧
+4. 视觉模型多帧对比排序，选出最佳商品帧
+5. 裁剪商品区域 → 上传 → 1688 图搜
+6. 后置过滤（视觉模型判断结果是否同款）→ rerank 排序

 ## 安装

 ```bash
+./install.sh          # 安装 auth-rt + 依赖
 bun install
 bun run build         # 输出到 dist/run.js
 ```
@ -26,77 +29,74 @@ bun dist/run.js <command> [options]

 | 命令 | 说明 |
 |------|------|
-| `detect <video>` | 抽帧并检测商品画面 |
-| `search <image>` | 用图片搜同款 |
-| `detect-and-search <video>` | 完整流程：检测最佳画面 → 搜图 |
-| `session` | 打印当前认证 session token |
+| `detect-best-and-search <video>` | **推荐。** 最佳帧 → 图搜 → rerank |
+| `detect-best <video>` | 只提取最佳商品帧，不搜图 |
+| `detect-and-search <video>` | 两阶段过滤后图搜（较慢） |
+| `detect <video>` | 抽帧并逐帧检测商品 |
+| `search <image>` | 用已有图片搜同款 |
+| `rerank` | 关键词对图搜结果交叉过滤 |
+| `session` | 获取当前认证会话 token |

-### 选项（`detect` / `detect-and-search`）
+### 选项（`detect-best` / `detect-best-and-search`）

 | 参数 | 默认值 | 说明 |
 |------|--------|------|
-| `--interval=<秒>` | `1` | 抽帧间隔（秒） |
-| `--max-frames=<数量>` | `60` | 最多分析帧数 |
-| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
-| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
-| `--dry-run` | — | 解析参数并打印配置，不实际执行 |
-
-### 示例
-
-```bash
-# 检测商品，每 3 秒抽一帧
-bun dist/run.js detect ./demo.mp4 --interval=3
-
-# 完整流程 + 更高置信度门槛
-bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
-
-# 用已有截图搜同款
-bun dist/run.js search ./snapshot.jpg
-```
+| `--interval=<秒>` | `0.5` | 帧采样间隔 |
+| `--max-frames=<n>` | `60` | 最大分析帧数 |
+| `--output-dir=<目录>` | 视频同目录 | 截图保存目录 |
+| `--session-id=<id>` | 自动生成 | Langfuse session ID |
+| `--dry-run` | — | 解析参数，不实际执行 |

 ## 输出

-所有命令输出 JSON 到 stdout。
+所有命令输出 JSON 到 stdout，包含 `sessionId` 字段用于 Langfuse 追踪。

 ```json
 {
+  "sessionId": "skill-20260426-184345-lb06",
+  "status": "success",
+  "command": "detect-best-and-search",
  "bestSnapshot": {
-    "frameIndex": 4,
-    "timestampSeconds": 9,
-    "imagePath": "/path/to/frame_0004.jpg",
-    "confidence": 0.92,
-    "description": "White sneaker with blue logo, left side view",
-    "boundingHint": "centered"
+    "frameIndex": 7,
+    "timestampSeconds": 3,
+    "imagePath": "/path/to/frame_0007.jpg",
+    "croppedImagePath": "/path/to/frame_0007_cropped.jpg",
+    "description": "黑色金属床底鞋架 可折叠移动"
  },
-  "productFrames": [...],
-  "searchBody": { ... }
+  "rerank": {
+    "keyword": "床底鞋架",
+    "results": [
+      { "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
+    ]
+  }
 }
 ```

- `productFrames` — 所有检测到的画面，按置信度排序（最高在前）
- `bestSnapshot` — 排名第一的画面
- `searchBody` — 图片搜索 API 的返回（仅 `search` / `detect-and-search`）
+## 鉴权架构
+
+```
+~/.openclaw/.env
+  CLIENT_KEY ──→ auth-rt ──→ 业务系统
+                              ├── /session          → access_token
+                              └── /client-config    → provider.api_key
+                                                       provider.base_url
+                                                       provider.model
+```
+
+仅需配置 `CLIENT_KEY`，LLM 凭据和端点均由业务系统下发。

 ## 环境变量

-唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`：
-
-```
-CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
-```
-
-所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取，无需额外配置。
-
-### 可选覆盖
-
 | 变量 | 说明 |
 |------|------|
-| `VISION_MODEL` | 覆盖模型名称（默认：`aliyun-cp-multimodal`） |
+| `CLIENT_KEY` | **必需。** 在 `~/.openclaw/.env` 中配置 |
+| `VISION_MODEL` | 覆盖模型名称（默认来自 client config） |
+| `SKILL_SESSION_ID` | Langfuse session ID（自动生成，格式 `skill-YYYYMMDD-HHMMSS-xxxx`） |
 | `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
-| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |
+| `TELEMETRY_ENDPOINT` | 遥测上报接口 |

 ## 前置依赖

 - [Bun](https://bun.sh) 运行时
- 系统 PATH 中包含 `ffmpeg` 和 `ffprobe`
- 系统 PATH 中包含 `auth-rt` CLI（`search` / `detect-and-search` 需要）
+- 系统 PATH 中包含 `ffmpeg` / `ffprobe`（帧提取）
+- `auth-rt` CLI（鉴权/API 调用，`install.sh` 自动安装）
--- a/SKILL.md
+++ b/SKILL.md
@ -1,11 +1,11 @@
 ---
 name: video-product-snapshot
-description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it. / 检测视频中的商品，提取最佳商品截图，并通过图片搜索在1688找同款。当用户提供视频想找商品时使用。"
+description: "Extract product snapshot from video and search 1688 by image. / 从视频中提取最佳商品帧，以图搜图在1688找同款。当用户提供视频想找商品时使用。"
 ---

-# Video Product Snapshot — 视频商品截图
+# Video Product Snapshot — 视频商品以图搜图

-从视频中提取最佳商品画面，通过 Claude Vision 检测并截取，然后在 1688 上以图搜图 + 关键词重排序找到同款商品。
+从视频中截取最清晰的商品帧（容器类产品自动选空载帧），上传图片在 1688 以图搜图找同款。

 ## 运行

@ -17,55 +17,61 @@ bun dist/run.js <command> [args] [--dry-run]

 | 命令 | 使用场景 |
 |------|---------|
-| `detect-best-and-search <video>` | **视频输入的默认命令。** 始终找出最佳画面（不管置信度高低），然后搜图。 |
-| `detect-best <video>` | 只提取最佳画面，不搜图。 |
-| `search <image-path>` | 已经有商品截图了，跳过检测直接搜图。 |
-| `detect-and-search <video>` | 旧版。过滤可能太严格导致无结果。建议用 `detect-best-and-search`。 |
+| `detect-best-and-search <video>` | **推荐。** 提取最佳商品帧 → 图搜 → rerank 返回结果。 |
+| `detect-best <video>` | 只提取最佳商品帧，不搜图。 |
+| `detect-and-search <video>` | 两阶段过滤后图搜（比 detect-best 慢）。 |
+| `search <image-path>` | 已有商品图，直接图搜。 |
+| `rerank` | 用关键词对图搜结果交叉过滤。 |
 | `session` | 获取当前认证会话 token。 |

-## `detect-best` / `detect-best-and-search` 选项
+## 主命令：`detect-best-and-search`

-| 参数 | 默认值 | 说明 |
-|------|--------|------|
-| `--interval=<秒>` | `0.5` | 抽帧间隔（秒） |
-| `--max-frames=<数量>` | `60` | 最多抽帧数 |
-| `--output-dir=<目录>` | 视频同目录 | 帧图片保存目录 |
+流程：
+1. ffmpeg 按 0.5s 间隔提取帧（最多 60 帧）
+2. 视觉模型检测是否为容器/架子类产品
+3. 容器类：只从前 40% 帧（空载阶段）中选最佳帧
+4. 非容器类：全帧中选最清晰帧
+5. 裁剪商品区域
+6. 上传裁剪图 → 1688 图搜
+7. rerank：图搜结果与关键词搜索结果交叉过滤

-## 画面选择原理
+## Options for `detect-best` / `detect-best-and-search`

-两轮 Vision 流水线：
-
-1. **过滤轮**（仅 `detect` / `detect-and-search`）—— 每帧二分类：保留/丢弃。可能过于严格返回空。
-2. **排名轮** —— 所有候选帧一起发给模型，从中选出最清晰、最完整、最突出的一张商品图。
-
-`detect-best` 跳过第一轮，所有帧直接进排名轮。超过 20 帧时会均匀采样到 20 帧再调用。**只要视频能出帧，就一定返回结果。**
+| Flag | Default | Description |
+|------|---------|-------------|
+| `--interval=<sec>` | `0.5` | 帧采样间隔（秒） |
+| `--max-frames=<n>` | `60` | 最大分析帧数 |
+| `--output-dir=<dir>` | 视频同目录 | 截图保存目录 |

 ## 输出格式

+### `detect-best-and-search`
+
 ```json
 {
  "bestSnapshot": {
-    "frameIndex": 4,
-    "timestampSeconds": 2,
-    "imagePath": "/path/to/frame_0004.jpg",
-    "croppedImagePath": "/path/to/frame_0004_cropped.jpg",
-    "confidence": 0.95,
-    "description": "White sneaker with blue logo, left side view",
-    "boundingHint": "Product fully visible, centered, no hands"
+    "frameIndex": 7,
+    "timestampSeconds": 3,
+    "imagePath": "/path/to/frame_0007.jpg",
+    "croppedImagePath": "/path/to/frame_0007_cropped.jpg",
+    "description": "黑色金属床底鞋架 可折叠移动"
  },
  "rerank": {
-    "results": [...]
+    "keyword": "床底鞋架",
+    "results": [
+      { "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
+    ]
  }
 }
 ```

 ## 结果展示格式

-CLI 执行完成后，将 `rerank.results` 格式化为 markdown 表格，**每页 5 行**（如不足 5 行则全显示）。
+将 `rerank.results`（优先）或 `searchBody.data.items.item` 格式化为 markdown 表格，**最多 5 条**：

 | # | 商品名称 | 价格 | 销量 | 链接 |
 |---|----------|------|------|------|
-| 1 | {title}  | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看](https://detail.1688.com/offer/{num_iid}.html) |
+| 1 | {title}  | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看]({detail_url}) |

 - 有 `promotion_price` 用促销价，否则用原价
 - `sales` 缺失或为零时显示 `—`
@ -73,29 +79,16 @@ CLI 执行完成后，将 `rerank.results` 格式化为 markdown 表格，**每

 ## 执行规则

-### 视频命令（慢 — 用 sub-agent 执行）
+1. **不要重试。** 命令失败就直接报错。
+2. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。
+3. **慢命令用 sub-agent**（detect-best-and-search / detect-and-search），直接运行会超时。

-涉及命令：`detect-best-and-search`、`detect-best`、`detect-and-search`
-
-使用 `sessions_spawn` 创建 sub-agent 执行，**不要直接运行**。
+### sub-agent 执行模板

 ```
 sessions_spawn(
-  task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <完整命令>\n\nCopy the entire JSON output as your reply.",
+  task: "cd /path/to/skill && Run this command and return the raw JSON output:\n\nbun dist/run.js detect-best-and-search <video-path>\n\nCopy the entire JSON output as your reply.",
  label: "video-product-snapshot",
  runTimeoutSeconds: 300,
 )
 ```
-
- 通知用户处理已开始，告知 `runId`
- 等待 sub-agent 返回结果，然后解析并展示
-
-### `search` 和 `session`（快 — 直接运行）
-
-直接在本会话中运行，不需要 sub-agent。
-
-### 通用规则
-
-1. **视频输入 → 始终用 `detect-best-and-search`。** 不要用 `detect-and-search`。
-2. **不要重试。** 命令失败就直接报错。
-3. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。
--- a/scripts/run.ts
+++ b/scripts/run.ts
@ -43,6 +43,18 @@ function printUsage(): void {
  detect-and-search <video-path> [options]
      检测最佳商品画面 → 图片搜索 → 关键词重排序

+  detect-best <video-path> [options]
+      从视频抽帧并选择最佳商品画面（更快更稳定）
+
+  detect-best-and-search <video-path> [options]
+      最佳画面 → 图片搜索 → 关键词重排序
+
+  detect-video <video-path>
+      识别商品描述和搜索关键词（当前实现：从视频抽帧选最佳帧）
+
+  detect-video-and-search <video-path>
+      识别商品 → 图片搜索 → 1688 关键词重排序（当前实现：从视频抽帧选最佳帧）
+
  rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>]
      通过关键词交并集过滤搜索结果

@ -69,6 +81,8 @@ async function main(): Promise<void> {
      dryRun = true;
    } else if (arg.startsWith('--api-base=')) {
      process.env.API_BASE = arg.slice('--api-base='.length).trim();
+    } else if (arg.startsWith('--session-id=')) {
+      process.env.SKILL_SESSION_ID = arg.slice('--session-id='.length).trim();
    } else if (arg === '-h' || arg === '--help') {
      printUsage(); process.exit(0);
    } else {
@ -79,6 +93,7 @@ async function main(): Promise<void> {
  if (positionals.length < 1) { printUsage(); process.exit(1); }

  const command = positionals[0] as Command;
+  const sessionId = process.env.SKILL_SESSION_ID!; // set by auth-cli.ts at module load
  const startMs = Date.now();
  let result: Awaited<ReturnType<typeof run>>;

@ -86,13 +101,14 @@ async function main(): Promise<void> {
    result = await run(command, positionals.slice(1), dryRun);
  } catch (err) {
    const error = err instanceof Error ? err.message : String(err);
-    console.log(JSON.stringify({ status: 'failed', command, dryRun, error }, null, 2));
-    if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, status: 'failed', durationMs: Date.now() - startMs, error });
+    console.log(JSON.stringify({ status: 'failed', command, dryRun, sessionId, error }, null, 2));
+    if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, sessionId, status: 'failed', durationMs: Date.now() - startMs, error });
    process.exit(1);
  }

-  console.log(JSON.stringify(result, null, 2));
-  if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, status: result.status, durationMs: Date.now() - startMs, error: (result as any).error });
+  const output = { ...result, sessionId } as Record<string, unknown>;
+  console.log(JSON.stringify(output, null, 2));
+  if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, sessionId, status: result.status, durationMs: Date.now() - startMs, error: (result as any).error });
 }

 main().catch((err) => {
--- a/src/auth-cli.ts
+++ b/src/auth-cli.ts
@ -20,6 +20,18 @@ import * as path from 'path';
 import * as os from 'os';

 const home = process.env.HOME || os.homedir();
+
+// ── session ID (Langfuse tracing) ──
+// Priority: SKILL_SESSION_ID env > auto-generate
+const SESSION_ID = process.env.SKILL_SESSION_ID || (() => {
+  const ts = new Date();
+  const pad = (n: number) => String(n).padStart(2, '0');
+  const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
+  const rand = Math.random().toString(36).slice(2, 6);
+  return `skill-${tsPart}-${rand}`;
+})();
+process.env.SKILL_SESSION_ID = SESSION_ID;
+
 const AUTH_RT_BIN = process.env.AUTH_RT_BIN
  || (() => {
    // Check if auth-rt is in PATH
--- a/src/index.ts
+++ b/src/index.ts
@ -1,10 +1,10 @@
 import * as fs from 'fs';
 import * as path from 'path';
-import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem } from './types.ts';
+import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem, DetectVideoResult, DetectVideoAndSearchResult } from './types.ts';
 import { createSkillClient } from './auth-cli.ts';
 import { extractFrames } from './frame-extractor.ts';
 import { detectProductFrames, detectBestFrame } from './product-detector.ts';
-import { imageToBase64 } from './frame-extractor.ts';
+import { postFilterByImage } from './post-filter.ts';
 import { generateText } from 'ai';
 import { createOpenAI } from '@ai-sdk/openai';

@ -12,6 +12,7 @@ export interface VisionConfig {
  apiKey: string;
  baseURL?: string;
  model: string;
+  sessionId?: string;
 }

 async function loadVisionConfig(client: ReturnType<typeof createSkillClient>): Promise<VisionConfig> {
@ -22,6 +23,7 @@ async function loadVisionConfig(client: ReturnType<typeof createSkillClient>): P
    apiKey,
    baseURL: cfg.metadata?.provider?.base_url,
    model: process.env.VISION_MODEL ?? cfg.metadata?.provider?.model ?? 'aliyun-cp-multimodal',
+    sessionId: process.env.SKILL_SESSION_ID || `skill_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
  };
 }

@ -43,6 +45,10 @@ export async function run(
      return runDetectBest(args, dryRun);
    case 'detect-best-and-search':
      return runDetectBestAndSearch(args, dryRun);
+    case 'detect-video':
+      return runDetectVideo(args, dryRun);
+    case 'detect-video-and-search':
+      return runDetectVideoAndSearch(args, dryRun);
    case 'rerank':
      return runRerank(args, dryRun);
    default:
@ -153,7 +159,7 @@ async function runDetectBest(args: string[], dryRun: boolean): Promise<DetectRes
    return { status: 'failed', command: 'detect-best', dryRun, videoPath, error: 'no frames extracted from video' };
  }

-  const best = await detectBestFrame(frames, 10, visionConfig);
+  const best = await detectBestFrame(frames, visionConfig, 20);

  return {
    status: 'success',
@ -179,15 +185,47 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
  const imageForSearch = best.croppedImagePath || best.imagePath;
  const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;

-  let rerankResult: any = undefined;
+  // Post-filter: drop results whose pic_url isn't the same product type as our snapshot
+  let postFilter: any = undefined;
  if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
+    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
+    if (items.length > 0) {
+      try {
+        const client = createSkillClient();
+        const visionConfig = await loadVisionConfig(client);
+        const result = await postFilterByImage(imageForSearch, items, visionConfig, { description: best.description });
+        (searchResult.searchBody as any).data.items.item = result.kept;
+        postFilter = {
+          totalChecked: result.totalChecked,
+          keptCount: result.kept.length,
+          rejectedCount: result.rejected.length,
+          failed: result.failed,
+        };
+      } catch (e: any) {
+        postFilter = { error: e.message };
+      }
+    }
+  }
+
+  let rerankResult: any = undefined;
+  // If post-filter produced focused results, sort them directly by sales — they're already the best matches.
+  // Otherwise fall back to the keyword-intersection rerank.
+  if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
+    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
+    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
+    rerankResult = {
+      source: 'post-filter',
+      results: sorted,
+      count: sorted.length,
+    };
+  } else if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
    const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
    try {
      fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
      rerankResult = await runRerank([
        `--image-results=${tmpFile}`,
        `--description=${best.description}`,
-        '--top=10',
+        '--top=5',
      ], dryRun);
    } catch (e: any) {
      rerankResult = { error: e.message };
@ -202,10 +240,87 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
    searchHttpStatus: searchResult.searchHttpStatus,
    searchBody: searchResult.searchBody,
    searchError: searchResult.error,
+    postFilter,
    rerank: rerankResult,
  } as any;
 }

+async function runDetectVideo(args: string[], dryRun: boolean): Promise<DetectVideoResult> {
+  const videoPath = args[0];
+  if (!videoPath) return { status: 'failed', command: 'detect-video', dryRun, error: 'detect-video requires <video-path>' };
+  if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video', dryRun, error: `video not found: ${videoPath}` };
+
+  const detectResult = await runDetectBest(args, dryRun) as DetectResult;
+  if (detectResult.status === 'failed') {
+    return { status: 'failed', command: 'detect-video', dryRun, videoPath, error: detectResult.error || 'failed to detect best frame' };
+  }
+  const description = detectResult.bestSnapshot?.description?.trim();
+  const snapshotImagePath = detectResult.bestSnapshot?.croppedImagePath || detectResult.bestSnapshot?.imagePath;
+  if (!description) {
+    return { status: 'failed', command: 'detect-video', dryRun, videoPath, error: 'no product description detected from video' };
+  }
+
+  if (dryRun) {
+    return { status: 'success', command: 'detect-video', dryRun, videoPath, videoUrl: null, description, keyword: '<dry-run-keyword>', snapshotImagePath };
+  }
+
+  const client = createSkillClient();
+  const visionConfig = await loadVisionConfig(client);
+  const keyword = await generateChineseKeyword(description, visionConfig);
+
+  return { status: 'success', command: 'detect-video', dryRun, videoPath, videoUrl: null, description, keyword, snapshotImagePath };
+}
+
+async function runDetectVideoAndSearch(args: string[], dryRun: boolean): Promise<DetectVideoAndSearchResult> {
+  const videoPath = args[0];
+  if (!videoPath) return { status: 'failed', command: 'detect-video-and-search', dryRun, error: 'detect-video-and-search requires <video-path>' };
+  if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video-and-search', dryRun, error: `video not found: ${videoPath}` };
+
+  if (dryRun) {
+    return { status: 'success', command: 'detect-video-and-search', dryRun, videoPath, videoUrl: null, description: '<dry-run>', keyword: '<dry-run>', searchResults: [] };
+  }
+
+  // Reuse existing pipeline: best snapshot → image search → keyword rerank
+  const detectAndSearch = await runDetectBestAndSearch(args, dryRun) as any;
+  if (detectAndSearch.status === 'failed') {
+    return { status: 'failed', command: 'detect-video-and-search', dryRun, videoPath, error: detectAndSearch.error || 'detect-best-and-search failed' };
+  }
+
+  const description = String(detectAndSearch.bestSnapshot?.description || '').trim();
+  const rerank = detectAndSearch.rerank;
+  const keyword = String(rerank?.keyword || '').trim();
+  const searchResults = (rerank?.results || []) as SearchItem[];
+
+  // Fallback: if rerank didn't produce anything, do keyword search directly.
+  if (!searchResults.length) {
+    const client = createSkillClient();
+    const visionConfig = await loadVisionConfig(client);
+    const fallbackKeyword = keyword || (description ? await generateChineseKeyword(description, visionConfig) : '');
+    const items = fallbackKeyword ? await keywordSearch(client, fallbackKeyword, 1) : [];
+    return {
+      status: 'success',
+      command: 'detect-video-and-search',
+      dryRun,
+      videoPath,
+      videoUrl: null,
+      description,
+      keyword: fallbackKeyword,
+      searchResults: items,
+    };
+  }
+
+  return {
+    status: 'success',
+    command: 'detect-video-and-search',
+    dryRun,
+    videoPath,
+    videoUrl: null,
+    description,
+    keyword,
+    searchResults,
+  };
+}
+
 async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
  const detectResult = await runDetect(args, dryRun) as DetectResult;
  if (detectResult.status === 'failed') return detectResult;
@ -218,15 +333,47 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
  const imageForSearch = best.croppedImagePath || best.imagePath;
  const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;

-  let rerankResult: any = undefined;
+  // Post-filter: drop results whose pic_url isn't the same product type as our snapshot
+  let postFilter: any = undefined;
  if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
+    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
+    if (items.length > 0) {
+      try {
+        const client = createSkillClient();
+        const visionConfig = await loadVisionConfig(client);
+        const result = await postFilterByImage(imageForSearch, items, visionConfig, { description: best.description });
+        (searchResult.searchBody as any).data.items.item = result.kept;
+        postFilter = {
+          totalChecked: result.totalChecked,
+          keptCount: result.kept.length,
+          rejectedCount: result.rejected.length,
+          failed: result.failed,
+        };
+      } catch (e: any) {
+        postFilter = { error: e.message };
+      }
+    }
+  }
+
+  let rerankResult: any = undefined;
+  // If post-filter produced focused results, sort them directly by sales — they're already the best matches.
+  // Otherwise fall back to the keyword-intersection rerank.
+  if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
+    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
+    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
+    rerankResult = {
+      source: 'post-filter',
+      results: sorted,
+      count: sorted.length,
+    };
+  } else if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
    const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
    try {
      fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
      rerankResult = await runRerank([
        `--image-results=${tmpFile}`,
        `--description=${best.description}`,
-        '--top=10',
+        '--top=5',
      ], dryRun);
    } catch (e: any) {
      rerankResult = { error: e.message };
@ -241,6 +388,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
    searchHttpStatus: searchResult.searchHttpStatus,
    searchBody: searchResult.searchBody,
    searchError: searchResult.error,
+    postFilter,
    rerank: rerankResult,
  } as any;
 }
@ -269,7 +417,25 @@ function getFlag(args: string[], flag: string): string | undefined {
 }

 function createVisionModel(config: VisionConfig) {
-  const openai = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL });
+  const sessionId = config.sessionId || '';
+  const originFetch = globalThis.fetch;
+  // Inject metadata.session_id into request body so LiteLLM → Langfuse creates sessions
+  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
+    if (init?.body && typeof init.body === 'string') {
+      try {
+        const body = JSON.parse(init.body);
+        if (!body.metadata) body.metadata = {};
+        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
+        body.metadata.tags = ['skill:video-product-snapshot'];
+        init = { ...init, body: JSON.stringify(body) };
+      } catch {}
+    }
+    return originFetch(input, init);
+  };
+  const openai = createOpenAI({
+    apiKey: config.apiKey, baseURL: config.baseURL,
+    fetch: wrapped as typeof globalThis.fetch,
+  });
  return openai(config.model);
 }

@ -324,9 +490,10 @@ function extractKeywordsFromTitles(items: SearchItem[], topN = 5): string {

 async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult> {
  // --image-results=<path> --keyword=<text> --top=<n>
-  const imageResultsArg = getFlag(args, '--image-results') || args[0];
-  const keywordArg = getFlag(args, '--keyword') || args[1];
-  const topN = parseInt(getFlag(args, '--top') || '10', 10);
+  const positionals = args.filter((a) => !a.startsWith('--'));
+  const imageResultsArg = getFlag(args, '--image-results') || positionals[0];
+  const keywordArg = getFlag(args, '--keyword') || positionals[1];
+  const topN = parseInt(getFlag(args, '--top') || '5', 10);

  const description = getFlag(args, '--description') || '';

@ -401,7 +568,3 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
    results: sorted,
  } as any;
 }
-
-function parseJsonSafe(text: string): unknown {
-  try { return JSON.parse(text); } catch { return text; }
-}
--- a/src/post-filter.ts
+++ b/src/post-filter.ts
@ -0,0 +1,123 @@
+import { generateText } from 'ai';
+import { createOpenAI } from '@ai-sdk/openai';
+import type { SearchItem } from './types.ts';
+import type { VisionConfig } from './index.ts';
+import { imageToBase64 } from './frame-extractor.ts';
+
+export interface PostFilterResult {
+  kept: SearchItem[];
+  rejected: SearchItem[];
+  totalChecked: number;
+  failed: boolean;
+}
+
+const FILTER_PROMPT = (count: number, description?: string) => {
+  const productLine = description
+    ? `查询商品是：${description}`
+    : '第1张图是查询商品。';
+  return `${productLine}
+后面的 ${count} 张图是搜索结果。
+
+任务：判断每张候选图是否与查询商品是**完全相同的具体产品类型**。
+- 必须是同一个具体产品（例如：查询是"鞋架"，候选必须也是鞋架；不是其他类型的架子如纸巾架、首饰架、收纳盒）
+- 颜色、材质、款式、尺寸不同但同一具体类型 → 算同类
+- 用途不同就不算同类（例如：查询是鞋架 vs 候选是纸巾架 → 不算；查询是鞋架 vs 候选是床下收纳箱 → 不算，除非明确是鞋类收纳）
+- 关键判断：候选商品的主要用途是否与查询商品一致
+
+按候选图顺序输出每一张的判断，每行一个，格式严格遵守：
+1: YES
+2: NO
+3: YES
+...
+
+只输出 ${count} 行结果，不要解释，不要前后空行。`;
+};
+
+function createModel(config: VisionConfig) {
+  const sessionId = config.sessionId || '';
+  const originFetch = globalThis.fetch;
+  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
+    if (init?.body && typeof init.body === 'string') {
+      try {
+        const body = JSON.parse(init.body);
+        if (!body.metadata) body.metadata = {};
+        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
+        body.metadata.tags = ['skill:video-product-snapshot'];
+        init = { ...init, body: JSON.stringify(body) };
+      } catch {}
+    }
+    return originFetch(input, init);
+  };
+  const provider = createOpenAI({
+    apiKey: config.apiKey, baseURL: config.baseURL,
+    fetch: wrapped as typeof globalThis.fetch,
+  });
+  return provider(config.model);
+}
+
+async function classifyBatch(
+  model: ReturnType<ReturnType<typeof createOpenAI>>,
+  queryImageDataUrl: string,
+  batch: SearchItem[],
+  description?: string,
+): Promise<boolean[]> {
+  const content: any[] = [{ type: 'image', image: queryImageDataUrl }];
+  for (const item of batch) {
+    content.push({ type: 'image', image: item.pic_url });
+  }
+  content.push({ type: 'text', text: FILTER_PROMPT(batch.length, description) });
+
+  const { text } = await generateText({
+    model,
+    messages: [{ role: 'user', content }],
+    maxTokens: 200,
+  });
+
+  const flags = batch.map(() => false);
+  for (const line of text.split('\n')) {
+    const m = line.match(/^\s*(\d+)\s*[:：]\s*(YES|NO|是|否)/i);
+    if (!m) continue;
+    const idx = parseInt(m[1], 10) - 1;
+    const yes = /YES|是/i.test(m[2]);
+    if (idx >= 0 && idx < flags.length) flags[idx] = yes;
+  }
+  return flags;
+}
+
+export async function postFilterByImage(
+  queryImagePath: string,
+  items: SearchItem[],
+  visionConfig: VisionConfig,
+  options: { description?: string; batchSize?: number } = {},
+): Promise<PostFilterResult> {
+  if (items.length === 0) {
+    return { kept: [], rejected: [], totalChecked: 0, failed: false };
+  }
+
+  const batchSize = options.batchSize ?? 10;
+  const description = options.description;
+
+  const model = createModel(visionConfig);
+  const queryDataUrl = `data:image/jpeg;base64,${imageToBase64(queryImagePath)}`;
+
+  const kept: SearchItem[] = [];
+  const rejected: SearchItem[] = [];
+  let anyFailed = false;
+
+  for (let i = 0; i < items.length; i += batchSize) {
+    const batch = items.slice(i, i + batchSize);
+    try {
+      const flags = await classifyBatch(model, queryDataUrl, batch, description);
+      batch.forEach((item, idx) => {
+        if (flags[idx]) kept.push(item);
+        else rejected.push(item);
+      });
+    } catch {
+      // On batch failure, keep items (don't lose them) but flag the run as partial
+      anyFailed = true;
+      kept.push(...batch);
+    }
+  }
+
+  return { kept, rejected, totalChecked: items.length, failed: anyFailed };
+}
--- a/src/product-detector.ts
+++ b/src/product-detector.ts
@ -1,4 +1,4 @@
-import { generateObject } from 'ai';
+import { generateObject, generateText } from 'ai';
 import { createOpenAI } from '@ai-sdk/openai';
 import { z } from 'zod';
 import type { ExtractedFrame } from './frame-extractor.ts';
@ -28,24 +28,74 @@ Discard (keep=false) if: only hands/texture/contents visible, motion blur, black

 reason options: product_visible | content_only | hands_only | blur | transition | background_only`;

-const RANKING_PROMPT = (count: number) => `You are selecting the single best product image from ${count} video frames for ecommerce image search.
+const CONTAINER_CHECK_PROMPT = `Is the main product in this image a CONTAINER, RACK, or HOLDER (something designed to store/hold other items)?
+Examples YES: shoe rack, shelf, storage box, organizer, basket, drawer, wardrobe, trolley, bin, tray, cabinet.
+Examples NO: shoes, clothing, electronics, food, toys, cosmetics, tools.
+Reply with only one word: YES or NO.`;

-The frames are numbered 0 to ${count - 1} in the order shown.
+const RANKING_PROMPT_CONTAINER = (count: number) => `You are selecting ONE frame from ${count} video frames to use as the query image for an ecommerce reverse-image search.

-Pick the ONE frame where the HERO PRODUCT is:
-1. Cleanest — fewest distractions, no hands blocking it, no clutter in foreground
-2. Most complete — full product silhouette visible, no edges cropped
-3. Most isolated — product stands out from background clearly
-4. Empty/minimal load preferred — a product without contents (e.g. an empty rack) beats one stuffed with items if both show the full structure equally
+The hero product is a CONTAINER / RACK / HOLDER / ORGANIZER.
+
+CRITICAL CONSTRAINT — read this first:
+Image search engines identify objects by visual appearance. If the container holds items (shoes, clothes, etc.), the search engine will match those ITEMS, not the container — returning completely wrong products.
+
+YOUR ONLY JOB: find the frame where the container structure itself is most visible with the FEWEST or NO items inside.
+
+ABSOLUTE PRIORITY ORDER (do not deviate):
+1. Frame with container completely EMPTY — highest priority regardless of angle or assembly state
+2. Frame with container partially assembled or partially visible but EMPTY — still better than any loaded frame
+3. Frame with fewest items inside (1-2 items, mostly empty)
+4. Frame with moderate load — only if no emptier option exists
+5. Frame fully loaded — last resort only if no other frames exist
+
+A frame showing the rack mid-assembly with zero items is ALWAYS better than a perfectly-lit fully-assembled rack filled with shoes.
+
+Frames are numbered 0 to ${count - 1} in order shown. You MUST pick ONE.

 Return:
- bestFrameIndex: 0-based index of chosen frame
- description: concise search query under 12 words (product type + material + color + key feature)
- reasoning: one sentence explaining why this frame was chosen
- boundingBox: tight bounding box of the HERO PRODUCT ONLY in the chosen frame as [x1, y1, x2, y2] normalized 0.0–1.0 (top-left origin). Exclude hands, background, and unrelated objects. The product is assumed to be near the center.`;
+- bestFrameIndex: 0-based index of the emptiest container frame
+- description: concise Chinese search query ≤12 words (container type + material + color + key feature)
+- reasoning: describe how many items are visible inside the chosen frame and why it's the emptiest option
+- boundingBox: tight box of the PRODUCT STRUCTURE ONLY as [x1, y1, x2, y2] normalized 0.0–1.0. Exclude any items stored inside.`;
+
+const RANKING_PROMPT_GENERAL = (count: number) => `You are selecting the single best product frame from ${count} video frames for ecommerce search.
+
+Frames are numbered 0 to ${count - 1} in order shown.
+
+IMPORTANT: You MUST pick ONE frame — even if product visibility is imperfect or no frame looks ideal. Always make your best guess.
+
+Pick the frame where the MAIN SELLING PRODUCT is:
+1. Most recognizable — clearest view of the item being sold
+2. Most complete — full product silhouette visible, not cropped at edges
+3. Cleanest — minimal obstruction (hands, clutter, motion blur, labels)
+4. Best lit and in focus
+
+Return:
+- bestFrameIndex: 0-based index
+- description: concise search query under 12 words (product type + material + color + key features), in Chinese
+- reasoning: one sentence explaining choice
+- boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.0–1.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;

 function createVisionModel(config: VisionConfig) {
-  const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL });
+  const sessionId = config.sessionId || '';
+  const originFetch = globalThis.fetch;
+  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
+    if (init?.body && typeof init.body === 'string') {
+      try {
+        const body = JSON.parse(init.body);
+        if (!body.metadata) body.metadata = {};
+        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
+        body.metadata.tags = ['skill:video-product-snapshot'];
+        init = { ...init, body: JSON.stringify(body) };
+      } catch {}
+    }
+    return originFetch(input, init);
+  };
+  const provider = createOpenAI({
+    apiKey: config.apiKey, baseURL: config.baseURL,
+    fetch: wrapped as typeof globalThis.fetch,
+  });
  return provider(config.model);
 }

@ -70,15 +120,52 @@ async function filterFrame(
  return object.keep;
 }

+
+async function isContainerProduct(
+  firstFrame: ExtractedFrame,
+  model: ReturnType<ReturnType<typeof createOpenAI>>,
+): Promise<boolean> {
+  try {
+    const { text } = await generateText({
+      model,
+      messages: [{
+        role: 'user',
+        content: [
+          { type: 'image', image: `data:image/jpeg;base64,${imageToBase64(firstFrame.imagePath)}` },
+          { type: 'text', text: CONTAINER_CHECK_PROMPT },
+        ],
+      }],
+      maxTokens: 5,
+    });
+    return text.trim().toUpperCase().startsWith('Y');
+  } catch {
+    return false;
+  }
+}
+
+
+function takeEarliestFrames(candidates: ExtractedFrame[], fraction: number = 0.4): ExtractedFrame[] {
+  // Ecommerce videos show the container empty/unboxing early, then full.
+  // Taking the first 40% of frames reliably captures empty states.
+  const sorted = [...candidates].sort((a, b) => a.frameIndex - b.frameIndex);
+  const cutoff = Math.max(1, Math.ceil(sorted.length * fraction));
+  return sorted.slice(0, cutoff);
+}
+
 async function rankCandidates(
  candidates: ExtractedFrame[],
  model: ReturnType<ReturnType<typeof createOpenAI>>,
+  isContainer: boolean,
 ): Promise<{ bestFrame: ExtractedFrame; description: string; reasoning: string; boundingBox: [number, number, number, number] }> {
  const imageContent = candidates.map((f) => ({
    type: 'image' as const,
    image: `data:image/jpeg;base64,${imageToBase64(f.imagePath)}`,
  }));

+  const prompt = isContainer
+    ? RANKING_PROMPT_CONTAINER(candidates.length)
+    : RANKING_PROMPT_GENERAL(candidates.length);
+
  const { object } = await generateObject({
    model,
    schema: RankingSchema,
@ -87,7 +174,7 @@ async function rankCandidates(
      role: 'user',
      content: [
        ...imageContent,
-        { type: 'text', text: RANKING_PROMPT(candidates.length) },
+        { type: 'text', text: prompt },
      ],
    }],
  });
@ -114,7 +201,17 @@ export async function cropProduct(

  let [x1, y1, x2, y2] = boundingBox;

-  // add padding
+  // Normalize coords: ensure x1<x2 and y1<y2
+  if (x1 > x2) [x1, x2] = [x2, x1];
+  if (y1 > y2) [y1, y2] = [y2, y1];
+
+  // Clamp to [0, 1]
+  x1 = Math.max(0, Math.min(1, x1));
+  y1 = Math.max(0, Math.min(1, y1));
+  x2 = Math.max(0, Math.min(1, x2));
+  y2 = Math.max(0, Math.min(1, y2));
+
+  // Add padding
  const pw = (x2 - x1) * paddingFactor;
  const ph = (y2 - y1) * paddingFactor;
  x1 = Math.max(0, x1 - pw);
@ -122,6 +219,11 @@ export async function cropProduct(
  x2 = Math.min(1, x2 + pw);
  y2 = Math.min(1, y2 + ph);

+  // Validate minimum area
+  if (x2 - x1 < 0.005 || y2 - y1 < 0.005) {
+    throw new Error('bounding box too small after normalization');
+  }
+
  const left = Math.round(x1 * W);
  const top = Math.round(y1 * H);
  const width = Math.round((x2 - x1) * W);
@ -151,38 +253,140 @@ async function withConcurrency<T>(
  return results;
 }

+// ── Frame quality pre-filtering ──────────────────────────────────────
+
+interface FrameQuality {
+  valid: boolean;
+  meanBrightness: number;
+  variance: number;
+}
+
+async function assessFrameQuality(imagePath: string): Promise<FrameQuality> {
+  const sharp = (await import('sharp')).default;
+  const { data, info } = await sharp(imagePath)
+    .grayscale()
+    .raw()
+    .toBuffer({ resolveWithObject: true });
+
+  const pixels = new Uint8Array(data);
+  let sum = 0;
+  let sumSq = 0;
+  for (let i = 0; i < pixels.length; i++) {
+    sum += pixels[i];
+    sumSq += pixels[i] * pixels[i];
+  }
+  const mean = sum / pixels.length;
+  const variance = sumSq / pixels.length - mean * mean;
+
+  // Skip near-black, near-white, or very low variance (blurry/blank/transition)
+  const valid = mean > 15 && mean < 240 && variance > 50;
+  return { valid, meanBrightness: mean, variance };
+}
+
+async function filterQualityFrames(frames: ExtractedFrame[]): Promise<ExtractedFrame[]> {
+  const results = await Promise.all(
+    frames.map(async (frame) => {
+      try {
+        const q = await assessFrameQuality(frame.imagePath);
+        return { frame, valid: q.valid };
+      } catch {
+        return { frame, valid: true };
+      }
+    }),
+  );
+  const valid = results.filter(r => r.valid).map(r => r.frame);
+  return valid.length > 0 ? valid : frames;
+}
+
+function isValidBoundingBox(bbox: [number, number, number, number]): boolean {
+  const [x1, y1, x2, y2] = bbox;
+  return (
+    x1 >= 0 && x1 <= 1 &&
+    y1 >= 0 && y1 <= 1 &&
+    x2 >= 0 && x2 <= 1 &&
+    y2 >= 0 && y2 <= 1 &&
+    x1 < x2 &&
+    y1 < y2 &&
+    (x2 - x1) * (y2 - y1) > 0.005
+  );
+}
+
 // Skips Pass 1 filter entirely — ranks all frames and always returns the best one.
 // Evenly samples down to maxCandidates when there are too many frames.
 export async function detectBestFrame(
  frames: ExtractedFrame[],
-  concurrency: number = 10,
  visionConfig: VisionConfig,
  maxCandidates: number = 20,
 ): Promise<ProductFrame | null> {
  if (frames.length === 0) return null;

-  const model = createVisionModel(visionConfig);
+  // 1. Filter out obviously bad frames (black, white, blurry)
+  let candidates = await filterQualityFrames(frames);

-  let candidates = frames;
-  if (frames.length > maxCandidates) {
-    const step = frames.length / maxCandidates;
-    candidates = Array.from({ length: maxCandidates }, (_, i) => frames[Math.floor(i * step)]);
+  // 2. Sample if too many
+  if (candidates.length > maxCandidates) {
+    const step = candidates.length / maxCandidates;
+    candidates = Array.from({ length: maxCandidates }, (_, i) => candidates[Math.floor(i * step)]);
  }

-  const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
+  const model = createVisionModel(visionConfig);

+  // 3. Check if product is a container/rack type (use first candidate frame)
+  const container = await isContainerProduct(candidates[0], model);
+
+  // 4. For containers: restrict ranking to earliest frames (empty/unboxing phase)
+  if (container) {
+    const early = takeEarliestFrames(candidates);
+    if (early.length > 0) candidates = early;
+  }
+
+  // 5. Try Vision ranking with error isolation
+  try {
+    const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model, container);
+
+    if (isValidBoundingBox(boundingBox)) {
      const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
+      try {
        await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
-
+      } catch {
+        // cropping is optional — keep original frame
+      }
      return {
        frameIndex: bestFrame.frameIndex,
        timestampSeconds: bestFrame.timestampSeconds,
        imagePath: bestFrame.imagePath,
-    croppedImagePath: croppedPath,
+        ...(croppedPath ? { croppedImagePath: croppedPath } : {}),
        confidence: 0.95,
        description,
        boundingHint: reasoning,
      };
+    }
+  } catch {
+    // Vision ranking failed — fall through to fallback
+  }
+
+  // 4. Fallback: rank by frame quality (variance) and return the sharpest
+  const withQuality = await Promise.all(
+    candidates.map(async (f) => {
+      try {
+        const q = await assessFrameQuality(f.imagePath);
+        return { frame: f, score: q.variance };
+      } catch {
+        return { frame: f, score: 0 };
+      }
+    }),
+  );
+  withQuality.sort((a, b) => b.score - a.score);
+  const best = withQuality[0].frame;
+
+  return {
+    frameIndex: best.frameIndex,
+    timestampSeconds: best.timestampSeconds,
+    imagePath: best.imagePath,
+    confidence: 0.5,
+    description: 'product frame (auto-selected)',
+    boundingHint: 'picked by frame quality analysis (Vision ranking failed)',
+  };
 }

 export async function detectProductFrames(
@ -203,18 +407,33 @@ export async function detectProductFrames(
  if (candidates.length === 0) return [];

  // Pass 2: single comparative call — model sees all candidates at once
-  const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
+  const container = await isContainerProduct(candidates[0], model);
+  let bestSnapshot: ProductFrame | undefined;
+  try {
+    const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model, container);

+    if (isValidBoundingBox(boundingBox)) {
      const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
+      try {
        await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
-
-  return [{
+      } catch {}
+      bestSnapshot = {
        frameIndex: bestFrame.frameIndex,
        timestampSeconds: bestFrame.timestampSeconds,
        imagePath: bestFrame.imagePath,
-    croppedImagePath: croppedPath,
+        ...(croppedPath ? { croppedImagePath: croppedPath } : {}),
        confidence: 0.95,
        description,
        boundingHint: reasoning,
-  }];
+      };
+    }
+  } catch {
+    // ranking failed
+  }
+
+  if (!bestSnapshot) {
+    return [];
+  }
+
+  return [bestSnapshot];
 }
--- a/src/types.ts
+++ b/src/types.ts
@ -1,4 +1,13 @@
-export type Command = 'detect' | 'search' | 'detect-and-search' | 'detect-best' | 'detect-best-and-search' | 'rerank' | 'session';
+export type Command =
+  | 'detect'
+  | 'search'
+  | 'detect-and-search'
+  | 'detect-best'
+  | 'detect-best-and-search'
+  | 'detect-video'
+  | 'detect-video-and-search'
+  | 'rerank'
+  | 'session';

 export interface SearchItem {
  num_iid: number;
@ -11,6 +20,30 @@ export interface SearchItem {
  detail_url: string;
 }

+export interface DetectVideoResult {
+  status: 'success' | 'failed';
+  command: 'detect-video';
+  dryRun: boolean;
+  videoPath?: string;
+  videoUrl?: string | null;
+  description?: string;
+  keyword?: string;
+  snapshotImagePath?: string;
+  error?: string;
+}
+
+export interface DetectVideoAndSearchResult {
+  status: 'success' | 'failed';
+  command: 'detect-video-and-search';
+  dryRun: boolean;
+  videoPath?: string;
+  videoUrl?: string | null;
+  description?: string;
+  keyword?: string;
+  searchResults?: SearchItem[];
+  error?: string;
+}
+
 export interface DetectOptions {
  videoPath: string;
  intervalSeconds: number;
@ -51,4 +84,4 @@ export interface SearchResult {
  error?: string;
 }

-export type OutputResult = DetectResult | SearchResult;
+export type OutputResult = DetectResult | SearchResult | DetectVideoResult | DetectVideoAndSearchResult;
Author	SHA1	Message	Date
ywkj	c5e1d0c88c	fix: rerank top-N 10→5, 匹配 Feishu 列表展示 register-skill-release / register (push) Successful in 17s Details	2026-04-26 20:34:57 +08:00
ywkj	eb8e7a7daf	fix: 同步 auth-cli.ts 补充 clientConfig() 方法 register-skill-release / register (push) Successful in 19s Details	2026-04-26 20:15:08 +08:00
ywkj	b6dc7af9bf	chore: tweak README register-skill-release / register (push) Successful in 18s Details	2026-04-26 20:08:10 +08:00
ywkj	7a43eb391b	docs: 更新 README 反映当前架构 register-skill-release / register (push) Successful in 18s Details - 补充 detect-best-and-search、detect-best、rerank 命令 - 更新鉴权架构说明（auth-rt 统一鉴权） - 补充 sessionId 和 Langfuse 追踪说明 - 更新环境变量表 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 19:57:03 +08:00
ywkj	d497e92626	fix: 用 fetch wrapper 注入 metadata.session_id 替代 HTTP header register-skill-release / register (push) Successful in 24s Details LiteLLM 不处理 x-langfuse-session-id header。改用 fetch 拦截器在请求体 metadata 里注入 session_id，LiteLLM 直接透传给 Langfuse 创建 session。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 19:45:08 +08:00
ywkj	a6b4f99a83	refactor: session ID 统一由 auth-cli.ts 生成 register-skill-release / register (push) Successful in 21s Details - scripts/run.ts: 移除重复的 session ID 自动生成逻辑 - src/auth-cli.ts: 同步自 auth-runtime（模块级 SKILL_SESSION_ID） Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 19:01:10 +08:00
ywkj	a95f9045e5	feat: 结构化 session ID + 输出透传 register-skill-release / register (push) Successful in 17s Details - 格式: vps-YYYYMMDD-HHMMSS-xxxx (vps = video-product-snapshot) - 优先级: --session-id CLI > SKILL_SESSION_ID env > 自动生成 - sessionId 写入 stdout JSON，telemetry 同步上报 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 18:44:01 +08:00
ywkj	c7e14ca396	feat: 统一鉴权清理 + Langfuse session 追踪 - src/index.ts: VisionConfig 新增 sessionId，createVisionModel 注入 x-langfuse-session-id / x-langfuse-tags headers - src/product-detector.ts: createVisionModel 同步注入 session headers - src/post-filter.ts: createModel 同步注入 session headers - scripts/run.ts: 支持 --session-id CLI 参数，fallback 自动生成 - 删除 VISION_API_KEY / VISION_API_BASE / ONEBOUND_* 死代码（统一由 auth-rt client-config 下发） Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 18:35:55 +08:00
ywkj	6bc4e1d3b4	feat: image-only pipeline with LLM post-filter for category accuracy register-skill-release / register (push) Successful in 18s Details - Drop video-understanding flow (detect-video, video-analyzer.ts) — image search is the only path now since text/video keywords return broad results. - Add container-aware frame selection: detect rack/holder products, restrict ranking to the earliest 40% of frames so empty/unboxing shots win over loaded ones (image search was matching shoes-on-rack instead of the rack). - Switch container check from generateObject (silently fails on this model) to generateText with a YES/NO answer. - Add post-filter step: send the snapshot + each result's pic_url to the vision model in batches, drop results whose category doesn't match the detected product description. Cuts 50 raw hits to ~10 same-type matches. - When post-filter succeeds, sort by sales directly instead of running the keyword-intersection rerank, which was overriding good filtered results with broad keyword fallbacks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-26 15:01:42 +08:00
ywkj	e9e1f01728	docs: remove frame-extraction workflow from SKILL.md, keep video-direct approach only register-skill-release / register (push) Successful in 20s Details Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:53:50 +08:00
ywkj	db4735e54e	feat: add detect-video command using direct video upload + API analysis register-skill-release / register (push) Successful in 16s Details - New detect-video / detect-video-and-search commands: upload video to get public URL, analyze via LiteLLM (video_url), generate keyword, search 1688 - New src/video-analyzer.ts: upload via direct HTTP (bypasses auth-rt CLI arg length limit), analyze via Chat Completions with video_url content - Frame-based pipeline robustness: quality pre-filtering (skip black/blurry frames), bounding box normalization/validation, crop failure tolerance, Vision ranking fallback to sharpness-based selection - Improve ranking prompt: force pick one frame, Chinese description - Update docs to recommend detect-video-and-search as primary command Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:30:01 +08:00