9 changed files with 242 additions and 1040 deletions
--- a/.env.example
+++ b/.env.example
@ -1,18 +1,40 @@
 # =============================================================================
 # video-product-snapshot 环境变量配置
-# =============================================================================
-#
-# 只需在 ~/.openclaw/.env 中配置 CLIENT_KEY：
-#   CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
-#
-# 所有配置（vision API key、接口地址等）均通过 auth-rt client-config 自动获取。
+# 复制为 .env 并填入真实值：cp .env.example .env
 # =============================================================================

-# 覆盖 Vision 模型（默认来自 client config，fallback 为 aliyun-cp-multimodal）
-# VISION_MODEL=aliyun-cp-multimodal
+# -----------------------------------------------------------------------------
+# Vision API 配置（用于商品帧检测）
+# 兼容任何 OpenAI 格式接口：OpenAI / Groq / Together / 本地 Ollama 等
+# -----------------------------------------------------------------------------

-# 覆盖 auth-rt 二进制路径
-# AUTH_RT_BIN=/custom/path/to/auth-rt
+# API Key（必填）
+VISION_API_KEY=your-api-key-here

-# 遥测上报（可选）
-# TELEMETRY_ENDPOINT=https://api-gw-test.yuanwei-lnc.com/ecom/tasks/telemetry
+# API Base URL（可选，留空则使用 OpenAI 官方地址）
+# VISION_API_BASE=https://api.groq.com/openai/v1
+# VISION_API_BASE=http://localhost:11434/v1
+
+# 模型名称（可选，默认 gpt-4o-mini）
+# VISION_MODEL=gpt-4o-mini
+# VISION_MODEL=meta-llama/llama-4-scout-17b-16e-instruct
+# VISION_MODEL=llava:13b
+
+# -----------------------------------------------------------------------------
+# 1688 图搜配置（via woo-data-scrawler 本地服务，端口 3202）
+# 所有 Onebound 调用均通过本地服务代理，无需持有 API 密钥
+# -----------------------------------------------------------------------------
+
+# 上传图片接口（将本地图片上传到公共存储，获取可访问 URL）
+ONEBOUND_UPLOAD_ENDPOINT=http://localhost:3202/api/v1/tasks/upload-image
+
+# 以图搜图接口
+ONEBOUND_SEARCH_ENDPOINT=http://localhost:3202/api/v1/tasks/search-by-image
+
+# 关键词搜索接口（用于 rerank 二次过滤）
+ONEBOUND_KEYWORD_SEARCH_ENDPOINT=http://localhost:3202/api/v1/tasks/keyword-search
+
+# -----------------------------------------------------------------------------
+# Auth（由 auth-rt 自动处理，配置见 ~/.openclaw/.env）
+# 只需在 ~/.openclaw/.env 中设置 CLIENT_KEY=sk_xxx
+# -----------------------------------------------------------------------------
--- a/README.md
+++ b/README.md
@ -1,102 +0,0 @@
-# video-product-snapshot — 视频商品以图搜图
-
-从视频中提取最佳商品帧，以图搜图在 1688 找同款。
-
-## 工作原理
-
-1. `ffmpeg` 按 0.5s 间隔抽帧（最多 60 帧）
-2. 视觉质量预过滤（亮度/方差剔除模糊帧）
-3. 容器/架子类产品检测 → 自动选择空载帧
-4. 视觉模型多帧对比排序，选出最佳商品帧
-5. 裁剪商品区域 → 上传 → 1688 图搜
-6. 后置过滤（视觉模型判断结果是否同款）→ rerank 排序
-
-## 安装
-
-```bash
-./install.sh          # 安装 auth-rt + 依赖
-bun install
-bun run build         # 输出到 dist/run.js
-```
-
-## 使用方法
-
-```bash
-bun dist/run.js <command> [options]
-```
-
-### 命令
-
-| 命令 | 说明 |
-|------|------|
-| `detect-best-and-search <video>` | **推荐。** 最佳帧 → 图搜 → rerank |
-| `detect-best <video>` | 只提取最佳商品帧，不搜图 |
-| `detect-and-search <video>` | 两阶段过滤后图搜（较慢） |
-| `detect <video>` | 抽帧并逐帧检测商品 |
-| `search <image>` | 用已有图片搜同款 |
-| `rerank` | 关键词对图搜结果交叉过滤 |
-| `session` | 获取当前认证会话 token |
-
-### 选项（`detect-best` / `detect-best-and-search`）
-
-| 参数 | 默认值 | 说明 |
-|------|--------|------|
-| `--interval=<秒>` | `0.5` | 帧采样间隔 |
-| `--max-frames=<n>` | `60` | 最大分析帧数 |
-| `--output-dir=<目录>` | 视频同目录 | 截图保存目录 |
-| `--session-id=<id>` | 自动生成 | Langfuse session ID |
-| `--dry-run` | — | 解析参数，不实际执行 |
-
-## 输出
-
-所有命令输出 JSON 到 stdout，包含 `sessionId` 字段用于 Langfuse 追踪。
-
-```json
-{
-  "sessionId": "skill-20260426-184345-lb06",
-  "status": "success",
-  "command": "detect-best-and-search",
-  "bestSnapshot": {
-    "frameIndex": 7,
-    "timestampSeconds": 3,
-    "imagePath": "/path/to/frame_0007.jpg",
-    "croppedImagePath": "/path/to/frame_0007_cropped.jpg",
-    "description": "黑色金属床底鞋架 可折叠移动"
-  },
-  "rerank": {
-    "keyword": "床底鞋架",
-    "results": [
-      { "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
-    ]
-  }
-}
-```
-
-## 鉴权架构
-
-```
-~/.openclaw/.env
-  CLIENT_KEY ──→ auth-rt ──→ 业务系统
-                              ├── /session          → access_token
-                              └── /client-config    → provider.api_key
-                                                       provider.base_url
-                                                       provider.model
-```
-
-仅需配置 `CLIENT_KEY`，LLM 凭据和端点均由业务系统下发。
-
-## 环境变量
-
-| 变量 | 说明 |
-|------|------|
-| `CLIENT_KEY` | **必需。** 在 `~/.openclaw/.env` 中配置 |
-| `VISION_MODEL` | 覆盖模型名称（默认来自 client config） |
-| `SKILL_SESSION_ID` | Langfuse session ID（自动生成，格式 `skill-YYYYMMDD-HHMMSS-xxxx`） |
-| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
-| `TELEMETRY_ENDPOINT` | 遥测上报接口 |
-
-## 前置依赖
-
- [Bun](https://bun.sh) 运行时
- 系统 PATH 中包含 `ffmpeg` / `ffprobe`（帧提取）
- `auth-rt` CLI（鉴权/API 调用，`install.sh` 自动安装）
--- a/SKILL.md
+++ b/SKILL.md
@ -1,94 +1,94 @@
 ---
 name: video-product-snapshot
-description: "Extract product snapshot from video and search 1688 by image. / 从视频中提取最佳商品帧，以图搜图在1688找同款。当用户提供视频想找商品时使用。"
+description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it."
 ---

-# Video Product Snapshot — 视频商品以图搜图
+# Video Product Snapshot

-从视频中截取最清晰的商品帧（容器类产品自动选空载帧），上传图片在 1688 以图搜图找同款。
+Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.

-## 运行
+## Run

 ```bash
 bun dist/run.js <command> [args] [--dry-run]
 ```

-## 命令列表
+## Commands

-| 命令 | 使用场景 |
-|------|---------|
-| `detect-best-and-search <video>` | **推荐。** 提取最佳商品帧 → 图搜 → rerank 返回结果。 |
-| `detect-best <video>` | 只提取最佳商品帧，不搜图。 |
-| `detect-and-search <video>` | 两阶段过滤后图搜（比 detect-best 慢）。 |
-| `search <image-path>` | 已有商品图，直接图搜。 |
-| `rerank` | 用关键词对图搜结果交叉过滤。 |
-| `session` | 获取当前认证会话 token。 |
+| Command | Description |
+|---------|-------------|
+| `detect <video-path> [options]` | Extract frames, detect product snapshots |
+| `search <image-path>` | Search products by image via API |
+| `detect-and-search <video-path> [options]` | Detect best snapshot then run image search |
+| `session` | Get auth session token |

-## 主命令：`detect-best-and-search`
-
-流程：
-1. ffmpeg 按 0.5s 间隔提取帧（最多 60 帧）
-2. 视觉模型检测是否为容器/架子类产品
-3. 容器类：只从前 40% 帧（空载阶段）中选最佳帧
-4. 非容器类：全帧中选最清晰帧
-5. 裁剪商品区域
-6. 上传裁剪图 → 1688 图搜
-7. rerank：图搜结果与关键词搜索结果交叉过滤
-
-## Options for `detect-best` / `detect-best-and-search`
+## Options for `detect` / `detect-and-search`

 | Flag | Default | Description |
 |------|---------|-------------|
-| `--interval=<sec>` | `0.5` | 帧采样间隔（秒） |
-| `--max-frames=<n>` | `60` | 最大分析帧数 |
-| `--output-dir=<dir>` | 视频同目录 | 截图保存目录 |
+| `--interval=<sec>` | `1` | Seconds between sampled frames |
+| `--max-frames=<n>` | `60` | Max frames to analyze |
+| `--output-dir=<dir>` | next to video | Directory to save snapshot images |
+| `--min-confidence=<0-1>` | `0.7` | Minimum detection confidence threshold |

-## 输出格式
+## Examples

-### `detect-best-and-search`
+```bash
+# Detect product frames in a video
+bun dist/run.js detect ./product-demo.mp4

+# Sample every 5 seconds, higher confidence threshold
+bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
+
+# Search for products using an existing image
+bun dist/run.js search ./snapshot.jpg
+
+# Full pipeline: detect best product frame then search
+bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
+```
+
+## Output
+
+Returns JSON with:
+- `productFrames[]`: all detected product frames sorted by confidence (highest first)
+- `bestSnapshot`: the highest-confidence product frame
+- `searchBody`: image search API response (for `detect-and-search` and `search`)
+
+Each `ProductFrame` contains:
 ```json
 {
-  "bestSnapshot": {
-    "frameIndex": 7,
-    "timestampSeconds": 3,
-    "imagePath": "/path/to/frame_0007.jpg",
-    "croppedImagePath": "/path/to/frame_0007_cropped.jpg",
-    "description": "黑色金属床底鞋架 可折叠移动"
-  },
-  "rerank": {
-    "keyword": "床底鞋架",
-    "results": [
-      { "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
-    ]
-  }
+  "frameIndex": 4,
+  "timestampSeconds": 9,
+  "imagePath": "/path/to/snapshot/frame_0004.jpg",
+  "confidence": 0.92,
+  "description": "White sneaker with blue logo, left side view",
+  "boundingHint": "centered"
 }
 ```

-## 结果展示格式
+## Prerequisites

-将 `rerank.results`（优先）或 `searchBody.data.items.item` 格式化为 markdown 表格，**最多 5 条**：
+- `ffmpeg` and `ffprobe` in PATH
+- `VISION_API_KEY` — API key for the vision endpoint
+- `VISION_API_BASE` — (optional) OpenAI-compatible base URL; omit to use OpenAI default
+- `VISION_MODEL` — (optional) model name, default `gpt-4o-mini`
+- `auth-rt` in PATH (for `search` / `detect-and-search` API calls)

-| # | 商品名称 | 价格 | 销量 | 链接 |
-|---|----------|------|------|------|
-| 1 | {title}  | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看]({detail_url}) |
+### Example provider configs

- 有 `promotion_price` 用促销价，否则用原价
- `sales` 缺失或为零时显示 `—`
- 始终用 markdown 表格展示，不要用列表
-
-## 执行规则
-
-1. **不要重试。** 命令失败就直接报错。
-2. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。
-3. **慢命令用 sub-agent**（detect-best-and-search / detect-and-search），直接运行会超时。
-
-### sub-agent 执行模板
+```bash
+# OpenAI (default)
+VISION_API_KEY=sk-...

+# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
+VISION_API_KEY=...
+VISION_API_BASE=http://localhost:11434/v1
+VISION_MODEL=llava:13b
 ```
-sessions_spawn(
-  task: "cd /path/to/skill && Run this command and return the raw JSON output:\n\nbun dist/run.js detect-best-and-search <video-path>\n\nCopy the entire JSON output as your reply.",
-  label: "video-product-snapshot",
-  runTimeoutSeconds: 300,
-)
-```
+
+## Rules — MUST follow
+
+1. **Execute only, do not reason about internals.** Run the CLI and return the output.
+2. **No fallback strategies.** Report errors as-is; do NOT try alternative approaches.
+3. **No retry loops.** If detection or search fails, report the failure.
+4. **Trust the tool's output.** The CLI handles session management and error formatting internally.
--- a/scripts/run.ts
+++ b/scripts/run.ts
@ -2,7 +2,6 @@
 import { resolve } from 'path';
 import type { Command } from '../src/types.ts';
 import { run } from '../src/index.ts';
-const SKILL_NAME = 'video-product-snapshot';

 // Load .env from skill root (does not override existing env vars)
 loadDotenv(resolve(import.meta.dir, '../.env'));
@ -22,56 +21,39 @@ function loadDotenv(path: string): void {
 }

 function printUsage(): void {
-  console.error(`用法:
+  console.error(`Usage:
  bun scripts/run.ts [--api-base=<url>] <command> [args...] [--dry-run]

-命令:
+Commands:
  session
-      获取认证 session token
+      Get auth session token

  detect <video-path> [options]
-      从视频抽帧并检测商品画面
-      选项:
-        --interval=<秒>            抽帧间隔（默认: 1）
-        --max-frames=<数量>        最多分析帧数（默认: 60）
-        --output-dir=<目录>        截图保存目录（默认: 视频所在目录）
-        --min-confidence=<0-1>     最低检测置信度（默认: 0.7）
+      Extract frames and detect ecommerce product snapshots
+      Options:
+        --interval=<seconds>       Frame sampling interval (default: 3)
+        --max-frames=<n>           Max frames to analyze (default: 30)
+        --output-dir=<dir>         Where to save snapshots (default: next to video)
+        --min-confidence=<0-1>     Minimum detection confidence (default: 0.7)
+        --concurrency=<n>          Parallel Vision API calls per chunk (default: 5)

  search <image-path>
-      用图片搜索商品（调用 ecom image-search API）
+      Search for products using an image via the ecom image-search API

  detect-and-search <video-path> [options]
-      检测最佳商品画面 → 图片搜索 → 关键词重排序
+      Detect best product snapshot from video then run image search

-  detect-best <video-path> [options]
-      从视频抽帧并选择最佳商品画面（更快更稳定）
+Examples:
+  bun scripts/run.ts detect ./demo.mp4
+  bun scripts/run.ts detect ./demo.mp4 --interval=5 --max-frames=20
+  bun scripts/run.ts search ./snapshot.jpg
+  bun scripts/run.ts detect-and-search ./demo.mp4 --min-confidence=0.8

-  detect-best-and-search <video-path> [options]
-      最佳画面 → 图片搜索 → 关键词重排序
-
-  detect-video <video-path>
-      识别商品描述和搜索关键词（当前实现：从视频抽帧选最佳帧）
-
-  detect-video-and-search <video-path>
-      识别商品 → 图片搜索 → 1688 关键词重排序（当前实现：从视频抽帧选最佳帧）
-
-  rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>]
-      通过关键词交并集过滤搜索结果
-
-配置文件: ~/.openclaw/.env (CLIENT_KEY), skill 目录 .env (VISION_API_KEY)
+Config: ANTHROPIC_API_KEY env var required for detection.
+        auth-rt in PATH required for search commands.
 `);
 }

-function reportTelemetry(payload: object): void {
-  const endpoint = process.env.TELEMETRY_ENDPOINT;
-  if (!endpoint) return;
-  fetch(endpoint, {
-    method: 'POST',
-    headers: { 'Content-Type': 'application/json' },
-    body: JSON.stringify(payload),
-  }).catch(() => {});
-}
-
 async function main(): Promise<void> {
  const positionals: string[] = [];
  let dryRun = false;
@ -81,8 +63,6 @@ async function main(): Promise<void> {
      dryRun = true;
    } else if (arg.startsWith('--api-base=')) {
      process.env.API_BASE = arg.slice('--api-base='.length).trim();
-    } else if (arg.startsWith('--session-id=')) {
-      process.env.SKILL_SESSION_ID = arg.slice('--session-id='.length).trim();
    } else if (arg === '-h' || arg === '--help') {
      printUsage(); process.exit(0);
    } else {
@ -92,23 +72,8 @@ async function main(): Promise<void> {

  if (positionals.length < 1) { printUsage(); process.exit(1); }

-  const command = positionals[0] as Command;
-  const sessionId = process.env.SKILL_SESSION_ID!; // set by auth-cli.ts at module load
-  const startMs = Date.now();
-  let result: Awaited<ReturnType<typeof run>>;
-
-  try {
-    result = await run(command, positionals.slice(1), dryRun);
-  } catch (err) {
-    const error = err instanceof Error ? err.message : String(err);
-    console.log(JSON.stringify({ status: 'failed', command, dryRun, sessionId, error }, null, 2));
-    if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, sessionId, status: 'failed', durationMs: Date.now() - startMs, error });
-    process.exit(1);
-  }
-
-  const output = { ...result, sessionId } as Record<string, unknown>;
-  console.log(JSON.stringify(output, null, 2));
-  if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, sessionId, status: result.status, durationMs: Date.now() - startMs, error: (result as any).error });
+  const result = await run(positionals[0] as Command, positionals.slice(1), dryRun);
+  console.log(JSON.stringify(result, null, 2));
 }

 main().catch((err) => {
--- a/src/auth-cli.ts
+++ b/src/auth-cli.ts
@ -20,18 +20,6 @@ import * as path from 'path';
 import * as os from 'os';

 const home = process.env.HOME || os.homedir();
-
-// ── session ID (Langfuse tracing) ──
-// Priority: SKILL_SESSION_ID env > auto-generate
-const SESSION_ID = process.env.SKILL_SESSION_ID || (() => {
-  const ts = new Date();
-  const pad = (n: number) => String(n).padStart(2, '0');
-  const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
-  const rand = Math.random().toString(36).slice(2, 6);
-  return `skill-${tsPart}-${rand}`;
-})();
-process.env.SKILL_SESSION_ID = SESSION_ID;
-
 const AUTH_RT_BIN = process.env.AUTH_RT_BIN
  || (() => {
    // Check if auth-rt is in PATH
@ -55,20 +43,6 @@ export interface SessionResponse {
  hookToken?: string;
 }

-export interface ClientConfig {
-  clientId: string;
-  name: string;
-  status: string;
-  metadata: {
-    provider?: {
-      api_key?: string;
-      base_url?: string;
-      model?: string;
-    };
-    [key: string]: unknown;
-  };
-}
-
 export interface SkillClientOptions {
  apiBase?: string;
  dryRun?: boolean;
@ -105,13 +79,6 @@ export class SkillClient {
    return JSON.parse(runCli('session'));
  }

-  async clientConfig(): Promise<ClientConfig> {
-    if (this.dryRun) {
-      return { clientId: '<dry-run>', name: '<dry-run>', status: 'active', metadata: {} };
-    }
-    return JSON.parse(runCli('client-config'));
-  }
-
  async get(urlPath: string): Promise<ApiResponse> {
    return this.request('GET', urlPath);
  }
--- a/src/index.ts
+++ b/src/index.ts
@ -1,32 +1,13 @@
 import * as fs from 'fs';
 import * as path from 'path';
-import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem, DetectVideoResult, DetectVideoAndSearchResult } from './types.ts';
+import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem } from './types.ts';
 import { createSkillClient } from './auth-cli.ts';
 import { extractFrames } from './frame-extractor.ts';
-import { detectProductFrames, detectBestFrame } from './product-detector.ts';
-import { postFilterByImage } from './post-filter.ts';
+import { detectProductFrames } from './product-detector.ts';
+import { imageToBase64 } from './frame-extractor.ts';
 import { generateText } from 'ai';
 import { createOpenAI } from '@ai-sdk/openai';

-export interface VisionConfig {
-  apiKey: string;
-  baseURL?: string;
-  model: string;
-  sessionId?: string;
-}
-
-async function loadVisionConfig(client: ReturnType<typeof createSkillClient>): Promise<VisionConfig> {
-  const cfg = await client.clientConfig();
-  const apiKey = cfg.metadata?.provider?.api_key;
-  if (!apiKey) throw new Error('Vision API key not found in client config (metadata.provider.api_key)');
-  return {
-    apiKey,
-    baseURL: cfg.metadata?.provider?.base_url,
-    model: process.env.VISION_MODEL ?? cfg.metadata?.provider?.model ?? 'aliyun-cp-multimodal',
-    sessionId: process.env.SKILL_SESSION_ID || `skill_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
-  };
-}
-
 export async function run(
  command: Command,
  args: string[],
@ -41,14 +22,6 @@ export async function run(
      return runSearch(args, dryRun);
    case 'detect-and-search':
      return runDetectAndSearch(args, dryRun);
-    case 'detect-best':
-      return runDetectBest(args, dryRun);
-    case 'detect-best-and-search':
-      return runDetectBestAndSearch(args, dryRun);
-    case 'detect-video':
-      return runDetectVideo(args, dryRun);
-    case 'detect-video-and-search':
-      return runDetectVideoAndSearch(args, dryRun);
    case 'rerank':
      return runRerank(args, dryRun);
    default:
@ -77,11 +50,8 @@ async function runDetect(args: string[], dryRun: boolean): Promise<DetectResult>
    };
  }

-  const client = createSkillClient();
-  const visionConfig = await loadVisionConfig(client);
-
  const frames = extractFrames(videoPath, opts.outputDir, opts.intervalSeconds, opts.maxFrames);
-  const productFrames = await detectProductFrames(frames, opts.minConfidence, opts.concurrency, visionConfig);
+  const productFrames = await detectProductFrames(frames, opts.minConfidence, opts.concurrency);

  return {
    status: 'success',
@ -94,16 +64,28 @@ async function runDetect(args: string[], dryRun: boolean): Promise<DetectResult>
  };
 }

-async function uploadImage(client: ReturnType<typeof createSkillClient>, imagePath: string): Promise<string> {
+async function uploadImage(imagePath: string): Promise<string> {
+  const searchEndpoint = process.env.ONEBOUND_SEARCH_ENDPOINT;
+  if (!searchEndpoint) throw new Error('ONEBOUND_SEARCH_ENDPOINT not set');
+
+  const uploadEndpoint = process.env.ONEBOUND_UPLOAD_ENDPOINT;
+  if (!uploadEndpoint) throw new Error('ONEBOUND_UPLOAD_ENDPOINT not set');
+
  const imageBuffer = fs.readFileSync(imagePath);
  const filename = `video-snapshot-${Date.now()}.jpg`;
-  const res = await client.post('/ecom/tasks/upload-image', {
-    data: imageBuffer.toString('base64'),
-    filename,
-    contentType: 'image/jpeg',
+
+  const response = await fetch(uploadEndpoint, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({
+      data: imageBuffer.toString('base64'),
+      filename,
+      contentType: 'image/jpeg',
+    }),
  });
-  if (res.status >= 400) throw new Error(`Upload failed: HTTP ${res.status}`);
-  const json = JSON.parse(res.body) as { url?: string };
+
+  if (!response.ok) throw new Error(`Upload failed: HTTP ${response.status}`);
+  const json = await response.json() as { url?: string };
  if (!json.url) throw new Error('Upload response missing url');
  return json.url;
 }
@ -113,214 +95,35 @@ async function runSearch(args: string[], dryRun: boolean): Promise<SearchResult>
  if (!imagePath) return { status: 'failed', command: 'search', dryRun, error: 'search requires <image-path>' };
  if (!fs.existsSync(imagePath)) return { status: 'failed', command: 'search', dryRun, error: `image not found: ${imagePath}` };

+  const searchEndpoint = process.env.ONEBOUND_SEARCH_ENDPOINT;
+  if (!searchEndpoint) return { status: 'failed', command: 'search', dryRun, error: 'ONEBOUND_SEARCH_ENDPOINT not set' };
+
  if (dryRun) {
    return { status: 'success', command: 'search', dryRun, imagePath, searchHttpStatus: 0, searchBody: null };
  }

-  const client = createSkillClient();
-
+  // If given a local file, upload it first to get a public URL
  let imgid = imagePath;
  if (!imagePath.startsWith('http')) {
-    imgid = await uploadImage(client, imagePath);
+    imgid = await uploadImage(imagePath);
  }

-  const res = await client.post('/ecom/tasks/search-by-image', { imgid, page: 1 });
-  const searchHttpStatus = res.status;
-  const body = JSON.parse(res.body);
+  const response = await fetch(searchEndpoint, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ imgid, page: 1 }),
+  });

-  if (res.status >= 400) {
+  const searchHttpStatus = response.status;
+  const body = await response.json();
+
+  if (!response.ok) {
    return { status: 'failed', command: 'search', dryRun, imagePath, searchHttpStatus, error: JSON.stringify(body) };
  }

  return { status: 'success', command: 'search', dryRun, imagePath, searchHttpStatus, searchBody: body };
 }

-async function runDetectBest(args: string[], dryRun: boolean): Promise<DetectResult> {
-  const videoPath = args[0];
-  if (!videoPath) return { status: 'failed', command: 'detect-best', dryRun, error: 'detect-best requires <video-path>' };
-  if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-best', dryRun, error: `video not found: ${videoPath}` };
-
-  const outputDir = getFlag(args, '--output-dir') || path.join(
-    path.dirname(videoPath),
-    `snapshots_${path.basename(videoPath, path.extname(videoPath))}_${Date.now()}`,
-  );
-  const intervalSeconds = parseFloat(getFlag(args, '--interval') || '0.5');
-  const maxFrames = parseInt(getFlag(args, '--max-frames') || '60', 10);
-
-  if (dryRun) {
-    return { status: 'success', command: 'detect-best', dryRun, videoPath, totalFramesExtracted: 0, productFrames: [], bestSnapshot: undefined };
-  }
-
-  const client = createSkillClient();
-  const visionConfig = await loadVisionConfig(client);
-
-  const frames = extractFrames(videoPath, outputDir, intervalSeconds, maxFrames);
-  if (frames.length === 0) {
-    return { status: 'failed', command: 'detect-best', dryRun, videoPath, error: 'no frames extracted from video' };
-  }
-
-  const best = await detectBestFrame(frames, visionConfig, 20);
-
-  return {
-    status: 'success',
-    command: 'detect-best',
-    dryRun,
-    videoPath,
-    totalFramesExtracted: frames.length,
-    productFrames: best ? [best] : [],
-    bestSnapshot: best ?? undefined,
-  };
-}
-
-async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
-  const detectResult = await runDetectBest(args, dryRun) as DetectResult;
-  if (detectResult.status === 'failed') return detectResult;
-
-  if (!detectResult.bestSnapshot) {
-    if (dryRun) return { ...detectResult, command: 'detect-best-and-search' };
-    return { ...detectResult, status: 'failed', error: 'no frame could be extracted from video' };
-  }
-
-  const best = detectResult.bestSnapshot;
-  const imageForSearch = best.croppedImagePath || best.imagePath;
-  const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;
-
-  // Post-filter: drop results whose pic_url isn't the same product type as our snapshot
-  let postFilter: any = undefined;
-  if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
-    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
-    if (items.length > 0) {
-      try {
-        const client = createSkillClient();
-        const visionConfig = await loadVisionConfig(client);
-        const result = await postFilterByImage(imageForSearch, items, visionConfig, { description: best.description });
-        (searchResult.searchBody as any).data.items.item = result.kept;
-        postFilter = {
-          totalChecked: result.totalChecked,
-          keptCount: result.kept.length,
-          rejectedCount: result.rejected.length,
-          failed: result.failed,
-        };
-      } catch (e: any) {
-        postFilter = { error: e.message };
-      }
-    }
-  }
-
-  let rerankResult: any = undefined;
-  // If post-filter produced focused results, sort them directly by sales — they're already the best matches.
-  // Otherwise fall back to the keyword-intersection rerank.
-  if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
-    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
-    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
-    rerankResult = {
-      source: 'post-filter',
-      results: sorted,
-      count: sorted.length,
-    };
-  } else if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
-    const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
-    try {
-      fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
-      rerankResult = await runRerank([
-        `--image-results=${tmpFile}`,
-        `--description=${best.description}`,
-        '--top=5',
-      ], dryRun);
-    } catch (e: any) {
-      rerankResult = { error: e.message };
-    } finally {
-      try { fs.unlinkSync(tmpFile); } catch {}
-    }
-  }
-
-  return {
-    ...detectResult,
-    command: 'detect-best-and-search',
-    searchHttpStatus: searchResult.searchHttpStatus,
-    searchBody: searchResult.searchBody,
-    searchError: searchResult.error,
-    postFilter,
-    rerank: rerankResult,
-  } as any;
-}
-
-async function runDetectVideo(args: string[], dryRun: boolean): Promise<DetectVideoResult> {
-  const videoPath = args[0];
-  if (!videoPath) return { status: 'failed', command: 'detect-video', dryRun, error: 'detect-video requires <video-path>' };
-  if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video', dryRun, error: `video not found: ${videoPath}` };
-
-  const detectResult = await runDetectBest(args, dryRun) as DetectResult;
-  if (detectResult.status === 'failed') {
-    return { status: 'failed', command: 'detect-video', dryRun, videoPath, error: detectResult.error || 'failed to detect best frame' };
-  }
-  const description = detectResult.bestSnapshot?.description?.trim();
-  const snapshotImagePath = detectResult.bestSnapshot?.croppedImagePath || detectResult.bestSnapshot?.imagePath;
-  if (!description) {
-    return { status: 'failed', command: 'detect-video', dryRun, videoPath, error: 'no product description detected from video' };
-  }
-
-  if (dryRun) {
-    return { status: 'success', command: 'detect-video', dryRun, videoPath, videoUrl: null, description, keyword: '<dry-run-keyword>', snapshotImagePath };
-  }
-
-  const client = createSkillClient();
-  const visionConfig = await loadVisionConfig(client);
-  const keyword = await generateChineseKeyword(description, visionConfig);
-
-  return { status: 'success', command: 'detect-video', dryRun, videoPath, videoUrl: null, description, keyword, snapshotImagePath };
-}
-
-async function runDetectVideoAndSearch(args: string[], dryRun: boolean): Promise<DetectVideoAndSearchResult> {
-  const videoPath = args[0];
-  if (!videoPath) return { status: 'failed', command: 'detect-video-and-search', dryRun, error: 'detect-video-and-search requires <video-path>' };
-  if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video-and-search', dryRun, error: `video not found: ${videoPath}` };
-
-  if (dryRun) {
-    return { status: 'success', command: 'detect-video-and-search', dryRun, videoPath, videoUrl: null, description: '<dry-run>', keyword: '<dry-run>', searchResults: [] };
-  }
-
-  // Reuse existing pipeline: best snapshot → image search → keyword rerank
-  const detectAndSearch = await runDetectBestAndSearch(args, dryRun) as any;
-  if (detectAndSearch.status === 'failed') {
-    return { status: 'failed', command: 'detect-video-and-search', dryRun, videoPath, error: detectAndSearch.error || 'detect-best-and-search failed' };
-  }
-
-  const description = String(detectAndSearch.bestSnapshot?.description || '').trim();
-  const rerank = detectAndSearch.rerank;
-  const keyword = String(rerank?.keyword || '').trim();
-  const searchResults = (rerank?.results || []) as SearchItem[];
-
-  // Fallback: if rerank didn't produce anything, do keyword search directly.
-  if (!searchResults.length) {
-    const client = createSkillClient();
-    const visionConfig = await loadVisionConfig(client);
-    const fallbackKeyword = keyword || (description ? await generateChineseKeyword(description, visionConfig) : '');
-    const items = fallbackKeyword ? await keywordSearch(client, fallbackKeyword, 1) : [];
-    return {
-      status: 'success',
-      command: 'detect-video-and-search',
-      dryRun,
-      videoPath,
-      videoUrl: null,
-      description,
-      keyword: fallbackKeyword,
-      searchResults: items,
-    };
-  }
-
-  return {
-    status: 'success',
-    command: 'detect-video-and-search',
-    dryRun,
-    videoPath,
-    videoUrl: null,
-    description,
-    keyword,
-    searchResults,
-  };
-}
-
 async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
  const detectResult = await runDetect(args, dryRun) as DetectResult;
  if (detectResult.status === 'failed') return detectResult;
@ -330,51 +133,23 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
  }

  const best = detectResult.bestSnapshot;
+  // Use cropped image if available, otherwise full frame
  const imageForSearch = best.croppedImagePath || best.imagePath;
  const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;

-  // Post-filter: drop results whose pic_url isn't the same product type as our snapshot
-  let postFilter: any = undefined;
-  if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
-    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
-    if (items.length > 0) {
-      try {
-        const client = createSkillClient();
-        const visionConfig = await loadVisionConfig(client);
-        const result = await postFilterByImage(imageForSearch, items, visionConfig, { description: best.description });
-        (searchResult.searchBody as any).data.items.item = result.kept;
-        postFilter = {
-          totalChecked: result.totalChecked,
-          keptCount: result.kept.length,
-          rejectedCount: result.rejected.length,
-          failed: result.failed,
-        };
-      } catch (e: any) {
-        postFilter = { error: e.message };
-      }
-    }
-  }
-
+  // Auto-rerank using product description to generate Chinese keyword
  let rerankResult: any = undefined;
-  // If post-filter produced focused results, sort them directly by sales — they're already the best matches.
-  // Otherwise fall back to the keyword-intersection rerank.
-  if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
-    const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
-    const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
-    rerankResult = {
-      source: 'post-filter',
-      results: sorted,
-      count: sorted.length,
-    };
-  } else if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
+  if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
+    // Save search body to temp file for rerank
    const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
    try {
      fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
-      rerankResult = await runRerank([
+      const rerankArgs = [
        `--image-results=${tmpFile}`,
        `--description=${best.description}`,
-        '--top=5',
-      ], dryRun);
+        '--top=10',
+      ];
+      rerankResult = await runRerank(rerankArgs, dryRun);
    } catch (e: any) {
      rerankResult = { error: e.message };
    } finally {
@ -388,7 +163,6 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
    searchHttpStatus: searchResult.searchHttpStatus,
    searchBody: searchResult.searchBody,
    searchError: searchResult.error,
-    postFilter,
    rerank: rerankResult,
  } as any;
 }
@ -416,41 +190,25 @@ function getFlag(args: string[], flag: string): string | undefined {
  return undefined;
 }

-function createVisionModel(config: VisionConfig) {
-  const sessionId = config.sessionId || '';
-  const originFetch = globalThis.fetch;
-  // Inject metadata.session_id into request body so LiteLLM → Langfuse creates sessions
-  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
-    if (init?.body && typeof init.body === 'string') {
-      try {
-        const body = JSON.parse(init.body);
-        if (!body.metadata) body.metadata = {};
-        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
-        body.metadata.tags = ['skill:video-product-snapshot'];
-        init = { ...init, body: JSON.stringify(body) };
-      } catch {}
-    }
-    return originFetch(input, init);
-  };
-  const openai = createOpenAI({
-    apiKey: config.apiKey, baseURL: config.baseURL,
-    fetch: wrapped as typeof globalThis.fetch,
-  });
-  return openai(config.model);
+function createVisionModel() {
+  const apiKey = process.env.VISION_API_KEY;
+  if (!apiKey) throw new Error('VISION_API_KEY not set');
+  const baseURL = process.env.VISION_API_BASE || undefined;
+  const modelName = process.env.VISION_MODEL || 'gpt-4o-mini';
+  const openai = createOpenAI({ apiKey, baseURL });
+  return openai(modelName);
 }

-async function generateChineseKeyword(description: string, visionConfig: VisionConfig): Promise<string> {
-  const model = createVisionModel(visionConfig);
+async function generateChineseKeyword(description: string): Promise<string> {
+  const model = createVisionModel();
  const { text } = await generateText({
    model,
    prompt: `You are generating a 1688.com (Chinese B2B wholesale) product search keyword.
 Rules:
- Output ONLY 2-4 Chinese words — the product OBJECT TYPE + 1-2 key material/feature words
- CRITICAL: If the product is a container, organizer, rack, shelf, bag, box, or holder, the keyword MUST name THAT object — NOT the items it holds.
-  Examples: shoe rack → "金属鞋架", cable organizer → "理线器", storage shelf → "收纳架", toolbox → "工具箱"
+- Output ONLY 2-4 Chinese words — the product category + 1-2 key material/feature words
 - Use common Chinese commerce terms, NOT a literal translation
 - No English, no punctuation, no explanation
- Short broad terms work better than long specific phrases
+- Short broad terms work better than long specific phrases (e.g. "金属鞋架" not "黑色Z型金属网格鞋架")

 Product description: ${description}

@ -459,9 +217,16 @@ Output only the search query:`,
  return text.trim().replace(/[^\u4e00-\u9fff\u3400-\u4dbf]/g, '').trim();
 }

-async function keywordSearch(client: ReturnType<typeof createSkillClient>, keyword: string, page = 1): Promise<SearchItem[]> {
-  const res = await client.post('/ecom/tasks/keyword-search', { keyword, page });
-  const json = JSON.parse(res.body) as any;
+async function keywordSearch(keyword: string, page = 1): Promise<SearchItem[]> {
+  const endpoint = process.env.ONEBOUND_KEYWORD_SEARCH_ENDPOINT;
+  if (!endpoint) throw new Error('ONEBOUND_KEYWORD_SEARCH_ENDPOINT not set');
+
+  const res = await fetch(endpoint, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ keyword, page }),
+  });
+  const json = await res.json() as any;
  return (json?.data?.items?.item ?? []) as SearchItem[];
 }

@ -490,10 +255,9 @@ function extractKeywordsFromTitles(items: SearchItem[], topN = 5): string {

 async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult> {
  // --image-results=<path> --keyword=<text> --top=<n>
-  const positionals = args.filter((a) => !a.startsWith('--'));
-  const imageResultsArg = getFlag(args, '--image-results') || positionals[0];
-  const keywordArg = getFlag(args, '--keyword') || positionals[1];
-  const topN = parseInt(getFlag(args, '--top') || '5', 10);
+  const imageResultsArg = getFlag(args, '--image-results') || args[0];
+  const keywordArg = getFlag(args, '--keyword') || args[1];
+  const topN = parseInt(getFlag(args, '--top') || '10', 10);

  const description = getFlag(args, '--description') || '';

@ -501,9 +265,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>

  if (dryRun) return { status: 'success', command: 'rerank', dryRun } as any;

-  const client = createSkillClient();
-  const visionConfig = await loadVisionConfig(client);
-
+  // Load image search results
  let imageItems: SearchItem[];
  try {
    const raw = fs.existsSync(imageResultsArg)
@ -527,7 +289,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
    // Prefer product description for accurate translation; fall back to image titles
    const sourceText = description || keyword || extractKeywordsFromTitles(imageItems);
    try {
-      autoGeneratedKeyword = await generateChineseKeyword(sourceText, visionConfig);
+      autoGeneratedKeyword = await generateChineseKeyword(sourceText);
    } catch {
      autoGeneratedKeyword = extractKeywordsFromTitles(imageItems);
    }
@ -537,7 +299,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
  // Keyword search on 1688
  let keywordItems: SearchItem[] = [];
  try {
-    keywordItems = await keywordSearch(client, keyword);
+    keywordItems = await keywordSearch(keyword);
  } catch (e: any) {
    return { status: 'failed', command: 'rerank', dryRun, error: `keyword search failed: ${e.message}` };
  }
@ -568,3 +330,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
    results: sorted,
  } as any;
 }
+
+function parseJsonSafe(text: string): unknown {
+  try { return JSON.parse(text); } catch { return text; }
+}
--- a/src/post-filter.ts
+++ b/src/post-filter.ts
@ -1,123 +0,0 @@
-import { generateText } from 'ai';
-import { createOpenAI } from '@ai-sdk/openai';
-import type { SearchItem } from './types.ts';
-import type { VisionConfig } from './index.ts';
-import { imageToBase64 } from './frame-extractor.ts';
-
-export interface PostFilterResult {
-  kept: SearchItem[];
-  rejected: SearchItem[];
-  totalChecked: number;
-  failed: boolean;
-}
-
-const FILTER_PROMPT = (count: number, description?: string) => {
-  const productLine = description
-    ? `查询商品是：${description}`
-    : '第1张图是查询商品。';
-  return `${productLine}
-后面的 ${count} 张图是搜索结果。
-
-任务：判断每张候选图是否与查询商品是**完全相同的具体产品类型**。
- 必须是同一个具体产品（例如：查询是"鞋架"，候选必须也是鞋架；不是其他类型的架子如纸巾架、首饰架、收纳盒）
- 颜色、材质、款式、尺寸不同但同一具体类型 → 算同类
- 用途不同就不算同类（例如：查询是鞋架 vs 候选是纸巾架 → 不算；查询是鞋架 vs 候选是床下收纳箱 → 不算，除非明确是鞋类收纳）
- 关键判断：候选商品的主要用途是否与查询商品一致
-
-按候选图顺序输出每一张的判断，每行一个，格式严格遵守：
-1: YES
-2: NO
-3: YES
-...
-
-只输出 ${count} 行结果，不要解释，不要前后空行。`;
-};
-
-function createModel(config: VisionConfig) {
-  const sessionId = config.sessionId || '';
-  const originFetch = globalThis.fetch;
-  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
-    if (init?.body && typeof init.body === 'string') {
-      try {
-        const body = JSON.parse(init.body);
-        if (!body.metadata) body.metadata = {};
-        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
-        body.metadata.tags = ['skill:video-product-snapshot'];
-        init = { ...init, body: JSON.stringify(body) };
-      } catch {}
-    }
-    return originFetch(input, init);
-  };
-  const provider = createOpenAI({
-    apiKey: config.apiKey, baseURL: config.baseURL,
-    fetch: wrapped as typeof globalThis.fetch,
-  });
-  return provider(config.model);
-}
-
-async function classifyBatch(
-  model: ReturnType<ReturnType<typeof createOpenAI>>,
-  queryImageDataUrl: string,
-  batch: SearchItem[],
-  description?: string,
-): Promise<boolean[]> {
-  const content: any[] = [{ type: 'image', image: queryImageDataUrl }];
-  for (const item of batch) {
-    content.push({ type: 'image', image: item.pic_url });
-  }
-  content.push({ type: 'text', text: FILTER_PROMPT(batch.length, description) });
-
-  const { text } = await generateText({
-    model,
-    messages: [{ role: 'user', content }],
-    maxTokens: 200,
-  });
-
-  const flags = batch.map(() => false);
-  for (const line of text.split('\n')) {
-    const m = line.match(/^\s*(\d+)\s*[:：]\s*(YES|NO|是|否)/i);
-    if (!m) continue;
-    const idx = parseInt(m[1], 10) - 1;
-    const yes = /YES|是/i.test(m[2]);
-    if (idx >= 0 && idx < flags.length) flags[idx] = yes;
-  }
-  return flags;
-}
-
-export async function postFilterByImage(
-  queryImagePath: string,
-  items: SearchItem[],
-  visionConfig: VisionConfig,
-  options: { description?: string; batchSize?: number } = {},
-): Promise<PostFilterResult> {
-  if (items.length === 0) {
-    return { kept: [], rejected: [], totalChecked: 0, failed: false };
-  }
-
-  const batchSize = options.batchSize ?? 10;
-  const description = options.description;
-
-  const model = createModel(visionConfig);
-  const queryDataUrl = `data:image/jpeg;base64,${imageToBase64(queryImagePath)}`;
-
-  const kept: SearchItem[] = [];
-  const rejected: SearchItem[] = [];
-  let anyFailed = false;
-
-  for (let i = 0; i < items.length; i += batchSize) {
-    const batch = items.slice(i, i + batchSize);
-    try {
-      const flags = await classifyBatch(model, queryDataUrl, batch, description);
-      batch.forEach((item, idx) => {
-        if (flags[idx]) kept.push(item);
-        else rejected.push(item);
-      });
-    } catch {
-      // On batch failure, keep items (don't lose them) but flag the run as partial
-      anyFailed = true;
-      kept.push(...batch);
-    }
-  }
-
-  return { kept, rejected, totalChecked: items.length, failed: anyFailed };
-}
--- a/src/product-detector.ts
+++ b/src/product-detector.ts
@ -1,10 +1,9 @@
-import { generateObject, generateText } from 'ai';
+import { generateObject } from 'ai';
 import { createOpenAI } from '@ai-sdk/openai';
 import { z } from 'zod';
 import type { ExtractedFrame } from './frame-extractor.ts';
 import type { ProductFrame } from './types.ts';
 import { imageToBase64 } from './frame-extractor.ts';
-import type { VisionConfig } from './index.ts';

 // Pass 1: quick filter — discard frames that clearly have no product
 const FilterSchema = z.object({
@ -28,75 +27,32 @@ Discard (keep=false) if: only hands/texture/contents visible, motion blur, black

 reason options: product_visible | content_only | hands_only | blur | transition | background_only`;

-const CONTAINER_CHECK_PROMPT = `Is the main product in this image a CONTAINER, RACK, or HOLDER (something designed to store/hold other items)?
-Examples YES: shoe rack, shelf, storage box, organizer, basket, drawer, wardrobe, trolley, bin, tray, cabinet.
-Examples NO: shoes, clothing, electronics, food, toys, cosmetics, tools.
-Reply with only one word: YES or NO.`;
+const RANKING_PROMPT = (count: number) => `You are selecting the single best product image from ${count} video frames for ecommerce image search.

-const RANKING_PROMPT_CONTAINER = (count: number) => `You are selecting ONE frame from ${count} video frames to use as the query image for an ecommerce reverse-image search.
+The frames are numbered 0 to ${count - 1} in the order shown.

-The hero product is a CONTAINER / RACK / HOLDER / ORGANIZER.
-
-CRITICAL CONSTRAINT — read this first:
-Image search engines identify objects by visual appearance. If the container holds items (shoes, clothes, etc.), the search engine will match those ITEMS, not the container — returning completely wrong products.
-
-YOUR ONLY JOB: find the frame where the container structure itself is most visible with the FEWEST or NO items inside.
-
-ABSOLUTE PRIORITY ORDER (do not deviate):
-1. Frame with container completely EMPTY — highest priority regardless of angle or assembly state
-2. Frame with container partially assembled or partially visible but EMPTY — still better than any loaded frame
-3. Frame with fewest items inside (1-2 items, mostly empty)
-4. Frame with moderate load — only if no emptier option exists
-5. Frame fully loaded — last resort only if no other frames exist
-
-A frame showing the rack mid-assembly with zero items is ALWAYS better than a perfectly-lit fully-assembled rack filled with shoes.
-
-Frames are numbered 0 to ${count - 1} in order shown. You MUST pick ONE.
+Pick the ONE frame where the HERO PRODUCT is:
+1. Cleanest — fewest distractions, no hands blocking it, no clutter in foreground
+2. Most complete — full product silhouette visible, no edges cropped
+3. Most isolated — product stands out from background clearly
+4. Empty/minimal load preferred — a product without contents (e.g. an empty rack) beats one stuffed with items if both show the full structure equally

 Return:
- bestFrameIndex: 0-based index of the emptiest container frame
- description: concise Chinese search query ≤12 words (container type + material + color + key feature)
- reasoning: describe how many items are visible inside the chosen frame and why it's the emptiest option
- boundingBox: tight box of the PRODUCT STRUCTURE ONLY as [x1, y1, x2, y2] normalized 0.0–1.0. Exclude any items stored inside.`;
+- bestFrameIndex: 0-based index of chosen frame
+- description: concise search query under 12 words (product type + material + color + key feature)
+- reasoning: one sentence explaining why this frame was chosen
+- boundingBox: tight bounding box of the HERO PRODUCT ONLY in the chosen frame as [x1, y1, x2, y2] normalized 0.0–1.0 (top-left origin). Exclude hands, background, and unrelated objects. The product is assumed to be near the center.`;

-const RANKING_PROMPT_GENERAL = (count: number) => `You are selecting the single best product frame from ${count} video frames for ecommerce search.
+function createVisionModel() {
+  const apiKey = process.env.VISION_API_KEY;
+  if (!apiKey) throw new Error('VISION_API_KEY not set');

-Frames are numbered 0 to ${count - 1} in order shown.
-
-IMPORTANT: You MUST pick ONE frame — even if product visibility is imperfect or no frame looks ideal. Always make your best guess.
-
-Pick the frame where the MAIN SELLING PRODUCT is:
-1. Most recognizable — clearest view of the item being sold
-2. Most complete — full product silhouette visible, not cropped at edges
-3. Cleanest — minimal obstruction (hands, clutter, motion blur, labels)
-4. Best lit and in focus
-
-Return:
- bestFrameIndex: 0-based index
- description: concise search query under 12 words (product type + material + color + key features), in Chinese
- reasoning: one sentence explaining choice
- boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.0–1.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;
-
-function createVisionModel(config: VisionConfig) {
-  const sessionId = config.sessionId || '';
-  const originFetch = globalThis.fetch;
-  const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
-    if (init?.body && typeof init.body === 'string') {
-      try {
-        const body = JSON.parse(init.body);
-        if (!body.metadata) body.metadata = {};
-        if (!body.metadata.session_id) body.metadata.session_id = sessionId;
-        body.metadata.tags = ['skill:video-product-snapshot'];
-        init = { ...init, body: JSON.stringify(body) };
-      } catch {}
-    }
-    return originFetch(input, init);
-  };
  const provider = createOpenAI({
-    apiKey: config.apiKey, baseURL: config.baseURL,
-    fetch: wrapped as typeof globalThis.fetch,
+    apiKey,
+    baseURL: process.env.VISION_API_BASE,
  });
-  return provider(config.model);
+
+  return provider(process.env.VISION_MODEL ?? 'gpt-4o-mini');
 }

 async function filterFrame(
@ -120,52 +76,15 @@ async function filterFrame(
  return object.keep;
 }

-
-async function isContainerProduct(
-  firstFrame: ExtractedFrame,
-  model: ReturnType<ReturnType<typeof createOpenAI>>,
-): Promise<boolean> {
-  try {
-    const { text } = await generateText({
-      model,
-      messages: [{
-        role: 'user',
-        content: [
-          { type: 'image', image: `data:image/jpeg;base64,${imageToBase64(firstFrame.imagePath)}` },
-          { type: 'text', text: CONTAINER_CHECK_PROMPT },
-        ],
-      }],
-      maxTokens: 5,
-    });
-    return text.trim().toUpperCase().startsWith('Y');
-  } catch {
-    return false;
-  }
-}
-
-
-function takeEarliestFrames(candidates: ExtractedFrame[], fraction: number = 0.4): ExtractedFrame[] {
-  // Ecommerce videos show the container empty/unboxing early, then full.
-  // Taking the first 40% of frames reliably captures empty states.
-  const sorted = [...candidates].sort((a, b) => a.frameIndex - b.frameIndex);
-  const cutoff = Math.max(1, Math.ceil(sorted.length * fraction));
-  return sorted.slice(0, cutoff);
-}
-
 async function rankCandidates(
  candidates: ExtractedFrame[],
  model: ReturnType<ReturnType<typeof createOpenAI>>,
-  isContainer: boolean,
 ): Promise<{ bestFrame: ExtractedFrame; description: string; reasoning: string; boundingBox: [number, number, number, number] }> {
  const imageContent = candidates.map((f) => ({
    type: 'image' as const,
    image: `data:image/jpeg;base64,${imageToBase64(f.imagePath)}`,
  }));

-  const prompt = isContainer
-    ? RANKING_PROMPT_CONTAINER(candidates.length)
-    : RANKING_PROMPT_GENERAL(candidates.length);
-
  const { object } = await generateObject({
    model,
    schema: RankingSchema,
@ -174,7 +93,7 @@ async function rankCandidates(
      role: 'user',
      content: [
        ...imageContent,
-        { type: 'text', text: prompt },
+        { type: 'text', text: RANKING_PROMPT(candidates.length) },
      ],
    }],
  });
@ -201,17 +120,7 @@ export async function cropProduct(

  let [x1, y1, x2, y2] = boundingBox;

-  // Normalize coords: ensure x1<x2 and y1<y2
-  if (x1 > x2) [x1, x2] = [x2, x1];
-  if (y1 > y2) [y1, y2] = [y2, y1];
-
-  // Clamp to [0, 1]
-  x1 = Math.max(0, Math.min(1, x1));
-  y1 = Math.max(0, Math.min(1, y1));
-  x2 = Math.max(0, Math.min(1, x2));
-  y2 = Math.max(0, Math.min(1, y2));
-
-  // Add padding
+  // add padding
  const pw = (x2 - x1) * paddingFactor;
  const ph = (y2 - y1) * paddingFactor;
  x1 = Math.max(0, x1 - pw);
@ -219,11 +128,6 @@ export async function cropProduct(
  x2 = Math.min(1, x2 + pw);
  y2 = Math.min(1, y2 + ph);

-  // Validate minimum area
-  if (x2 - x1 < 0.005 || y2 - y1 < 0.005) {
-    throw new Error('bounding box too small after normalization');
-  }
-
  const left = Math.round(x1 * W);
  const top = Math.round(y1 * H);
  const width = Math.round((x2 - x1) * W);
@ -237,203 +141,39 @@ export async function cropProduct(
  return outputPath;
 }

-async function withConcurrency<T>(
-  tasks: (() => Promise<T>)[],
-  limit: number,
-): Promise<T[]> {
-  const results: T[] = new Array(tasks.length);
-  let next = 0;
-  async function worker() {
-    while (next < tasks.length) {
-      const i = next++;
-      results[i] = await tasks[i]();
-    }
-  }
-  await Promise.all(Array.from({ length: Math.min(limit, tasks.length) }, worker));
-  return results;
-}
-
-// ── Frame quality pre-filtering ──────────────────────────────────────
-
-interface FrameQuality {
-  valid: boolean;
-  meanBrightness: number;
-  variance: number;
-}
-
-async function assessFrameQuality(imagePath: string): Promise<FrameQuality> {
-  const sharp = (await import('sharp')).default;
-  const { data, info } = await sharp(imagePath)
-    .grayscale()
-    .raw()
-    .toBuffer({ resolveWithObject: true });
-
-  const pixels = new Uint8Array(data);
-  let sum = 0;
-  let sumSq = 0;
-  for (let i = 0; i < pixels.length; i++) {
-    sum += pixels[i];
-    sumSq += pixels[i] * pixels[i];
-  }
-  const mean = sum / pixels.length;
-  const variance = sumSq / pixels.length - mean * mean;
-
-  // Skip near-black, near-white, or very low variance (blurry/blank/transition)
-  const valid = mean > 15 && mean < 240 && variance > 50;
-  return { valid, meanBrightness: mean, variance };
-}
-
-async function filterQualityFrames(frames: ExtractedFrame[]): Promise<ExtractedFrame[]> {
-  const results = await Promise.all(
-    frames.map(async (frame) => {
-      try {
-        const q = await assessFrameQuality(frame.imagePath);
-        return { frame, valid: q.valid };
-      } catch {
-        return { frame, valid: true };
-      }
-    }),
-  );
-  const valid = results.filter(r => r.valid).map(r => r.frame);
-  return valid.length > 0 ? valid : frames;
-}
-
-function isValidBoundingBox(bbox: [number, number, number, number]): boolean {
-  const [x1, y1, x2, y2] = bbox;
-  return (
-    x1 >= 0 && x1 <= 1 &&
-    y1 >= 0 && y1 <= 1 &&
-    x2 >= 0 && x2 <= 1 &&
-    y2 >= 0 && y2 <= 1 &&
-    x1 < x2 &&
-    y1 < y2 &&
-    (x2 - x1) * (y2 - y1) > 0.005
-  );
-}
-
-// Skips Pass 1 filter entirely — ranks all frames and always returns the best one.
-// Evenly samples down to maxCandidates when there are too many frames.
-export async function detectBestFrame(
-  frames: ExtractedFrame[],
-  visionConfig: VisionConfig,
-  maxCandidates: number = 20,
-): Promise<ProductFrame | null> {
-  if (frames.length === 0) return null;
-
-  // 1. Filter out obviously bad frames (black, white, blurry)
-  let candidates = await filterQualityFrames(frames);
-
-  // 2. Sample if too many
-  if (candidates.length > maxCandidates) {
-    const step = candidates.length / maxCandidates;
-    candidates = Array.from({ length: maxCandidates }, (_, i) => candidates[Math.floor(i * step)]);
-  }
-
-  const model = createVisionModel(visionConfig);
-
-  // 3. Check if product is a container/rack type (use first candidate frame)
-  const container = await isContainerProduct(candidates[0], model);
-
-  // 4. For containers: restrict ranking to earliest frames (empty/unboxing phase)
-  if (container) {
-    const early = takeEarliestFrames(candidates);
-    if (early.length > 0) candidates = early;
-  }
-
-  // 5. Try Vision ranking with error isolation
-  try {
-    const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model, container);
-
-    if (isValidBoundingBox(boundingBox)) {
-      const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
-      try {
-        await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
-      } catch {
-        // cropping is optional — keep original frame
-      }
-      return {
-        frameIndex: bestFrame.frameIndex,
-        timestampSeconds: bestFrame.timestampSeconds,
-        imagePath: bestFrame.imagePath,
-        ...(croppedPath ? { croppedImagePath: croppedPath } : {}),
-        confidence: 0.95,
-        description,
-        boundingHint: reasoning,
-      };
-    }
-  } catch {
-    // Vision ranking failed — fall through to fallback
-  }
-
-  // 4. Fallback: rank by frame quality (variance) and return the sharpest
-  const withQuality = await Promise.all(
-    candidates.map(async (f) => {
-      try {
-        const q = await assessFrameQuality(f.imagePath);
-        return { frame: f, score: q.variance };
-      } catch {
-        return { frame: f, score: 0 };
-      }
-    }),
-  );
-  withQuality.sort((a, b) => b.score - a.score);
-  const best = withQuality[0].frame;
-
-  return {
-    frameIndex: best.frameIndex,
-    timestampSeconds: best.timestampSeconds,
-    imagePath: best.imagePath,
-    confidence: 0.5,
-    description: 'product frame (auto-selected)',
-    boundingHint: 'picked by frame quality analysis (Vision ranking failed)',
-  };
-}
-
 export async function detectProductFrames(
  frames: ExtractedFrame[],
  minConfidence: number,
-  concurrency: number = 10,
-  visionConfig: VisionConfig,
+  concurrency: number = 5,
 ): Promise<ProductFrame[]> {
-  const model = createVisionModel(visionConfig);
+  const model = createVisionModel();

-  // Pass 1: all frames in parallel, bounded by concurrency
-  const keepFlags = await withConcurrency(
-    frames.map((f) => () => filterFrame(f, model).catch(() => false)),
-    concurrency,
-  );
+  // Pass 1: parallel filter — discard junk frames
+  const keepFlags: boolean[] = [];
+  for (let i = 0; i < frames.length; i += concurrency) {
+    const chunk = frames.slice(i, i + concurrency);
+    const flags = await Promise.all(
+      chunk.map((f) => filterFrame(f, model).catch(() => false))
+    );
+    keepFlags.push(...flags);
+  }

  const candidates = frames.filter((_, i) => keepFlags[i]);
  if (candidates.length === 0) return [];

  // Pass 2: single comparative call — model sees all candidates at once
-  const container = await isContainerProduct(candidates[0], model);
-  let bestSnapshot: ProductFrame | undefined;
-  try {
-    const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model, container);
+  const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);

-    if (isValidBoundingBox(boundingBox)) {
-      const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
-      try {
-        await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
-      } catch {}
-      bestSnapshot = {
-        frameIndex: bestFrame.frameIndex,
-        timestampSeconds: bestFrame.timestampSeconds,
-        imagePath: bestFrame.imagePath,
-        ...(croppedPath ? { croppedImagePath: croppedPath } : {}),
-        confidence: 0.95,
-        description,
-        boundingHint: reasoning,
-      };
-    }
-  } catch {
-    // ranking failed
-  }
+  const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
+  await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);

-  if (!bestSnapshot) {
-    return [];
-  }
-
-  return [bestSnapshot];
+  return [{
+    frameIndex: bestFrame.frameIndex,
+    timestampSeconds: bestFrame.timestampSeconds,
+    imagePath: bestFrame.imagePath,
+    croppedImagePath: croppedPath,
+    confidence: 0.95,
+    description,
+    boundingHint: reasoning,
+  }];
 }
--- a/src/types.ts
+++ b/src/types.ts
@ -1,13 +1,4 @@
-export type Command =
-  | 'detect'
-  | 'search'
-  | 'detect-and-search'
-  | 'detect-best'
-  | 'detect-best-and-search'
-  | 'detect-video'
-  | 'detect-video-and-search'
-  | 'rerank'
-  | 'session';
+export type Command = 'detect' | 'search' | 'detect-and-search' | 'rerank' | 'session';

 export interface SearchItem {
  num_iid: number;
@ -20,30 +11,6 @@ export interface SearchItem {
  detail_url: string;
 }

-export interface DetectVideoResult {
-  status: 'success' | 'failed';
-  command: 'detect-video';
-  dryRun: boolean;
-  videoPath?: string;
-  videoUrl?: string | null;
-  description?: string;
-  keyword?: string;
-  snapshotImagePath?: string;
-  error?: string;
-}
-
-export interface DetectVideoAndSearchResult {
-  status: 'success' | 'failed';
-  command: 'detect-video-and-search';
-  dryRun: boolean;
-  videoPath?: string;
-  videoUrl?: string | null;
-  description?: string;
-  keyword?: string;
-  searchResults?: SearchItem[];
-  error?: string;
-}
-
 export interface DetectOptions {
  videoPath: string;
  intervalSeconds: number;
@ -84,4 +51,4 @@ export interface SearchResult {
  error?: string;
 }

-export type OutputResult = DetectResult | SearchResult | DetectVideoResult | DetectVideoAndSearchResult;
+export type OutputResult = DetectResult | SearchResult;