Compare commits

...

6 Commits
v0.1.4 ... main

Author SHA1 Message Date
ywkj c5e1d0c88c fix: rerank top-N 10→5, 匹配 Feishu 列表展示
register-skill-release / register (push) Successful in 17s Details
2026-04-26 20:34:57 +08:00
ywkj eb8e7a7daf fix: 同步 auth-cli.ts 补充 clientConfig() 方法
register-skill-release / register (push) Successful in 19s Details
2026-04-26 20:15:08 +08:00
ywkj b6dc7af9bf chore: tweak README
register-skill-release / register (push) Successful in 18s Details
2026-04-26 20:08:10 +08:00
ywkj 7a43eb391b docs: 更新 README 反映当前架构
register-skill-release / register (push) Successful in 18s Details
- 补充 detect-best-and-search、detect-best、rerank 命令
- 更新鉴权架构说明(auth-rt 统一鉴权)
- 补充 sessionId 和 Langfuse 追踪说明
- 更新环境变量表

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 19:57:03 +08:00
ywkj d497e92626 fix: 用 fetch wrapper 注入 metadata.session_id 替代 HTTP header
register-skill-release / register (push) Successful in 24s Details
LiteLLM 不处理 x-langfuse-session-id header。改用 fetch 拦截器在请求体
metadata 里注入 session_id,LiteLLM 直接透传给 Langfuse 创建 session。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 19:45:08 +08:00
ywkj a6b4f99a83 refactor: session ID 统一由 auth-cli.ts 生成
register-skill-release / register (push) Successful in 21s Details
- scripts/run.ts: 移除重复的 session ID 自动生成逻辑
- src/auth-cli.ts: 同步自 auth-runtime(模块级 SKILL_SESSION_ID)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-26 19:01:10 +08:00
7 changed files with 127 additions and 91 deletions

112
README.md
View File

@ -1,19 +1,22 @@
# video-product-snapshot — 视频商品 # video-product-snapshot — 视频商品以图搜
检测视频中的电商商品,提取最佳商品画面,并通过图片搜索在 1688 找同款。 从视频中提取最佳商品帧,以图搜图在 1688 找同款。
## 工作原理 ## 工作原理
1. 使用 `ffmpeg` 按配置间隔从视频抽帧 1. `ffmpeg` 按 0.5s 间隔抽帧(最多 60 帧)
2. 将每帧发给视觉模型,检测是否有商品并评分 2. 视觉质量预过滤(亮度/方差剔除模糊帧)
3. 选出置信度最高的帧作为最佳商品截图 3. 容器/架子类产品检测 → 自动选择空载帧
4. 可选:用这张截图调用图片搜索 API 找同款商品 4. 视觉模型多帧对比排序,选出最佳商品帧
5. 裁剪商品区域 → 上传 → 1688 图搜
6. 后置过滤(视觉模型判断结果是否同款)→ rerank 排序
## 安装 ## 安装
```bash ```bash
./install.sh # 安装 auth-rt + 依赖
bun install bun install
bun run build # 输出到 dist/run.js bun run build # 输出到 dist/run.js
``` ```
## 使用方法 ## 使用方法
@ -26,77 +29,74 @@ bun dist/run.js <command> [options]
| 命令 | 说明 | | 命令 | 说明 |
|------|------| |------|------|
| `detect <video>` | 抽帧并检测商品画面 | | `detect-best-and-search <video>` | **推荐。** 最佳帧 → 图搜 → rerank |
| `search <image>` | 用图片搜同款 | | `detect-best <video>` | 只提取最佳商品帧,不搜图 |
| `detect-and-search <video>` | 完整流程:检测最佳画面 → 搜图 | | `detect-and-search <video>` | 两阶段过滤后图搜(较慢) |
| `session` | 打印当前认证 session token | | `detect <video>` | 抽帧并逐帧检测商品 |
| `search <image>` | 用已有图片搜同款 |
| `rerank` | 关键词对图搜结果交叉过滤 |
| `session` | 获取当前认证会话 token |
### 选项(`detect` / `detect-and-search` ### 选项(`detect-best` / `detect-best-and-search`
| 参数 | 默认值 | 说明 | | 参数 | 默认值 | 说明 |
|------|--------|------| |------|--------|------|
| `--interval=<秒>` | `1` | 抽帧间隔(秒) | | `--interval=<秒>` | `0.5` | 帧采样间隔 |
| `--max-frames=<数量>` | `60` | 最多分析帧数 | | `--max-frames=<n>` | `60` | 最大分析帧数 |
| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 | | `--output-dir=<目录>` | 视频同目录 | 截图保存目录 |
| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 | | `--session-id=<id>` | 自动生成 | Langfuse session ID |
| `--dry-run` | — | 解析参数并打印配置,不实际执行 | | `--dry-run` | — | 解析参数,不实际执行 |
### 示例
```bash
# 检测商品,每 3 秒抽一帧
bun dist/run.js detect ./demo.mp4 --interval=3
# 完整流程 + 更高置信度门槛
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# 用已有截图搜同款
bun dist/run.js search ./snapshot.jpg
```
## 输出 ## 输出
所有命令输出 JSON 到 stdout。 所有命令输出 JSON 到 stdout包含 `sessionId` 字段用于 Langfuse 追踪。
```json ```json
{ {
"sessionId": "skill-20260426-184345-lb06",
"status": "success",
"command": "detect-best-and-search",
"bestSnapshot": { "bestSnapshot": {
"frameIndex": 4, "frameIndex": 7,
"timestampSeconds": 9, "timestampSeconds": 3,
"imagePath": "/path/to/frame_0004.jpg", "imagePath": "/path/to/frame_0007.jpg",
"confidence": 0.92, "croppedImagePath": "/path/to/frame_0007_cropped.jpg",
"description": "White sneaker with blue logo, left side view", "description": "黑色金属床底鞋架 可折叠移动"
"boundingHint": "centered"
}, },
"productFrames": [...], "rerank": {
"searchBody": { ... } "keyword": "床底鞋架",
"results": [
{ "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
]
}
} }
``` ```
- `productFrames` — 所有检测到的画面,按置信度排序(最高在前) ## 鉴权架构
- `bestSnapshot` — 排名第一的画面
- `searchBody` — 图片搜索 API 的返回(仅 `search` / `detect-and-search` ```
~/.openclaw/.env
CLIENT_KEY ──→ auth-rt ──→ 业务系统
├── /session → access_token
└── /client-config → provider.api_key
provider.base_url
provider.model
```
仅需配置 `CLIENT_KEY`LLM 凭据和端点均由业务系统下发。
## 环境变量 ## 环境变量
唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`
```
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
```
所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取,无需额外配置。
### 可选覆盖
| 变量 | 说明 | | 变量 | 说明 |
|------|------| |------|------|
| `VISION_MODEL` | 覆盖模型名称(默认:`aliyun-cp-multimodal` | | `CLIENT_KEY` | **必需。**`~/.openclaw/.env` 中配置 |
| `VISION_MODEL` | 覆盖模型名称(默认来自 client config |
| `SKILL_SESSION_ID` | Langfuse session ID自动生成格式 `skill-YYYYMMDD-HHMMSS-xxxx` |
| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 | | `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 | | `TELEMETRY_ENDPOINT` | 遥测上报接口 |
## 前置依赖 ## 前置依赖
- [Bun](https://bun.sh) 运行时 - [Bun](https://bun.sh) 运行时
- 系统 PATH 中包含 `ffmpeg` `ffprobe` - 系统 PATH 中包含 `ffmpeg` / `ffprobe`(帧提取)
- 系统 PATH 中包含 `auth-rt` CLI`search` / `detect-and-search` 需要 - `auth-rt` CLI鉴权/API 调用,`install.sh` 自动安装

View File

@ -67,7 +67,7 @@ bun dist/run.js <command> [args] [--dry-run]
## 结果展示格式 ## 结果展示格式
`rerank.results`(优先)或 `searchBody.data.items.item` 格式化为 markdown 表格,**每页 5 行** `rerank.results`(优先)或 `searchBody.data.items.item` 格式化为 markdown 表格,**最多 5 条**
| # | 商品名称 | 价格 | 销量 | 链接 | | # | 商品名称 | 价格 | 销量 | 链接 |
|---|----------|------|------|------| |---|----------|------|------|------|

View File

@ -93,17 +93,7 @@ async function main(): Promise<void> {
if (positionals.length < 1) { printUsage(); process.exit(1); } if (positionals.length < 1) { printUsage(); process.exit(1); }
const command = positionals[0] as Command; const command = positionals[0] as Command;
const sessionId = process.env.SKILL_SESSION_ID!; // set by auth-cli.ts at module load
// Resolve session ID: explicit CLI arg > env > auto-generate structured ID
const sessionId = process.env.SKILL_SESSION_ID || (() => {
const ts = new Date();
const pad = (n: number) => String(n).padStart(2, '0');
const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
const rand = Math.random().toString(36).slice(2, 6);
return `vps-${tsPart}-${rand}`;
})();
process.env.SKILL_SESSION_ID = sessionId;
const startMs = Date.now(); const startMs = Date.now();
let result: Awaited<ReturnType<typeof run>>; let result: Awaited<ReturnType<typeof run>>;

View File

@ -20,6 +20,18 @@ import * as path from 'path';
import * as os from 'os'; import * as os from 'os';
const home = process.env.HOME || os.homedir(); const home = process.env.HOME || os.homedir();
// ── session ID (Langfuse tracing) ──
// Priority: SKILL_SESSION_ID env > auto-generate
const SESSION_ID = process.env.SKILL_SESSION_ID || (() => {
const ts = new Date();
const pad = (n: number) => String(n).padStart(2, '0');
const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
const rand = Math.random().toString(36).slice(2, 6);
return `skill-${tsPart}-${rand}`;
})();
process.env.SKILL_SESSION_ID = SESSION_ID;
const AUTH_RT_BIN = process.env.AUTH_RT_BIN const AUTH_RT_BIN = process.env.AUTH_RT_BIN
|| (() => { || (() => {
// Check if auth-rt is in PATH // Check if auth-rt is in PATH

View File

@ -212,7 +212,7 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
// Otherwise fall back to the keyword-intersection rerank. // Otherwise fall back to the keyword-intersection rerank.
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) { if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? []; const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 10); const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
rerankResult = { rerankResult = {
source: 'post-filter', source: 'post-filter',
results: sorted, results: sorted,
@ -225,7 +225,7 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
rerankResult = await runRerank([ rerankResult = await runRerank([
`--image-results=${tmpFile}`, `--image-results=${tmpFile}`,
`--description=${best.description}`, `--description=${best.description}`,
'--top=10', '--top=5',
], dryRun); ], dryRun);
} catch (e: any) { } catch (e: any) {
rerankResult = { error: e.message }; rerankResult = { error: e.message };
@ -360,7 +360,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
// Otherwise fall back to the keyword-intersection rerank. // Otherwise fall back to the keyword-intersection rerank.
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) { if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? []; const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 10); const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
rerankResult = { rerankResult = {
source: 'post-filter', source: 'post-filter',
results: sorted, results: sorted,
@ -373,7 +373,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
rerankResult = await runRerank([ rerankResult = await runRerank([
`--image-results=${tmpFile}`, `--image-results=${tmpFile}`,
`--description=${best.description}`, `--description=${best.description}`,
'--top=10', '--top=5',
], dryRun); ], dryRun);
} catch (e: any) { } catch (e: any) {
rerankResult = { error: e.message }; rerankResult = { error: e.message };
@ -417,13 +417,25 @@ function getFlag(args: string[], flag: string): string | undefined {
} }
function createVisionModel(config: VisionConfig) { function createVisionModel(config: VisionConfig) {
const headers: Record<string, string> = { const sessionId = config.sessionId || '';
'x-langfuse-tags': 'skill:video-product-snapshot', const originFetch = globalThis.fetch;
// Inject metadata.session_id into request body so LiteLLM → Langfuse creates sessions
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
if (init?.body && typeof init.body === 'string') {
try {
const body = JSON.parse(init.body);
if (!body.metadata) body.metadata = {};
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
body.metadata.tags = ['skill:video-product-snapshot'];
init = { ...init, body: JSON.stringify(body) };
} catch {}
}
return originFetch(input, init);
}; };
if (config.sessionId) { const openai = createOpenAI({
headers['x-langfuse-session-id'] = config.sessionId; apiKey: config.apiKey, baseURL: config.baseURL,
} fetch: wrapped as typeof globalThis.fetch,
const openai = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers }); });
return openai(config.model); return openai(config.model);
} }
@ -481,7 +493,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
const positionals = args.filter((a) => !a.startsWith('--')); const positionals = args.filter((a) => !a.startsWith('--'));
const imageResultsArg = getFlag(args, '--image-results') || positionals[0]; const imageResultsArg = getFlag(args, '--image-results') || positionals[0];
const keywordArg = getFlag(args, '--keyword') || positionals[1]; const keywordArg = getFlag(args, '--keyword') || positionals[1];
const topN = parseInt(getFlag(args, '--top') || '10', 10); const topN = parseInt(getFlag(args, '--top') || '5', 10);
const description = getFlag(args, '--description') || ''; const description = getFlag(args, '--description') || '';

View File

@ -34,13 +34,24 @@ const FILTER_PROMPT = (count: number, description?: string) => {
}; };
function createModel(config: VisionConfig) { function createModel(config: VisionConfig) {
const headers: Record<string, string> = { const sessionId = config.sessionId || '';
'x-langfuse-tags': 'skill:video-product-snapshot', const originFetch = globalThis.fetch;
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
if (init?.body && typeof init.body === 'string') {
try {
const body = JSON.parse(init.body);
if (!body.metadata) body.metadata = {};
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
body.metadata.tags = ['skill:video-product-snapshot'];
init = { ...init, body: JSON.stringify(body) };
} catch {}
}
return originFetch(input, init);
}; };
if (config.sessionId) { const provider = createOpenAI({
headers['x-langfuse-session-id'] = config.sessionId; apiKey: config.apiKey, baseURL: config.baseURL,
} fetch: wrapped as typeof globalThis.fetch,
const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers }); });
return provider(config.model); return provider(config.model);
} }

View File

@ -78,13 +78,24 @@ Return:
- boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.01.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`; - boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.01.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;
function createVisionModel(config: VisionConfig) { function createVisionModel(config: VisionConfig) {
const headers: Record<string, string> = { const sessionId = config.sessionId || '';
'x-langfuse-tags': 'skill:video-product-snapshot', const originFetch = globalThis.fetch;
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
if (init?.body && typeof init.body === 'string') {
try {
const body = JSON.parse(init.body);
if (!body.metadata) body.metadata = {};
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
body.metadata.tags = ['skill:video-product-snapshot'];
init = { ...init, body: JSON.stringify(body) };
} catch {}
}
return originFetch(input, init);
}; };
if (config.sessionId) { const provider = createOpenAI({
headers['x-langfuse-session-id'] = config.sessionId; apiKey: config.apiKey, baseURL: config.baseURL,
} fetch: wrapped as typeof globalThis.fetch,
const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers }); });
return provider(config.model); return provider(config.model);
} }