Compare commits

..

No commits in common. "main" and "v.0.1.1" have entirely different histories.

9 changed files with 299 additions and 561 deletions

112
README.md
View File

@ -1,22 +1,19 @@
# video-product-snapshot — 视频商品以图搜 # video-product-snapshot — 视频商品
从视频中提取最佳商品帧,以图搜图在 1688 找同款。 检测视频中的电商商品,提取最佳商品画面,并通过图片搜索在 1688 找同款。
## 工作原理 ## 工作原理
1. `ffmpeg` 按 0.5s 间隔抽帧(最多 60 帧) 1. 使用 `ffmpeg` 按配置间隔从视频抽帧
2. 视觉质量预过滤(亮度/方差剔除模糊帧) 2. 将每帧发给视觉模型,检测是否有商品并评分
3. 容器/架子类产品检测 → 自动选择空载帧 3. 选出置信度最高的帧作为最佳商品截图
4. 视觉模型多帧对比排序,选出最佳商品帧 4. 可选:用这张截图调用图片搜索 API 找同款商品
5. 裁剪商品区域 → 上传 → 1688 图搜
6. 后置过滤(视觉模型判断结果是否同款)→ rerank 排序
## 安装 ## 安装
```bash ```bash
./install.sh # 安装 auth-rt + 依赖
bun install bun install
bun run build # 输出到 dist/run.js bun run build # 输出到 dist/run.js
``` ```
## 使用方法 ## 使用方法
@ -29,74 +26,77 @@ bun dist/run.js <command> [options]
| 命令 | 说明 | | 命令 | 说明 |
|------|------| |------|------|
| `detect-best-and-search <video>` | **推荐。** 最佳帧 → 图搜 → rerank | | `detect <video>` | 抽帧并检测商品画面 |
| `detect-best <video>` | 只提取最佳商品帧,不搜图 | | `search <image>` | 用图片搜同款 |
| `detect-and-search <video>` | 两阶段过滤后图搜(较慢) | | `detect-and-search <video>` | 完整流程:检测最佳画面 → 搜图 |
| `detect <video>` | 抽帧并逐帧检测商品 | | `session` | 打印当前认证 session token |
| `search <image>` | 用已有图片搜同款 |
| `rerank` | 关键词对图搜结果交叉过滤 |
| `session` | 获取当前认证会话 token |
### 选项(`detect-best` / `detect-best-and-search` ### 选项(`detect` / `detect-and-search`
| 参数 | 默认值 | 说明 | | 参数 | 默认值 | 说明 |
|------|--------|------| |------|--------|------|
| `--interval=<秒>` | `0.5` | 帧采样间隔 | | `--interval=<秒>` | `1` | 抽帧间隔(秒) |
| `--max-frames=<n>` | `60` | 最大分析帧数 | | `--max-frames=<数量>` | `60` | 最多分析帧数 |
| `--output-dir=<目录>` | 视频同目录 | 截图保存目录 | | `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
| `--session-id=<id>` | 自动生成 | Langfuse session ID | | `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
| `--dry-run` | — | 解析参数,不实际执行 | | `--dry-run` | — | 解析参数并打印配置,不实际执行 |
### 示例
```bash
# 检测商品,每 3 秒抽一帧
bun dist/run.js detect ./demo.mp4 --interval=3
# 完整流程 + 更高置信度门槛
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
# 用已有截图搜同款
bun dist/run.js search ./snapshot.jpg
```
## 输出 ## 输出
所有命令输出 JSON 到 stdout包含 `sessionId` 字段用于 Langfuse 追踪。 所有命令输出 JSON 到 stdout。
```json ```json
{ {
"sessionId": "skill-20260426-184345-lb06",
"status": "success",
"command": "detect-best-and-search",
"bestSnapshot": { "bestSnapshot": {
"frameIndex": 7, "frameIndex": 4,
"timestampSeconds": 3, "timestampSeconds": 9,
"imagePath": "/path/to/frame_0007.jpg", "imagePath": "/path/to/frame_0004.jpg",
"croppedImagePath": "/path/to/frame_0007_cropped.jpg", "confidence": 0.92,
"description": "黑色金属床底鞋架 可折叠移动" "description": "White sneaker with blue logo, left side view",
"boundingHint": "centered"
}, },
"rerank": { "productFrames": [...],
"keyword": "床底鞋架", "searchBody": { ... }
"results": [
{ "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
]
}
} }
``` ```
## 鉴权架构 - `productFrames` — 所有检测到的画面,按置信度排序(最高在前)
- `bestSnapshot` — 排名第一的画面
``` - `searchBody` — 图片搜索 API 的返回(仅 `search` / `detect-and-search`
~/.openclaw/.env
CLIENT_KEY ──→ auth-rt ──→ 业务系统
├── /session → access_token
└── /client-config → provider.api_key
provider.base_url
provider.model
```
仅需配置 `CLIENT_KEY`LLM 凭据和端点均由业务系统下发。
## 环境变量 ## 环境变量
唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`
```
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
```
所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取,无需额外配置。
### 可选覆盖
| 变量 | 说明 | | 变量 | 说明 |
|------|------| |------|------|
| `CLIENT_KEY` | **必需。**`~/.openclaw/.env` 中配置 | | `VISION_MODEL` | 覆盖模型名称(默认:`aliyun-cp-multimodal` |
| `VISION_MODEL` | 覆盖模型名称(默认来自 client config |
| `SKILL_SESSION_ID` | Langfuse session ID自动生成格式 `skill-YYYYMMDD-HHMMSS-xxxx` |
| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 | | `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
| `TELEMETRY_ENDPOINT` | 遥测上报接口 | | `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |
## 前置依赖 ## 前置依赖
- [Bun](https://bun.sh) 运行时 - [Bun](https://bun.sh) 运行时
- 系统 PATH 中包含 `ffmpeg` / `ffprobe`(帧提取) - 系统 PATH 中包含 `ffmpeg` `ffprobe`
- `auth-rt` CLI鉴权/API 调用,`install.sh` 自动安装 - 系统 PATH 中包含 `auth-rt` CLI`search` / `detect-and-search` 需要

View File

@ -1,11 +1,11 @@
--- ---
name: video-product-snapshot name: video-product-snapshot
description: "Extract product snapshot from video and search 1688 by image. / 从视频中提取最佳商品帧以图搜图在1688找同款。当用户提供视频想找商品时使用。" description: "Upload video to API for product analysis and 1688 keyword search. / 上传视频直接识别商品并在1688搜索同款。当用户提供视频想找商品时使用。"
--- ---
# Video Product Snapshot — 视频商品以图搜 # Video Product Snapshot — 视频商品
从视频中截取最清晰的商品帧(容器类产品自动选空载帧),上传图片在 1688 以图搜图找同款 上传视频到 API由多模态模型识别商品主体生成中文关键词在 1688 上搜索找到同款商品
## 运行 ## 运行
@ -17,61 +17,49 @@ bun dist/run.js <command> [args] [--dry-run]
| 命令 | 使用场景 | | 命令 | 使用场景 |
|------|---------| |------|---------|
| `detect-best-and-search <video>` | **推荐。** 提取最佳商品帧 → 图搜 → rerank 返回结果。 | | `detect-video-and-search <video>` | **推荐。** 上传视频到 API 识别商品,然后 1688 关键词搜索。 |
| `detect-best <video>` | 只提取最佳商品帧,不搜图。 | | `detect-video <video>` | 只识别商品描述和生成关键词,不搜图。 |
| `detect-and-search <video>` | 两阶段过滤后图搜(比 detect-best 慢)。 | | `search <image-path>` | 已经有商品截图了,跳过检测直接搜图。 |
| `search <image-path>` | 已有商品图,直接图搜。 |
| `rerank` | 用关键词对图搜结果交叉过滤。 |
| `session` | 获取当前认证会话 token。 | | `session` | 获取当前认证会话 token。 |
## 主命令:`detect-best-and-search` ## `detect-video` / `detect-video-and-search`
上传视频到 API 直接识别商品主体。
流程: 流程:
1. ffmpeg 按 0.5s 间隔提取帧(最多 60 帧) 1. 上传视频 → 获取公开 URL复用现有上传接口
2. 视觉模型检测是否为容器/架子类产品 2. 调用 LiteLLMChat Completions + `video_url`)分析视频内容
3. 容器类:只从前 40% 帧(空载阶段)中选最佳帧 3. 识别商品名称、材质、颜色、功能
4. 非容器类:全帧中选最清晰帧 4. 生成中文搜索关键词
5. 裁剪商品区域 5. 1688 关键词搜索(`detect-video-and-search`
6. 上传裁剪图 → 1688 图搜
7. rerank图搜结果与关键词搜索结果交叉过滤
## Options for `detect-best` / `detect-best-and-search` 依赖:
- `auth-rt` client key自动无需额外配置
| Flag | Default | Description | - LiteLLM 代理支持 `video_url` 内容类型
|------|---------|-------------| - 上传接口返回公开 URL
| `--interval=<sec>` | `0.5` | 帧采样间隔(秒) |
| `--max-frames=<n>` | `60` | 最大分析帧数 |
| `--output-dir=<dir>` | 视频同目录 | 截图保存目录 |
## 输出格式 ## 输出格式
### `detect-best-and-search` ### `detect-video-and-search`
```json ```json
{ {
"bestSnapshot": { "videoUrl": "https://...",
"frameIndex": 7, "description": "白色帆布收纳盒,带提手,可折叠",
"timestampSeconds": 3, "keyword": "帆布收纳盒",
"imagePath": "/path/to/frame_0007.jpg", "searchResults": [
"croppedImagePath": "/path/to/frame_0007_cropped.jpg", { "num_iid": 123, "title": "...", "price": "15.00", "promotion_price": "12.00", "sales": 500, "detail_url": "..." }
"description": "黑色金属床底鞋架 可折叠移动" ]
},
"rerank": {
"keyword": "床底鞋架",
"results": [
{ "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
]
}
} }
``` ```
## 结果展示格式 ## 结果展示格式
`rerank.results`(优先)或 `searchBody.data.items.item` 格式化为 markdown 表格,**最多 5 条** `searchResults` 格式化为 markdown 表格,**每页 5 行**(不足 5 行则全显示):
| # | 商品名称 | 价格 | 销量 | 链接 | | # | 商品名称 | 价格 | 销量 | 链接 |
|---|----------|------|------|------| |---|----------|------|------|------|
| 1 | {title} | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看]({detail_url}) | | 1 | {title} | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看](https://detail.1688.com/offer/{num_iid}.html) |
- 有 `promotion_price` 用促销价,否则用原价 - 有 `promotion_price` 用促销价,否则用原价
- `sales` 缺失或为零时显示 `—` - `sales` 缺失或为零时显示 `—`
@ -79,16 +67,28 @@ bun dist/run.js <command> [args] [--dry-run]
## 执行规则 ## 执行规则
1. **不要重试。** 命令失败就直接报错。 ### 视频命令(慢 — 用 sub-agent 执行)
2. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。
3. **慢命令用 sub-agent**detect-best-and-search / detect-and-search直接运行会超时。
### sub-agent 执行模板 涉及命令:`detect-video-and-search`、`detect-video`
使用 `sessions_spawn` 创建 sub-agent 执行,**不要直接运行**。
``` ```
sessions_spawn( sessions_spawn(
task: "cd /path/to/skill && Run this command and return the raw JSON output:\n\nbun dist/run.js detect-best-and-search <video-path>\n\nCopy the entire JSON output as your reply.", task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <完整命令>\n\nCopy the entire JSON output as your reply.",
label: "video-product-snapshot", label: "video-product-snapshot",
runTimeoutSeconds: 300, runTimeoutSeconds: 300,
) )
``` ```
- 通知用户处理已开始,告知 `runId`
- 等待 sub-agent 返回结果,然后解析并展示
### `search``session`(快 — 直接运行)
直接在本会话中运行,不需要 sub-agent。
### 通用规则
1. **不要重试。** 命令失败就直接报错。
2. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。

View File

@ -43,17 +43,11 @@ function printUsage(): void {
detect-and-search <video-path> [options] detect-and-search <video-path> [options]
detect-best <video-path> [options]
detect-best-and-search <video-path> [options]
detect-video <video-path> detect-video <video-path>
API
detect-video-and-search <video-path> detect-video-and-search <video-path>
1688 1688
rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>] rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>]
@ -81,8 +75,6 @@ async function main(): Promise<void> {
dryRun = true; dryRun = true;
} else if (arg.startsWith('--api-base=')) { } else if (arg.startsWith('--api-base=')) {
process.env.API_BASE = arg.slice('--api-base='.length).trim(); process.env.API_BASE = arg.slice('--api-base='.length).trim();
} else if (arg.startsWith('--session-id=')) {
process.env.SKILL_SESSION_ID = arg.slice('--session-id='.length).trim();
} else if (arg === '-h' || arg === '--help') { } else if (arg === '-h' || arg === '--help') {
printUsage(); process.exit(0); printUsage(); process.exit(0);
} else { } else {
@ -93,7 +85,6 @@ async function main(): Promise<void> {
if (positionals.length < 1) { printUsage(); process.exit(1); } if (positionals.length < 1) { printUsage(); process.exit(1); }
const command = positionals[0] as Command; const command = positionals[0] as Command;
const sessionId = process.env.SKILL_SESSION_ID!; // set by auth-cli.ts at module load
const startMs = Date.now(); const startMs = Date.now();
let result: Awaited<ReturnType<typeof run>>; let result: Awaited<ReturnType<typeof run>>;
@ -101,14 +92,13 @@ async function main(): Promise<void> {
result = await run(command, positionals.slice(1), dryRun); result = await run(command, positionals.slice(1), dryRun);
} catch (err) { } catch (err) {
const error = err instanceof Error ? err.message : String(err); const error = err instanceof Error ? err.message : String(err);
console.log(JSON.stringify({ status: 'failed', command, dryRun, sessionId, error }, null, 2)); console.log(JSON.stringify({ status: 'failed', command, dryRun, error }, null, 2));
if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, sessionId, status: 'failed', durationMs: Date.now() - startMs, error }); if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, status: 'failed', durationMs: Date.now() - startMs, error });
process.exit(1); process.exit(1);
} }
const output = { ...result, sessionId } as Record<string, unknown>; console.log(JSON.stringify(result, null, 2));
console.log(JSON.stringify(output, null, 2)); if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, status: result.status, durationMs: Date.now() - startMs, error: (result as any).error });
if (!dryRun) reportTelemetry({ skill: SKILL_NAME, command, sessionId, status: result.status, durationMs: Date.now() - startMs, error: (result as any).error });
} }
main().catch((err) => { main().catch((err) => {

View File

@ -20,18 +20,6 @@ import * as path from 'path';
import * as os from 'os'; import * as os from 'os';
const home = process.env.HOME || os.homedir(); const home = process.env.HOME || os.homedir();
// ── session ID (Langfuse tracing) ──
// Priority: SKILL_SESSION_ID env > auto-generate
const SESSION_ID = process.env.SKILL_SESSION_ID || (() => {
const ts = new Date();
const pad = (n: number) => String(n).padStart(2, '0');
const tsPart = `${ts.getFullYear()}${pad(ts.getMonth()+1)}${pad(ts.getDate())}-${pad(ts.getHours())}${pad(ts.getMinutes())}${pad(ts.getSeconds())}`;
const rand = Math.random().toString(36).slice(2, 6);
return `skill-${tsPart}-${rand}`;
})();
process.env.SKILL_SESSION_ID = SESSION_ID;
const AUTH_RT_BIN = process.env.AUTH_RT_BIN const AUTH_RT_BIN = process.env.AUTH_RT_BIN
|| (() => { || (() => {
// Check if auth-rt is in PATH // Check if auth-rt is in PATH

View File

@ -1,10 +1,11 @@
import * as fs from 'fs'; import * as fs from 'fs';
import * as path from 'path'; import * as path from 'path';
import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem, DetectVideoResult, DetectVideoAndSearchResult } from './types.ts'; import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem, DetectVideoResult } from './types.ts';
import { createSkillClient } from './auth-cli.ts'; import { createSkillClient } from './auth-cli.ts';
import { extractFrames } from './frame-extractor.ts'; import { extractFrames } from './frame-extractor.ts';
import { detectProductFrames, detectBestFrame } from './product-detector.ts'; import { detectProductFrames, detectBestFrame } from './product-detector.ts';
import { postFilterByImage } from './post-filter.ts'; import { imageToBase64 } from './frame-extractor.ts';
import { uploadVideo, analyzeVideo } from './video-analyzer.ts';
import { generateText } from 'ai'; import { generateText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai'; import { createOpenAI } from '@ai-sdk/openai';
@ -12,7 +13,6 @@ export interface VisionConfig {
apiKey: string; apiKey: string;
baseURL?: string; baseURL?: string;
model: string; model: string;
sessionId?: string;
} }
async function loadVisionConfig(client: ReturnType<typeof createSkillClient>): Promise<VisionConfig> { async function loadVisionConfig(client: ReturnType<typeof createSkillClient>): Promise<VisionConfig> {
@ -23,7 +23,6 @@ async function loadVisionConfig(client: ReturnType<typeof createSkillClient>): P
apiKey, apiKey,
baseURL: cfg.metadata?.provider?.base_url, baseURL: cfg.metadata?.provider?.base_url,
model: process.env.VISION_MODEL ?? cfg.metadata?.provider?.model ?? 'aliyun-cp-multimodal', model: process.env.VISION_MODEL ?? cfg.metadata?.provider?.model ?? 'aliyun-cp-multimodal',
sessionId: process.env.SKILL_SESSION_ID || `skill_${Date.now()}_${Math.random().toString(36).slice(2, 8)}`,
}; };
} }
@ -185,47 +184,15 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
const imageForSearch = best.croppedImagePath || best.imagePath; const imageForSearch = best.croppedImagePath || best.imagePath;
const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult; const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;
// Post-filter: drop results whose pic_url isn't the same product type as our snapshot
let postFilter: any = undefined;
if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
if (items.length > 0) {
try {
const client = createSkillClient();
const visionConfig = await loadVisionConfig(client);
const result = await postFilterByImage(imageForSearch, items, visionConfig, { description: best.description });
(searchResult.searchBody as any).data.items.item = result.kept;
postFilter = {
totalChecked: result.totalChecked,
keptCount: result.kept.length,
rejectedCount: result.rejected.length,
failed: result.failed,
};
} catch (e: any) {
postFilter = { error: e.message };
}
}
}
let rerankResult: any = undefined; let rerankResult: any = undefined;
// If post-filter produced focused results, sort them directly by sales — they're already the best matches. if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
// Otherwise fall back to the keyword-intersection rerank.
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
rerankResult = {
source: 'post-filter',
results: sorted,
count: sorted.length,
};
} else if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`); const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
try { try {
fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody)); fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
rerankResult = await runRerank([ rerankResult = await runRerank([
`--image-results=${tmpFile}`, `--image-results=${tmpFile}`,
`--description=${best.description}`, `--description=${best.description}`,
'--top=5', '--top=10',
], dryRun); ], dryRun);
} catch (e: any) { } catch (e: any) {
rerankResult = { error: e.message }; rerankResult = { error: e.message };
@ -240,87 +207,10 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
searchHttpStatus: searchResult.searchHttpStatus, searchHttpStatus: searchResult.searchHttpStatus,
searchBody: searchResult.searchBody, searchBody: searchResult.searchBody,
searchError: searchResult.error, searchError: searchResult.error,
postFilter,
rerank: rerankResult, rerank: rerankResult,
} as any; } as any;
} }
async function runDetectVideo(args: string[], dryRun: boolean): Promise<DetectVideoResult> {
const videoPath = args[0];
if (!videoPath) return { status: 'failed', command: 'detect-video', dryRun, error: 'detect-video requires <video-path>' };
if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video', dryRun, error: `video not found: ${videoPath}` };
const detectResult = await runDetectBest(args, dryRun) as DetectResult;
if (detectResult.status === 'failed') {
return { status: 'failed', command: 'detect-video', dryRun, videoPath, error: detectResult.error || 'failed to detect best frame' };
}
const description = detectResult.bestSnapshot?.description?.trim();
const snapshotImagePath = detectResult.bestSnapshot?.croppedImagePath || detectResult.bestSnapshot?.imagePath;
if (!description) {
return { status: 'failed', command: 'detect-video', dryRun, videoPath, error: 'no product description detected from video' };
}
if (dryRun) {
return { status: 'success', command: 'detect-video', dryRun, videoPath, videoUrl: null, description, keyword: '<dry-run-keyword>', snapshotImagePath };
}
const client = createSkillClient();
const visionConfig = await loadVisionConfig(client);
const keyword = await generateChineseKeyword(description, visionConfig);
return { status: 'success', command: 'detect-video', dryRun, videoPath, videoUrl: null, description, keyword, snapshotImagePath };
}
async function runDetectVideoAndSearch(args: string[], dryRun: boolean): Promise<DetectVideoAndSearchResult> {
const videoPath = args[0];
if (!videoPath) return { status: 'failed', command: 'detect-video-and-search', dryRun, error: 'detect-video-and-search requires <video-path>' };
if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video-and-search', dryRun, error: `video not found: ${videoPath}` };
if (dryRun) {
return { status: 'success', command: 'detect-video-and-search', dryRun, videoPath, videoUrl: null, description: '<dry-run>', keyword: '<dry-run>', searchResults: [] };
}
// Reuse existing pipeline: best snapshot → image search → keyword rerank
const detectAndSearch = await runDetectBestAndSearch(args, dryRun) as any;
if (detectAndSearch.status === 'failed') {
return { status: 'failed', command: 'detect-video-and-search', dryRun, videoPath, error: detectAndSearch.error || 'detect-best-and-search failed' };
}
const description = String(detectAndSearch.bestSnapshot?.description || '').trim();
const rerank = detectAndSearch.rerank;
const keyword = String(rerank?.keyword || '').trim();
const searchResults = (rerank?.results || []) as SearchItem[];
// Fallback: if rerank didn't produce anything, do keyword search directly.
if (!searchResults.length) {
const client = createSkillClient();
const visionConfig = await loadVisionConfig(client);
const fallbackKeyword = keyword || (description ? await generateChineseKeyword(description, visionConfig) : '');
const items = fallbackKeyword ? await keywordSearch(client, fallbackKeyword, 1) : [];
return {
status: 'success',
command: 'detect-video-and-search',
dryRun,
videoPath,
videoUrl: null,
description,
keyword: fallbackKeyword,
searchResults: items,
};
}
return {
status: 'success',
command: 'detect-video-and-search',
dryRun,
videoPath,
videoUrl: null,
description,
keyword,
searchResults,
};
}
async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> { async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
const detectResult = await runDetect(args, dryRun) as DetectResult; const detectResult = await runDetect(args, dryRun) as DetectResult;
if (detectResult.status === 'failed') return detectResult; if (detectResult.status === 'failed') return detectResult;
@ -333,47 +223,15 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
const imageForSearch = best.croppedImagePath || best.imagePath; const imageForSearch = best.croppedImagePath || best.imagePath;
const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult; const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;
// Post-filter: drop results whose pic_url isn't the same product type as our snapshot
let postFilter: any = undefined;
if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
if (items.length > 0) {
try {
const client = createSkillClient();
const visionConfig = await loadVisionConfig(client);
const result = await postFilterByImage(imageForSearch, items, visionConfig, { description: best.description });
(searchResult.searchBody as any).data.items.item = result.kept;
postFilter = {
totalChecked: result.totalChecked,
keptCount: result.kept.length,
rejectedCount: result.rejected.length,
failed: result.failed,
};
} catch (e: any) {
postFilter = { error: e.message };
}
}
}
let rerankResult: any = undefined; let rerankResult: any = undefined;
// If post-filter produced focused results, sort them directly by sales — they're already the best matches. if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
// Otherwise fall back to the keyword-intersection rerank.
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
rerankResult = {
source: 'post-filter',
results: sorted,
count: sorted.length,
};
} else if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`); const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
try { try {
fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody)); fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
rerankResult = await runRerank([ rerankResult = await runRerank([
`--image-results=${tmpFile}`, `--image-results=${tmpFile}`,
`--description=${best.description}`, `--description=${best.description}`,
'--top=5', '--top=10',
], dryRun); ], dryRun);
} catch (e: any) { } catch (e: any) {
rerankResult = { error: e.message }; rerankResult = { error: e.message };
@ -388,11 +246,69 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
searchHttpStatus: searchResult.searchHttpStatus, searchHttpStatus: searchResult.searchHttpStatus,
searchBody: searchResult.searchBody, searchBody: searchResult.searchBody,
searchError: searchResult.error, searchError: searchResult.error,
postFilter,
rerank: rerankResult, rerank: rerankResult,
} as any; } as any;
} }
async function runDetectVideo(args: string[], dryRun: boolean): Promise<DetectVideoResult> {
const videoPath = args[0];
if (!videoPath) return { status: 'failed', command: 'detect-video', dryRun, error: 'detect-video requires <video-path>' };
if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-video', dryRun, error: `video not found: ${videoPath}` };
if (dryRun) {
return { status: 'success', command: 'detect-video', dryRun, videoPath };
}
const client = createSkillClient();
const visionConfig = await loadVisionConfig(client);
// 1. Upload video to get public URL
const videoUrl = await uploadVideo(videoPath);
// 2. Analyze video via LLM
const { description } = await analyzeVideo(videoUrl, visionConfig);
// 3. Generate Chinese search keyword
const keyword = await generateChineseKeyword(description, visionConfig);
return {
status: 'success',
command: 'detect-video',
dryRun,
videoPath,
videoUrl,
description,
keyword,
};
}
async function runDetectVideoAndSearch(args: string[], dryRun: boolean): Promise<DetectVideoResult> {
const result = await runDetectVideo(args, dryRun) as DetectVideoResult;
if (result.status === 'failed') return result;
if (dryRun) return { ...result, command: 'detect-video-and-search' };
const client = createSkillClient();
// Search 1688 with keyword directly (no rerank — image-based rerank doesn't apply to text search)
let searchResults: SearchItem[] = [];
if (result.keyword) {
try {
const items = await keywordSearch(client, result.keyword);
// Sort by sales descending
searchResults = items.sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0));
} catch (e: any) {
return { ...result, command: 'detect-video-and-search', status: 'failed', error: `keyword search failed: ${e.message}` };
}
}
return {
...result,
command: 'detect-video-and-search',
searchResults,
};
}
function parseDetectOptions(videoPath: string, args: string[]): DetectOptions { function parseDetectOptions(videoPath: string, args: string[]): DetectOptions {
const outputDir = getFlag(args, '--output-dir') || path.join( const outputDir = getFlag(args, '--output-dir') || path.join(
path.dirname(videoPath), path.dirname(videoPath),
@ -417,25 +333,7 @@ function getFlag(args: string[], flag: string): string | undefined {
} }
function createVisionModel(config: VisionConfig) { function createVisionModel(config: VisionConfig) {
const sessionId = config.sessionId || ''; const openai = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL });
const originFetch = globalThis.fetch;
// Inject metadata.session_id into request body so LiteLLM → Langfuse creates sessions
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
if (init?.body && typeof init.body === 'string') {
try {
const body = JSON.parse(init.body);
if (!body.metadata) body.metadata = {};
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
body.metadata.tags = ['skill:video-product-snapshot'];
init = { ...init, body: JSON.stringify(body) };
} catch {}
}
return originFetch(input, init);
};
const openai = createOpenAI({
apiKey: config.apiKey, baseURL: config.baseURL,
fetch: wrapped as typeof globalThis.fetch,
});
return openai(config.model); return openai(config.model);
} }
@ -490,10 +388,9 @@ function extractKeywordsFromTitles(items: SearchItem[], topN = 5): string {
async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult> { async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult> {
// --image-results=<path> --keyword=<text> --top=<n> // --image-results=<path> --keyword=<text> --top=<n>
const positionals = args.filter((a) => !a.startsWith('--')); const imageResultsArg = getFlag(args, '--image-results') || args[0];
const imageResultsArg = getFlag(args, '--image-results') || positionals[0]; const keywordArg = getFlag(args, '--keyword') || args[1];
const keywordArg = getFlag(args, '--keyword') || positionals[1]; const topN = parseInt(getFlag(args, '--top') || '10', 10);
const topN = parseInt(getFlag(args, '--top') || '5', 10);
const description = getFlag(args, '--description') || ''; const description = getFlag(args, '--description') || '';
@ -568,3 +465,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
results: sorted, results: sorted,
} as any; } as any;
} }
function parseJsonSafe(text: string): unknown {
try { return JSON.parse(text); } catch { return text; }
}

View File

@ -1,123 +0,0 @@
import { generateText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
import type { SearchItem } from './types.ts';
import type { VisionConfig } from './index.ts';
import { imageToBase64 } from './frame-extractor.ts';
export interface PostFilterResult {
kept: SearchItem[];
rejected: SearchItem[];
totalChecked: number;
failed: boolean;
}
const FILTER_PROMPT = (count: number, description?: string) => {
const productLine = description
? `查询商品是:${description}`
: '第1张图是查询商品。';
return `${productLine}
${count}
****
- "鞋架"
-
- vs vs
-
1: YES
2: NO
3: YES
...
${count} `;
};
function createModel(config: VisionConfig) {
const sessionId = config.sessionId || '';
const originFetch = globalThis.fetch;
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
if (init?.body && typeof init.body === 'string') {
try {
const body = JSON.parse(init.body);
if (!body.metadata) body.metadata = {};
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
body.metadata.tags = ['skill:video-product-snapshot'];
init = { ...init, body: JSON.stringify(body) };
} catch {}
}
return originFetch(input, init);
};
const provider = createOpenAI({
apiKey: config.apiKey, baseURL: config.baseURL,
fetch: wrapped as typeof globalThis.fetch,
});
return provider(config.model);
}
async function classifyBatch(
model: ReturnType<ReturnType<typeof createOpenAI>>,
queryImageDataUrl: string,
batch: SearchItem[],
description?: string,
): Promise<boolean[]> {
const content: any[] = [{ type: 'image', image: queryImageDataUrl }];
for (const item of batch) {
content.push({ type: 'image', image: item.pic_url });
}
content.push({ type: 'text', text: FILTER_PROMPT(batch.length, description) });
const { text } = await generateText({
model,
messages: [{ role: 'user', content }],
maxTokens: 200,
});
const flags = batch.map(() => false);
for (const line of text.split('\n')) {
const m = line.match(/^\s*(\d+)\s*[:]\s*(YES|NO|是|否)/i);
if (!m) continue;
const idx = parseInt(m[1], 10) - 1;
const yes = /YES|是/i.test(m[2]);
if (idx >= 0 && idx < flags.length) flags[idx] = yes;
}
return flags;
}
export async function postFilterByImage(
queryImagePath: string,
items: SearchItem[],
visionConfig: VisionConfig,
options: { description?: string; batchSize?: number } = {},
): Promise<PostFilterResult> {
if (items.length === 0) {
return { kept: [], rejected: [], totalChecked: 0, failed: false };
}
const batchSize = options.batchSize ?? 10;
const description = options.description;
const model = createModel(visionConfig);
const queryDataUrl = `data:image/jpeg;base64,${imageToBase64(queryImagePath)}`;
const kept: SearchItem[] = [];
const rejected: SearchItem[] = [];
let anyFailed = false;
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
try {
const flags = await classifyBatch(model, queryDataUrl, batch, description);
batch.forEach((item, idx) => {
if (flags[idx]) kept.push(item);
else rejected.push(item);
});
} catch {
// On batch failure, keep items (don't lose them) but flag the run as partial
anyFailed = true;
kept.push(...batch);
}
}
return { kept, rejected, totalChecked: items.length, failed: anyFailed };
}

View File

@ -1,4 +1,4 @@
import { generateObject, generateText } from 'ai'; import { generateObject } from 'ai';
import { createOpenAI } from '@ai-sdk/openai'; import { createOpenAI } from '@ai-sdk/openai';
import { z } from 'zod'; import { z } from 'zod';
import type { ExtractedFrame } from './frame-extractor.ts'; import type { ExtractedFrame } from './frame-extractor.ts';
@ -28,38 +28,7 @@ Discard (keep=false) if: only hands/texture/contents visible, motion blur, black
reason options: product_visible | content_only | hands_only | blur | transition | background_only`; reason options: product_visible | content_only | hands_only | blur | transition | background_only`;
const CONTAINER_CHECK_PROMPT = `Is the main product in this image a CONTAINER, RACK, or HOLDER (something designed to store/hold other items)? const RANKING_PROMPT = (count: number) => `You are selecting the single best product frame from ${count} video frames for ecommerce search.
Examples YES: shoe rack, shelf, storage box, organizer, basket, drawer, wardrobe, trolley, bin, tray, cabinet.
Examples NO: shoes, clothing, electronics, food, toys, cosmetics, tools.
Reply with only one word: YES or NO.`;
const RANKING_PROMPT_CONTAINER = (count: number) => `You are selecting ONE frame from ${count} video frames to use as the query image for an ecommerce reverse-image search.
The hero product is a CONTAINER / RACK / HOLDER / ORGANIZER.
CRITICAL CONSTRAINT read this first:
Image search engines identify objects by visual appearance. If the container holds items (shoes, clothes, etc.), the search engine will match those ITEMS, not the container returning completely wrong products.
YOUR ONLY JOB: find the frame where the container structure itself is most visible with the FEWEST or NO items inside.
ABSOLUTE PRIORITY ORDER (do not deviate):
1. Frame with container completely EMPTY highest priority regardless of angle or assembly state
2. Frame with container partially assembled or partially visible but EMPTY still better than any loaded frame
3. Frame with fewest items inside (1-2 items, mostly empty)
4. Frame with moderate load only if no emptier option exists
5. Frame fully loaded last resort only if no other frames exist
A frame showing the rack mid-assembly with zero items is ALWAYS better than a perfectly-lit fully-assembled rack filled with shoes.
Frames are numbered 0 to ${count - 1} in order shown. You MUST pick ONE.
Return:
- bestFrameIndex: 0-based index of the emptiest container frame
- description: concise Chinese search query 12 words (container type + material + color + key feature)
- reasoning: describe how many items are visible inside the chosen frame and why it's the emptiest option
- boundingBox: tight box of the PRODUCT STRUCTURE ONLY as [x1, y1, x2, y2] normalized 0.01.0. Exclude any items stored inside.`;
const RANKING_PROMPT_GENERAL = (count: number) => `You are selecting the single best product frame from ${count} video frames for ecommerce search.
Frames are numbered 0 to ${count - 1} in order shown. Frames are numbered 0 to ${count - 1} in order shown.
@ -78,24 +47,7 @@ Return:
- boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.01.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`; - boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.01.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;
function createVisionModel(config: VisionConfig) { function createVisionModel(config: VisionConfig) {
const sessionId = config.sessionId || ''; const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL });
const originFetch = globalThis.fetch;
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
if (init?.body && typeof init.body === 'string') {
try {
const body = JSON.parse(init.body);
if (!body.metadata) body.metadata = {};
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
body.metadata.tags = ['skill:video-product-snapshot'];
init = { ...init, body: JSON.stringify(body) };
} catch {}
}
return originFetch(input, init);
};
const provider = createOpenAI({
apiKey: config.apiKey, baseURL: config.baseURL,
fetch: wrapped as typeof globalThis.fetch,
});
return provider(config.model); return provider(config.model);
} }
@ -120,52 +72,15 @@ async function filterFrame(
return object.keep; return object.keep;
} }
async function isContainerProduct(
firstFrame: ExtractedFrame,
model: ReturnType<ReturnType<typeof createOpenAI>>,
): Promise<boolean> {
try {
const { text } = await generateText({
model,
messages: [{
role: 'user',
content: [
{ type: 'image', image: `data:image/jpeg;base64,${imageToBase64(firstFrame.imagePath)}` },
{ type: 'text', text: CONTAINER_CHECK_PROMPT },
],
}],
maxTokens: 5,
});
return text.trim().toUpperCase().startsWith('Y');
} catch {
return false;
}
}
function takeEarliestFrames(candidates: ExtractedFrame[], fraction: number = 0.4): ExtractedFrame[] {
// Ecommerce videos show the container empty/unboxing early, then full.
// Taking the first 40% of frames reliably captures empty states.
const sorted = [...candidates].sort((a, b) => a.frameIndex - b.frameIndex);
const cutoff = Math.max(1, Math.ceil(sorted.length * fraction));
return sorted.slice(0, cutoff);
}
async function rankCandidates( async function rankCandidates(
candidates: ExtractedFrame[], candidates: ExtractedFrame[],
model: ReturnType<ReturnType<typeof createOpenAI>>, model: ReturnType<ReturnType<typeof createOpenAI>>,
isContainer: boolean,
): Promise<{ bestFrame: ExtractedFrame; description: string; reasoning: string; boundingBox: [number, number, number, number] }> { ): Promise<{ bestFrame: ExtractedFrame; description: string; reasoning: string; boundingBox: [number, number, number, number] }> {
const imageContent = candidates.map((f) => ({ const imageContent = candidates.map((f) => ({
type: 'image' as const, type: 'image' as const,
image: `data:image/jpeg;base64,${imageToBase64(f.imagePath)}`, image: `data:image/jpeg;base64,${imageToBase64(f.imagePath)}`,
})); }));
const prompt = isContainer
? RANKING_PROMPT_CONTAINER(candidates.length)
: RANKING_PROMPT_GENERAL(candidates.length);
const { object } = await generateObject({ const { object } = await generateObject({
model, model,
schema: RankingSchema, schema: RankingSchema,
@ -174,7 +89,7 @@ async function rankCandidates(
role: 'user', role: 'user',
content: [ content: [
...imageContent, ...imageContent,
{ type: 'text', text: prompt }, { type: 'text', text: RANKING_PROMPT(candidates.length) },
], ],
}], }],
}); });
@ -331,18 +246,9 @@ export async function detectBestFrame(
const model = createVisionModel(visionConfig); const model = createVisionModel(visionConfig);
// 3. Check if product is a container/rack type (use first candidate frame) // 3. Try Vision ranking with error isolation
const container = await isContainerProduct(candidates[0], model);
// 4. For containers: restrict ranking to earliest frames (empty/unboxing phase)
if (container) {
const early = takeEarliestFrames(candidates);
if (early.length > 0) candidates = early;
}
// 5. Try Vision ranking with error isolation
try { try {
const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model, container); const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
if (isValidBoundingBox(boundingBox)) { if (isValidBoundingBox(boundingBox)) {
const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg'); const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
@ -407,10 +313,9 @@ export async function detectProductFrames(
if (candidates.length === 0) return []; if (candidates.length === 0) return [];
// Pass 2: single comparative call — model sees all candidates at once // Pass 2: single comparative call — model sees all candidates at once
const container = await isContainerProduct(candidates[0], model);
let bestSnapshot: ProductFrame | undefined; let bestSnapshot: ProductFrame | undefined;
try { try {
const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model, container); const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
if (isValidBoundingBox(boundingBox)) { if (isValidBoundingBox(boundingBox)) {
const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg'); const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');

View File

@ -1,13 +1,4 @@
export type Command = export type Command = 'detect' | 'search' | 'detect-and-search' | 'detect-best' | 'detect-best-and-search' | 'detect-video' | 'detect-video-and-search' | 'rerank' | 'session';
| 'detect'
| 'search'
| 'detect-and-search'
| 'detect-best'
| 'detect-best-and-search'
| 'detect-video'
| 'detect-video-and-search'
| 'rerank'
| 'session';
export interface SearchItem { export interface SearchItem {
num_iid: number; num_iid: number;
@ -20,30 +11,6 @@ export interface SearchItem {
detail_url: string; detail_url: string;
} }
export interface DetectVideoResult {
status: 'success' | 'failed';
command: 'detect-video';
dryRun: boolean;
videoPath?: string;
videoUrl?: string | null;
description?: string;
keyword?: string;
snapshotImagePath?: string;
error?: string;
}
export interface DetectVideoAndSearchResult {
status: 'success' | 'failed';
command: 'detect-video-and-search';
dryRun: boolean;
videoPath?: string;
videoUrl?: string | null;
description?: string;
keyword?: string;
searchResults?: SearchItem[];
error?: string;
}
export interface DetectOptions { export interface DetectOptions {
videoPath: string; videoPath: string;
intervalSeconds: number; intervalSeconds: number;
@ -84,4 +51,17 @@ export interface SearchResult {
error?: string; error?: string;
} }
export type OutputResult = DetectResult | SearchResult | DetectVideoResult | DetectVideoAndSearchResult; export interface DetectVideoResult {
status: 'success' | 'failed';
command: Command;
dryRun: boolean;
videoPath?: string;
videoUrl?: string;
description?: string;
keyword?: string;
searchResults?: SearchItem[];
rerank?: unknown;
error?: string;
}
export type OutputResult = DetectResult | SearchResult | DetectVideoResult;

97
src/video-analyzer.ts Normal file
View File

@ -0,0 +1,97 @@
import * as fs from 'fs';
import type { VisionConfig } from './index.ts';
import { createSkillClient } from './auth-cli.ts';
const UPLOAD_ENDPOINT =
process.env.ONEBOUND_UPLOAD_ENDPOINT ||
'http://localhost:3202/api/v1/tasks/upload-image';
/**
* Upload a video file to get a public URL.
*
* Uses direct HTTP fetch (not auth-rt CLI) to avoid E2BIG errors
* when the base64-encoded video exceeds the command-line argument limit.
*/
export async function uploadVideo(videoPath: string): Promise<string> {
const client = createSkillClient();
const { accessToken } = await client.session();
const videoBuffer = fs.readFileSync(videoPath);
const ext = videoPath.match(/\.(\w+)$/)?.[1] || 'mp4';
const filename = `video-${Date.now()}.${ext}`;
const contentType = ext === 'mov' ? 'video/quicktime' : `video/${ext}`;
const response = await fetch(UPLOAD_ENDPOINT, {
method: 'POST',
headers: {
Authorization: `Bearer ${accessToken}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
data: videoBuffer.toString('base64'),
filename,
contentType,
}),
});
if (!response.ok) {
const errBody = await response.text().catch(() => 'unknown');
throw new Error(`Video upload failed (${response.status}): ${errBody.slice(0, 300)}`);
}
const json = (await response.json()) as { url?: string };
if (!json.url) throw new Error('Upload response missing url');
return json.url;
}
export interface VideoAnalysis {
description: string;
rawResponse?: string;
}
export async function analyzeVideo(
videoUrl: string,
config: VisionConfig,
): Promise<VideoAnalysis> {
const response = await fetch(`${config.baseURL}/v1/chat/completions`, {
method: 'POST',
headers: {
Authorization: `Bearer ${config.apiKey}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: config.model,
messages: [
{
role: 'user',
content: [
{
type: 'video_url',
video_url: { url: videoUrl },
},
{
type: 'text',
text: '找出视频中的商品主体,用中文简要描述商品名称、材质、颜色、功能。',
},
],
},
],
max_tokens: 500,
}),
});
if (!response.ok) {
const errBody = await response.text().catch(() => 'unknown');
throw new Error(
`Video analysis API error (${response.status}): ${errBody.slice(0, 500)}`,
);
}
const json = (await response.json()) as any;
const content = json?.choices?.[0]?.message?.content;
if (!content) {
throw new Error('Video analysis returned empty response');
}
return { description: content.trim(), rawResponse: JSON.stringify(json) };
}