Compare commits
5 Commits
| Author | SHA1 | Date |
|---|---|---|
|
|
c5e1d0c88c | |
|
|
eb8e7a7daf | |
|
|
b6dc7af9bf | |
|
|
7a43eb391b | |
|
|
d497e92626 |
112
README.md
112
README.md
|
|
@ -1,19 +1,22 @@
|
||||||
# video-product-snapshot — 视频商品截图
|
# video-product-snapshot — 视频商品以图搜图
|
||||||
|
|
||||||
检测视频中的电商商品,提取最佳商品画面,并通过图片搜索在 1688 找同款。
|
从视频中提取最佳商品帧,以图搜图在 1688 找同款。
|
||||||
|
|
||||||
## 工作原理
|
## 工作原理
|
||||||
|
|
||||||
1. 使用 `ffmpeg` 按配置间隔从视频抽帧
|
1. `ffmpeg` 按 0.5s 间隔抽帧(最多 60 帧)
|
||||||
2. 将每帧发给视觉模型,检测是否有商品并评分
|
2. 视觉质量预过滤(亮度/方差剔除模糊帧)
|
||||||
3. 选出置信度最高的帧作为最佳商品截图
|
3. 容器/架子类产品检测 → 自动选择空载帧
|
||||||
4. 可选:用这张截图调用图片搜索 API 找同款商品
|
4. 视觉模型多帧对比排序,选出最佳商品帧
|
||||||
|
5. 裁剪商品区域 → 上传 → 1688 图搜
|
||||||
|
6. 后置过滤(视觉模型判断结果是否同款)→ rerank 排序
|
||||||
|
|
||||||
## 安装
|
## 安装
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
./install.sh # 安装 auth-rt + 依赖
|
||||||
bun install
|
bun install
|
||||||
bun run build # 输出到 dist/run.js
|
bun run build # 输出到 dist/run.js
|
||||||
```
|
```
|
||||||
|
|
||||||
## 使用方法
|
## 使用方法
|
||||||
|
|
@ -26,77 +29,74 @@ bun dist/run.js <command> [options]
|
||||||
|
|
||||||
| 命令 | 说明 |
|
| 命令 | 说明 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| `detect <video>` | 抽帧并检测商品画面 |
|
| `detect-best-and-search <video>` | **推荐。** 最佳帧 → 图搜 → rerank |
|
||||||
| `search <image>` | 用图片搜同款 |
|
| `detect-best <video>` | 只提取最佳商品帧,不搜图 |
|
||||||
| `detect-and-search <video>` | 完整流程:检测最佳画面 → 搜图 |
|
| `detect-and-search <video>` | 两阶段过滤后图搜(较慢) |
|
||||||
| `session` | 打印当前认证 session token |
|
| `detect <video>` | 抽帧并逐帧检测商品 |
|
||||||
|
| `search <image>` | 用已有图片搜同款 |
|
||||||
|
| `rerank` | 关键词对图搜结果交叉过滤 |
|
||||||
|
| `session` | 获取当前认证会话 token |
|
||||||
|
|
||||||
### 选项(`detect` / `detect-and-search`)
|
### 选项(`detect-best` / `detect-best-and-search`)
|
||||||
|
|
||||||
| 参数 | 默认值 | 说明 |
|
| 参数 | 默认值 | 说明 |
|
||||||
|------|--------|------|
|
|------|--------|------|
|
||||||
| `--interval=<秒>` | `1` | 抽帧间隔(秒) |
|
| `--interval=<秒>` | `0.5` | 帧采样间隔 |
|
||||||
| `--max-frames=<数量>` | `60` | 最多分析帧数 |
|
| `--max-frames=<n>` | `60` | 最大分析帧数 |
|
||||||
| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
|
| `--output-dir=<目录>` | 视频同目录 | 截图保存目录 |
|
||||||
| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
|
| `--session-id=<id>` | 自动生成 | Langfuse session ID |
|
||||||
| `--dry-run` | — | 解析参数并打印配置,不实际执行 |
|
| `--dry-run` | — | 解析参数,不实际执行 |
|
||||||
|
|
||||||
### 示例
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 检测商品,每 3 秒抽一帧
|
|
||||||
bun dist/run.js detect ./demo.mp4 --interval=3
|
|
||||||
|
|
||||||
# 完整流程 + 更高置信度门槛
|
|
||||||
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
|
|
||||||
|
|
||||||
# 用已有截图搜同款
|
|
||||||
bun dist/run.js search ./snapshot.jpg
|
|
||||||
```
|
|
||||||
|
|
||||||
## 输出
|
## 输出
|
||||||
|
|
||||||
所有命令输出 JSON 到 stdout。
|
所有命令输出 JSON 到 stdout,包含 `sessionId` 字段用于 Langfuse 追踪。
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
|
"sessionId": "skill-20260426-184345-lb06",
|
||||||
|
"status": "success",
|
||||||
|
"command": "detect-best-and-search",
|
||||||
"bestSnapshot": {
|
"bestSnapshot": {
|
||||||
"frameIndex": 4,
|
"frameIndex": 7,
|
||||||
"timestampSeconds": 9,
|
"timestampSeconds": 3,
|
||||||
"imagePath": "/path/to/frame_0004.jpg",
|
"imagePath": "/path/to/frame_0007.jpg",
|
||||||
"confidence": 0.92,
|
"croppedImagePath": "/path/to/frame_0007_cropped.jpg",
|
||||||
"description": "White sneaker with blue logo, left side view",
|
"description": "黑色金属床底鞋架 可折叠移动"
|
||||||
"boundingHint": "centered"
|
|
||||||
},
|
},
|
||||||
"productFrames": [...],
|
"rerank": {
|
||||||
"searchBody": { ... }
|
"keyword": "床底鞋架",
|
||||||
|
"results": [
|
||||||
|
{ "num_iid": 123, "title": "...", "price": "44.00", "sales": 87, "detail_url": "..." }
|
||||||
|
]
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
- `productFrames` — 所有检测到的画面,按置信度排序(最高在前)
|
## 鉴权架构
|
||||||
- `bestSnapshot` — 排名第一的画面
|
|
||||||
- `searchBody` — 图片搜索 API 的返回(仅 `search` / `detect-and-search`)
|
```
|
||||||
|
~/.openclaw/.env
|
||||||
|
CLIENT_KEY ──→ auth-rt ──→ 业务系统
|
||||||
|
├── /session → access_token
|
||||||
|
└── /client-config → provider.api_key
|
||||||
|
provider.base_url
|
||||||
|
provider.model
|
||||||
|
```
|
||||||
|
|
||||||
|
仅需配置 `CLIENT_KEY`,LLM 凭据和端点均由业务系统下发。
|
||||||
|
|
||||||
## 环境变量
|
## 环境变量
|
||||||
|
|
||||||
唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`:
|
|
||||||
|
|
||||||
```
|
|
||||||
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
|
|
||||||
```
|
|
||||||
|
|
||||||
所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取,无需额外配置。
|
|
||||||
|
|
||||||
### 可选覆盖
|
|
||||||
|
|
||||||
| 变量 | 说明 |
|
| 变量 | 说明 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| `VISION_MODEL` | 覆盖模型名称(默认:`aliyun-cp-multimodal`) |
|
| `CLIENT_KEY` | **必需。** 在 `~/.openclaw/.env` 中配置 |
|
||||||
|
| `VISION_MODEL` | 覆盖模型名称(默认来自 client config) |
|
||||||
|
| `SKILL_SESSION_ID` | Langfuse session ID(自动生成,格式 `skill-YYYYMMDD-HHMMSS-xxxx`) |
|
||||||
| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
|
| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
|
||||||
| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |
|
| `TELEMETRY_ENDPOINT` | 遥测上报接口 |
|
||||||
|
|
||||||
## 前置依赖
|
## 前置依赖
|
||||||
|
|
||||||
- [Bun](https://bun.sh) 运行时
|
- [Bun](https://bun.sh) 运行时
|
||||||
- 系统 PATH 中包含 `ffmpeg` 和 `ffprobe`
|
- 系统 PATH 中包含 `ffmpeg` / `ffprobe`(帧提取)
|
||||||
- 系统 PATH 中包含 `auth-rt` CLI(`search` / `detect-and-search` 需要)
|
- `auth-rt` CLI(鉴权/API 调用,`install.sh` 自动安装)
|
||||||
|
|
|
||||||
2
SKILL.md
2
SKILL.md
|
|
@ -67,7 +67,7 @@ bun dist/run.js <command> [args] [--dry-run]
|
||||||
|
|
||||||
## 结果展示格式
|
## 结果展示格式
|
||||||
|
|
||||||
将 `rerank.results`(优先)或 `searchBody.data.items.item` 格式化为 markdown 表格,**每页 5 行**:
|
将 `rerank.results`(优先)或 `searchBody.data.items.item` 格式化为 markdown 表格,**最多 5 条**:
|
||||||
|
|
||||||
| # | 商品名称 | 价格 | 销量 | 链接 |
|
| # | 商品名称 | 价格 | 销量 | 链接 |
|
||||||
|---|----------|------|------|------|
|
|---|----------|------|------|------|
|
||||||
|
|
|
||||||
|
|
@ -55,6 +55,20 @@ export interface SessionResponse {
|
||||||
hookToken?: string;
|
hookToken?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export interface ClientConfig {
|
||||||
|
clientId: string;
|
||||||
|
name: string;
|
||||||
|
status: string;
|
||||||
|
metadata: {
|
||||||
|
provider?: {
|
||||||
|
api_key?: string;
|
||||||
|
base_url?: string;
|
||||||
|
model?: string;
|
||||||
|
};
|
||||||
|
[key: string]: unknown;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
export interface SkillClientOptions {
|
export interface SkillClientOptions {
|
||||||
apiBase?: string;
|
apiBase?: string;
|
||||||
dryRun?: boolean;
|
dryRun?: boolean;
|
||||||
|
|
@ -91,6 +105,13 @@ export class SkillClient {
|
||||||
return JSON.parse(runCli('session'));
|
return JSON.parse(runCli('session'));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async clientConfig(): Promise<ClientConfig> {
|
||||||
|
if (this.dryRun) {
|
||||||
|
return { clientId: '<dry-run>', name: '<dry-run>', status: 'active', metadata: {} };
|
||||||
|
}
|
||||||
|
return JSON.parse(runCli('client-config'));
|
||||||
|
}
|
||||||
|
|
||||||
async get(urlPath: string): Promise<ApiResponse> {
|
async get(urlPath: string): Promise<ApiResponse> {
|
||||||
return this.request('GET', urlPath);
|
return this.request('GET', urlPath);
|
||||||
}
|
}
|
||||||
|
|
|
||||||
34
src/index.ts
34
src/index.ts
|
|
@ -212,7 +212,7 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
|
||||||
// Otherwise fall back to the keyword-intersection rerank.
|
// Otherwise fall back to the keyword-intersection rerank.
|
||||||
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
|
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
|
||||||
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
|
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
|
||||||
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 10);
|
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
|
||||||
rerankResult = {
|
rerankResult = {
|
||||||
source: 'post-filter',
|
source: 'post-filter',
|
||||||
results: sorted,
|
results: sorted,
|
||||||
|
|
@ -225,7 +225,7 @@ async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<
|
||||||
rerankResult = await runRerank([
|
rerankResult = await runRerank([
|
||||||
`--image-results=${tmpFile}`,
|
`--image-results=${tmpFile}`,
|
||||||
`--description=${best.description}`,
|
`--description=${best.description}`,
|
||||||
'--top=10',
|
'--top=5',
|
||||||
], dryRun);
|
], dryRun);
|
||||||
} catch (e: any) {
|
} catch (e: any) {
|
||||||
rerankResult = { error: e.message };
|
rerankResult = { error: e.message };
|
||||||
|
|
@ -360,7 +360,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
|
||||||
// Otherwise fall back to the keyword-intersection rerank.
|
// Otherwise fall back to the keyword-intersection rerank.
|
||||||
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
|
if (!dryRun && postFilter && !postFilter.error && postFilter.keptCount > 0) {
|
||||||
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
|
const items: SearchItem[] = (searchResult.searchBody as any)?.data?.items?.item ?? [];
|
||||||
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 10);
|
const sorted = [...items].sort((a, b) => (b.sales ?? 0) - (a.sales ?? 0)).slice(0, 5);
|
||||||
rerankResult = {
|
rerankResult = {
|
||||||
source: 'post-filter',
|
source: 'post-filter',
|
||||||
results: sorted,
|
results: sorted,
|
||||||
|
|
@ -373,7 +373,7 @@ async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<Outp
|
||||||
rerankResult = await runRerank([
|
rerankResult = await runRerank([
|
||||||
`--image-results=${tmpFile}`,
|
`--image-results=${tmpFile}`,
|
||||||
`--description=${best.description}`,
|
`--description=${best.description}`,
|
||||||
'--top=10',
|
'--top=5',
|
||||||
], dryRun);
|
], dryRun);
|
||||||
} catch (e: any) {
|
} catch (e: any) {
|
||||||
rerankResult = { error: e.message };
|
rerankResult = { error: e.message };
|
||||||
|
|
@ -417,13 +417,25 @@ function getFlag(args: string[], flag: string): string | undefined {
|
||||||
}
|
}
|
||||||
|
|
||||||
function createVisionModel(config: VisionConfig) {
|
function createVisionModel(config: VisionConfig) {
|
||||||
const headers: Record<string, string> = {
|
const sessionId = config.sessionId || '';
|
||||||
'x-langfuse-tags': 'skill:video-product-snapshot',
|
const originFetch = globalThis.fetch;
|
||||||
|
// Inject metadata.session_id into request body so LiteLLM → Langfuse creates sessions
|
||||||
|
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
|
||||||
|
if (init?.body && typeof init.body === 'string') {
|
||||||
|
try {
|
||||||
|
const body = JSON.parse(init.body);
|
||||||
|
if (!body.metadata) body.metadata = {};
|
||||||
|
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
|
||||||
|
body.metadata.tags = ['skill:video-product-snapshot'];
|
||||||
|
init = { ...init, body: JSON.stringify(body) };
|
||||||
|
} catch {}
|
||||||
|
}
|
||||||
|
return originFetch(input, init);
|
||||||
};
|
};
|
||||||
if (config.sessionId) {
|
const openai = createOpenAI({
|
||||||
headers['x-langfuse-session-id'] = config.sessionId;
|
apiKey: config.apiKey, baseURL: config.baseURL,
|
||||||
}
|
fetch: wrapped as typeof globalThis.fetch,
|
||||||
const openai = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers });
|
});
|
||||||
return openai(config.model);
|
return openai(config.model);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -481,7 +493,7 @@ async function runRerank(args: string[], dryRun: boolean): Promise<OutputResult>
|
||||||
const positionals = args.filter((a) => !a.startsWith('--'));
|
const positionals = args.filter((a) => !a.startsWith('--'));
|
||||||
const imageResultsArg = getFlag(args, '--image-results') || positionals[0];
|
const imageResultsArg = getFlag(args, '--image-results') || positionals[0];
|
||||||
const keywordArg = getFlag(args, '--keyword') || positionals[1];
|
const keywordArg = getFlag(args, '--keyword') || positionals[1];
|
||||||
const topN = parseInt(getFlag(args, '--top') || '10', 10);
|
const topN = parseInt(getFlag(args, '--top') || '5', 10);
|
||||||
|
|
||||||
const description = getFlag(args, '--description') || '';
|
const description = getFlag(args, '--description') || '';
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -34,13 +34,24 @@ const FILTER_PROMPT = (count: number, description?: string) => {
|
||||||
};
|
};
|
||||||
|
|
||||||
function createModel(config: VisionConfig) {
|
function createModel(config: VisionConfig) {
|
||||||
const headers: Record<string, string> = {
|
const sessionId = config.sessionId || '';
|
||||||
'x-langfuse-tags': 'skill:video-product-snapshot',
|
const originFetch = globalThis.fetch;
|
||||||
|
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
|
||||||
|
if (init?.body && typeof init.body === 'string') {
|
||||||
|
try {
|
||||||
|
const body = JSON.parse(init.body);
|
||||||
|
if (!body.metadata) body.metadata = {};
|
||||||
|
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
|
||||||
|
body.metadata.tags = ['skill:video-product-snapshot'];
|
||||||
|
init = { ...init, body: JSON.stringify(body) };
|
||||||
|
} catch {}
|
||||||
|
}
|
||||||
|
return originFetch(input, init);
|
||||||
};
|
};
|
||||||
if (config.sessionId) {
|
const provider = createOpenAI({
|
||||||
headers['x-langfuse-session-id'] = config.sessionId;
|
apiKey: config.apiKey, baseURL: config.baseURL,
|
||||||
}
|
fetch: wrapped as typeof globalThis.fetch,
|
||||||
const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers });
|
});
|
||||||
return provider(config.model);
|
return provider(config.model);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -78,13 +78,24 @@ Return:
|
||||||
- boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.0–1.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;
|
- boundingBox: tight box of the PRODUCT ONLY as [x1, y1, x2, y2] normalized 0.0–1.0, top-left origin. Exclude hands, background, and unrelated objects. The product is near the center of the frame.`;
|
||||||
|
|
||||||
function createVisionModel(config: VisionConfig) {
|
function createVisionModel(config: VisionConfig) {
|
||||||
const headers: Record<string, string> = {
|
const sessionId = config.sessionId || '';
|
||||||
'x-langfuse-tags': 'skill:video-product-snapshot',
|
const originFetch = globalThis.fetch;
|
||||||
|
const wrapped = async (input: RequestInfo | URL, init?: RequestInit) => {
|
||||||
|
if (init?.body && typeof init.body === 'string') {
|
||||||
|
try {
|
||||||
|
const body = JSON.parse(init.body);
|
||||||
|
if (!body.metadata) body.metadata = {};
|
||||||
|
if (!body.metadata.session_id) body.metadata.session_id = sessionId;
|
||||||
|
body.metadata.tags = ['skill:video-product-snapshot'];
|
||||||
|
init = { ...init, body: JSON.stringify(body) };
|
||||||
|
} catch {}
|
||||||
|
}
|
||||||
|
return originFetch(input, init);
|
||||||
};
|
};
|
||||||
if (config.sessionId) {
|
const provider = createOpenAI({
|
||||||
headers['x-langfuse-session-id'] = config.sessionId;
|
apiKey: config.apiKey, baseURL: config.baseURL,
|
||||||
}
|
fetch: wrapped as typeof globalThis.fetch,
|
||||||
const provider = createOpenAI({ apiKey: config.apiKey, baseURL: config.baseURL, headers });
|
});
|
||||||
return provider(config.model);
|
return provider(config.model);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue