skill: translate user-facing docs to Chinese, add detect-best commands
register-skill-release / register (push) Successful in 22s
Details
register-skill-release / register (push) Successful in 22s
Details
- SKILL.md / README.md: full Chinese translation for Chinese users - scripts/run.ts: help text in Chinese - src/: add detectBest and detectBestAndSearch commands Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
33e3d378cc
commit
91a623751d
94
README.md
94
README.md
|
|
@ -1,62 +1,62 @@
|
|||
# video-product-snapshot
|
||||
# video-product-snapshot — 视频商品截图
|
||||
|
||||
Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search for matching products via image-search API.
|
||||
检测视频中的电商商品,提取最佳商品画面,并通过图片搜索在 1688 找同款。
|
||||
|
||||
## How it works
|
||||
## 工作原理
|
||||
|
||||
1. Extracts frames from the video at a configurable interval using `ffmpeg`
|
||||
2. Sends each frame to a vision model to detect whether a product is visible and rate confidence
|
||||
3. Picks the highest-confidence frame as the best snapshot
|
||||
4. Optionally calls an image-search API with the snapshot to find matching products
|
||||
1. 使用 `ffmpeg` 按配置间隔从视频抽帧
|
||||
2. 将每帧发给视觉模型,检测是否有商品并评分
|
||||
3. 选出置信度最高的帧作为最佳商品截图
|
||||
4. 可选:用这张截图调用图片搜索 API 找同款商品
|
||||
|
||||
## Install
|
||||
## 安装
|
||||
|
||||
```bash
|
||||
bun install
|
||||
bun run build # outputs dist/run.js
|
||||
bun run build # 输出到 dist/run.js
|
||||
```
|
||||
|
||||
## Usage
|
||||
## 使用方法
|
||||
|
||||
```bash
|
||||
bun dist/run.js <command> [options]
|
||||
```
|
||||
|
||||
### Commands
|
||||
### 命令
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `detect <video>` | Extract frames and detect product snapshots |
|
||||
| `search <image>` | Search products by image via API |
|
||||
| `detect-and-search <video>` | Full pipeline: detect best snapshot then search |
|
||||
| `session` | Print current auth session token |
|
||||
| 命令 | 说明 |
|
||||
|------|------|
|
||||
| `detect <video>` | 抽帧并检测商品画面 |
|
||||
| `search <image>` | 用图片搜同款 |
|
||||
| `detect-and-search <video>` | 完整流程:检测最佳画面 → 搜图 |
|
||||
| `session` | 打印当前认证 session token |
|
||||
|
||||
### Options (`detect` / `detect-and-search`)
|
||||
### 选项(`detect` / `detect-and-search`)
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--interval=<sec>` | `1` | Seconds between sampled frames |
|
||||
| `--max-frames=<n>` | `60` | Max frames to analyze |
|
||||
| `--output-dir=<dir>` | next to video | Directory to save extracted frames |
|
||||
| `--min-confidence=<0-1>` | `0.7` | Minimum confidence to include a frame |
|
||||
| `--dry-run` | — | Parse args and print config without running |
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `--interval=<秒>` | `1` | 抽帧间隔(秒) |
|
||||
| `--max-frames=<数量>` | `60` | 最多分析帧数 |
|
||||
| `--output-dir=<目录>` | 视频所在目录 | 抽帧图片保存目录 |
|
||||
| `--min-confidence=<0-1>` | `0.7` | 最低检测置信度 |
|
||||
| `--dry-run` | — | 解析参数并打印配置,不实际执行 |
|
||||
|
||||
### Examples
|
||||
### 示例
|
||||
|
||||
```bash
|
||||
# Detect products, sample every 3 seconds
|
||||
# 检测商品,每 3 秒抽一帧
|
||||
bun dist/run.js detect ./demo.mp4 --interval=3
|
||||
|
||||
# Full pipeline with higher confidence threshold
|
||||
# 完整流程 + 更高置信度门槛
|
||||
bun dist/run.js detect-and-search ./demo.mp4 --interval=5 --min-confidence=0.85
|
||||
|
||||
# Search using an existing snapshot image
|
||||
# 用已有截图搜同款
|
||||
bun dist/run.js search ./snapshot.jpg
|
||||
```
|
||||
|
||||
## Output
|
||||
## 输出
|
||||
|
||||
All commands return JSON to stdout.
|
||||
所有命令输出 JSON 到 stdout。
|
||||
|
||||
```json
|
||||
{
|
||||
|
|
@ -73,30 +73,30 @@ All commands return JSON to stdout.
|
|||
}
|
||||
```
|
||||
|
||||
- `productFrames` — all detected frames sorted by confidence (highest first)
|
||||
- `bestSnapshot` — the top-ranked frame
|
||||
- `searchBody` — image-search API response (only for `search` / `detect-and-search`)
|
||||
- `productFrames` — 所有检测到的画面,按置信度排序(最高在前)
|
||||
- `bestSnapshot` — 排名第一的画面
|
||||
- `searchBody` — 图片搜索 API 的返回(仅 `search` / `detect-and-search`)
|
||||
|
||||
## Environment variables
|
||||
## 环境变量
|
||||
|
||||
The only required configuration is `CLIENT_KEY` in `~/.openclaw/.env`:
|
||||
唯一必需配置是 `~/.openclaw/.env` 中的 `CLIENT_KEY`:
|
||||
|
||||
```
|
||||
CLIENT_KEY=sk_xxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxx
|
||||
```
|
||||
|
||||
All credentials and endpoints are fetched automatically from the client config via `auth-rt`. No per-skill env vars needed.
|
||||
所有凭据和接口地址通过 `auth-rt` 从客户端配置自动获取,无需额外配置。
|
||||
|
||||
### Optional overrides
|
||||
### 可选覆盖
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `VISION_MODEL` | Override model name (default: `aliyun-cp-multimodal`) |
|
||||
| `AUTH_RT_BIN` | Override path to the `auth-rt` binary |
|
||||
| `TELEMETRY_ENDPOINT` | POST execution results to a telemetry endpoint |
|
||||
| 变量 | 说明 |
|
||||
|------|------|
|
||||
| `VISION_MODEL` | 覆盖模型名称(默认:`aliyun-cp-multimodal`) |
|
||||
| `AUTH_RT_BIN` | 覆盖 `auth-rt` 二进制路径 |
|
||||
| `TELEMETRY_ENDPOINT` | 上报执行结果到遥测接口 |
|
||||
|
||||
## Prerequisites
|
||||
## 前置依赖
|
||||
|
||||
- [Bun](https://bun.sh) runtime
|
||||
- `ffmpeg` and `ffprobe` in PATH
|
||||
- `auth-rt` CLI in PATH (required for `search` / `detect-and-search`)
|
||||
- [Bun](https://bun.sh) 运行时
|
||||
- 系统 PATH 中包含 `ffmpeg` 和 `ffprobe`
|
||||
- 系统 PATH 中包含 `auth-rt` CLI(`search` / `detect-and-search` 需要)
|
||||
|
|
|
|||
131
SKILL.md
131
SKILL.md
|
|
@ -1,126 +1,101 @@
|
|||
---
|
||||
name: video-product-snapshot
|
||||
description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it."
|
||||
description: "Detect ecommerce products in video frames using Claude Vision, extract the best product snapshot, and optionally search via image-search API. Use when the user provides a video and wants to find/identify products shown in it. / 检测视频中的商品,提取最佳商品截图,并通过图片搜索在1688找同款。当用户提供视频想找商品时使用。"
|
||||
---
|
||||
|
||||
# Video Product Snapshot
|
||||
# Video Product Snapshot — 视频商品截图
|
||||
|
||||
Extract ecommerce product snapshots from video using Claude Vision, then optionally search for matching products via image-search API.
|
||||
从视频中提取最佳商品画面,通过 Claude Vision 检测并截取,然后在 1688 上以图搜图 + 关键词重排序找到同款商品。
|
||||
|
||||
## Run
|
||||
## 运行
|
||||
|
||||
```bash
|
||||
bun dist/run.js <command> [args] [--dry-run]
|
||||
```
|
||||
|
||||
## Commands
|
||||
## 命令列表
|
||||
|
||||
| Command | Description |
|
||||
|---------|-------------|
|
||||
| `detect <video-path> [options]` | Extract frames, detect product snapshots |
|
||||
| `search <image-path>` | Search products by image via API |
|
||||
| `detect-and-search <video-path> [options]` | Detect best snapshot then run image search |
|
||||
| `session` | Get auth session token |
|
||||
| 命令 | 使用场景 |
|
||||
|------|---------|
|
||||
| `detect-best-and-search <video>` | **视频输入的默认命令。** 始终找出最佳画面(不管置信度高低),然后搜图。 |
|
||||
| `detect-best <video>` | 只提取最佳画面,不搜图。 |
|
||||
| `search <image-path>` | 已经有商品截图了,跳过检测直接搜图。 |
|
||||
| `detect-and-search <video>` | 旧版。过滤可能太严格导致无结果。建议用 `detect-best-and-search`。 |
|
||||
| `session` | 获取当前认证会话 token。 |
|
||||
|
||||
## Options for `detect` / `detect-and-search`
|
||||
## `detect-best` / `detect-best-and-search` 选项
|
||||
|
||||
| Flag | Default | Description |
|
||||
|------|---------|-------------|
|
||||
| `--interval=<sec>` | `1` | Seconds between sampled frames |
|
||||
| `--max-frames=<n>` | `60` | Max frames to analyze |
|
||||
| `--output-dir=<dir>` | next to video | Directory to save snapshot images |
|
||||
| `--min-confidence=<0-1>` | `0.7` | Minimum detection confidence threshold |
|
||||
| 参数 | 默认值 | 说明 |
|
||||
|------|--------|------|
|
||||
| `--interval=<秒>` | `0.5` | 抽帧间隔(秒) |
|
||||
| `--max-frames=<数量>` | `60` | 最多抽帧数 |
|
||||
| `--output-dir=<目录>` | 视频同目录 | 帧图片保存目录 |
|
||||
|
||||
## Examples
|
||||
## 画面选择原理
|
||||
|
||||
```bash
|
||||
# Detect product frames in a video
|
||||
bun dist/run.js detect ./product-demo.mp4
|
||||
两轮 Vision 流水线:
|
||||
|
||||
# Sample every 5 seconds, higher confidence threshold
|
||||
bun dist/run.js detect ./product-demo.mp4 --interval=5 --min-confidence=0.85
|
||||
1. **过滤轮**(仅 `detect` / `detect-and-search`)—— 每帧二分类:保留/丢弃。可能过于严格返回空。
|
||||
2. **排名轮** —— 所有候选帧一起发给模型,从中选出最清晰、最完整、最突出的一张商品图。
|
||||
|
||||
# Search for products using an existing image
|
||||
bun dist/run.js search ./snapshot.jpg
|
||||
`detect-best` 跳过第一轮,所有帧直接进排名轮。超过 20 帧时会均匀采样到 20 帧再调用。**只要视频能出帧,就一定返回结果。**
|
||||
|
||||
# Full pipeline: detect best product frame then search
|
||||
bun dist/run.js detect-and-search ./product-demo.mp4 --interval=3 --max-frames=20
|
||||
```
|
||||
## 输出格式
|
||||
|
||||
## Output
|
||||
|
||||
Returns JSON with:
|
||||
- `productFrames[]`: all detected product frames sorted by confidence (highest first)
|
||||
- `bestSnapshot`: the highest-confidence product frame
|
||||
- `searchBody`: image search API response (for `detect-and-search` and `search`)
|
||||
|
||||
Each `ProductFrame` contains:
|
||||
```json
|
||||
{
|
||||
"bestSnapshot": {
|
||||
"frameIndex": 4,
|
||||
"timestampSeconds": 9,
|
||||
"imagePath": "/path/to/snapshot/frame_0004.jpg",
|
||||
"confidence": 0.92,
|
||||
"timestampSeconds": 2,
|
||||
"imagePath": "/path/to/frame_0004.jpg",
|
||||
"croppedImagePath": "/path/to/frame_0004_cropped.jpg",
|
||||
"confidence": 0.95,
|
||||
"description": "White sneaker with blue logo, left side view",
|
||||
"boundingHint": "centered"
|
||||
"boundingHint": "Product fully visible, centered, no hands"
|
||||
},
|
||||
"rerank": {
|
||||
"results": [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
## 结果展示格式
|
||||
|
||||
- `ffmpeg` and `ffprobe` in PATH
|
||||
- `VISION_API_KEY` — API key for the vision endpoint
|
||||
- `VISION_API_BASE` — (optional) OpenAI-compatible base URL; omit to use OpenAI default
|
||||
- `VISION_MODEL` — (optional) model name, default `gpt-4o-mini`
|
||||
- `auth-rt` in PATH (for `search` / `detect-and-search` API calls)
|
||||
|
||||
### Example provider configs
|
||||
|
||||
```bash
|
||||
# OpenAI (default)
|
||||
VISION_API_KEY=sk-...
|
||||
|
||||
# Any OpenAI-compatible endpoint (local Ollama, Together, Groq, etc.)
|
||||
VISION_API_KEY=...
|
||||
VISION_API_BASE=http://localhost:11434/v1
|
||||
VISION_MODEL=llava:13b
|
||||
```
|
||||
|
||||
## Result formatting
|
||||
|
||||
After the CLI completes, format `rerank.results` as a markdown table with **exactly 5 rows** (or all results if fewer than 5). Do NOT split into "最佳匹配" / "其他热门选项" — show everything in one flat table.
|
||||
CLI 执行完成后,将 `rerank.results` 格式化为 markdown 表格,**每页 5 行**(如不足 5 行则全显示)。
|
||||
|
||||
| # | 商品名称 | 价格 | 销量 | 链接 |
|
||||
|---|----------|------|------|------|
|
||||
| 1 | {title} | ¥{promotion_price \|\| price} | {sales ?? —}件 | [查看](https://detail.1688.com/offer/{num_iid}.html) |
|
||||
|
||||
- Use `promotion_price` when present, otherwise `price`
|
||||
- If `sales` is missing or zero, show `—`
|
||||
- Always render as a markdown table, never as bullet points
|
||||
- 有 `promotion_price` 用促销价,否则用原价
|
||||
- `sales` 缺失或为零时显示 `—`
|
||||
- 始终用 markdown 表格展示,不要用列表
|
||||
|
||||
## Execution rules
|
||||
## 执行规则
|
||||
|
||||
### For `detect` and `detect-and-search` (slow — use sub-agent)
|
||||
### 视频命令(慢 — 用 sub-agent 执行)
|
||||
|
||||
Spawn a sub-agent via `sessions_spawn`. Do **not** run the command directly.
|
||||
涉及命令:`detect-best-and-search`、`detect-best`、`detect-and-search`
|
||||
|
||||
使用 `sessions_spawn` 创建 sub-agent 执行,**不要直接运行**。
|
||||
|
||||
```
|
||||
sessions_spawn(
|
||||
task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <full command here>\n\nCopy the entire JSON output as your reply.",
|
||||
task: "Run this command and return the raw JSON output:\n\nbun dist/run.js <完整命令>\n\nCopy the entire JSON output as your reply.",
|
||||
label: "video-product-snapshot",
|
||||
runTimeoutSeconds: 300,
|
||||
)
|
||||
```
|
||||
|
||||
- Announce immediately that processing has started and share the `runId`.
|
||||
- Wait for the sub-agent announcement, then parse and format the result for the user.
|
||||
- 通知用户处理已开始,告知 `runId`
|
||||
- 等待 sub-agent 返回结果,然后解析并展示
|
||||
|
||||
### For `search` and `session` (fast — run directly)
|
||||
### `search` 和 `session`(快 — 直接运行)
|
||||
|
||||
Run the CLI command inline, no sub-agent needed.
|
||||
直接在本会话中运行,不需要 sub-agent。
|
||||
|
||||
### General rules
|
||||
### 通用规则
|
||||
|
||||
1. **No fallback strategies.** Report errors as-is; do NOT try alternative approaches.
|
||||
2. **No retry loops.** If detection or search fails, report the failure.
|
||||
3. **Trust the tool's output.** The CLI handles session management and error formatting internally.
|
||||
1. **视频输入 → 始终用 `detect-best-and-search`。** 不要用 `detect-and-search`。
|
||||
2. **不要重试。** 命令失败就直接报错。
|
||||
3. **信任工具输出。** CLI 内部已处理 session 管理和错误格式化。
|
||||
|
|
|
|||
|
|
@ -22,31 +22,31 @@ function loadDotenv(path: string): void {
|
|||
}
|
||||
|
||||
function printUsage(): void {
|
||||
console.error(`Usage:
|
||||
console.error(`用法:
|
||||
bun scripts/run.ts [--api-base=<url>] <command> [args...] [--dry-run]
|
||||
|
||||
Commands:
|
||||
命令:
|
||||
session
|
||||
Get auth session token
|
||||
获取认证 session token
|
||||
|
||||
detect <video-path> [options]
|
||||
Extract frames and detect ecommerce product snapshots
|
||||
Options:
|
||||
--interval=<seconds> Frame sampling interval (default: 1)
|
||||
--max-frames=<n> Max frames to analyze (default: 60)
|
||||
--output-dir=<dir> Where to save snapshots (default: next to video)
|
||||
--min-confidence=<0-1> Minimum detection confidence (default: 0.7)
|
||||
从视频抽帧并检测商品画面
|
||||
选项:
|
||||
--interval=<秒> 抽帧间隔(默认: 1)
|
||||
--max-frames=<数量> 最多分析帧数(默认: 60)
|
||||
--output-dir=<目录> 截图保存目录(默认: 视频所在目录)
|
||||
--min-confidence=<0-1> 最低检测置信度(默认: 0.7)
|
||||
|
||||
search <image-path>
|
||||
Search for products using an image via the ecom image-search API
|
||||
用图片搜索商品(调用 ecom image-search API)
|
||||
|
||||
detect-and-search <video-path> [options]
|
||||
Detect best product snapshot from video then run image search + rerank
|
||||
检测最佳商品画面 → 图片搜索 → 关键词重排序
|
||||
|
||||
rerank --image-results=<json> [--description=<text>] [--keyword=<text>] [--top=<n>]
|
||||
Filter image search results using keyword intersection
|
||||
通过关键词交并集过滤搜索结果
|
||||
|
||||
Config: ~/.openclaw/.env (CLIENT_KEY), skill .env (VISION_API_KEY)
|
||||
配置文件: ~/.openclaw/.env (CLIENT_KEY), skill 目录 .env (VISION_API_KEY)
|
||||
`);
|
||||
}
|
||||
|
||||
|
|
|
|||
83
src/index.ts
83
src/index.ts
|
|
@ -3,7 +3,7 @@ import * as path from 'path';
|
|||
import type { Command, DetectOptions, DetectResult, SearchResult, OutputResult, SearchItem } from './types.ts';
|
||||
import { createSkillClient } from './auth-cli.ts';
|
||||
import { extractFrames } from './frame-extractor.ts';
|
||||
import { detectProductFrames } from './product-detector.ts';
|
||||
import { detectProductFrames, detectBestFrame } from './product-detector.ts';
|
||||
import { imageToBase64 } from './frame-extractor.ts';
|
||||
import { generateText } from 'ai';
|
||||
import { createOpenAI } from '@ai-sdk/openai';
|
||||
|
|
@ -39,6 +39,10 @@ export async function run(
|
|||
return runSearch(args, dryRun);
|
||||
case 'detect-and-search':
|
||||
return runDetectAndSearch(args, dryRun);
|
||||
case 'detect-best':
|
||||
return runDetectBest(args, dryRun);
|
||||
case 'detect-best-and-search':
|
||||
return runDetectBestAndSearch(args, dryRun);
|
||||
case 'rerank':
|
||||
return runRerank(args, dryRun);
|
||||
default:
|
||||
|
|
@ -125,6 +129,83 @@ async function runSearch(args: string[], dryRun: boolean): Promise<SearchResult>
|
|||
return { status: 'success', command: 'search', dryRun, imagePath, searchHttpStatus, searchBody: body };
|
||||
}
|
||||
|
||||
async function runDetectBest(args: string[], dryRun: boolean): Promise<DetectResult> {
|
||||
const videoPath = args[0];
|
||||
if (!videoPath) return { status: 'failed', command: 'detect-best', dryRun, error: 'detect-best requires <video-path>' };
|
||||
if (!fs.existsSync(videoPath)) return { status: 'failed', command: 'detect-best', dryRun, error: `video not found: ${videoPath}` };
|
||||
|
||||
const outputDir = getFlag(args, '--output-dir') || path.join(
|
||||
path.dirname(videoPath),
|
||||
`snapshots_${path.basename(videoPath, path.extname(videoPath))}_${Date.now()}`,
|
||||
);
|
||||
const intervalSeconds = parseFloat(getFlag(args, '--interval') || '0.5');
|
||||
const maxFrames = parseInt(getFlag(args, '--max-frames') || '60', 10);
|
||||
|
||||
if (dryRun) {
|
||||
return { status: 'success', command: 'detect-best', dryRun, videoPath, totalFramesExtracted: 0, productFrames: [], bestSnapshot: undefined };
|
||||
}
|
||||
|
||||
const client = createSkillClient();
|
||||
const visionConfig = await loadVisionConfig(client);
|
||||
|
||||
const frames = extractFrames(videoPath, outputDir, intervalSeconds, maxFrames);
|
||||
if (frames.length === 0) {
|
||||
return { status: 'failed', command: 'detect-best', dryRun, videoPath, error: 'no frames extracted from video' };
|
||||
}
|
||||
|
||||
const best = await detectBestFrame(frames, 10, visionConfig);
|
||||
|
||||
return {
|
||||
status: 'success',
|
||||
command: 'detect-best',
|
||||
dryRun,
|
||||
videoPath,
|
||||
totalFramesExtracted: frames.length,
|
||||
productFrames: best ? [best] : [],
|
||||
bestSnapshot: best ?? undefined,
|
||||
};
|
||||
}
|
||||
|
||||
async function runDetectBestAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
|
||||
const detectResult = await runDetectBest(args, dryRun) as DetectResult;
|
||||
if (detectResult.status === 'failed') return detectResult;
|
||||
|
||||
if (!detectResult.bestSnapshot) {
|
||||
if (dryRun) return { ...detectResult, command: 'detect-best-and-search' };
|
||||
return { ...detectResult, status: 'failed', error: 'no frame could be extracted from video' };
|
||||
}
|
||||
|
||||
const best = detectResult.bestSnapshot;
|
||||
const imageForSearch = best.croppedImagePath || best.imagePath;
|
||||
const searchResult = await runSearch([imageForSearch], dryRun) as SearchResult;
|
||||
|
||||
let rerankResult: any = undefined;
|
||||
if (!dryRun && searchResult.status === 'success' && searchResult.searchBody) {
|
||||
const tmpFile = path.join(path.dirname(imageForSearch), `search_body_${Date.now()}.json`);
|
||||
try {
|
||||
fs.writeFileSync(tmpFile, JSON.stringify(searchResult.searchBody));
|
||||
rerankResult = await runRerank([
|
||||
`--image-results=${tmpFile}`,
|
||||
`--description=${best.description}`,
|
||||
'--top=10',
|
||||
], dryRun);
|
||||
} catch (e: any) {
|
||||
rerankResult = { error: e.message };
|
||||
} finally {
|
||||
try { fs.unlinkSync(tmpFile); } catch {}
|
||||
}
|
||||
}
|
||||
|
||||
return {
|
||||
...detectResult,
|
||||
command: 'detect-best-and-search',
|
||||
searchHttpStatus: searchResult.searchHttpStatus,
|
||||
searchBody: searchResult.searchBody,
|
||||
searchError: searchResult.error,
|
||||
rerank: rerankResult,
|
||||
} as any;
|
||||
}
|
||||
|
||||
async function runDetectAndSearch(args: string[], dryRun: boolean): Promise<OutputResult> {
|
||||
const detectResult = await runDetect(args, dryRun) as DetectResult;
|
||||
if (detectResult.status === 'failed') return detectResult;
|
||||
|
|
|
|||
|
|
@ -151,6 +151,40 @@ async function withConcurrency<T>(
|
|||
return results;
|
||||
}
|
||||
|
||||
// Skips Pass 1 filter entirely — ranks all frames and always returns the best one.
|
||||
// Evenly samples down to maxCandidates when there are too many frames.
|
||||
export async function detectBestFrame(
|
||||
frames: ExtractedFrame[],
|
||||
concurrency: number = 10,
|
||||
visionConfig: VisionConfig,
|
||||
maxCandidates: number = 20,
|
||||
): Promise<ProductFrame | null> {
|
||||
if (frames.length === 0) return null;
|
||||
|
||||
const model = createVisionModel(visionConfig);
|
||||
|
||||
let candidates = frames;
|
||||
if (frames.length > maxCandidates) {
|
||||
const step = frames.length / maxCandidates;
|
||||
candidates = Array.from({ length: maxCandidates }, (_, i) => frames[Math.floor(i * step)]);
|
||||
}
|
||||
|
||||
const { bestFrame, description, reasoning, boundingBox } = await rankCandidates(candidates, model);
|
||||
|
||||
const croppedPath = bestFrame.imagePath.replace(/\.jpg$/, '_cropped.jpg');
|
||||
await cropProduct(bestFrame.imagePath, boundingBox, croppedPath);
|
||||
|
||||
return {
|
||||
frameIndex: bestFrame.frameIndex,
|
||||
timestampSeconds: bestFrame.timestampSeconds,
|
||||
imagePath: bestFrame.imagePath,
|
||||
croppedImagePath: croppedPath,
|
||||
confidence: 0.95,
|
||||
description,
|
||||
boundingHint: reasoning,
|
||||
};
|
||||
}
|
||||
|
||||
export async function detectProductFrames(
|
||||
frames: ExtractedFrame[],
|
||||
minConfidence: number,
|
||||
|
|
|
|||
|
|
@ -1,4 +1,4 @@
|
|||
export type Command = 'detect' | 'search' | 'detect-and-search' | 'rerank' | 'session';
|
||||
export type Command = 'detect' | 'search' | 'detect-and-search' | 'detect-best' | 'detect-best-and-search' | 'rerank' | 'session';
|
||||
|
||||
export interface SearchItem {
|
||||
num_iid: number;
|
||||
|
|
|
|||
Loading…
Reference in New Issue