diff --git a/README.md b/README.md index d916856..29c6070 100644 --- a/README.md +++ b/README.md @@ -1,65 +1,32 @@ -# template-skill +# 1688-logistics-scraper -新 skill 的基础模版。 +从 1688 商品页面提取物流相关数据(重量、尺寸、体积)。 -## 认证机制:auth-cli.ts +通过 Chrome DevTools Protocol (CDP) 连接到已运行的 Chrome 浏览器,自动提取商品属性、SKU 变体中的物流数据,并下载详情图片供进一步分析。 -每个 skill 内置一份 `src/auth-cli.ts`,它是一个薄 wrapper,通过 subprocess 调用 `auth-rt` 二进制。 +## 前置条件 -**不使用 npm 依赖**,auth-runtime 更新时只需重新编译二进制,不需要改动任何 skill。 - -### 工作原理 - -``` -skill/src/index.ts - → import { createSkillClient } from './auth-cli.ts' - → auth-cli.ts 通过 spawnSync 调用 auth-rt 二进制 - → auth-rt 处理 token/session/request -``` - -### 使用方式 - -```typescript -import { createSkillClient } from './auth-cli.ts'; - -const client = createSkillClient({ - apiBase: process.env.API_BASE, // 可选 - dryRun: false, // 可选,dry-run 模式返回模拟数据 -}); - -// API 调用 -const res = await client.post('/ecom/your/endpoint', { param: 'value' }); -// res = { status: 200, body: '...' } - -// 获取 session -const session = await client.session(); -// session = { accessToken: '...', expiresIn: 900 } -``` - -### 前置条件 - -每台运行 skill 的机器上必须安装 `auth-rt` 二进制: +启动 Chrome 并开启远程调试: ```bash -git clone http://192.168.0.108:3030/agent-skills/auth-runtime.git ~/clawd/skills/auth-runtime -cd ~/clawd/skills/auth-runtime && ./install.sh -# 安装到 ~/.openclaw/bin/auth-rt +/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 ``` -确保 `~/.openclaw/bin` 在 PATH 中,或通过 `AUTH_RT_BIN` 环境变量指定路径。 +## 安装 -### auth-runtime 更新流程 - -auth-runtime 代码变更后: ```bash -cd ~/clawd/skills/auth-runtime && git pull && ./install.sh +bash install.sh ``` -重新编译即可,**无需改动任何 skill 代码**。 -### 新建 skill 检查清单 +## 使用 -1. 从此模版创建仓库 -2. 确认 `src/auth-cli.ts` 已包含(直接从模版继承) -3. `src/index.ts` 中 `import { createSkillClient } from './auth-cli.ts'` -4. `package.json` 中 **不要** 添加 `@clawd/auth-runtime` 依赖 -5. `install.sh` 中包含 auth-rt 二进制检查 +```bash +bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' +``` + +## 数据来源 + +1. 商品属性表(商品属性/商品参数) +2. SKU/变体规格 +3. 物流信息区域 +4. 商品详情图片(下载到 `/tmp/1688-logistics//`) diff --git a/SKILL.md b/SKILL.md index db0dee6..656975f 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,26 +1,75 @@ --- -name: my-skill -description: "TODO: describe what this skill does and when to use it." +name: 1688-logistics-scraper +description: "Extract product weight/size/logistics data from 1688 product pages via Chrome browser, output structured JSON. Use when the user provides a 1688 product URL and needs logistics specs." --- -# my-skill +# 1688 Logistics Scraper -TODO: one-line description. - -> Auth is handled automatically via `auth-cli.ts` → `auth-runtime` CLI. +Extract product weight, size, and logistics data from 1688 product pages. ## Run ```bash -bun scripts/run.ts [args] [--dry-run] +bun scripts/run.ts scrape [--dry-run] ``` -## Commands +### Examples -| Command | Description | -|---------|-------------| -| `run ` | TODO: describe | +```bash +bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' +bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run +``` + +## What It Does + +1. Opens the 1688 product URL in the browser +2. Extracts weight/size data from wherever it appears on the page — product attributes, variant specs, logistics section +3. Downloads detail images (商品详情图片) for analysis — weight/size is often only in images +4. Outputs structured JSON + +## Where To Look For Data + +Weight/size data on 1688 pages hides in multiple places. Check all before giving up: + +1. **Product attributes** (商品属性 / 商品参数) — key-value table, most reliable +2. **Variant/SKU specs** — per-variant weight or size +3. **Logistics section** — shipping weight, volume, freight info +4. **Detail images** — downloaded to `/tmp/1688-logistics//`, read them to find weight/size text baked into images ## Output -Returns JSON: `{ "status": "success" | "failed", "data": ... }` +```json +{ + "status": "success", + "url": "https://detail.1688.com/offer/...", + "product": { + "title": "产品标题", + "logistics": { + "weight": { "value": 0.5, "unit": "kg", "source": "attributes" }, + "dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "attributes" }, + "grossWeight": null, + "netWeight": null, + "packageWeight": null, + "volume": null, + "shippingMethod": null, + "shippingCost": null, + "origin": null + }, + "variants": [ + { "name": "颜色: 红色", "weight": null, "dimensions": null } + ] + }, + "detailImages": ["/tmp/1688-logistics/852504650877/img_001.jpg"], + "rawAttributes": { "重量": "0.5kg", "尺寸": "30*20*10cm" } +} +``` + +`null` = not found in text. Check `detailImages` — the data may be in the images. + +## Rules + +1. If the browser is not running, report the error. Do not try to launch it. +2. Check all data sources before reporting `null`. +3. Normalize units: 克→kg, 毫米→cm. Keep raw values in `rawAttributes`. +4. No retries. If it fails, report as-is. +5. Trust page content. Do not guess values. diff --git a/install.sh b/install.sh index 9949f59..7beffc6 100755 --- a/install.sh +++ b/install.sh @@ -2,25 +2,8 @@ set -euo pipefail cd "$(dirname "$0")" -# Auto-install auth-rt if not found -if ! command -v auth-rt &>/dev/null && [ ! -x "$HOME/.local/bin/auth-rt" ]; then - echo "auth-rt not found, installing..." - _FORGEJO="http://192.168.0.108:3030" - _OS="$(uname -s | tr '[:upper:]' '[:lower:]')" - _ARCH="$(uname -m)"; case "$_ARCH" in x86_64) _ARCH="amd64";; aarch64) _ARCH="arm64";; esac - _URL="$_FORGEJO/agent-skills/auth-runtime/releases/download/latest/auth-rt-${_OS}-${_ARCH}" - mkdir -p "$HOME/.local/bin" - if curl -fsSL "$_URL" -o "$HOME/.local/bin/auth-rt" 2>/dev/null; then - chmod +x "$HOME/.local/bin/auth-rt" - echo "auth-rt installed (downloaded)" - else - echo "Download failed, building from source..." - _SRC="$HOME/.local/share/auth-runtime" - if [ -d "$_SRC/.git" ]; then git -C "$_SRC" pull --ff-only - else git clone --depth 1 "$_FORGEJO/agent-skills/auth-runtime.git" "$_SRC" - fi - bash "$_SRC/install.sh" - fi -fi - -npm install +bun install +echo "1688-logistics-scraper installed." +echo "" +echo "Prerequisites: Chrome must be running with remote debugging:" +echo " /Applications/Google\\ Chrome.app/Contents/MacOS/Google\\ Chrome --remote-debugging-port=9222" diff --git a/package.json b/package.json index d1deedc..91a115e 100644 --- a/package.json +++ b/package.json @@ -1,5 +1,5 @@ { - "name": "my-skill", + "name": "1688-logistics-scraper", "version": "0.1.0", "type": "module", "scripts": { diff --git a/scripts/run.ts b/scripts/run.ts index 378f6b2..0756918 100644 --- a/scripts/run.ts +++ b/scripts/run.ts @@ -4,24 +4,31 @@ import { run } from '../src/index.ts'; function printUsage(): void { console.error(`Usage: - bun scripts/run.ts [--api-base=] [args...] [--dry-run] + bun scripts/run.ts [--port=] [args...] [--dry-run] Commands: - run + scrape <1688-url> Scrape logistics data (weight/size) from product page -Config: ~/.openclaw/.env (API_BASE) +Examples: + bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' + bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run + bun scripts/run.ts --port=9223 scrape 'https://detail.1688.com/offer/852504650877.html' + +Prerequisites: + Chrome must be running with --remote-debugging-port=9222 `); } async function main(): Promise { const positionals: string[] = []; let dryRun = false; + let port = 9222; for (const arg of process.argv.slice(2)) { if (arg === '--dry-run') { dryRun = true; - } else if (arg.startsWith('--api-base=')) { - process.env.API_BASE = arg.slice('--api-base='.length).trim(); + } else if (arg.startsWith('--port=')) { + port = parseInt(arg.slice('--port='.length), 10); } else if (arg === '-h' || arg === '--help') { printUsage(); process.exit(0); } else { @@ -31,11 +38,14 @@ async function main(): Promise { if (positionals.length < 1) { printUsage(); process.exit(1); } - const result = await run(positionals[0] as Command, positionals.slice(1), dryRun); + const result = await run(positionals[0] as Command, positionals.slice(1), dryRun, port); console.log(JSON.stringify(result, null, 2)); } main().catch((err) => { - console.error(JSON.stringify({ status: 'failed', error: err instanceof Error ? err.message : String(err) }, null, 2)); + console.error(JSON.stringify({ + status: 'failed', + error: err instanceof Error ? err.message : String(err), + }, null, 2)); process.exit(1); }); diff --git a/src/auth-cli.ts b/src/auth-cli.ts deleted file mode 100644 index d072a88..0000000 --- a/src/auth-cli.ts +++ /dev/null @@ -1,119 +0,0 @@ -/** - * Thin CLI wrapper for auth-runtime. - * - * Copy this file into your skill's src/ directory. It calls the - * `auth-rt` binary (a standalone Go executable), so the skill has - * zero npm/runtime dependency on auth-runtime. - * - * Prerequisites: - * `auth-rt` must be in PATH or at ~/.local/bin/auth-rt - * (install.sh handles this automatically) - * - * Usage: - * import { createSkillClient } from './auth-cli.ts'; - * const client = createSkillClient(); - * const res = await client.post('/ecom/tasks/scrape', { url: '...' }); - */ - -import { spawnSync } from 'child_process'; -import * as path from 'path'; -import * as os from 'os'; - -const home = process.env.HOME || os.homedir(); -const AUTH_RT_BIN = process.env.AUTH_RT_BIN - || (() => { - // Check if auth-rt is in PATH - const which = spawnSync('which', ['auth-rt'], { encoding: 'utf-8' }); - if (which.status === 0 && which.stdout.trim()) { - return which.stdout.trim(); - } - return path.join(home, '.local', 'bin', 'auth-rt'); - })(); - -export interface ApiResponse { - status: number; - body: string; -} - -export interface SessionResponse { - accessToken: string; - expiresIn: number; - ownerSessionToken?: string; - hookUrl?: string; - hookToken?: string; -} - -export interface SkillClientOptions { - apiBase?: string; - dryRun?: boolean; -} - -function runCli(...args: string[]): string { - const result = spawnSync(AUTH_RT_BIN, args, { - encoding: 'utf-8', - timeout: 60_000, - }); - - if (result.error) { - throw new Error(`auth-rt spawn failed: ${result.error.message}`); - } - if (result.status !== 0) { - throw new Error(`auth-rt failed (exit ${result.status}): ${(result.stderr || '').trim()}`); - } - return (result.stdout || '').trim(); -} - -export class SkillClient { - private readonly apiBase?: string; - private readonly dryRun: boolean; - - constructor(options: SkillClientOptions = {}) { - this.apiBase = options.apiBase; - this.dryRun = options.dryRun ?? false; - } - - async session(): Promise { - if (this.dryRun) { - return { accessToken: '', expiresIn: 900 }; - } - return JSON.parse(runCli('session')); - } - - async get(urlPath: string): Promise { - return this.request('GET', urlPath); - } - - async post(urlPath: string, body?: unknown): Promise { - return this.request('POST', urlPath, body); - } - - async put(urlPath: string, body?: unknown): Promise { - return this.request('PUT', urlPath, body); - } - - async patch(urlPath: string, body?: unknown): Promise { - return this.request('PATCH', urlPath, body); - } - - async delete(urlPath: string, body?: unknown): Promise { - return this.request('DELETE', urlPath, body); - } - - private async request(method: string, urlPath: string, body?: unknown): Promise { - if (this.dryRun) { - return { status: 200, body: JSON.stringify({ dryRun: true, method, path: urlPath }) }; - } - const args = ['request', method, urlPath]; - if (body != null) { - args.push('--body', JSON.stringify(body)); - } - if (this.apiBase) { - args.push('--api-base', this.apiBase); - } - return JSON.parse(runCli(...args)); - } -} - -export function createSkillClient(options?: SkillClientOptions): SkillClient { - return new SkillClient(options); -} diff --git a/src/index.ts b/src/index.ts index 56effac..d416c5e 100644 --- a/src/index.ts +++ b/src/index.ts @@ -1,34 +1,355 @@ -import { createSkillClient, type ApiResponse } from './auth-cli.ts'; +import * as fs from 'fs'; +import * as path from 'path'; -export type Command = 'run'; // TODO: add your commands +export type Command = 'scrape'; -export interface RunResult { +export interface LogisticsValue { + value: number | null; + unit: string | null; + source: string; +} + +export interface Dimensions { + length: number | null; + width: number | null; + height: number | null; + unit: string | null; + source: string; +} + +export interface LogisticsData { + weight: LogisticsValue | null; + dimensions: Dimensions | null; + grossWeight: LogisticsValue | null; + netWeight: LogisticsValue | null; + packageWeight: LogisticsValue | null; + volume: LogisticsValue | null; + shippingMethod: string | null; + shippingCost: string | null; + origin: string | null; +} + +export interface VariantInfo { + name: string; + weight: LogisticsValue | null; + dimensions: Dimensions | null; +} + +export interface ScrapeResult { status: 'success' | 'failed'; + url: string; command: Command; dryRun: boolean; - data?: unknown; + product?: { + title: string; + logistics: LogisticsData; + variants: VariantInfo[]; + }; + detailImages?: string[]; + rawAttributes?: Record; error?: string; } +// --- CDP helpers (raw WebSocket, no npm deps) --- + +interface CdpResult { + id: number; + result?: any; + error?: { message: string }; +} + +class CdpSession { + private ws!: WebSocket; + private msgId = 0; + private pending = new Map void; reject: (e: Error) => void }>(); + + static async connect(port: number): Promise { + const resp = await fetch(`http://127.0.0.1:${port}/json`); + const targets = (await resp.json()) as Array<{ webSocketDebuggerUrl: string; type: string }>; + const page = targets.find(t => t.type === 'page'); + if (!page) throw new Error('No Chrome page tab found. Open a tab first.'); + const session = new CdpSession(); + await session.open(page.webSocketDebuggerUrl); + return session; + } + + private open(wsUrl: string): Promise { + return new Promise((resolve, reject) => { + this.ws = new WebSocket(wsUrl); + this.ws.onopen = () => resolve(); + this.ws.onerror = (e: any) => reject(new Error(`WebSocket error: ${e.message || e}`)); + this.ws.onmessage = (ev: MessageEvent) => { + const msg: CdpResult = JSON.parse(typeof ev.data === 'string' ? ev.data : ev.data.toString()); + if (msg.id != null && this.pending.has(msg.id)) { + const p = this.pending.get(msg.id)!; + this.pending.delete(msg.id); + if (msg.error) p.reject(new Error(msg.error.message)); + else p.resolve(msg.result); + } + }; + }); + } + + send(method: string, params: Record = {}): Promise { + const id = ++this.msgId; + return new Promise((resolve, reject) => { + this.pending.set(id, { resolve, reject }); + this.ws.send(JSON.stringify({ id, method, params })); + }); + } + + async evaluate(expression: string): Promise { + const res = await this.send('Runtime.evaluate', { expression, returnByValue: true }); + return res?.result?.value; + } + + close() { + try { this.ws.close(); } catch {} + } +} + +// --- Parsers --- + +const WEIGHT_KEYS = ['重量', '毛重', '净重', '单件重量', '包装重量', '产品重量', '单品重量', 'weight']; +const DIMENSION_KEYS = ['尺寸', '规格', '长宽高', '外箱尺寸', '包装尺寸', '产品尺寸', '大小', 'size', 'dimensions']; +const VOLUME_KEYS = ['体积', '容积', 'volume']; + +function extractOfferId(url: string): string { + return url.match(/offer\/(\d+)/)?.[1] || 'unknown'; +} + +function parseWeight(raw: string): LogisticsValue | null { + const m = raw.match(/([\d.]+)\s*(kg|g|克|千克|公斤|斤)/i); + if (!m) return null; + let value = parseFloat(m[1]); + let unit = m[2].toLowerCase(); + if (unit === 'g' || unit === '克') { value /= 1000; unit = 'kg'; } + if (unit === '千克' || unit === '公斤') unit = 'kg'; + if (unit === '斤') { value *= 0.5; unit = 'kg'; } + return { value, unit, source: '' }; +} + +function parseDimensions(raw: string): Dimensions | null { + const m = raw.match(/([\d.]+)\s*[*xX×]\s*([\d.]+)\s*[*xX×]\s*([\d.]+)\s*(cm|mm|毫米|厘米|m|米)?/i); + if (!m) return null; + let [l, w, h] = [parseFloat(m[1]), parseFloat(m[2]), parseFloat(m[3])]; + let unit = (m[4] || 'cm').toLowerCase(); + if (unit === 'mm' || unit === '毫米') { l /= 10; w /= 10; h /= 10; unit = 'cm'; } + if (unit === '厘米') unit = 'cm'; + if (unit === 'm' || unit === '米') { l *= 100; w *= 100; h *= 100; unit = 'cm'; } + return { length: l, width: w, height: h, unit, source: '' }; +} + +function parseVolume(raw: string): LogisticsValue | null { + const m = raw.match(/([\d.]+)\s*(m³|cm³|L|ml|升|毫升|立方米|立方厘米)/i); + if (!m) return null; + return { value: parseFloat(m[1]), unit: m[2], source: '' }; +} + +function matchKey(text: string, keys: string[]): boolean { + const lower = text.toLowerCase(); + return keys.some(k => lower.includes(k.toLowerCase())); +} + +// --- Page extraction --- + +const JS_EXTRACT_ATTRS = ` +(function() { + const attrs = {}; + const sels = [ + '.detail-attributes-list .attributes-item', + '.obj-leading .obj-content li', + '#mod-detail-attributes .attribute-item', + '.detail-info table tr', + '[class*="attribute"] li', + '[class*="param"] li', + '.offer-attr-list .offer-attr-item', + ]; + for (const sel of sels) { + document.querySelectorAll(sel).forEach(el => { + const parts = el.textContent.trim().split(/[::]/); + if (parts.length >= 2) attrs[parts[0].trim()] = parts.slice(1).join(':').trim(); + }); + } + document.querySelectorAll('table tr, .detail-attributes-list tr').forEach(tr => { + const cells = tr.querySelectorAll('td, th'); + if (cells.length >= 2) attrs[cells[0].textContent.trim()] = cells[1].textContent.trim(); + }); + return JSON.stringify(attrs); +})()`; + +const JS_EXTRACT_VARIANTS = ` +(function() { + const variants = []; + const sels = [ + '.sku-item-wrapper .sku-item', + '[class*="sku"] [class*="item"]', + '.obj-sku .obj-content li', + '.unit-detail-spec-operator .spec-item', + ]; + for (const sel of sels) { + document.querySelectorAll(sel).forEach(el => { + const name = el.textContent.trim().replace(/\\s+/g, ' '); + if (name && name.length < 200) variants.push({ name, text: el.textContent }); + }); + } + return JSON.stringify(variants); +})()`; + +const JS_EXTRACT_TITLE = ` +(function() { + for (const sel of ['.title-text','.detail-title-text','h1[class*="title"]','.mod-detail-title h1','.d-title']) { + const el = document.querySelector(sel); + if (el && el.textContent.trim()) return el.textContent.trim(); + } + return document.title || ''; +})()`; + +const JS_EXTRACT_IMAGES = ` +(function() { + const imgs = [], seen = new Set(); + const sels = [ + '#desc-lazyload-container img', + '.detail-desc-decorate-richtext img', + '[class*="detail-desc"] img', + '.mod-detail-description img', + '.offer-attr-item img', + '.desc-img-loaded img', + ]; + for (const sel of sels) { + document.querySelectorAll(sel).forEach(img => { + const src = img.src || img.dataset.src || img.dataset.lazySrc || ''; + if (src && !seen.has(src) && (src.startsWith('http') || src.startsWith('//'))) { + seen.add(src); + imgs.push(src.startsWith('//') ? 'https:' + src : src); + } + }); + } + return JSON.stringify(imgs); +})()`; + +async function downloadImages(urls: string[], outputDir: string): Promise { + fs.mkdirSync(outputDir, { recursive: true }); + const saved: string[] = []; + for (let i = 0; i < urls.length; i++) { + try { + const resp = await fetch(urls[i]); + if (!resp.ok) continue; + const buf = Buffer.from(await resp.arrayBuffer()); + const ext = urls[i].match(/\.(jpg|jpeg|png|webp|gif)/i)?.[1] || 'jpg'; + const p = path.join(outputDir, `img_${String(i + 1).padStart(3, '0')}.${ext}`); + fs.writeFileSync(p, buf); + saved.push(p); + } catch {} + } + return saved; +} + +// --- Main --- + export async function run( command: Command, args: string[], dryRun: boolean, -): Promise { - const client = createSkillClient({ - apiBase: process.env.API_BASE, - dryRun, - }); - - if (command === 'run') { - const response: ApiResponse = await client.post('/your/endpoint', { param: args[0] }); - - if (response.status < 200 || response.status >= 300) { - return { status: 'failed', command, dryRun, error: `HTTP ${response.status}: ${response.body}` }; - } - - return { status: 'success', command, dryRun, data: JSON.parse(response.body) }; + cdpPort: number = 9222, +): Promise { + if (command !== 'scrape') { + return { status: 'failed', url: '', command, dryRun, error: `unknown command: ${command}` }; } - return { status: 'failed', command, dryRun, error: `unknown command: ${command}` }; + const url = args[0]; + if (!url) { + return { status: 'failed', url: '', command, dryRun, error: 'scrape requires ' }; + } + + if (dryRun) { + return { + status: 'success', url, command, dryRun, + product: { + title: '', + logistics: { + weight: null, dimensions: null, grossWeight: null, netWeight: null, + packageWeight: null, volume: null, shippingMethod: null, shippingCost: null, origin: null, + }, + variants: [], + }, + detailImages: [], + rawAttributes: {}, + }; + } + + let cdp: CdpSession | null = null; + try { + cdp = await CdpSession.connect(cdpPort); + + await cdp.send('Page.enable'); + await cdp.send('Runtime.enable'); + await cdp.send('Page.navigate', { url }); + + // Wait for load + await new Promise(r => setTimeout(r, 5000)); + + const title: string = await cdp.evaluate(JS_EXTRACT_TITLE) || ''; + const rawAttributes: Record = JSON.parse(await cdp.evaluate(JS_EXTRACT_ATTRS) || '{}'); + const rawVariants: Array<{ name: string; text: string }> = JSON.parse(await cdp.evaluate(JS_EXTRACT_VARIANTS) || '[]'); + const imgUrls: string[] = JSON.parse(await cdp.evaluate(JS_EXTRACT_IMAGES) || '[]'); + + const variants: VariantInfo[] = rawVariants.map(v => { + const weight = parseWeight(v.text); + const dimensions = parseDimensions(v.text); + if (weight) weight.source = 'variant'; + if (dimensions) dimensions.source = 'variant'; + return { name: v.name, weight, dimensions }; + }); + + const logistics: LogisticsData = { + weight: null, dimensions: null, grossWeight: null, netWeight: null, + packageWeight: null, volume: null, shippingMethod: null, shippingCost: null, origin: null, + }; + + for (const [key, val] of Object.entries(rawAttributes)) { + if (matchKey(key, ['毛重'])) { + logistics.grossWeight = parseWeight(val); + if (logistics.grossWeight) logistics.grossWeight.source = 'attributes'; + } else if (matchKey(key, ['净重'])) { + logistics.netWeight = parseWeight(val); + if (logistics.netWeight) logistics.netWeight.source = 'attributes'; + } else if (matchKey(key, ['包装重量'])) { + logistics.packageWeight = parseWeight(val); + if (logistics.packageWeight) logistics.packageWeight.source = 'attributes'; + } else if (matchKey(key, WEIGHT_KEYS)) { + logistics.weight = parseWeight(val); + if (logistics.weight) logistics.weight.source = 'attributes'; + } + if (matchKey(key, DIMENSION_KEYS)) { + logistics.dimensions = parseDimensions(val); + if (logistics.dimensions) logistics.dimensions.source = 'attributes'; + } + if (matchKey(key, VOLUME_KEYS)) { + logistics.volume = parseVolume(val); + if (logistics.volume) logistics.volume.source = 'attributes'; + } + if (matchKey(key, ['产地', '发货地', '所在地'])) { + logistics.origin = val; + } + } + + const offerId = extractOfferId(url); + const imgDir = path.join('/tmp', '1688-logistics', offerId); + const detailImages = await downloadImages(imgUrls, imgDir); + + return { + status: 'success', url, command, dryRun, + product: { title, logistics, variants }, + detailImages, + rawAttributes, + }; + } catch (error) { + return { + status: 'failed', url, command, dryRun, + error: error instanceof Error ? error.message : String(error), + }; + } finally { + cdp?.close(); + } }