Compare commits

...

8 Commits
v0.0.3 ... main

Author SHA1 Message Date
ywkj 20d3529068 fix: use Page.loadEventFired + networkIdle instead of fixed timeout
register-skill-release / register (push) Successful in 14s Details
Replace 15s polling loop with proper CDP event-based page load
detection: wait for Page.loadEventFired, then PerformanceObserver
network idle (no new resource requests for 1s). More reliable and
faster than fixed timeouts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 06:51:51 +08:00
ywkj 935cac3c61 feat: extract window.context.result.data for structured logistics data
register-skill-release / register (push) Successful in 13s Details
Poll window.context.result.data (up to 15s) for productPackInfo,
productTitle, productAttributes, and skuSelection. This provides
structured weight/size/volume data per-variant directly from 1688's
JS context — more reliable than vision-only extraction.

Screenshots still captured as fallback for data only in images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:05:53 +08:00
ywkj 93517505d4 fix: set 1920x1080 @2x viewport before capture
Wide tables (商品件重尺) were getting cut off at the right edge.
Now emulates a 1920x1080 PC viewport at 2x scale before navigating,
ensuring all columns fit in screenshots.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 13:48:00 +08:00
ywkj 92a3e2eba3 fix: capture viewport-only screenshots for readable resolution
captureBeyondViewport was capturing the entire page in one giant image
(2804x20746), making text unreadable. Now captures per-viewport with
80% overlap, producing ~2400x1992 screenshots that vision can read.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:23:00 +08:00
ywkj a780041840 feat: define structured JSON output schema for API consumption
SKILL.md now specifies exact JSON structure the model must output
after reading screenshots. Weight in kg, dimensions in cm, omit nulls.
Ready for downstream API integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:18:15 +08:00
ywkj 87920a9503 refactor: replace DOM parsing with vision-based approach
Remove all CSS selectors, regex parsers, and structured extraction.
Instead, capture full-page screenshots (scrolling) and download detail
images. The model reads these directly with vision to extract logistics
data — no fragile DOM dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:11:24 +08:00
ywkj bff990628b fix: use localhost for CDP (IPv6), prevent null overwrite on dimensions
- CDP discovery uses localhost instead of 127.0.0.1 (Chrome binds IPv6)
- Only overwrite logistics fields when parsing succeeds, preventing
  later unparseable keys from nullifying valid parsed values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:08:41 +08:00
ywkj e48592690b feat: default CDP port 18800, add 包装信息 and 商品件重尺 extraction
register-skill-release / register (push) Successful in 14s Details
- Change default CDP port from 9222 to 18800
- Extract 包装信息 section (packaging type, box weight/dims, units per box)
- Extract 商品件重尺 table (per-piece weight/dimensions/volume)
- Backfill logistics from pieceWeightSize/packageInfo when attributes missing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 08:20:46 +08:00
3 changed files with 242 additions and 276 deletions

114
SKILL.md
View File

@ -1,75 +1,97 @@
--- ---
name: 1688-logistics-scraper name: 1688-logistics-scraper
description: "Extract product weight/size/logistics data from 1688 product pages via Chrome browser, output structured JSON. Use when the user provides a 1688 product URL and needs logistics specs." description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size data. Use when the user provides a 1688 product URL and needs logistics specs."
--- ---
# 1688 Logistics Scraper # 1688 Logistics Scraper
Extract product weight, size, and logistics data from 1688 product pages. Capture 1688 product pages and extract weight/size data via vision.
## Run ## Run
```bash ```bash
bun scripts/run.ts scrape <url> [--dry-run] bun scripts/run.ts scrape <url> [--dry-run] [--port=9222]
```
### Examples
```bash
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
``` ```
## What It Does ## What It Does
1. Opens the 1688 product URL in the browser 1. Opens the 1688 product URL in the browser (default port 18800)
2. Extracts weight/size data from wherever it appears on the page — product attributes, variant specs, logistics section 2. Scrolls through the entire page, capturing full-page screenshots
3. Downloads detail images (商品详情图片) for analysis — weight/size is often only in images 3. Downloads all product detail images
4. Outputs structured JSON 4. Saves to `/tmp/1688-logistics/<offer-id>/`
## Where To Look For Data ## After Running — MUST follow
Weight/size data on 1688 pages hides in multiple places. Check all before giving up: Read ALL screenshots and detail images, then output the following JSON structure. This is the final output for API consumption.
1. **Product attributes** (商品属性 / 商品参数) — key-value table, most reliable
2. **Variant/SKU specs** — per-variant weight or size
3. **Logistics section** — shipping weight, volume, freight info
4. **Detail images** — downloaded to `/tmp/1688-logistics/<offer-id>/`, read them to find weight/size text baked into images
## Output
```json ```json
{ {
"status": "success", "offerId": "966107271425",
"url": "https://detail.1688.com/offer/...", "url": "https://detail.1688.com/offer/966107271425.html",
"product": { "title": "商品标题",
"title": "产品标题", "weight": {
"logistics": { "value": 0.15,
"weight": { "value": 0.5, "unit": "kg", "source": "attributes" }, "unit": "kg",
"dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "attributes" }, "source": "商品属性"
"grossWeight": null,
"netWeight": null,
"packageWeight": null,
"volume": null,
"shippingMethod": null,
"shippingCost": null,
"origin": null
}, },
"grossWeight": {
"value": 0.2,
"unit": "kg",
"source": "商品件重尺"
},
"netWeight": {
"value": 0.15,
"unit": "kg",
"source": "商品属性"
},
"dimensions": {
"length": 10,
"width": 8,
"height": 1.8,
"unit": "cm",
"source": "商品属性"
},
"volume": {
"value": 0.000144,
"unit": "m³",
"source": "商品件重尺"
},
"packageWeight": {
"value": 5.0,
"unit": "kg",
"source": "包装信息"
},
"packageDimensions": {
"length": 40,
"width": 30,
"height": 20,
"unit": "cm",
"source": "包装信息"
},
"unitsPerPackage": 50,
"variants": [ "variants": [
{ "name": "颜色: 红色", "weight": null, "dimensions": null } {
"name": "12支装",
"weight": { "value": 0.12, "unit": "kg" },
"dimensions": { "length": 9.5, "width": 6, "height": 2.2, "unit": "cm" }
}
] ]
},
"detailImages": ["/tmp/1688-logistics/852504650877/img_001.jpg"],
"rawAttributes": { "重量": "0.5kg", "尺寸": "30*20*10cm" }
} }
``` ```
`null` = not found in text. Check `detailImages` — the data may be in the images. ### Field rules
- **All weight values normalized to kg** (克÷1000, 斤×0.5)
- **All dimension values normalized to cm** (mm÷10)
- **`source`**: where on the page the data was found (商品属性 / 商品件重尺 / 包装信息 / 详情图片)
- **`variants`**: only include if weight/size differs per SKU. Omit if all variants share the same specs.
- **Omit fields that are `null`** — do not include fields where no data was found
- **Do not guess.** Only include values actually visible on the page or in images.
## Rules ## Rules
1. If the browser is not running, report the error. Do not try to launch it. 1. If the browser is not running, report the error. Do not try to launch it.
2. Check all data sources before reporting `null`. 2. No retries. If it fails, report as-is.
3. Normalize units: 克→kg, 毫米→cm. Keep raw values in `rawAttributes`. 3. Read ALL screenshots — logistics data can appear anywhere on the page.
4. No retries. If it fails, report as-is. 4. Read detail images too — weight/size is often baked into product photos.
5. Trust page content. Do not guess values. 5. Output ONLY the structured JSON above. No extra commentary.

View File

@ -12,17 +12,14 @@ Commands:
Examples: Examples:
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
bun scripts/run.ts --port=9223 scrape 'https://detail.1688.com/offer/852504650877.html' bun scripts/run.ts --port=18801 scrape 'https://detail.1688.com/offer/852504650877.html'
Prerequisites:
Chrome must be running with --remote-debugging-port=9222
`); `);
} }
async function main(): Promise<void> { async function main(): Promise<void> {
const positionals: string[] = []; const positionals: string[] = [];
let dryRun = false; let dryRun = false;
let port = 9222; let port = 18800;
for (const arg of process.argv.slice(2)) { for (const arg of process.argv.slice(2)) {
if (arg === '--dry-run') { if (arg === '--dry-run') {

View File

@ -3,50 +3,16 @@ import * as path from 'path';
export type Command = 'scrape'; export type Command = 'scrape';
export interface LogisticsValue {
value: number | null;
unit: string | null;
source: string;
}
export interface Dimensions {
length: number | null;
width: number | null;
height: number | null;
unit: string | null;
source: string;
}
export interface LogisticsData {
weight: LogisticsValue | null;
dimensions: Dimensions | null;
grossWeight: LogisticsValue | null;
netWeight: LogisticsValue | null;
packageWeight: LogisticsValue | null;
volume: LogisticsValue | null;
shippingMethod: string | null;
shippingCost: string | null;
origin: string | null;
}
export interface VariantInfo {
name: string;
weight: LogisticsValue | null;
dimensions: Dimensions | null;
}
export interface ScrapeResult { export interface ScrapeResult {
status: 'success' | 'failed'; status: 'success' | 'failed';
url: string; url: string;
command: Command; command: Command;
dryRun: boolean; dryRun: boolean;
product?: { offerId: string;
title: string; productPackInfo?: unknown;
logistics: LogisticsData; windowContext?: unknown;
variants: VariantInfo[]; screenshots?: string[];
};
detailImages?: string[]; detailImages?: string[];
rawAttributes?: Record<string, string>;
error?: string; error?: string;
} }
@ -62,9 +28,10 @@ class CdpSession {
private ws!: WebSocket; private ws!: WebSocket;
private msgId = 0; private msgId = 0;
private pending = new Map<number, { resolve: (v: any) => void; reject: (e: Error) => void }>(); private pending = new Map<number, { resolve: (v: any) => void; reject: (e: Error) => void }>();
private eventListeners = new Map<string, Array<(params: any) => void>>();
static async connect(port: number): Promise<CdpSession> { static async connect(port: number): Promise<CdpSession> {
const resp = await fetch(`http://127.0.0.1:${port}/json`); const resp = await fetch(`http://localhost:${port}/json`);
const targets = (await resp.json()) as Array<{ webSocketDebuggerUrl: string; type: string }>; const targets = (await resp.json()) as Array<{ webSocketDebuggerUrl: string; type: string }>;
const page = targets.find(t => t.type === 'page'); const page = targets.find(t => t.type === 'page');
if (!page) throw new Error('No Chrome page tab found. Open a tab first.'); if (!page) throw new Error('No Chrome page tab found. Open a tab first.');
@ -79,13 +46,20 @@ class CdpSession {
this.ws.onopen = () => resolve(); this.ws.onopen = () => resolve();
this.ws.onerror = (e: any) => reject(new Error(`WebSocket error: ${e.message || e}`)); this.ws.onerror = (e: any) => reject(new Error(`WebSocket error: ${e.message || e}`));
this.ws.onmessage = (ev: MessageEvent) => { this.ws.onmessage = (ev: MessageEvent) => {
const msg: CdpResult = JSON.parse(typeof ev.data === 'string' ? ev.data : ev.data.toString()); const msg = JSON.parse(typeof ev.data === 'string' ? ev.data : ev.data.toString());
// Handle command responses
if (msg.id != null && this.pending.has(msg.id)) { if (msg.id != null && this.pending.has(msg.id)) {
const p = this.pending.get(msg.id)!; const p = this.pending.get(msg.id)!;
this.pending.delete(msg.id); this.pending.delete(msg.id);
if (msg.error) p.reject(new Error(msg.error.message)); if (msg.error) p.reject(new Error(msg.error.message));
else p.resolve(msg.result); else p.resolve(msg.result);
} }
// Handle events
if (msg.method && this.eventListeners.has(msg.method)) {
for (const fn of this.eventListeners.get(msg.method)!) {
fn(msg.params);
}
}
}; };
}); });
} }
@ -98,145 +72,117 @@ class CdpSession {
}); });
} }
waitForEvent(event: string, timeoutMs: number = 30000): Promise<any> {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
cleanup();
reject(new Error(`Timeout waiting for ${event}`));
}, timeoutMs);
const handler = (params: any) => {
cleanup();
resolve(params);
};
const cleanup = () => {
clearTimeout(timer);
const listeners = this.eventListeners.get(event);
if (listeners) {
const idx = listeners.indexOf(handler);
if (idx >= 0) listeners.splice(idx, 1);
}
};
if (!this.eventListeners.has(event)) this.eventListeners.set(event, []);
this.eventListeners.get(event)!.push(handler);
});
}
async evaluate(expression: string): Promise<any> { async evaluate(expression: string): Promise<any> {
const res = await this.send('Runtime.evaluate', { expression, returnByValue: true }); const res = await this.send('Runtime.evaluate', { expression, returnByValue: true });
return res?.result?.value; return res?.result?.value;
} }
async captureScreenshot(format: string = 'png'): Promise<Buffer> {
const res = await this.send('Page.captureScreenshot', {
format,
captureBeyondViewport: false,
});
return Buffer.from(res.data, 'base64');
}
close() { close() {
try { this.ws.close(); } catch {} try { this.ws.close(); } catch {}
} }
} }
// --- Parsers --- // --- Helpers ---
const WEIGHT_KEYS = ['重量', '毛重', '净重', '单件重量', '包装重量', '产品重量', '单品重量', 'weight'];
const DIMENSION_KEYS = ['尺寸', '规格', '长宽高', '外箱尺寸', '包装尺寸', '产品尺寸', '大小', 'size', 'dimensions'];
const VOLUME_KEYS = ['体积', '容积', 'volume'];
function extractOfferId(url: string): string { function extractOfferId(url: string): string {
return url.match(/offer\/(\d+)/)?.[1] || 'unknown'; return url.match(/offer\/(\d+)/)?.[1] || 'unknown';
} }
function parseWeight(raw: string): LogisticsValue | null { async function scrollAndCapture(
const m = raw.match(/([\d.]+)\s*(kg|g|克|千克|公斤|斤)/i); cdp: CdpSession,
if (!m) return null; outputDir: string,
let value = parseFloat(m[1]); ): Promise<string[]> {
let unit = m[2].toLowerCase(); fs.mkdirSync(outputDir, { recursive: true });
if (unit === 'g' || unit === '克') { value /= 1000; unit = 'kg'; } const saved: string[] = [];
if (unit === '千克' || unit === '公斤') unit = 'kg';
if (unit === '斤') { value *= 0.5; unit = 'kg'; }
return { value, unit, source: '' };
}
function parseDimensions(raw: string): Dimensions | null { // Get page height
const m = raw.match(/([\d.]+)\s*[*xX×]\s*([\d.]+)\s*[*xX×]\s*([\d.]+)\s*(cm|mm|毫米|厘米|m|米)?/i); const pageHeight: number = await cdp.evaluate(
if (!m) return null; 'Math.max(document.body.scrollHeight, document.documentElement.scrollHeight)'
let [l, w, h] = [parseFloat(m[1]), parseFloat(m[2]), parseFloat(m[3])]; ) || 0;
let unit = (m[4] || 'cm').toLowerCase(); const viewportHeight: number = await cdp.evaluate('window.innerHeight') || 900;
if (unit === 'mm' || unit === '毫米') { l /= 10; w /= 10; h /= 10; unit = 'cm'; }
if (unit === '厘米') unit = 'cm';
if (unit === 'm' || unit === '米') { l *= 100; w *= 100; h *= 100; unit = 'cm'; }
return { length: l, width: w, height: h, unit, source: '' };
}
function parseVolume(raw: string): LogisticsValue | null { // Scroll through the page and capture viewport-sized screenshots
const m = raw.match(/([\d.]+)\s*(m³|cm³|L|ml|升|毫升|立方米|立方厘米)/i); // Use 80% step to overlap slightly and avoid missing content at boundaries
if (!m) return null; const step = Math.floor(viewportHeight * 0.8);
return { value: parseFloat(m[1]), unit: m[2], source: '' }; let scrollY = 0;
} let idx = 1;
while (scrollY < pageHeight) {
await cdp.evaluate(`window.scrollTo(0, ${scrollY})`);
await new Promise(r => setTimeout(r, 800)); // wait for lazy-load render
function matchKey(text: string, keys: string[]): boolean { const buf = await cdp.captureScreenshot('png');
const lower = text.toLowerCase(); const filePath = path.join(outputDir, `page_${String(idx).padStart(3, '0')}.png`);
return keys.some(k => lower.includes(k.toLowerCase())); fs.writeFileSync(filePath, buf);
} saved.push(filePath);
// --- Page extraction --- scrollY += step;
idx++;
const JS_EXTRACT_ATTRS = `
(function() {
const attrs = {};
const sels = [
'.detail-attributes-list .attributes-item',
'.obj-leading .obj-content li',
'#mod-detail-attributes .attribute-item',
'.detail-info table tr',
'[class*="attribute"] li',
'[class*="param"] li',
'.offer-attr-list .offer-attr-item',
];
for (const sel of sels) {
document.querySelectorAll(sel).forEach(el => {
const parts = el.textContent.trim().split(/[:]/);
if (parts.length >= 2) attrs[parts[0].trim()] = parts.slice(1).join(':').trim();
});
} }
document.querySelectorAll('table tr, .detail-attributes-list tr').forEach(tr => {
const cells = tr.querySelectorAll('td, th');
if (cells.length >= 2) attrs[cells[0].textContent.trim()] = cells[1].textContent.trim();
});
return JSON.stringify(attrs);
})()`;
const JS_EXTRACT_VARIANTS = ` return saved;
(function() { }
const variants = [];
const sels = [
'.sku-item-wrapper .sku-item',
'[class*="sku"] [class*="item"]',
'.obj-sku .obj-content li',
'.unit-detail-spec-operator .spec-item',
];
for (const sel of sels) {
document.querySelectorAll(sel).forEach(el => {
const name = el.textContent.trim().replace(/\\s+/g, ' ');
if (name && name.length < 200) variants.push({ name, text: el.textContent });
});
}
return JSON.stringify(variants);
})()`;
const JS_EXTRACT_TITLE = ` async function downloadDetailImages(
(function() { cdp: CdpSession,
for (const sel of ['.title-text','.detail-title-text','h1[class*="title"]','.mod-detail-title h1','.d-title']) { outputDir: string,
const el = document.querySelector(sel); ): Promise<string[]> {
if (el && el.textContent.trim()) return el.textContent.trim(); // Get all detail image URLs from the page
} const imgUrls: string[] = JSON.parse(await cdp.evaluate(`
return document.title || ''; (function() {
})()`;
const JS_EXTRACT_IMAGES = `
(function() {
const imgs = [], seen = new Set(); const imgs = [], seen = new Set();
const sels = [ document.querySelectorAll('img').forEach(img => {
'#desc-lazyload-container img',
'.detail-desc-decorate-richtext img',
'[class*="detail-desc"] img',
'.mod-detail-description img',
'.offer-attr-item img',
'.desc-img-loaded img',
];
for (const sel of sels) {
document.querySelectorAll(sel).forEach(img => {
const src = img.src || img.dataset.src || img.dataset.lazySrc || ''; const src = img.src || img.dataset.src || img.dataset.lazySrc || '';
if (src && !seen.has(src) && (src.startsWith('http') || src.startsWith('//'))) { if (src && !seen.has(src) && (src.startsWith('http') || src.startsWith('//'))) {
// Filter for product detail images (skip tiny icons/avatars)
if (img.naturalWidth > 200 || img.width > 200 || !img.complete) {
seen.add(src); seen.add(src);
imgs.push(src.startsWith('//') ? 'https:' + src : src); imgs.push(src.startsWith('//') ? 'https:' + src : src);
} }
});
} }
});
return JSON.stringify(imgs); return JSON.stringify(imgs);
})()`; })()
`) || '[]');
async function downloadImages(urls: string[], outputDir: string): Promise<string[]> {
fs.mkdirSync(outputDir, { recursive: true }); fs.mkdirSync(outputDir, { recursive: true });
const saved: string[] = []; const saved: string[] = [];
for (let i = 0; i < urls.length; i++) { for (let i = 0; i < imgUrls.length; i++) {
try { try {
const resp = await fetch(urls[i]); const resp = await fetch(imgUrls[i]);
if (!resp.ok) continue; if (!resp.ok) continue;
const buf = Buffer.from(await resp.arrayBuffer()); const buf = Buffer.from(await resp.arrayBuffer());
const ext = urls[i].match(/\.(jpg|jpeg|png|webp|gif)/i)?.[1] || 'jpg'; const ext = imgUrls[i].match(/\.(jpg|jpeg|png|webp|gif)/i)?.[1] || 'jpg';
const p = path.join(outputDir, `img_${String(i + 1).padStart(3, '0')}.${ext}`); const p = path.join(outputDir, `img_${String(i + 1).padStart(3, '0')}.${ext}`);
fs.writeFileSync(p, buf); fs.writeFileSync(p, buf);
saved.push(p); saved.push(p);
@ -251,30 +197,24 @@ export async function run(
command: Command, command: Command,
args: string[], args: string[],
dryRun: boolean, dryRun: boolean,
cdpPort: number = 9222, cdpPort: number = 18800,
): Promise<ScrapeResult> { ): Promise<ScrapeResult> {
if (command !== 'scrape') { if (command !== 'scrape') {
return { status: 'failed', url: '', command, dryRun, error: `unknown command: ${command}` }; return { status: 'failed', url: '', command, dryRun, offerId: '', error: `unknown command: ${command}` };
} }
const url = args[0]; const url = args[0];
if (!url) { if (!url) {
return { status: 'failed', url: '', command, dryRun, error: 'scrape requires <url>' }; return { status: 'failed', url: '', command, dryRun, offerId: '', error: 'scrape requires <url>' };
} }
const offerId = extractOfferId(url);
if (dryRun) { if (dryRun) {
return { return {
status: 'success', url, command, dryRun, status: 'success', url, command, dryRun, offerId,
product: { screenshots: [],
title: '<dry-run>',
logistics: {
weight: null, dimensions: null, grossWeight: null, netWeight: null,
packageWeight: null, volume: null, shippingMethod: null, shippingCost: null, origin: null,
},
variants: [],
},
detailImages: [], detailImages: [],
rawAttributes: {},
}; };
} }
@ -284,69 +224,76 @@ export async function run(
await cdp.send('Page.enable'); await cdp.send('Page.enable');
await cdp.send('Runtime.enable'); await cdp.send('Runtime.enable');
await cdp.send('Page.navigate', { url });
// Wait for load // Set wide PC viewport to ensure tables fit without horizontal overflow
await new Promise(r => setTimeout(r, 5000)); await cdp.send('Emulation.setDeviceMetricsOverride', {
width: 1920,
const title: string = await cdp.evaluate(JS_EXTRACT_TITLE) || ''; height: 1080,
const rawAttributes: Record<string, string> = JSON.parse(await cdp.evaluate(JS_EXTRACT_ATTRS) || '{}'); deviceScaleFactor: 2,
const rawVariants: Array<{ name: string; text: string }> = JSON.parse(await cdp.evaluate(JS_EXTRACT_VARIANTS) || '[]'); mobile: false,
const imgUrls: string[] = JSON.parse(await cdp.evaluate(JS_EXTRACT_IMAGES) || '[]');
const variants: VariantInfo[] = rawVariants.map(v => {
const weight = parseWeight(v.text);
const dimensions = parseDimensions(v.text);
if (weight) weight.source = 'variant';
if (dimensions) dimensions.source = 'variant';
return { name: v.name, weight, dimensions };
}); });
const logistics: LogisticsData = { // Navigate and wait for page load event
weight: null, dimensions: null, grossWeight: null, netWeight: null, const loadPromise = cdp.waitForEvent('Page.loadEventFired', 30000);
packageWeight: null, volume: null, shippingMethod: null, shippingCost: null, origin: null, await cdp.send('Page.navigate', { url });
}; await loadPromise;
for (const [key, val] of Object.entries(rawAttributes)) { // Wait for networkIdle — poll until no pending requests for 1s
if (matchKey(key, ['毛重'])) { await cdp.evaluate(`
logistics.grossWeight = parseWeight(val); new Promise(resolve => {
if (logistics.grossWeight) logistics.grossWeight.source = 'attributes'; let timer;
} else if (matchKey(key, ['净重'])) { const reset = () => { clearTimeout(timer); timer = setTimeout(resolve, 1000); };
logistics.netWeight = parseWeight(val); const observer = new PerformanceObserver(() => reset());
if (logistics.netWeight) logistics.netWeight.source = 'attributes'; observer.observe({ entryTypes: ['resource'] });
} else if (matchKey(key, ['包装重量'])) { reset();
logistics.packageWeight = parseWeight(val); })
if (logistics.packageWeight) logistics.packageWeight.source = 'attributes'; `);
} else if (matchKey(key, WEIGHT_KEYS)) {
logistics.weight = parseWeight(val); // Extract window.context.result.data
if (logistics.weight) logistics.weight.source = 'attributes'; let productPackInfo: unknown = null;
} let windowContext: unknown = null;
if (matchKey(key, DIMENSION_KEYS)) { const ctx = await cdp.evaluate(`
logistics.dimensions = parseDimensions(val); (function() {
if (logistics.dimensions) logistics.dimensions.source = 'attributes'; try {
} const d = window.context && window.context.result && window.context.result.data;
if (matchKey(key, VOLUME_KEYS)) { if (d && d.productPackInfo) {
logistics.volume = parseVolume(val); return JSON.stringify({
if (logistics.volume) logistics.volume.source = 'attributes'; productPackInfo: d.productPackInfo,
} productTitle: d.productTitle || null,
if (matchKey(key, ['产地', '发货地', '所在地'])) { productAttributes: d.productAttributes || null,
logistics.origin = val; skuSelection: d.skuSelection || null,
});
} }
} catch(e) {}
return null;
})()
`);
if (ctx) {
const parsed = JSON.parse(ctx);
productPackInfo = parsed.productPackInfo;
windowContext = parsed;
} }
const offerId = extractOfferId(url); const outputDir = path.join('/tmp', '1688-logistics', offerId);
const imgDir = path.join('/tmp', '1688-logistics', offerId);
const detailImages = await downloadImages(imgUrls, imgDir); // Capture full-page screenshots (scrolling)
const screenshotDir = path.join(outputDir, 'screenshots');
const screenshots = await scrollAndCapture(cdp, screenshotDir);
// Download detail images
const imgDir = path.join(outputDir, 'images');
const detailImages = await downloadDetailImages(cdp, imgDir);
return { return {
status: 'success', url, command, dryRun, status: 'success', url, command, dryRun, offerId,
product: { title, logistics, variants }, productPackInfo,
windowContext,
screenshots,
detailImages, detailImages,
rawAttributes,
}; };
} catch (error) { } catch (error) {
return { return {
status: 'failed', url, command, dryRun, status: 'failed', url, command, dryRun, offerId,
error: error instanceof Error ? error.message : String(error), error: error instanceof Error ? error.message : String(error),
}; };
} finally { } finally {