Compare commits

...

8 Commits
v0.0.3 ... main

Author SHA1 Message Date
ywkj 20d3529068 fix: use Page.loadEventFired + networkIdle instead of fixed timeout
register-skill-release / register (push) Successful in 14s Details
Replace 15s polling loop with proper CDP event-based page load
detection: wait for Page.loadEventFired, then PerformanceObserver
network idle (no new resource requests for 1s). More reliable and
faster than fixed timeouts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-31 06:51:51 +08:00
ywkj 935cac3c61 feat: extract window.context.result.data for structured logistics data
register-skill-release / register (push) Successful in 13s Details
Poll window.context.result.data (up to 15s) for productPackInfo,
productTitle, productAttributes, and skuSelection. This provides
structured weight/size/volume data per-variant directly from 1688's
JS context — more reliable than vision-only extraction.

Screenshots still captured as fallback for data only in images.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 14:05:53 +08:00
ywkj 93517505d4 fix: set 1920x1080 @2x viewport before capture
Wide tables (商品件重尺) were getting cut off at the right edge.
Now emulates a 1920x1080 PC viewport at 2x scale before navigating,
ensuring all columns fit in screenshots.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 13:48:00 +08:00
ywkj 92a3e2eba3 fix: capture viewport-only screenshots for readable resolution
captureBeyondViewport was capturing the entire page in one giant image
(2804x20746), making text unreadable. Now captures per-viewport with
80% overlap, producing ~2400x1992 screenshots that vision can read.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:23:00 +08:00
ywkj a780041840 feat: define structured JSON output schema for API consumption
SKILL.md now specifies exact JSON structure the model must output
after reading screenshots. Weight in kg, dimensions in cm, omit nulls.
Ready for downstream API integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:18:15 +08:00
ywkj 87920a9503 refactor: replace DOM parsing with vision-based approach
Remove all CSS selectors, regex parsers, and structured extraction.
Instead, capture full-page screenshots (scrolling) and download detail
images. The model reads these directly with vision to extract logistics
data — no fragile DOM dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:11:24 +08:00
ywkj bff990628b fix: use localhost for CDP (IPv6), prevent null overwrite on dimensions
- CDP discovery uses localhost instead of 127.0.0.1 (Chrome binds IPv6)
- Only overwrite logistics fields when parsing succeeds, preventing
  later unparseable keys from nullifying valid parsed values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 12:08:41 +08:00
ywkj e48592690b feat: default CDP port 18800, add 包装信息 and 商品件重尺 extraction
register-skill-release / register (push) Successful in 14s Details
- Change default CDP port from 9222 to 18800
- Extract 包装信息 section (packaging type, box weight/dims, units per box)
- Extract 商品件重尺 table (per-piece weight/dimensions/volume)
- Backfill logistics from pieceWeightSize/packageInfo when attributes missing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-30 08:20:46 +08:00
3 changed files with 242 additions and 276 deletions

118
SKILL.md
View File

@ -1,75 +1,97 @@
---
name: 1688-logistics-scraper
description: "Extract product weight/size/logistics data from 1688 product pages via Chrome browser, output structured JSON. Use when the user provides a 1688 product URL and needs logistics specs."
description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size data. Use when the user provides a 1688 product URL and needs logistics specs."
---
# 1688 Logistics Scraper
Extract product weight, size, and logistics data from 1688 product pages.
Capture 1688 product pages and extract weight/size data via vision.
## Run
```bash
bun scripts/run.ts scrape <url> [--dry-run]
```
### Examples
```bash
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
bun scripts/run.ts scrape <url> [--dry-run] [--port=9222]
```
## What It Does
1. Opens the 1688 product URL in the browser
2. Extracts weight/size data from wherever it appears on the page — product attributes, variant specs, logistics section
3. Downloads detail images (商品详情图片) for analysis — weight/size is often only in images
4. Outputs structured JSON
1. Opens the 1688 product URL in the browser (default port 18800)
2. Scrolls through the entire page, capturing full-page screenshots
3. Downloads all product detail images
4. Saves to `/tmp/1688-logistics/<offer-id>/`
## Where To Look For Data
## After Running — MUST follow
Weight/size data on 1688 pages hides in multiple places. Check all before giving up:
1. **Product attributes** (商品属性 / 商品参数) — key-value table, most reliable
2. **Variant/SKU specs** — per-variant weight or size
3. **Logistics section** — shipping weight, volume, freight info
4. **Detail images** — downloaded to `/tmp/1688-logistics/<offer-id>/`, read them to find weight/size text baked into images
## Output
Read ALL screenshots and detail images, then output the following JSON structure. This is the final output for API consumption.
```json
{
"status": "success",
"url": "https://detail.1688.com/offer/...",
"product": {
"title": "产品标题",
"logistics": {
"weight": { "value": 0.5, "unit": "kg", "source": "attributes" },
"dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "attributes" },
"grossWeight": null,
"netWeight": null,
"packageWeight": null,
"volume": null,
"shippingMethod": null,
"shippingCost": null,
"origin": null
},
"variants": [
{ "name": "颜色: 红色", "weight": null, "dimensions": null }
]
"offerId": "966107271425",
"url": "https://detail.1688.com/offer/966107271425.html",
"title": "商品标题",
"weight": {
"value": 0.15,
"unit": "kg",
"source": "商品属性"
},
"detailImages": ["/tmp/1688-logistics/852504650877/img_001.jpg"],
"rawAttributes": { "重量": "0.5kg", "尺寸": "30*20*10cm" }
"grossWeight": {
"value": 0.2,
"unit": "kg",
"source": "商品件重尺"
},
"netWeight": {
"value": 0.15,
"unit": "kg",
"source": "商品属性"
},
"dimensions": {
"length": 10,
"width": 8,
"height": 1.8,
"unit": "cm",
"source": "商品属性"
},
"volume": {
"value": 0.000144,
"unit": "m³",
"source": "商品件重尺"
},
"packageWeight": {
"value": 5.0,
"unit": "kg",
"source": "包装信息"
},
"packageDimensions": {
"length": 40,
"width": 30,
"height": 20,
"unit": "cm",
"source": "包装信息"
},
"unitsPerPackage": 50,
"variants": [
{
"name": "12支装",
"weight": { "value": 0.12, "unit": "kg" },
"dimensions": { "length": 9.5, "width": 6, "height": 2.2, "unit": "cm" }
}
]
}
```
`null` = not found in text. Check `detailImages` — the data may be in the images.
### Field rules
- **All weight values normalized to kg** (克÷1000, 斤×0.5)
- **All dimension values normalized to cm** (mm÷10)
- **`source`**: where on the page the data was found (商品属性 / 商品件重尺 / 包装信息 / 详情图片)
- **`variants`**: only include if weight/size differs per SKU. Omit if all variants share the same specs.
- **Omit fields that are `null`** — do not include fields where no data was found
- **Do not guess.** Only include values actually visible on the page or in images.
## Rules
1. If the browser is not running, report the error. Do not try to launch it.
2. Check all data sources before reporting `null`.
3. Normalize units: 克→kg, 毫米→cm. Keep raw values in `rawAttributes`.
4. No retries. If it fails, report as-is.
5. Trust page content. Do not guess values.
2. No retries. If it fails, report as-is.
3. Read ALL screenshots — logistics data can appear anywhere on the page.
4. Read detail images too — weight/size is often baked into product photos.
5. Output ONLY the structured JSON above. No extra commentary.

View File

@ -12,17 +12,14 @@ Commands:
Examples:
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
bun scripts/run.ts --port=9223 scrape 'https://detail.1688.com/offer/852504650877.html'
Prerequisites:
Chrome must be running with --remote-debugging-port=9222
bun scripts/run.ts --port=18801 scrape 'https://detail.1688.com/offer/852504650877.html'
`);
}
async function main(): Promise<void> {
const positionals: string[] = [];
let dryRun = false;
let port = 9222;
let port = 18800;
for (const arg of process.argv.slice(2)) {
if (arg === '--dry-run') {

View File

@ -3,50 +3,16 @@ import * as path from 'path';
export type Command = 'scrape';
export interface LogisticsValue {
value: number | null;
unit: string | null;
source: string;
}
export interface Dimensions {
length: number | null;
width: number | null;
height: number | null;
unit: string | null;
source: string;
}
export interface LogisticsData {
weight: LogisticsValue | null;
dimensions: Dimensions | null;
grossWeight: LogisticsValue | null;
netWeight: LogisticsValue | null;
packageWeight: LogisticsValue | null;
volume: LogisticsValue | null;
shippingMethod: string | null;
shippingCost: string | null;
origin: string | null;
}
export interface VariantInfo {
name: string;
weight: LogisticsValue | null;
dimensions: Dimensions | null;
}
export interface ScrapeResult {
status: 'success' | 'failed';
url: string;
command: Command;
dryRun: boolean;
product?: {
title: string;
logistics: LogisticsData;
variants: VariantInfo[];
};
offerId: string;
productPackInfo?: unknown;
windowContext?: unknown;
screenshots?: string[];
detailImages?: string[];
rawAttributes?: Record<string, string>;
error?: string;
}
@ -62,9 +28,10 @@ class CdpSession {
private ws!: WebSocket;
private msgId = 0;
private pending = new Map<number, { resolve: (v: any) => void; reject: (e: Error) => void }>();
private eventListeners = new Map<string, Array<(params: any) => void>>();
static async connect(port: number): Promise<CdpSession> {
const resp = await fetch(`http://127.0.0.1:${port}/json`);
const resp = await fetch(`http://localhost:${port}/json`);
const targets = (await resp.json()) as Array<{ webSocketDebuggerUrl: string; type: string }>;
const page = targets.find(t => t.type === 'page');
if (!page) throw new Error('No Chrome page tab found. Open a tab first.');
@ -79,13 +46,20 @@ class CdpSession {
this.ws.onopen = () => resolve();
this.ws.onerror = (e: any) => reject(new Error(`WebSocket error: ${e.message || e}`));
this.ws.onmessage = (ev: MessageEvent) => {
const msg: CdpResult = JSON.parse(typeof ev.data === 'string' ? ev.data : ev.data.toString());
const msg = JSON.parse(typeof ev.data === 'string' ? ev.data : ev.data.toString());
// Handle command responses
if (msg.id != null && this.pending.has(msg.id)) {
const p = this.pending.get(msg.id)!;
this.pending.delete(msg.id);
if (msg.error) p.reject(new Error(msg.error.message));
else p.resolve(msg.result);
}
// Handle events
if (msg.method && this.eventListeners.has(msg.method)) {
for (const fn of this.eventListeners.get(msg.method)!) {
fn(msg.params);
}
}
};
});
}
@ -98,145 +72,117 @@ class CdpSession {
});
}
waitForEvent(event: string, timeoutMs: number = 30000): Promise<any> {
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
cleanup();
reject(new Error(`Timeout waiting for ${event}`));
}, timeoutMs);
const handler = (params: any) => {
cleanup();
resolve(params);
};
const cleanup = () => {
clearTimeout(timer);
const listeners = this.eventListeners.get(event);
if (listeners) {
const idx = listeners.indexOf(handler);
if (idx >= 0) listeners.splice(idx, 1);
}
};
if (!this.eventListeners.has(event)) this.eventListeners.set(event, []);
this.eventListeners.get(event)!.push(handler);
});
}
async evaluate(expression: string): Promise<any> {
const res = await this.send('Runtime.evaluate', { expression, returnByValue: true });
return res?.result?.value;
}
async captureScreenshot(format: string = 'png'): Promise<Buffer> {
const res = await this.send('Page.captureScreenshot', {
format,
captureBeyondViewport: false,
});
return Buffer.from(res.data, 'base64');
}
close() {
try { this.ws.close(); } catch {}
}
}
// --- Parsers ---
const WEIGHT_KEYS = ['重量', '毛重', '净重', '单件重量', '包装重量', '产品重量', '单品重量', 'weight'];
const DIMENSION_KEYS = ['尺寸', '规格', '长宽高', '外箱尺寸', '包装尺寸', '产品尺寸', '大小', 'size', 'dimensions'];
const VOLUME_KEYS = ['体积', '容积', 'volume'];
// --- Helpers ---
function extractOfferId(url: string): string {
return url.match(/offer\/(\d+)/)?.[1] || 'unknown';
}
function parseWeight(raw: string): LogisticsValue | null {
const m = raw.match(/([\d.]+)\s*(kg|g|克|千克|公斤|斤)/i);
if (!m) return null;
let value = parseFloat(m[1]);
let unit = m[2].toLowerCase();
if (unit === 'g' || unit === '克') { value /= 1000; unit = 'kg'; }
if (unit === '千克' || unit === '公斤') unit = 'kg';
if (unit === '斤') { value *= 0.5; unit = 'kg'; }
return { value, unit, source: '' };
}
function parseDimensions(raw: string): Dimensions | null {
const m = raw.match(/([\d.]+)\s*[*xX×]\s*([\d.]+)\s*[*xX×]\s*([\d.]+)\s*(cm|mm|毫米|厘米|m|米)?/i);
if (!m) return null;
let [l, w, h] = [parseFloat(m[1]), parseFloat(m[2]), parseFloat(m[3])];
let unit = (m[4] || 'cm').toLowerCase();
if (unit === 'mm' || unit === '毫米') { l /= 10; w /= 10; h /= 10; unit = 'cm'; }
if (unit === '厘米') unit = 'cm';
if (unit === 'm' || unit === '米') { l *= 100; w *= 100; h *= 100; unit = 'cm'; }
return { length: l, width: w, height: h, unit, source: '' };
}
function parseVolume(raw: string): LogisticsValue | null {
const m = raw.match(/([\d.]+)\s*(m³|cm³|L|ml|升|毫升|立方米|立方厘米)/i);
if (!m) return null;
return { value: parseFloat(m[1]), unit: m[2], source: '' };
}
function matchKey(text: string, keys: string[]): boolean {
const lower = text.toLowerCase();
return keys.some(k => lower.includes(k.toLowerCase()));
}
// --- Page extraction ---
const JS_EXTRACT_ATTRS = `
(function() {
const attrs = {};
const sels = [
'.detail-attributes-list .attributes-item',
'.obj-leading .obj-content li',
'#mod-detail-attributes .attribute-item',
'.detail-info table tr',
'[class*="attribute"] li',
'[class*="param"] li',
'.offer-attr-list .offer-attr-item',
];
for (const sel of sels) {
document.querySelectorAll(sel).forEach(el => {
const parts = el.textContent.trim().split(/[:]/);
if (parts.length >= 2) attrs[parts[0].trim()] = parts.slice(1).join(':').trim();
});
}
document.querySelectorAll('table tr, .detail-attributes-list tr').forEach(tr => {
const cells = tr.querySelectorAll('td, th');
if (cells.length >= 2) attrs[cells[0].textContent.trim()] = cells[1].textContent.trim();
});
return JSON.stringify(attrs);
})()`;
const JS_EXTRACT_VARIANTS = `
(function() {
const variants = [];
const sels = [
'.sku-item-wrapper .sku-item',
'[class*="sku"] [class*="item"]',
'.obj-sku .obj-content li',
'.unit-detail-spec-operator .spec-item',
];
for (const sel of sels) {
document.querySelectorAll(sel).forEach(el => {
const name = el.textContent.trim().replace(/\\s+/g, ' ');
if (name && name.length < 200) variants.push({ name, text: el.textContent });
});
}
return JSON.stringify(variants);
})()`;
const JS_EXTRACT_TITLE = `
(function() {
for (const sel of ['.title-text','.detail-title-text','h1[class*="title"]','.mod-detail-title h1','.d-title']) {
const el = document.querySelector(sel);
if (el && el.textContent.trim()) return el.textContent.trim();
}
return document.title || '';
})()`;
const JS_EXTRACT_IMAGES = `
(function() {
const imgs = [], seen = new Set();
const sels = [
'#desc-lazyload-container img',
'.detail-desc-decorate-richtext img',
'[class*="detail-desc"] img',
'.mod-detail-description img',
'.offer-attr-item img',
'.desc-img-loaded img',
];
for (const sel of sels) {
document.querySelectorAll(sel).forEach(img => {
const src = img.src || img.dataset.src || img.dataset.lazySrc || '';
if (src && !seen.has(src) && (src.startsWith('http') || src.startsWith('//'))) {
seen.add(src);
imgs.push(src.startsWith('//') ? 'https:' + src : src);
}
});
}
return JSON.stringify(imgs);
})()`;
async function downloadImages(urls: string[], outputDir: string): Promise<string[]> {
async function scrollAndCapture(
cdp: CdpSession,
outputDir: string,
): Promise<string[]> {
fs.mkdirSync(outputDir, { recursive: true });
const saved: string[] = [];
for (let i = 0; i < urls.length; i++) {
// Get page height
const pageHeight: number = await cdp.evaluate(
'Math.max(document.body.scrollHeight, document.documentElement.scrollHeight)'
) || 0;
const viewportHeight: number = await cdp.evaluate('window.innerHeight') || 900;
// Scroll through the page and capture viewport-sized screenshots
// Use 80% step to overlap slightly and avoid missing content at boundaries
const step = Math.floor(viewportHeight * 0.8);
let scrollY = 0;
let idx = 1;
while (scrollY < pageHeight) {
await cdp.evaluate(`window.scrollTo(0, ${scrollY})`);
await new Promise(r => setTimeout(r, 800)); // wait for lazy-load render
const buf = await cdp.captureScreenshot('png');
const filePath = path.join(outputDir, `page_${String(idx).padStart(3, '0')}.png`);
fs.writeFileSync(filePath, buf);
saved.push(filePath);
scrollY += step;
idx++;
}
return saved;
}
async function downloadDetailImages(
cdp: CdpSession,
outputDir: string,
): Promise<string[]> {
// Get all detail image URLs from the page
const imgUrls: string[] = JSON.parse(await cdp.evaluate(`
(function() {
const imgs = [], seen = new Set();
document.querySelectorAll('img').forEach(img => {
const src = img.src || img.dataset.src || img.dataset.lazySrc || '';
if (src && !seen.has(src) && (src.startsWith('http') || src.startsWith('//'))) {
// Filter for product detail images (skip tiny icons/avatars)
if (img.naturalWidth > 200 || img.width > 200 || !img.complete) {
seen.add(src);
imgs.push(src.startsWith('//') ? 'https:' + src : src);
}
}
});
return JSON.stringify(imgs);
})()
`) || '[]');
fs.mkdirSync(outputDir, { recursive: true });
const saved: string[] = [];
for (let i = 0; i < imgUrls.length; i++) {
try {
const resp = await fetch(urls[i]);
const resp = await fetch(imgUrls[i]);
if (!resp.ok) continue;
const buf = Buffer.from(await resp.arrayBuffer());
const ext = urls[i].match(/\.(jpg|jpeg|png|webp|gif)/i)?.[1] || 'jpg';
const ext = imgUrls[i].match(/\.(jpg|jpeg|png|webp|gif)/i)?.[1] || 'jpg';
const p = path.join(outputDir, `img_${String(i + 1).padStart(3, '0')}.${ext}`);
fs.writeFileSync(p, buf);
saved.push(p);
@ -251,30 +197,24 @@ export async function run(
command: Command,
args: string[],
dryRun: boolean,
cdpPort: number = 9222,
cdpPort: number = 18800,
): Promise<ScrapeResult> {
if (command !== 'scrape') {
return { status: 'failed', url: '', command, dryRun, error: `unknown command: ${command}` };
return { status: 'failed', url: '', command, dryRun, offerId: '', error: `unknown command: ${command}` };
}
const url = args[0];
if (!url) {
return { status: 'failed', url: '', command, dryRun, error: 'scrape requires <url>' };
return { status: 'failed', url: '', command, dryRun, offerId: '', error: 'scrape requires <url>' };
}
const offerId = extractOfferId(url);
if (dryRun) {
return {
status: 'success', url, command, dryRun,
product: {
title: '<dry-run>',
logistics: {
weight: null, dimensions: null, grossWeight: null, netWeight: null,
packageWeight: null, volume: null, shippingMethod: null, shippingCost: null, origin: null,
},
variants: [],
},
status: 'success', url, command, dryRun, offerId,
screenshots: [],
detailImages: [],
rawAttributes: {},
};
}
@ -284,69 +224,76 @@ export async function run(
await cdp.send('Page.enable');
await cdp.send('Runtime.enable');
await cdp.send('Page.navigate', { url });
// Wait for load
await new Promise(r => setTimeout(r, 5000));
const title: string = await cdp.evaluate(JS_EXTRACT_TITLE) || '';
const rawAttributes: Record<string, string> = JSON.parse(await cdp.evaluate(JS_EXTRACT_ATTRS) || '{}');
const rawVariants: Array<{ name: string; text: string }> = JSON.parse(await cdp.evaluate(JS_EXTRACT_VARIANTS) || '[]');
const imgUrls: string[] = JSON.parse(await cdp.evaluate(JS_EXTRACT_IMAGES) || '[]');
const variants: VariantInfo[] = rawVariants.map(v => {
const weight = parseWeight(v.text);
const dimensions = parseDimensions(v.text);
if (weight) weight.source = 'variant';
if (dimensions) dimensions.source = 'variant';
return { name: v.name, weight, dimensions };
// Set wide PC viewport to ensure tables fit without horizontal overflow
await cdp.send('Emulation.setDeviceMetricsOverride', {
width: 1920,
height: 1080,
deviceScaleFactor: 2,
mobile: false,
});
const logistics: LogisticsData = {
weight: null, dimensions: null, grossWeight: null, netWeight: null,
packageWeight: null, volume: null, shippingMethod: null, shippingCost: null, origin: null,
};
// Navigate and wait for page load event
const loadPromise = cdp.waitForEvent('Page.loadEventFired', 30000);
await cdp.send('Page.navigate', { url });
await loadPromise;
for (const [key, val] of Object.entries(rawAttributes)) {
if (matchKey(key, ['毛重'])) {
logistics.grossWeight = parseWeight(val);
if (logistics.grossWeight) logistics.grossWeight.source = 'attributes';
} else if (matchKey(key, ['净重'])) {
logistics.netWeight = parseWeight(val);
if (logistics.netWeight) logistics.netWeight.source = 'attributes';
} else if (matchKey(key, ['包装重量'])) {
logistics.packageWeight = parseWeight(val);
if (logistics.packageWeight) logistics.packageWeight.source = 'attributes';
} else if (matchKey(key, WEIGHT_KEYS)) {
logistics.weight = parseWeight(val);
if (logistics.weight) logistics.weight.source = 'attributes';
}
if (matchKey(key, DIMENSION_KEYS)) {
logistics.dimensions = parseDimensions(val);
if (logistics.dimensions) logistics.dimensions.source = 'attributes';
}
if (matchKey(key, VOLUME_KEYS)) {
logistics.volume = parseVolume(val);
if (logistics.volume) logistics.volume.source = 'attributes';
}
if (matchKey(key, ['产地', '发货地', '所在地'])) {
logistics.origin = val;
}
// Wait for networkIdle — poll until no pending requests for 1s
await cdp.evaluate(`
new Promise(resolve => {
let timer;
const reset = () => { clearTimeout(timer); timer = setTimeout(resolve, 1000); };
const observer = new PerformanceObserver(() => reset());
observer.observe({ entryTypes: ['resource'] });
reset();
})
`);
// Extract window.context.result.data
let productPackInfo: unknown = null;
let windowContext: unknown = null;
const ctx = await cdp.evaluate(`
(function() {
try {
const d = window.context && window.context.result && window.context.result.data;
if (d && d.productPackInfo) {
return JSON.stringify({
productPackInfo: d.productPackInfo,
productTitle: d.productTitle || null,
productAttributes: d.productAttributes || null,
skuSelection: d.skuSelection || null,
});
}
} catch(e) {}
return null;
})()
`);
if (ctx) {
const parsed = JSON.parse(ctx);
productPackInfo = parsed.productPackInfo;
windowContext = parsed;
}
const offerId = extractOfferId(url);
const imgDir = path.join('/tmp', '1688-logistics', offerId);
const detailImages = await downloadImages(imgUrls, imgDir);
const outputDir = path.join('/tmp', '1688-logistics', offerId);
// Capture full-page screenshots (scrolling)
const screenshotDir = path.join(outputDir, 'screenshots');
const screenshots = await scrollAndCapture(cdp, screenshotDir);
// Download detail images
const imgDir = path.join(outputDir, 'images');
const detailImages = await downloadDetailImages(cdp, imgDir);
return {
status: 'success', url, command, dryRun,
product: { title, logistics, variants },
status: 'success', url, command, dryRun, offerId,
productPackInfo,
windowContext,
screenshots,
detailImages,
rawAttributes,
};
} catch (error) {
return {
status: 'failed', url, command, dryRun,
status: 'failed', url, command, dryRun, offerId,
error: error instanceof Error ? error.message : String(error),
};
} finally {