71 lines
2.3 KiB
Markdown
71 lines
2.3 KiB
Markdown
---
|
|
name: 1688-logistics-scraper
|
|
description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size/logistics data. Use when the user provides a 1688 product URL and needs logistics specs."
|
|
---
|
|
|
|
# 1688 Logistics Scraper
|
|
|
|
Capture 1688 product pages for vision-based extraction of weight, size, and logistics data.
|
|
|
|
## Run
|
|
|
|
```bash
|
|
bun scripts/run.ts scrape <url> [--dry-run] [--port=9222]
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
|
|
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
|
|
```
|
|
|
|
## What It Does
|
|
|
|
1. Opens the 1688 product URL in the browser (default port 18800)
|
|
2. Scrolls through the entire page, capturing **full-page screenshots** section by section
|
|
3. Downloads all **product detail images** (large images only, skips icons)
|
|
4. Saves everything to `/tmp/1688-logistics/<offer-id>/`
|
|
|
|
**No DOM parsing or regex.** The model reads the screenshots and images directly to extract logistics data.
|
|
|
|
## Output
|
|
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"url": "https://detail.1688.com/offer/...",
|
|
"offerId": "852504650877",
|
|
"screenshots": [
|
|
"/tmp/1688-logistics/852504650877/screenshots/page_001.png",
|
|
"/tmp/1688-logistics/852504650877/screenshots/page_002.png",
|
|
"/tmp/1688-logistics/852504650877/screenshots/page_003.png"
|
|
],
|
|
"detailImages": [
|
|
"/tmp/1688-logistics/852504650877/images/img_001.jpg",
|
|
"/tmp/1688-logistics/852504650877/images/img_002.jpg"
|
|
]
|
|
}
|
|
```
|
|
|
|
## After Running
|
|
|
|
Read the screenshots and images to extract:
|
|
|
|
- **Weight** (重量/毛重/净重/单件重量) — normalize to kg
|
|
- **Dimensions** (尺寸/长宽高) — normalize to cm
|
|
- **Volume** (体积/容积)
|
|
- **Package info** (包装信息) — packaging type, box weight, box dimensions, units per box
|
|
- **Piece weight/size** (商品件重尺) — per-piece logistics specs
|
|
- **Variant-specific** weight/size if shown per SKU
|
|
- **Shipping info** — method, cost, origin
|
|
|
|
Output the extracted data as structured JSON.
|
|
|
|
## Rules
|
|
|
|
1. If the browser is not running, report the error. Do not try to launch it.
|
|
2. No retries. If it fails, report as-is.
|
|
3. Read ALL screenshots — logistics data can appear anywhere on the page.
|
|
4. Read detail images too — weight/size is often baked into product photos.
|