1688-logistics-scraper/SKILL.md

71 lines
2.3 KiB
Markdown

---
name: 1688-logistics-scraper
description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size/logistics data. Use when the user provides a 1688 product URL and needs logistics specs."
---
# 1688 Logistics Scraper
Capture 1688 product pages for vision-based extraction of weight, size, and logistics data.
## Run
```bash
bun scripts/run.ts scrape <url> [--dry-run] [--port=9222]
```
### Examples
```bash
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
```
## What It Does
1. Opens the 1688 product URL in the browser (default port 18800)
2. Scrolls through the entire page, capturing **full-page screenshots** section by section
3. Downloads all **product detail images** (large images only, skips icons)
4. Saves everything to `/tmp/1688-logistics/<offer-id>/`
**No DOM parsing or regex.** The model reads the screenshots and images directly to extract logistics data.
## Output
```json
{
"status": "success",
"url": "https://detail.1688.com/offer/...",
"offerId": "852504650877",
"screenshots": [
"/tmp/1688-logistics/852504650877/screenshots/page_001.png",
"/tmp/1688-logistics/852504650877/screenshots/page_002.png",
"/tmp/1688-logistics/852504650877/screenshots/page_003.png"
],
"detailImages": [
"/tmp/1688-logistics/852504650877/images/img_001.jpg",
"/tmp/1688-logistics/852504650877/images/img_002.jpg"
]
}
```
## After Running
Read the screenshots and images to extract:
- **Weight** (重量/毛重/净重/单件重量) — normalize to kg
- **Dimensions** (尺寸/长宽高) — normalize to cm
- **Volume** (体积/容积)
- **Package info** (包装信息) — packaging type, box weight, box dimensions, units per box
- **Piece weight/size** (商品件重尺) — per-piece logistics specs
- **Variant-specific** weight/size if shown per SKU
- **Shipping info** — method, cost, origin
Output the extracted data as structured JSON.
## Rules
1. If the browser is not running, report the error. Do not try to launch it.
2. No retries. If it fails, report as-is.
3. Read ALL screenshots — logistics data can appear anywhere on the page.
4. Read detail images too — weight/size is often baked into product photos.