98 lines
2.7 KiB
Markdown
98 lines
2.7 KiB
Markdown
---
|
||
name: 1688-logistics-scraper
|
||
description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size data. Use when the user provides a 1688 product URL and needs logistics specs."
|
||
---
|
||
|
||
# 1688 Logistics Scraper
|
||
|
||
Capture 1688 product pages and extract weight/size data via vision.
|
||
|
||
## Run
|
||
|
||
```bash
|
||
bun scripts/run.ts scrape <url> [--dry-run] [--port=9222]
|
||
```
|
||
|
||
## What It Does
|
||
|
||
1. Opens the 1688 product URL in the browser (default port 18800)
|
||
2. Scrolls through the entire page, capturing full-page screenshots
|
||
3. Downloads all product detail images
|
||
4. Saves to `/tmp/1688-logistics/<offer-id>/`
|
||
|
||
## After Running — MUST follow
|
||
|
||
Read ALL screenshots and detail images, then output the following JSON structure. This is the final output for API consumption.
|
||
|
||
```json
|
||
{
|
||
"offerId": "966107271425",
|
||
"url": "https://detail.1688.com/offer/966107271425.html",
|
||
"title": "商品标题",
|
||
"weight": {
|
||
"value": 0.15,
|
||
"unit": "kg",
|
||
"source": "商品属性"
|
||
},
|
||
"grossWeight": {
|
||
"value": 0.2,
|
||
"unit": "kg",
|
||
"source": "商品件重尺"
|
||
},
|
||
"netWeight": {
|
||
"value": 0.15,
|
||
"unit": "kg",
|
||
"source": "商品属性"
|
||
},
|
||
"dimensions": {
|
||
"length": 10,
|
||
"width": 8,
|
||
"height": 1.8,
|
||
"unit": "cm",
|
||
"source": "商品属性"
|
||
},
|
||
"volume": {
|
||
"value": 0.000144,
|
||
"unit": "m³",
|
||
"source": "商品件重尺"
|
||
},
|
||
"packageWeight": {
|
||
"value": 5.0,
|
||
"unit": "kg",
|
||
"source": "包装信息"
|
||
},
|
||
"packageDimensions": {
|
||
"length": 40,
|
||
"width": 30,
|
||
"height": 20,
|
||
"unit": "cm",
|
||
"source": "包装信息"
|
||
},
|
||
"unitsPerPackage": 50,
|
||
"variants": [
|
||
{
|
||
"name": "12支装",
|
||
"weight": { "value": 0.12, "unit": "kg" },
|
||
"dimensions": { "length": 9.5, "width": 6, "height": 2.2, "unit": "cm" }
|
||
}
|
||
]
|
||
}
|
||
```
|
||
|
||
### Field rules
|
||
|
||
- **All weight values normalized to kg** (克÷1000, 斤×0.5)
|
||
- **All dimension values normalized to cm** (mm÷10)
|
||
- **`source`**: where on the page the data was found (商品属性 / 商品件重尺 / 包装信息 / 详情图片)
|
||
- **`variants`**: only include if weight/size differs per SKU. Omit if all variants share the same specs.
|
||
- **Omit fields that are `null`** — do not include fields where no data was found
|
||
- **Do not guess.** Only include values actually visible on the page or in images.
|
||
|
||
## Rules
|
||
|
||
1. If the browser is not running, report the error. Do not try to launch it.
|
||
2. No retries. If it fails, report as-is.
|
||
3. Read ALL screenshots — logistics data can appear anywhere on the page.
|
||
4. Read detail images too — weight/size is often baked into product photos.
|
||
5. Output ONLY the structured JSON above. No extra commentary.
|