91 lines
3.4 KiB
Markdown
91 lines
3.4 KiB
Markdown
---
|
|
name: 1688-logistics-scraper
|
|
description: "Extract product weight/size/logistics data from 1688 product pages via Chrome browser, output structured JSON. Use when the user provides a 1688 product URL and needs logistics specs."
|
|
---
|
|
|
|
# 1688 Logistics Scraper
|
|
|
|
Extract product weight, size, and logistics data from 1688 product pages.
|
|
|
|
## Run
|
|
|
|
```bash
|
|
bun scripts/run.ts scrape <url> [--dry-run]
|
|
```
|
|
|
|
### Examples
|
|
|
|
```bash
|
|
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
|
|
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
|
|
```
|
|
|
|
## What It Does
|
|
|
|
1. Opens the 1688 product URL in the browser (port 18800)
|
|
2. Extracts weight/size data from wherever it appears on the page — product attributes, variant specs, 包装信息, 商品件重尺 table, logistics section
|
|
3. Downloads detail images (商品详情图片) for analysis — weight/size is often only in images
|
|
4. Outputs structured JSON
|
|
|
|
## Where To Look For Data
|
|
|
|
Weight/size data on 1688 pages hides in multiple places. Check all before giving up:
|
|
|
|
1. **Product attributes** (商品属性 / 商品参数) — key-value table, most reliable
|
|
2. **商品件重尺 table** — dedicated weight/dimensions/volume table for logistics
|
|
3. **包装信息 section** — packaging type, box weight, box dimensions, units per box
|
|
4. **Variant/SKU specs** — per-variant weight or size
|
|
5. **Logistics section** — shipping weight, volume, freight info
|
|
6. **Detail images** — downloaded to `/tmp/1688-logistics/<offer-id>/`, read them to find weight/size text baked into images
|
|
|
|
## Output
|
|
|
|
```json
|
|
{
|
|
"status": "success",
|
|
"url": "https://detail.1688.com/offer/...",
|
|
"product": {
|
|
"title": "产品标题",
|
|
"logistics": {
|
|
"weight": { "value": 0.5, "unit": "kg", "source": "attributes" },
|
|
"dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "attributes" },
|
|
"grossWeight": null,
|
|
"netWeight": null,
|
|
"packageWeight": { "value": 2.0, "unit": "kg", "source": "packageInfo" },
|
|
"volume": null,
|
|
"shippingMethod": null,
|
|
"shippingCost": null,
|
|
"origin": null
|
|
},
|
|
"variants": [
|
|
{ "name": "颜色: 红色", "weight": null, "dimensions": null }
|
|
],
|
|
"packageInfo": {
|
|
"packagingType": "纸箱",
|
|
"packagingWeight": { "value": 2.0, "unit": "kg", "source": "packageInfo" },
|
|
"packagingDimensions": { "length": 40, "width": 30, "height": 20, "unit": "cm", "source": "packageInfo" },
|
|
"unitsPerPackage": 50,
|
|
"raw": { "包装方式": "纸箱", "箱规": "40*30*20cm", "装箱数": "50" }
|
|
},
|
|
"pieceWeightSize": {
|
|
"weight": { "value": 0.5, "unit": "kg", "source": "pieceWeightSize" },
|
|
"dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "pieceWeightSize" },
|
|
"volume": null,
|
|
"raw": { "重量": "500g", "尺寸": "30*20*10cm" }
|
|
}
|
|
},
|
|
"detailImages": ["/tmp/1688-logistics/852504650877/img_001.jpg"],
|
|
"rawAttributes": { "重量": "0.5kg", "尺寸": "30*20*10cm" }
|
|
}
|
|
```
|
|
|
|
`null` = not found in text. Check `detailImages` — the data may be in the images.
|
|
|
|
## Rules
|
|
|
|
1. If the browser is not running, report the error. Do not try to launch it.
|
|
2. Check all data sources before reporting `null`.
|
|
3. Normalize units: 克→kg, 毫米→cm. Keep raw values in `rawAttributes` and `raw` fields.
|
|
4. No retries. If it fails, report as-is.
|
|
5. Trust page content. Do not guess values.
|