feat: define structured JSON output schema for API consumption

SKILL.md now specifies exact JSON structure the model must output
after reading screenshots. Weight in kg, dimensions in cm, omit nulls.
Ready for downstream API integration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
ywkj 2026-03-30 12:18:15 +08:00
parent 87920a9503
commit a780041840
1 changed files with 64 additions and 37 deletions

101
SKILL.md
View File

@ -1,11 +1,11 @@
--- ---
name: 1688-logistics-scraper name: 1688-logistics-scraper
description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size/logistics data. Use when the user provides a 1688 product URL and needs logistics specs." description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size data. Use when the user provides a 1688 product URL and needs logistics specs."
--- ---
# 1688 Logistics Scraper # 1688 Logistics Scraper
Capture 1688 product pages for vision-based extraction of weight, size, and logistics data. Capture 1688 product pages and extract weight/size data via vision.
## Run ## Run
@ -13,54 +13,80 @@ Capture 1688 product pages for vision-based extraction of weight, size, and logi
bun scripts/run.ts scrape <url> [--dry-run] [--port=9222] bun scripts/run.ts scrape <url> [--dry-run] [--port=9222]
``` ```
### Examples
```bash
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run
```
## What It Does ## What It Does
1. Opens the 1688 product URL in the browser (default port 18800) 1. Opens the 1688 product URL in the browser (default port 18800)
2. Scrolls through the entire page, capturing **full-page screenshots** section by section 2. Scrolls through the entire page, capturing full-page screenshots
3. Downloads all **product detail images** (large images only, skips icons) 3. Downloads all product detail images
4. Saves everything to `/tmp/1688-logistics/<offer-id>/` 4. Saves to `/tmp/1688-logistics/<offer-id>/`
**No DOM parsing or regex.** The model reads the screenshots and images directly to extract logistics data. ## After Running — MUST follow
## Output Read ALL screenshots and detail images, then output the following JSON structure. This is the final output for API consumption.
```json ```json
{ {
"status": "success", "offerId": "966107271425",
"url": "https://detail.1688.com/offer/...", "url": "https://detail.1688.com/offer/966107271425.html",
"offerId": "852504650877", "title": "商品标题",
"screenshots": [ "weight": {
"/tmp/1688-logistics/852504650877/screenshots/page_001.png", "value": 0.15,
"/tmp/1688-logistics/852504650877/screenshots/page_002.png", "unit": "kg",
"/tmp/1688-logistics/852504650877/screenshots/page_003.png" "source": "商品属性"
], },
"detailImages": [ "grossWeight": {
"/tmp/1688-logistics/852504650877/images/img_001.jpg", "value": 0.2,
"/tmp/1688-logistics/852504650877/images/img_002.jpg" "unit": "kg",
"source": "商品件重尺"
},
"netWeight": {
"value": 0.15,
"unit": "kg",
"source": "商品属性"
},
"dimensions": {
"length": 10,
"width": 8,
"height": 1.8,
"unit": "cm",
"source": "商品属性"
},
"volume": {
"value": 0.000144,
"unit": "m³",
"source": "商品件重尺"
},
"packageWeight": {
"value": 5.0,
"unit": "kg",
"source": "包装信息"
},
"packageDimensions": {
"length": 40,
"width": 30,
"height": 20,
"unit": "cm",
"source": "包装信息"
},
"unitsPerPackage": 50,
"variants": [
{
"name": "12支装",
"weight": { "value": 0.12, "unit": "kg" },
"dimensions": { "length": 9.5, "width": 6, "height": 2.2, "unit": "cm" }
}
] ]
} }
``` ```
## After Running ### Field rules
Read the screenshots and images to extract: - **All weight values normalized to kg** (克÷1000, 斤×0.5)
- **All dimension values normalized to cm** (mm÷10)
- **Weight** (重量/毛重/净重/单件重量) — normalize to kg - **`source`**: where on the page the data was found (商品属性 / 商品件重尺 / 包装信息 / 详情图片)
- **Dimensions** (尺寸/长宽高) — normalize to cm - **`variants`**: only include if weight/size differs per SKU. Omit if all variants share the same specs.
- **Volume** (体积/容积) - **Omit fields that are `null`** — do not include fields where no data was found
- **Package info** (包装信息) — packaging type, box weight, box dimensions, units per box - **Do not guess.** Only include values actually visible on the page or in images.
- **Piece weight/size** (商品件重尺) — per-piece logistics specs
- **Variant-specific** weight/size if shown per SKU
- **Shipping info** — method, cost, origin
Output the extracted data as structured JSON.
## Rules ## Rules
@ -68,3 +94,4 @@ Output the extracted data as structured JSON.
2. No retries. If it fails, report as-is. 2. No retries. If it fails, report as-is.
3. Read ALL screenshots — logistics data can appear anywhere on the page. 3. Read ALL screenshots — logistics data can appear anywhere on the page.
4. Read detail images too — weight/size is often baked into product photos. 4. Read detail images too — weight/size is often baked into product photos.
5. Output ONLY the structured JSON above. No extra commentary.