From a780041840b05e084d8b423d7cb7a522441d36f5 Mon Sep 17 00:00:00 2001 From: ywkj Date: Mon, 30 Mar 2026 12:18:15 +0800 Subject: [PATCH] feat: define structured JSON output schema for API consumption SKILL.md now specifies exact JSON structure the model must output after reading screenshots. Weight in kg, dimensions in cm, omit nulls. Ready for downstream API integration. Co-Authored-By: Claude Opus 4.6 --- SKILL.md | 101 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 64 insertions(+), 37 deletions(-) diff --git a/SKILL.md b/SKILL.md index 02c3dcf..bd89c67 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,11 +1,11 @@ --- name: 1688-logistics-scraper -description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size/logistics data. Use when the user provides a 1688 product URL and needs logistics specs." +description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size data. Use when the user provides a 1688 product URL and needs logistics specs." --- # 1688 Logistics Scraper -Capture 1688 product pages for vision-based extraction of weight, size, and logistics data. +Capture 1688 product pages and extract weight/size data via vision. ## Run @@ -13,54 +13,80 @@ Capture 1688 product pages for vision-based extraction of weight, size, and logi bun scripts/run.ts scrape [--dry-run] [--port=9222] ``` -### Examples - -```bash -bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' -bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run -``` - ## What It Does 1. Opens the 1688 product URL in the browser (default port 18800) -2. Scrolls through the entire page, capturing **full-page screenshots** section by section -3. Downloads all **product detail images** (large images only, skips icons) -4. Saves everything to `/tmp/1688-logistics//` +2. Scrolls through the entire page, capturing full-page screenshots +3. Downloads all product detail images +4. Saves to `/tmp/1688-logistics//` -**No DOM parsing or regex.** The model reads the screenshots and images directly to extract logistics data. +## After Running — MUST follow -## Output +Read ALL screenshots and detail images, then output the following JSON structure. This is the final output for API consumption. ```json { - "status": "success", - "url": "https://detail.1688.com/offer/...", - "offerId": "852504650877", - "screenshots": [ - "/tmp/1688-logistics/852504650877/screenshots/page_001.png", - "/tmp/1688-logistics/852504650877/screenshots/page_002.png", - "/tmp/1688-logistics/852504650877/screenshots/page_003.png" - ], - "detailImages": [ - "/tmp/1688-logistics/852504650877/images/img_001.jpg", - "/tmp/1688-logistics/852504650877/images/img_002.jpg" + "offerId": "966107271425", + "url": "https://detail.1688.com/offer/966107271425.html", + "title": "商品标题", + "weight": { + "value": 0.15, + "unit": "kg", + "source": "商品属性" + }, + "grossWeight": { + "value": 0.2, + "unit": "kg", + "source": "商品件重尺" + }, + "netWeight": { + "value": 0.15, + "unit": "kg", + "source": "商品属性" + }, + "dimensions": { + "length": 10, + "width": 8, + "height": 1.8, + "unit": "cm", + "source": "商品属性" + }, + "volume": { + "value": 0.000144, + "unit": "m³", + "source": "商品件重尺" + }, + "packageWeight": { + "value": 5.0, + "unit": "kg", + "source": "包装信息" + }, + "packageDimensions": { + "length": 40, + "width": 30, + "height": 20, + "unit": "cm", + "source": "包装信息" + }, + "unitsPerPackage": 50, + "variants": [ + { + "name": "12支装", + "weight": { "value": 0.12, "unit": "kg" }, + "dimensions": { "length": 9.5, "width": 6, "height": 2.2, "unit": "cm" } + } ] } ``` -## After Running +### Field rules -Read the screenshots and images to extract: - -- **Weight** (重量/毛重/净重/单件重量) — normalize to kg -- **Dimensions** (尺寸/长宽高) — normalize to cm -- **Volume** (体积/容积) -- **Package info** (包装信息) — packaging type, box weight, box dimensions, units per box -- **Piece weight/size** (商品件重尺) — per-piece logistics specs -- **Variant-specific** weight/size if shown per SKU -- **Shipping info** — method, cost, origin - -Output the extracted data as structured JSON. +- **All weight values normalized to kg** (克÷1000, 斤×0.5) +- **All dimension values normalized to cm** (mm÷10) +- **`source`**: where on the page the data was found (商品属性 / 商品件重尺 / 包装信息 / 详情图片) +- **`variants`**: only include if weight/size differs per SKU. Omit if all variants share the same specs. +- **Omit fields that are `null`** — do not include fields where no data was found +- **Do not guess.** Only include values actually visible on the page or in images. ## Rules @@ -68,3 +94,4 @@ Output the extracted data as structured JSON. 2. No retries. If it fails, report as-is. 3. Read ALL screenshots — logistics data can appear anywhere on the page. 4. Read detail images too — weight/size is often baked into product photos. +5. Output ONLY the structured JSON above. No extra commentary.