diff --git a/SKILL.md b/SKILL.md index 02c3dcf..bd89c67 100644 --- a/SKILL.md +++ b/SKILL.md @@ -1,11 +1,11 @@ --- name: 1688-logistics-scraper -description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size/logistics data. Use when the user provides a 1688 product URL and needs logistics specs." +description: "Scrape 1688 product pages via Chrome, capture full-page screenshots and detail images for vision-based extraction of weight/size data. Use when the user provides a 1688 product URL and needs logistics specs." --- # 1688 Logistics Scraper -Capture 1688 product pages for vision-based extraction of weight, size, and logistics data. +Capture 1688 product pages and extract weight/size data via vision. ## Run @@ -13,54 +13,80 @@ Capture 1688 product pages for vision-based extraction of weight, size, and logi bun scripts/run.ts scrape [--dry-run] [--port=9222] ``` -### Examples - -```bash -bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' -bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run -``` - ## What It Does 1. Opens the 1688 product URL in the browser (default port 18800) -2. Scrolls through the entire page, capturing **full-page screenshots** section by section -3. Downloads all **product detail images** (large images only, skips icons) -4. Saves everything to `/tmp/1688-logistics//` +2. Scrolls through the entire page, capturing full-page screenshots +3. Downloads all product detail images +4. Saves to `/tmp/1688-logistics//` -**No DOM parsing or regex.** The model reads the screenshots and images directly to extract logistics data. +## After Running — MUST follow -## Output +Read ALL screenshots and detail images, then output the following JSON structure. This is the final output for API consumption. ```json { - "status": "success", - "url": "https://detail.1688.com/offer/...", - "offerId": "852504650877", - "screenshots": [ - "/tmp/1688-logistics/852504650877/screenshots/page_001.png", - "/tmp/1688-logistics/852504650877/screenshots/page_002.png", - "/tmp/1688-logistics/852504650877/screenshots/page_003.png" - ], - "detailImages": [ - "/tmp/1688-logistics/852504650877/images/img_001.jpg", - "/tmp/1688-logistics/852504650877/images/img_002.jpg" + "offerId": "966107271425", + "url": "https://detail.1688.com/offer/966107271425.html", + "title": "商品标题", + "weight": { + "value": 0.15, + "unit": "kg", + "source": "商品属性" + }, + "grossWeight": { + "value": 0.2, + "unit": "kg", + "source": "商品件重尺" + }, + "netWeight": { + "value": 0.15, + "unit": "kg", + "source": "商品属性" + }, + "dimensions": { + "length": 10, + "width": 8, + "height": 1.8, + "unit": "cm", + "source": "商品属性" + }, + "volume": { + "value": 0.000144, + "unit": "m³", + "source": "商品件重尺" + }, + "packageWeight": { + "value": 5.0, + "unit": "kg", + "source": "包装信息" + }, + "packageDimensions": { + "length": 40, + "width": 30, + "height": 20, + "unit": "cm", + "source": "包装信息" + }, + "unitsPerPackage": 50, + "variants": [ + { + "name": "12支装", + "weight": { "value": 0.12, "unit": "kg" }, + "dimensions": { "length": 9.5, "width": 6, "height": 2.2, "unit": "cm" } + } ] } ``` -## After Running +### Field rules -Read the screenshots and images to extract: - -- **Weight** (重量/毛重/净重/单件重量) — normalize to kg -- **Dimensions** (尺寸/长宽高) — normalize to cm -- **Volume** (体积/容积) -- **Package info** (包装信息) — packaging type, box weight, box dimensions, units per box -- **Piece weight/size** (商品件重尺) — per-piece logistics specs -- **Variant-specific** weight/size if shown per SKU -- **Shipping info** — method, cost, origin - -Output the extracted data as structured JSON. +- **All weight values normalized to kg** (克÷1000, 斤×0.5) +- **All dimension values normalized to cm** (mm÷10) +- **`source`**: where on the page the data was found (商品属性 / 商品件重尺 / 包装信息 / 详情图片) +- **`variants`**: only include if weight/size differs per SKU. Omit if all variants share the same specs. +- **Omit fields that are `null`** — do not include fields where no data was found +- **Do not guess.** Only include values actually visible on the page or in images. ## Rules @@ -68,3 +94,4 @@ Output the extracted data as structured JSON. 2. No retries. If it fails, report as-is. 3. Read ALL screenshots — logistics data can appear anywhere on the page. 4. Read detail images too — weight/size is often baked into product photos. +5. Output ONLY the structured JSON above. No extra commentary.