1688-logistics-scraper/SKILL.md

3.4 KiB

name description
1688-logistics-scraper Extract product weight/size/logistics data from 1688 product pages via Chrome browser, output structured JSON. Use when the user provides a 1688 product URL and needs logistics specs.

1688 Logistics Scraper

Extract product weight, size, and logistics data from 1688 product pages.

Run

bun scripts/run.ts scrape <url> [--dry-run]

Examples

bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html'
bun scripts/run.ts scrape 'https://detail.1688.com/offer/852504650877.html' --dry-run

What It Does

  1. Opens the 1688 product URL in the browser (port 18800)
  2. Extracts weight/size data from wherever it appears on the page — product attributes, variant specs, 包装信息, 商品件重尺 table, logistics section
  3. Downloads detail images (商品详情图片) for analysis — weight/size is often only in images
  4. Outputs structured JSON

Where To Look For Data

Weight/size data on 1688 pages hides in multiple places. Check all before giving up:

  1. Product attributes (商品属性 / 商品参数) — key-value table, most reliable
  2. 商品件重尺 table — dedicated weight/dimensions/volume table for logistics
  3. 包装信息 section — packaging type, box weight, box dimensions, units per box
  4. Variant/SKU specs — per-variant weight or size
  5. Logistics section — shipping weight, volume, freight info
  6. Detail images — downloaded to /tmp/1688-logistics/<offer-id>/, read them to find weight/size text baked into images

Output

{
  "status": "success",
  "url": "https://detail.1688.com/offer/...",
  "product": {
    "title": "产品标题",
    "logistics": {
      "weight": { "value": 0.5, "unit": "kg", "source": "attributes" },
      "dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "attributes" },
      "grossWeight": null,
      "netWeight": null,
      "packageWeight": { "value": 2.0, "unit": "kg", "source": "packageInfo" },
      "volume": null,
      "shippingMethod": null,
      "shippingCost": null,
      "origin": null
    },
    "variants": [
      { "name": "颜色: 红色", "weight": null, "dimensions": null }
    ],
    "packageInfo": {
      "packagingType": "纸箱",
      "packagingWeight": { "value": 2.0, "unit": "kg", "source": "packageInfo" },
      "packagingDimensions": { "length": 40, "width": 30, "height": 20, "unit": "cm", "source": "packageInfo" },
      "unitsPerPackage": 50,
      "raw": { "包装方式": "纸箱", "箱规": "40*30*20cm", "装箱数": "50" }
    },
    "pieceWeightSize": {
      "weight": { "value": 0.5, "unit": "kg", "source": "pieceWeightSize" },
      "dimensions": { "length": 30, "width": 20, "height": 10, "unit": "cm", "source": "pieceWeightSize" },
      "volume": null,
      "raw": { "重量": "500g", "尺寸": "30*20*10cm" }
    }
  },
  "detailImages": ["/tmp/1688-logistics/852504650877/img_001.jpg"],
  "rawAttributes": { "重量": "0.5kg", "尺寸": "30*20*10cm" }
}

null = not found in text. Check detailImages — the data may be in the images.

Rules

  1. If the browser is not running, report the error. Do not try to launch it.
  2. Check all data sources before reporting null.
  3. Normalize units: 克→kg, 毫米→cm. Keep raw values in rawAttributes and raw fields.
  4. No retries. If it fails, report as-is.
  5. Trust page content. Do not guess values.