name	korean-public-data-api
description	Extract API request/response schema from Korean Public Data Portal (data.go.kr) documentation pages and generate structured JSON representation

Korean Public Data Portal API Schema Extractor

Extract API data structure from Korean Public Data Portal (data.go.kr) documentation pages and generate JSON schema.

Purpose

Parse HTML from Korean Public Data Portal API documentation pages to extract field specifications (field name, data type, description) and generate a structured JSON representation.

Input

User provides a URL to a Korean Public Data Portal API documentation page.

Example: https://www.data.go.kr/data/15058782/openapi.do

Task

Fetch HTML Content
- Use WebFetch tool with the provided URL
- Prompt WebFetch to extract API field specifications from sections like:
  - "출력 메시지 명세" (Output Message Specification)
  - "응답 메시지" (Response Message)
  - "요청 메시지" (Request Message)
  - Field tables with columns: 항목명, 항목설명, 샘플데이터, etc.
Parse Field Information
- Extract for each field:
  - name: Field name (technical identifier)
  - type: Data type (string, number, integer, boolean, object, array)
  - description: Korean description
Infer Data Types
- Use field names, descriptions, and sample data to infer types:
  - String: Text, codes, names, dates in string format
  - Number/Integer: Numeric values, counts, IDs that are numeric
  - Boolean: true/false indicators
  - Object: Nested structures (e.g., header, body)
  - Array: Lists of items

Handle Nested Structures

Common public data portal response structure:

response
  └─ header (object)
      ├─ resultCode (string)
      └─ resultMsg (string)
  └─ body (object)
      ├─ items (array of objects)
      ├─ numOfRows (integer)
      ├─ pageNo (integer)
      └─ totalCount (integer)

For nested objects, create recursive field definitions
For arrays, specify itemType and nested fields

Generate JSON Schema

Output format:

{
  "apiName": "API 이름",
  "url": "원본 URL",
  "extractedAt": "ISO 8601 timestamp",
  "requestParams": [
    {
      "name": "param_name",
      "type": "string",
      "required": true,
      "description": "파라미터 설명"
    }
  ],
  "responseSchema": {
    "type": "object",
    "fields": [
      {
        "name": "header",
        "type": "object",
        "description": "응답 헤더",
        "fields": [
          {
            "name": "resultCode",
            "type": "string",
            "description": "결과 코드"
          },
          {
            "name": "resultMsg",
            "type": "string",
            "description": "결과 메시지"
          }
        ]
      },
      {
        "name": "body",
        "type": "object",
        "description": "응답 본문",
        "fields": [
          {
            "name": "items",
            "type": "array",
            "description": "데이터 목록",
            "itemType": "object",
            "fields": [
              {
                "name": "fieldName",
                "type": "string",
                "description": "필드 설명"
              }
            ]
          },
          {
            "name": "numOfRows",
            "type": "integer",
            "description": "한 페이지 결과 수"
          },
          {
            "name": "pageNo",
            "type": "integer",
            "description": "페이지 번호"
          },
          {
            "name": "totalCount",
            "type": "integer",
            "description": "전체 결과 수"
          }
        ]
      }
    ]
  }
}

Implementation Steps

Use WebFetch to retrieve HTML and extract field information
- Prompt should ask for field tables, request/response specifications
Process the extracted data
- Organize fields into logical groups (request params, response fields)
- Infer data types based on:
  - Field naming conventions (e.g., "Cnt" → integer, "Name" → string, "No" → string)
  - Korean descriptions (e.g., "코드" → string, "개수" → integer, "여부" → boolean)
  - Sample data if available
Build nested structure
- Default assumption: Public data portal APIs use header/body structure
- Items are typically in body.items as array
- Pagination fields (numOfRows, pageNo, totalCount) in body
Format as JSON
- Use proper indentation
- Include metadata (API name, URL, extraction timestamp)
- Present the complete schema to the user

Error Handling

If WebFetch fails or no fields found, return error:

{
  "success": false,
  "error": "Unable to extract field specifications",
  "url": "provided URL"
}

Type Inference Rules

String: Default type, names, codes, dates (YYYYMMDD format), times
Integer: Counts (Cnt suffix), numbers (No suffix when numeric), page numbers, totals
Number: Decimals, rates, percentages
Boolean: 여부 (yes/no indicators), flags
Object: header, body, nested structures
Array: items, lists (명단, 목록)

Example Workflow

User: "Extract schema from https://www.data.go.kr/data/15058782/openapi.do"

Agent:

Fetches HTML with WebFetch
Extracts fields: hrName (horse name), hrNo (horse number), trDate (training date), etc.
Infers types: all are strings based on field descriptions
Constructs JSON schema with:
- Request params section
- Response schema with assumed header/body structure
- Items array containing the extracted fields
Returns formatted JSON to user

Notes

Always include extraction timestamp
Preserve Korean descriptions exactly as found
If uncertain about nesting, default to flat structure under body.items
Common patterns in public data APIs:
- Pagination: numOfRows, pageNo, totalCount
- Response codes: resultCode, resultMsg
- Date formats: YYYYMMDD, YYYYMMDDhhmmss

korean-public-data-api

Install Skill

SKILL.md