| name | korean-public-data-api |
| description | Extract API request/response schema from Korean Public Data Portal (data.go.kr) documentation pages and generate structured JSON representation |
Korean Public Data Portal API Schema Extractor
Extract API data structure from Korean Public Data Portal (data.go.kr) documentation pages and generate JSON schema.
Purpose
Parse HTML from Korean Public Data Portal API documentation pages to extract field specifications (field name, data type, description) and generate a structured JSON representation.
Input
User provides a URL to a Korean Public Data Portal API documentation page.
Example: https://www.data.go.kr/data/15058782/openapi.do
Task
Fetch HTML Content
- Use WebFetch tool with the provided URL
- Prompt WebFetch to extract API field specifications from sections like:
- "출력 메시지 명세" (Output Message Specification)
- "응답 메시지" (Response Message)
- "요청 메시지" (Request Message)
- Field tables with columns: 항목명, 항목설명, 샘플데이터, etc.
Parse Field Information
- Extract for each field:
- name: Field name (technical identifier)
- type: Data type (string, number, integer, boolean, object, array)
- description: Korean description
- Extract for each field:
Infer Data Types
- Use field names, descriptions, and sample data to infer types:
- String: Text, codes, names, dates in string format
- Number/Integer: Numeric values, counts, IDs that are numeric
- Boolean: true/false indicators
- Object: Nested structures (e.g., header, body)
- Array: Lists of items
- Use field names, descriptions, and sample data to infer types:
Handle Nested Structures
- Common public data portal response structure:
response └─ header (object) ├─ resultCode (string) └─ resultMsg (string) └─ body (object) ├─ items (array of objects) ├─ numOfRows (integer) ├─ pageNo (integer) └─ totalCount (integer) - For nested objects, create recursive field definitions
- For arrays, specify itemType and nested fields
- Common public data portal response structure:
Generate JSON Schema
Output format:
{
"apiName": "API 이름",
"url": "원본 URL",
"extractedAt": "ISO 8601 timestamp",
"requestParams": [
{
"name": "param_name",
"type": "string",
"required": true,
"description": "파라미터 설명"
}
],
"responseSchema": {
"type": "object",
"fields": [
{
"name": "header",
"type": "object",
"description": "응답 헤더",
"fields": [
{
"name": "resultCode",
"type": "string",
"description": "결과 코드"
},
{
"name": "resultMsg",
"type": "string",
"description": "결과 메시지"
}
]
},
{
"name": "body",
"type": "object",
"description": "응답 본문",
"fields": [
{
"name": "items",
"type": "array",
"description": "데이터 목록",
"itemType": "object",
"fields": [
{
"name": "fieldName",
"type": "string",
"description": "필드 설명"
}
]
},
{
"name": "numOfRows",
"type": "integer",
"description": "한 페이지 결과 수"
},
{
"name": "pageNo",
"type": "integer",
"description": "페이지 번호"
},
{
"name": "totalCount",
"type": "integer",
"description": "전체 결과 수"
}
]
}
]
}
}
Implementation Steps
Use WebFetch to retrieve HTML and extract field information
- Prompt should ask for field tables, request/response specifications
Process the extracted data
- Organize fields into logical groups (request params, response fields)
- Infer data types based on:
- Field naming conventions (e.g., "Cnt" → integer, "Name" → string, "No" → string)
- Korean descriptions (e.g., "코드" → string, "개수" → integer, "여부" → boolean)
- Sample data if available
Build nested structure
- Default assumption: Public data portal APIs use header/body structure
- Items are typically in body.items as array
- Pagination fields (numOfRows, pageNo, totalCount) in body
Format as JSON
- Use proper indentation
- Include metadata (API name, URL, extraction timestamp)
- Present the complete schema to the user
Error Handling
- If WebFetch fails or no fields found, return error:
{ "success": false, "error": "Unable to extract field specifications", "url": "provided URL" }
- If WebFetch fails or no fields found, return error:
Type Inference Rules
- String: Default type, names, codes, dates (YYYYMMDD format), times
- Integer: Counts (Cnt suffix), numbers (No suffix when numeric), page numbers, totals
- Number: Decimals, rates, percentages
- Boolean: 여부 (yes/no indicators), flags
- Object: header, body, nested structures
- Array: items, lists (명단, 목록)
Example Workflow
User: "Extract schema from https://www.data.go.kr/data/15058782/openapi.do"
Agent:
- Fetches HTML with WebFetch
- Extracts fields: hrName (horse name), hrNo (horse number), trDate (training date), etc.
- Infers types: all are strings based on field descriptions
- Constructs JSON schema with:
- Request params section
- Response schema with assumed header/body structure
- Items array containing the extracted fields
- Returns formatted JSON to user
Notes
- Always include extraction timestamp
- Preserve Korean descriptions exactly as found
- If uncertain about nesting, default to flat structure under body.items
- Common patterns in public data APIs:
- Pagination: numOfRows, pageNo, totalCount
- Response codes: resultCode, resultMsg
- Date formats: YYYYMMDD, YYYYMMDDhhmmss