| name | cobol-modernization |
| description | This skill provides guidance for translating COBOL programs to modern languages (Python, Java, etc.) while preserving exact behavior. It should be used when tasks involve COBOL-to-modern-language migration, legacy code translation, fixed-width file format handling, or ensuring byte-level compatibility between source and target implementations. |
COBOL Modernization
Overview
This skill guides the translation of COBOL programs to modern languages while ensuring functional equivalence. COBOL modernization requires meticulous attention to data formats, file handling semantics, and numeric precision to achieve byte-for-byte output compatibility.
Core Principles
Complete Source Understanding First
Before writing any translation code:
- Read the entire COBOL source - Never work from truncated views. If output is truncated, request the complete file in chunks or use multiple reads with offsets.
- Map all data structures - Identify every WORKING-STORAGE, FILE SECTION, and LINKAGE SECTION variable with exact PICTURE clauses.
- Document file formats - Create explicit specifications for every file's record layout including field positions, lengths, and data types.
- Identify all I/O operations - Note READ, WRITE, REWRITE, DELETE operations and their conditions.
Fixed-Width Record Handling
COBOL programs typically use fixed-width record formats. Critical considerations:
- Record length calculation - Sum all field lengths from the FD (File Description) or record definitions
- Padding behavior - COBOL pads strings with spaces and numbers with zeros
- Numeric formatting - PICTURE clauses like
9(10)mean 10-digit zero-padded numbers - Sign handling - Signed numbers may use trailing sign conventions
Verification Strategy
Implement a systematic comparison approach:
- Create baseline outputs - Run the original COBOL program to establish expected outputs
- Test incrementally - Verify each logical section before moving to the next
- Byte-level comparison - Use
difforcmpto verify exact output match, not just logical equivalence - Preserve test artifacts - Keep backup copies of all data files until verification is complete
Translation Workflow
Phase 1: Analysis
- Read and document all COBOL source files completely
- Map every data structure with exact field specifications:
Field Name | PICTURE | Length | Position | Type ACCOUNT-ID | X(5) | 5 | 1-5 | Alphanumeric BALANCE | 9(10) | 10 | 6-15 | Numeric - Document all file record layouts with byte positions
- Identify validation logic and business rules
- Note any COBOL-specific behaviors (REWRITE semantics, file status codes)
Phase 2: Implementation
- Implement data parsing functions that match exact COBOL field positions
- Use string slicing based on documented positions, not assumptions
- Implement numeric formatting to match PICTURE clauses exactly
- Handle file operations with equivalent semantics (especially REWRITE vs WRITE)
Phase 3: Verification
- Back up all original data files before any testing
- Run COBOL program and capture outputs
- Restore data files to original state
- Run translated program and capture outputs
- Compare outputs byte-by-byte
- Test multiple scenarios including:
- Happy path (valid transactions)
- Invalid inputs (missing records, bad references)
- Edge cases (zero values, maximum lengths)
- Boundary conditions (empty files, single records)
Common Pitfalls
Incomplete Source Reading
Problem: Working from truncated code views leads to missing logic. Prevention: Always verify the complete source is available. If truncated, use offset reading or request full file.
Record Length Mismatch
Problem: Input files don't match expected record lengths. Prevention: Calculate expected length from COBOL definitions. If files differ, investigate whether COBOL handles short records with padding.
Numeric Field Formatting
Problem: Python's default number-to-string conversion doesn't match COBOL's zero-padded format.
Prevention: Use explicit formatting: f"{value:010d}" for PIC 9(10).
REWRITE vs WRITE Semantics
Problem: COBOL's REWRITE updates a record in place; Python file operations differ. Prevention: For indexed files, read entire file, modify in memory, write complete file. Track record positions explicitly.
Premature Cleanup
Problem: Removing backup files before confirming task completion. Prevention: Keep all backups until explicit final verification succeeds.
Assuming Input Validity
Problem: Not testing with malformed or edge-case inputs. Prevention: Create explicit test cases for invalid inputs, boundary values, and empty files.
Verification Checklist
Before declaring translation complete:
- Complete COBOL source has been read (no truncation)
- All data structures documented with exact field positions
- All file formats documented with record layouts
- Translated code has been read back and verified complete
- COBOL baseline outputs captured for comparison
- Translated outputs match byte-for-byte
- Multiple test scenarios executed (valid, invalid, edge cases)
- Backup files preserved until final verification
Testing Script Pattern
Create a reusable comparison workflow:
# Backup original data
for f in *.DAT; do cp "$f" "${f}.orig"; done
# Run COBOL version
./run_cobol.sh
for f in *.DAT; do cp "$f" "${f%.DAT}_COBOL.DAT"; done
# Restore and run Python version
for f in *.DAT.orig; do cp "$f" "${f%.orig}"; done
python program.py
for f in *.DAT; do [[ ! "$f" =~ (COBOL|orig) ]] && cp "$f" "${f%.DAT}_PYTHON.DAT"; done
# Compare outputs
for f in *_COBOL.DAT; do
base="${f%_COBOL.DAT}"
diff -q "${base}_COBOL.DAT" "${base}_PYTHON.DAT" || echo "MISMATCH: $base"
done
Edge Cases to Test
- Insufficient balance - Buyer lacks funds for transaction
- Empty data files - How do both programs handle empty inputs?
- Malformed input - Non-numeric values, wrong field lengths
- Self-referential transactions - Buyer equals seller
- Multiple transactions - Batch processing if supported
- Maximum values - Largest values that fit in PICTURE clauses
- Missing references - Referenced records that don't exist