name

xlsx-editor

description

Specialized guidance for editing EXISTING Excel files with emphasis on preserving formulas, formatting, and structure across multiple tabs. Use this skill when modifying, updating, or adding data to existing .xlsx files where maintaining integrity is critical—particularly for multi-tab workbooks, complex formulas, formatted tables, and data that must maintain sort order and relationships across sheets.

Excel Editor - Editing Existing Files

Overview

This skill provides specialized workflows for editing existing Excel files while preserving their integrity. It addresses the unique challenges of modifying live workbooks: maintaining formula relationships across tabs, preserving formatting and structure, ensuring data insertion maintains sort order, and guaranteeing completeness of multi-tab updates.

When to Use This Skill

Use this skill when:

Modifying existing Excel files (not creating new ones)
Working with multi-tab workbooks where changes must cascade correctly
Adding data that needs to integrate with existing tables while maintaining sort order
Updating formulas that reference cells across multiple sheets
Preserving complex formatting, merged cells, or conditional formatting
Ensuring all related tables across tabs are updated completely

Critical Principles for Editing Existing Files

1. Always Analyze Before Editing

Never dive directly into making changes. First understand the file's structure, dependencies, and relationships.

2. Preserve, Don't Replace

The existing file has established patterns. Maintain its structure, formulas, and formatting conventions rather than imposing new ones.

3. Multi-Tab Completeness is Mandatory

If data appears in multiple tabs, ALL instances must be updated. Missing one tab is a critical failure.

4. Validate Exhaustively

After edits, verify formulas, check for blank rows, confirm sort order, and ensure multi-tab consistency.

Pre-Edit Analysis Workflow

MANDATORY: Always perform this analysis before making any edits.

from openpyxl import load_workbook
import pandas as pd

# Load file without modifying it
wb = load_workbook('existing.xlsx', data_only=False)

# 1. INVENTORY ALL SHEETS
print("Sheets in workbook:", wb.sheetnames)
for sheet_name in wb.sheetnames:
    sheet = wb[sheet_name]
    print(f"\n{sheet_name}:")
    print(f"  Dimensions: {sheet.dimensions}")
    print(f"  Max row: {sheet.max_row}, Max col: {sheet.max_column}")

# 2. IDENTIFY DATA STRUCTURES
# For each sheet, identify:
# - Table boundaries (first/last row and column)
# - Header rows
# - Sort keys (which columns determine order)
# - Merged cells
for sheet_name in wb.sheetnames:
    sheet = wb[sheet_name]
    print(f"\nMerged cells in {sheet_name}:", list(sheet.merged_cells))

# 3. MAP FORMULAS AND DEPENDENCIES
# Find all formulas and note cross-sheet references
for sheet_name in wb.sheetnames:
    sheet = wb[sheet_name]
    formulas = []
    for row in sheet.iter_rows():
        for cell in row:
            if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
                formulas.append({
                    'cell': cell.coordinate,
                    'formula': cell.value,
                    'cross_sheet': '!' in cell.value
                })
    if formulas:
        print(f"\n{sheet_name} has {len(formulas)} formulas")
        cross_sheet = [f for f in formulas if f['cross_sheet']]
        if cross_sheet:
            print(f"  {len(cross_sheet)} have cross-sheet references")
            for f in cross_sheet[:3]:  # Show first 3 examples
                print(f"    {f['cell']}: {f['formula']}")

# 4. UNDERSTAND SORT ORDER
# Read data to understand current ordering
for sheet_name in wb.sheetnames:
    df = pd.read_excel('existing.xlsx', sheet_name=sheet_name)
    if not df.empty:
        print(f"\n{sheet_name} current sort:")
        print(f"  First 5 rows of first column: {df.iloc[:5, 0].tolist()}")
        print(f"  Last 5 rows of first column: {df.iloc[-5:, 0].tolist()}")

Document findings before proceeding:

Which sheets contain related data?
What formulas reference other sheets?
What determines sort order?
Where are merged cells or complex formatting?
What is the established naming/structure pattern?

Core Editing Workflows

Workflow 1: Updating Cell Values While Preserving Formulas

Use case: Changing data values without breaking formula dependencies.

from openpyxl import load_workbook

# Load with formulas intact
wb = load_workbook('existing.xlsx', data_only=False)
sheet = wb['SheetName']

# CORRECT: Update value, preserve formula context
cell = sheet['A5']
if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
    print(f"WARNING: A5 contains formula: {cell.value}")
    print("Updating this will DELETE the formula. Proceed? (update dependent formulas instead)")
else:
    cell.value = new_value

# For cells with formulas, update the formula itself or its dependencies
sheet['B10'] = '=SUM(A1:A9)'  # Updates formula, preserving calculation logic

wb.save('existing.xlsx')

Critical checks:

Verify you're not overwriting formula cells with values
Check if other cells reference the cell you're updating
Test a few formulas to ensure they still work after the change

Workflow 2: Adding Rows While Maintaining Structure

Use case: Inserting new data into existing tables.

from openpyxl import load_workbook
import pandas as pd

# 1. UNDERSTAND CURRENT STRUCTURE
df = pd.read_excel('existing.xlsx', sheet_name='Data')
# Identify sort key(s)
# Example: sorted by date (column A), then by category (column B)

# 2. DETERMINE INSERTION POINT
# Find where new row belongs in sort order
new_data = {'Date': '2024-06-15', 'Category': 'Sales', 'Amount': 1500}

insertion_row = None
for idx, row in df.iterrows():
    if row['Date'] > new_data['Date']:
        insertion_row = idx + 2  # +2 because: +1 for header, +1 for 1-based indexing
        break
if insertion_row is None:
    insertion_row = len(df) + 2  # Append at end

# 3. INSERT ROW
wb = load_workbook('existing.xlsx')
sheet = wb['Data']
sheet.insert_rows(insertion_row)

# 4. POPULATE WITH CORRECT STRUCTURE
# CRITICAL: Copy formatting from adjacent row
source_row = insertion_row - 1 if insertion_row > 2 else insertion_row + 1
for col in range(1, sheet.max_column + 1):
    # Copy formatting
    source_cell = sheet.cell(source_row, col)
    target_cell = sheet.cell(insertion_row, col)
    if source_cell.has_style:
        target_cell.font = source_cell.font.copy()
        target_cell.border = source_cell.border.copy()
        target_cell.fill = source_cell.fill.copy()
        target_cell.number_format = source_cell.number_format
        target_cell.alignment = source_cell.alignment.copy()

# 5. SET VALUES
sheet.cell(insertion_row, 1, new_data['Date'])
sheet.cell(insertion_row, 2, new_data['Category'])
sheet.cell(insertion_row, 3, new_data['Amount'])

# 6. UPDATE FORMULAS THAT REFERENCE THIS RANGE
# If there are SUM formulas like =SUM(A2:A100), they may need adjustment
# Check all sheets for formulas referencing this sheet
for ws_name in wb.sheetnames:
    ws = wb[ws_name]
    for row in ws.iter_rows():
        for cell in row:
            if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
                if 'Data!' in cell.value:  # References the sheet we modified
                    print(f"REVIEW: {ws_name}!{cell.coordinate} formula may need adjustment: {cell.value}")

wb.save('existing.xlsx')

Critical checks:

New row is in correct sort position
Formatting matches surrounding rows
No blank rows introduced
Formulas referencing the range are still valid

Workflow 3: Adding Columns While Preserving Formulas

Use case: Adding new data columns to existing tables.

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
sheet = wb['Data']

# 1. IDENTIFY INSERTION POINT
# Example: Insert after column C (before column D)
insert_col = 4  # Column D

# 2. INSERT COLUMN
sheet.insert_cols(insert_col)

# 3. COPY FORMATTING FROM ADJACENT COLUMN
source_col = insert_col - 1
for row_idx in range(1, sheet.max_row + 1):
    source_cell = sheet.cell(row_idx, source_col)
    target_cell = sheet.cell(row_idx, insert_col)
    if source_cell.has_style:
        target_cell.font = source_cell.font.copy()
        target_cell.border = source_cell.border.copy()
        target_cell.fill = source_cell.fill.copy()
        target_cell.number_format = source_cell.number_format
        target_cell.alignment = source_cell.alignment.copy()

# 4. SET HEADER
sheet.cell(1, insert_col, 'New Column Header')

# 5. ADD FORMULAS OR DATA
for row_idx in range(2, sheet.max_row + 1):
    # Example: Formula referencing other columns in same row
    sheet.cell(row_idx, insert_col, f'=B{row_idx}*C{row_idx}')

wb.save('existing.xlsx')

Critical checks:

Formulas in other columns still reference correct cells
Cross-sheet references are not broken
Column widths are appropriate

Workflow 4: Multi-Tab Data Updates (CRITICAL WORKFLOW)

Use case: Updating data that appears in multiple sheets (THE most common source of errors).

THE PROBLEM: When the same entity (customer, product, date, etc.) appears across multiple tabs, ALL instances must be updated. Missing even one tab creates data inconsistency.

THE SOLUTION: Systematic multi-tab update workflow.

from openpyxl import load_workbook
import pandas as pd

# STEP 1: IDENTIFY ALL OCCURRENCES
# Find which sheets contain the data to be updated
target_value = "Customer X"  # Example: updating customer name
occurrences = {}

wb = load_workbook('existing.xlsx', data_only=True)  # Read values
for sheet_name in wb.sheetnames:
    df = pd.read_excel('existing.xlsx', sheet_name=sheet_name)
    
    # Search for target value in all cells
    found_locations = []
    for col in df.columns:
        mask = df[col].astype(str).str.contains(target_value, case=False, na=False)
        if mask.any():
            locations = df[mask].index.tolist()
            found_locations.extend([(col, idx+2) for idx in locations])  # +2 for header and 1-based
    
    if found_locations:
        occurrences[sheet_name] = found_locations
        print(f"Found '{target_value}' in {sheet_name}: {len(found_locations)} locations")

if not occurrences:
    print(f"ERROR: '{target_value}' not found in any sheet!")
else:
    print(f"\nTotal sheets containing '{target_value}': {len(occurrences)}")

# STEP 2: UPDATE ALL OCCURRENCES
wb = load_workbook('existing.xlsx', data_only=False)  # Now load with formulas

new_value = "Customer X - Updated"
update_count = 0

for sheet_name, locations in occurrences.items():
    sheet = wb[sheet_name]
    for col_name, row_idx in locations:
        # Convert column name to index
        df = pd.read_excel('existing.xlsx', sheet_name=sheet_name, nrows=0)
        col_idx = list(df.columns).index(col_name) + 1
        
        cell = sheet.cell(row_idx, col_idx)
        
        # Check if it's a formula
        if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
            print(f"WARNING: {sheet_name}!{cell.coordinate} is a formula. Skipping.")
        else:
            cell.value = new_value
            update_count += 1
            print(f"Updated {sheet_name}!{cell.coordinate}")

print(f"\nTotal cells updated: {update_count}")

# STEP 3: VERIFY ALL SHEETS WERE UPDATED
print("\nVerifying updates...")
wb.save('existing_updated.xlsx')
wb_verify = load_workbook('existing_updated.xlsx', data_only=True)

for sheet_name in wb_verify.sheetnames:
    df = pd.read_excel('existing_updated.xlsx', sheet_name=sheet_name)
    old_count = df.astype(str).apply(lambda x: x.str.contains(target_value, case=False, na=False)).sum().sum()
    new_count = df.astype(str).apply(lambda x: x.str.contains(new_value, case=False, na=False)).sum().sum()
    
    if old_count > 0:
        print(f"❌ {sheet_name}: Still has {old_count} instances of old value!")
    else:
        print(f"✅ {sheet_name}: All instances updated (found {new_count} new values)")

Critical checks:

ALL sheets searched, not just obvious ones
Update count matches expected number of occurrences
Verification confirms zero instances of old value remain
No formulas were accidentally overwritten

Workflow 5: Cascading Formula Updates Across Tabs

Use case: Changing a formula that must be propagated to other sheets.

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')

# EXAMPLE: Update formula pattern across all sheets
# Old formula: =SUM(B2:B10)
# New formula: =SUMIF(B2:B10,">0")  # Only sum positive values

for sheet_name in wb.sheetnames:
    sheet = wb[sheet_name]
    
    changes_in_sheet = 0
    for row in sheet.iter_rows():
        for cell in row:
            if cell.value and isinstance(cell.value, str):
                if cell.value.startswith('=SUM(') and not 'SUMIF' in cell.value:
                    # Extract range from SUM formula
                    old_formula = cell.value
                    # Example: =SUM(B2:B10) -> =SUMIF(B2:B10,">0")
                    range_part = old_formula[5:-1]  # Extract "B2:B10"
                    new_formula = f'=SUMIF({range_part},">0")'
                    
                    cell.value = new_formula
                    changes_in_sheet += 1
                    print(f"{sheet_name}!{cell.coordinate}: {old_formula} -> {new_formula}")
    
    if changes_in_sheet > 0:
        print(f"  {sheet_name}: {changes_in_sheet} formulas updated")

wb.save('existing.xlsx')

Critical checks:

Formula pattern correctly identified across all sheets
New formula syntax is valid
All sheets were checked (not just the first few)
Edge cases handled (formulas with different ranges, nested functions)

Workflow 6: Handling Merged Cells

Use case: Working with tables that have merged cells (common in headers/formatting).

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
sheet = wb['Data']

# 1. IDENTIFY MERGED CELLS
merged_ranges = list(sheet.merged_cells.ranges)
print(f"Found {len(merged_ranges)} merged cell ranges:")
for merged_range in merged_ranges:
    print(f"  {merged_range}")

# 2. WHEN INSERTING ROWS/COLUMNS NEAR MERGED CELLS
# Merged cells will NOT expand automatically. Must handle manually.

# Example: Inserting row in middle of merged region
insert_row = 5

# Check if insertion point is in a merged range
for merged_range in merged_ranges:
    if merged_range.min_row <= insert_row <= merged_range.max_row:
        print(f"WARNING: Row {insert_row} is within merged range {merged_range}")
        print("Unmerging, inserting, then re-merging with expanded range")
        
        # Unmerge
        sheet.unmerge_cells(str(merged_range))
        
        # Insert row
        sheet.insert_rows(insert_row)
        
        # Re-merge with expanded range
        new_range = f"{merged_range.min_col}{merged_range.min_row}:{merged_range.max_col}{merged_range.max_row+1}"
        sheet.merge_cells(new_range)

wb.save('existing.xlsx')

Critical checks:

Merged cells are preserved or appropriately expanded
Values in merged cells are maintained
Formatting of merged cells is preserved

Post-Edit Validation Checklist

MANDATORY: Complete this checklist after ANY edits.

1. Formula Integrity Check

# Run recalc.py to identify formula errors
import subprocess
import json

result = subprocess.run(['python', 'recalc.py', 'existing.xlsx'], 
                       capture_output=True, text=True)
recalc_result = json.loads(result.stdout)

if recalc_result['status'] == 'errors_found':
    print(f"❌ ERRORS FOUND: {recalc_result['total_errors']} formula errors")
    for error_type, details in recalc_result['error_summary'].items():
        print(f"\n{error_type}: {details['count']} occurrences")
        for location in details['locations'][:5]:  # Show first 5
            print(f"  - {location}")
else:
    print("✅ All formulas calculated successfully (zero errors)")

Zero formula errors (#REF!, #DIV/0!, #VALUE!, #N/A, #NAME?)
All formulas return expected values
Cross-sheet references still work

2. Structural Integrity Check

from openpyxl import load_workbook
import pandas as pd

wb = load_workbook('existing.xlsx')

# Check for blank rows in data tables
for sheet_name in wb.sheetnames:
    df = pd.read_excel('existing.xlsx', sheet_name=sheet_name)
    
    # Identify blank rows (all NaN)
    blank_rows = df[df.isnull().all(axis=1)]
    if not blank_rows.empty and len(blank_rows) > 0:
        print(f"⚠️  {sheet_name}: Found {len(blank_rows)} blank rows at indices: {blank_rows.index.tolist()}")
    
    # Check for unexpected blank cells in key columns (first column)
    if not df.empty and len(df.columns) > 0:
        blank_cells = df[df.iloc[:, 0].isnull()]
        if not blank_cells.empty and len(blank_cells) > 0:
            print(f"⚠️  {sheet_name}: First column has {len(blank_cells)} blank cells")

No unexpected blank rows in data tables
No blank cells in key identifier columns
Table boundaries are clean (no trailing empty rows/columns)

3. Sort Order Verification

# Verify data maintains expected sort order
df = pd.read_excel('existing.xlsx', sheet_name='Data')

# Example: Should be sorted by Date ascending, then Category ascending
expected_sort_cols = ['Date', 'Category']

# Check if sorted
is_sorted = df[expected_sort_cols].equals(df[expected_sort_cols].sort_values(expected_sort_cols))

if is_sorted:
    print("✅ Data is in correct sort order")
else:
    print("❌ Data is NOT sorted correctly")
    print("First 10 rows of sort columns:")
    print(df[expected_sort_cols].head(10))

Data maintains expected sort order
New rows inserted at correct position
Sort keys (if any) are consistent

4. Multi-Tab Consistency Check

# Verify related data is consistent across tabs

# Example: Check that customer names are consistent across Summary and Detail sheets
df_summary = pd.read_excel('existing.xlsx', sheet_name='Summary')
df_detail = pd.read_excel('existing.xlsx', sheet_name='Detail')

customers_summary = set(df_summary['Customer'].dropna().unique())
customers_detail = set(df_detail['Customer'].dropna().unique())

in_summary_not_detail = customers_summary - customers_detail
in_detail_not_summary = customers_detail - customers_summary

if in_summary_not_detail:
    print(f"⚠️  Customers in Summary but not Detail: {in_summary_not_detail}")
if in_detail_not_summary:
    print(f"⚠️  Customers in Detail but not Summary: {in_detail_not_summary}")

if not in_summary_not_detail and not in_detail_not_summary:
    print("✅ Customer lists are consistent across tabs")

Related data exists in ALL expected sheets
Counts/totals are consistent across sheets
No orphaned data (exists in one sheet but not related sheets)

5. Formatting Preservation Check

# Visual inspection is often needed for formatting
# But can programmatically check some basics

from openpyxl import load_workbook

wb = load_workbook('existing.xlsx')
sheet = wb['Data']

# Check that new rows have formatting
for row_idx in [5, 10, 15]:  # Example rows you inserted
    cell = sheet.cell(row_idx, 1)
    has_font = cell.font is not None and cell.font != sheet.cell(1, 1).font
    has_fill = cell.fill is not None and cell.fill.start_color.index != '00000000'
    
    print(f"Row {row_idx}: Font={has_font}, Fill={has_fill}")

New rows/columns match formatting of surrounding cells
Borders are continuous (no breaks in table borders)
Merged cells are intact
Conditional formatting still applies correctly

Common Pitfalls and How to Avoid Them

Pitfall 1: Incomplete Multi-Tab Updates

Problem: Updating data in Sheet1 but forgetting it also appears in Sheet2 and Sheet3.

Solution: Always search ALL sheets before updating. Use the Multi-Tab Data Updates workflow.

Prevention:

# Before ANY update, run this search
search_term = "value_to_update"
for sheet_name in wb.sheetnames:
    df = pd.read_excel('file.xlsx', sheet_name=sheet_name)
    found = df.astype(str).apply(lambda x: x.str.contains(search_term, case=False, na=False)).any().any()
    if found:
        print(f"'{search_term}' found in {sheet_name}")

Pitfall 2: Breaking Formulas When Inserting Rows

Problem: Inserting a row causes formulas like =SUM(A2:A10) to break or not include the new row.

Solution: After inserting rows, check if range-based formulas need adjustment.

Prevention:

Insert rows WITHIN existing ranges (not before or after)
Use entire column references when possible: =SUM(A:A) instead of =SUM(A2:A10)
After insertion, verify formula ranges programmatically

Pitfall 3: Inserting Data Out of Sort Order

Problem: New row appears at the end instead of in alphabetical/chronological order.

Solution: Always determine the correct insertion point based on sort keys BEFORE inserting.

Prevention:

# Determine insertion point based on sort logic
insertion_row = None
for idx, row in df.iterrows():
    if should_come_before(new_data, row):  # Define your comparison logic
        insertion_row = idx + 2
        break

Pitfall 4: Introducing Blank Rows

Problem: Editing operations create unexpected blank rows in tables.

Solution: Never use append() for mid-table insertions. Always use insert_rows() with calculated position.

Prevention: Always run blank row check in post-edit validation.

Pitfall 5: Overwriting Formulas with Values

Problem: Updating a cell that contains a formula, replacing it with a hardcoded value.

Solution: Always check if cell contains a formula before updating.

Prevention:

if cell.value and isinstance(cell.value, str) and cell.value.startswith('='):
    raise ValueError(f"{cell.coordinate} contains formula. Cannot overwrite with value.")

Pitfall 6: Not Recalculating After Formula Changes

Problem: Formulas are updated but values shown are stale (from before the edit).

Solution: ALWAYS run recalc.py after any formula modifications.

Prevention: Make recalc.py part of your standard workflow. Never skip it.

Pitfall 7: Losing Formatting on New Rows/Columns

Problem: New rows have default formatting instead of matching the table style.

Solution: Always copy formatting from adjacent rows/columns.

Prevention: Use the formatting copy code from Workflow 2 and 3 above.

Summary: Editing Workflow Template

For any editing task, follow this sequence:

Analyze (Pre-Edit Analysis Workflow)
- Map all sheets
- Identify dependencies
- Document structure
Plan
- Determine which sheets need changes
- Identify formula impacts
- Calculate insertion points
Execute
- Use appropriate workflow
- Update ALL relevant sheets
- Preserve formatting
Validate (Post-Edit Validation Checklist)
- Check formulas (zero errors)
- Verify structure (no blank rows)
- Confirm sort order
- Ensure multi-tab consistency
- Validate formatting
Recalculate
- Run recalc.py
- Fix any errors found
- Run recalc.py again until zero errors

Never skip steps. Each step catches different classes of errors.

xlsx-editor

Install Skill

SKILL.md

Excel Editor - Editing Existing Files

Overview

When to Use This Skill

Critical Principles for Editing Existing Files

1. Always Analyze Before Editing

2. Preserve, Don't Replace

3. Multi-Tab Completeness is Mandatory

4. Validate Exhaustively

Pre-Edit Analysis Workflow

Core Editing Workflows

Workflow 1: Updating Cell Values While Preserving Formulas

Workflow 2: Adding Rows While Maintaining Structure

Workflow 3: Adding Columns While Preserving Formulas

Workflow 4: Multi-Tab Data Updates (CRITICAL WORKFLOW)

Workflow 5: Cascading Formula Updates Across Tabs

Workflow 6: Handling Merged Cells

Post-Edit Validation Checklist

1. Formula Integrity Check

2. Structural Integrity Check

3. Sort Order Verification

4. Multi-Tab Consistency Check

5. Formatting Preservation Check

Common Pitfalls and How to Avoid Them

Pitfall 1: Incomplete Multi-Tab Updates

Pitfall 2: Breaking Formulas When Inserting Rows

Pitfall 3: Inserting Data Out of Sort Order

Pitfall 4: Introducing Blank Rows

Pitfall 5: Overwriting Formulas with Values

Pitfall 6: Not Recalculating After Formula Changes

Pitfall 7: Losing Formatting on New Rows/Columns

Summary: Editing Workflow Template