| name | docx |
| description | Comprehensive document creation, editing, and analysis with support for tracked changes, comments, formatting preservation, and text extraction. When Claude needs to work with professional documents (.docx files) for (1) Creating new documents, (2) Modifying or editing content, (3) Working with tracked changes, (4) Adding comments, or any other document tasks. |
DOCX Processing
Overview
Work with Microsoft Word documents (.docx files) for creation, editing, analysis, and conversion.
Reading/Analyzing Documents
Text Extraction
Use pandoc for simple text extraction:
pandoc document.docx -t plain -o output.txt
Raw XML Access
Unpack for direct access to comments, formatting, and metadata:
unzip document.docx -d document_unpacked/
Creating New Documents
Use JavaScript/TypeScript with the docx library:
import { Document, Paragraph, TextRun, Packer } from 'docx';
const doc = new Document({
sections: [{
properties: {},
children: [
new Paragraph({
children: [
new TextRun("Hello World"),
],
}),
],
}],
});
// Export
const buffer = await Packer.toBuffer(doc);
Editing Existing Documents
Workflow
- Unpack the DOCX file
- Modify XML content directly
- Repack the document
Python Approach
from docx import Document
doc = Document('input.docx')
for para in doc.paragraphs:
if 'old text' in para.text:
para.text = para.text.replace('old text', 'new text')
doc.save('output.docx')
Redlining Workflow (Tracked Changes)
- Convert to markdown first
- Identify changes in logical batches (3-10 per group)
- Unpack the document
- Implement changes using precise XML edits
- Only mark text that actually changes
- Verify comprehensively
Document Conversion
DOCX to PDF
libreoffice --headless --convert-to pdf document.docx
PDF to Images
pdftoppm -jpeg -r 150 document.pdf output
Key Principles
- Read referenced documentation files completely without range limits
- Maintain minimal, precise edits when working with tracked changes
- Preserve original formatting when possible