| name | LVMS Analyzer |
| description | Analyzes LVMS must-gather data to diagnose storage issues |
LVMS Analyzer Skill
This skill provides detailed guidance for analyzing LVMS (Logical Volume Manager Storage) must-gather data to identify and troubleshoot storage issues.
When to Use This Skill
Use this skill when:
- Analyzing LVMS must-gather data offline
- Diagnosing PVCs stuck in Pending state
- Investigating LVMCluster readiness issues
- Troubleshooting volume group creation failures
- Debugging TopoLVM CSI driver problems
- Checking operator health in LVMS namespace
This skill is automatically invoked by the /lvms:analyze command when working with must-gather data.
Prerequisites
Required:
- LVMS must-gather directory extracted and accessible
- Must-gather contains LVMS namespace directory:
namespaces/openshift-lvm-storage/(newer versions)- OR
namespaces/openshift-storage/(older versions)
- Python 3.6 or higher installed
- PyYAML library:
pip install pyyaml
Namespace Compatibility:
- LVMS namespace changed from
openshift-storagetoopenshift-lvm-storagein recent versions - The analysis script automatically detects which namespace is present
- Both namespaces are fully supported for backward compatibility
Must-Gather Structure:
must-gather/
└── registry-{image-registry}-lvms-must-gather-{version}-sha256-{hash}/
├── cluster-scoped-resources/
│ ├── core/
│ │ └── persistentvolumes/
│ │ └── pvc-*.yaml # Individual PV files
│ ├── storage.k8s.io/
│ │ └── storageclasses/
│ │ ├── lvms-vg1.yaml
│ │ └── lvms-vg1-immediate.yaml
│ └── security.openshift.io/
│ └── securitycontextconstraints/
│ └── lvms-vgmanager.yaml
├── namespaces/
│ └── openshift-lvm-storage/ # or openshift-storage for older versions
│ ├── oc_output/ # IMPORTANT: Primary location for LVMS resources
│ │ ├── lvmcluster.yaml # Full LVMCluster resource with status
│ │ ├── lvmcluster # Text output (oc describe)
│ │ ├── lvmvolumegroup # Text output
│ │ ├── lvmvolumegroupnodestatus # Text output
│ │ ├── logicalvolume # Text output
│ │ ├── pods # Text output (oc get pods)
│ │ └── events # Text output
│ ├── pods/
│ │ ├── lvms-operator-{hash}/
│ │ │ └── lvms-operator-{hash}.yaml
│ │ └── vg-manager-{hash}/
│ │ └── vg-manager-{hash}.yaml
│ └── apps/ # May contain deployments/daemonsets
└── ...
Key Note: LVMS resources are primarily in the oc_output/ directory, with lvmcluster.yaml being the most important file containing full cluster and node status.
Implementation Steps
Step 1: Validate Must-Gather Path
Before running analysis, verify the must-gather directory structure:
# Check if LVMS namespace directory exists (try both namespaces)
ls {must-gather-path}/namespaces/openshift-lvm-storage 2>/dev/null || \
ls {must-gather-path}/namespaces/openshift-storage
# Verify required resource directories
ls {must-gather-path}/cluster-scoped-resources/core/persistentvolumes
Namespace Detection: The analysis script automatically detects which namespace is present:
- Newer LVMS versions use
openshift-lvm-storage - Older LVMS versions use
openshift-storage - The script will inform you which namespace was detected
Common Issue: User provides parent directory instead of subdirectory
- Must-gather extracts to a directory like
must-gather.local.12345/ - Inside is a subdirectory like
registry-ci-openshift-org-origin-4-18.../ - Always use the subdirectory (the one with cluster-scoped-resources/ and namespaces/)
Handling:
# If user provides parent directory, try to find the correct subdirectory
if [ ! -d "{path}/namespaces/openshift-lvm-storage" ] && \
[ ! -d "{path}/namespaces/openshift-storage" ]; then
# Try to find either namespace
find {path} -type d \( -name "openshift-lvm-storage" -o -name "openshift-storage" \) -path "*/namespaces/*"
# Suggest the correct path to user
fi
Step 2: Run Analysis Script
Use the Python analysis script for structured analysis:
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path}
Script Location:
- Always use:
plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py - Use relative path from repository root
- Script is part of the LVMS plugin
Component-Specific Analysis:
For focused analysis on specific components:
# Analyze only storage/PVC issues
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component storage
# Analyze only operator health
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component operator
# Analyze only volume groups
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component volumes
# Analyze only pod logs
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
{must-gather-path} --component logs
Step 3: Interpret Analysis Results
The script provides structured output across several sections:
1. LVMCluster Status
Key fields to check:
state: Should be "Ready"ready: Should be trueconditions: All should have status "True"- ResourcesAvailable: Resources deployed successfully
- VolumeGroupsReady: VGs created on all nodes
Example healthy output:
LVMCluster: lvmcluster-sample
✓ State: Ready
✓ Ready: true
Conditions:
✓ ResourcesAvailable: True
✓ VolumeGroupsReady: True
Example unhealthy output (real case from must-gather):
LVMCluster: my-lvmcluster
❌ State: Degraded
❌ Ready: false
Conditions:
✓ ResourcesAvailable: True
Reason: ResourcesAvailable
Message: Reconciliation is complete and all the resources are available
❌ VolumeGroupsReady: False
Reason: VGsDegraded
Message: One or more VGs are degraded
2. Volume Group Status
Checks volume group creation per node and device availability:
Example output (real case from must-gather):
Volume Group/Device Class: vg1
Nodes: 3
Node: ocpnode1.ocpiopex.growipx.com
⚠ Status: Progressing
Devices: /dev/mapper/3600a098038315048302b586c38397562, /dev/mapper/mpatha
Excluded devices: 24 device(s)
- /dev/sdb: /dev/sdb has children block devices and could not be considered
- /dev/sdb4: /dev/sdb4 has an invalid filesystem signature (xfs) and cannot be used
- /dev/mapper/3600a098038315047433f586c53477272: has an invalid filesystem signature (xfs)
... and 21 more excluded devices
Node: ocpnode2.ocpiopex.growipx.com
❌ Status: Degraded
Reason:
failed to create/extend volume group vg1: failed to extend volume group vg1:
WARNING: VG name vg0 is used by VGs VVnkhP-khYQ-blyc-2TNo-d3cv-b6di-4RbSyY and EUV3xv-ft6q-39xK-J3ki-rglf-9H44-rVIHIq.
Fix duplicate VG names with vgrename uuid, a device filter, or system IDs.
Physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' is already in volume group 'vg0'
Unable to add physical volume '/dev/mapper/3600a098038315048302b586c38397578p3' to volume group 'vg0'
... (truncated, see LVMCluster status for full details)
Devices: /dev/mapper/mpatha
This real example shows a common LVMS issue: duplicate volume group names preventing VG extension.
3. Storage (PVC/PV) Status
Lists pending or failed PVCs:
Example output:
Pending PVCs:
database/postgres-data
❌ Status: Pending (10m)
Storage Class: lvms-vg1
Requested: 100Gi
Recent Events:
⚠ ProvisioningFailed: no node has enough free space
4. Operator Health
Checks LVMS operator pods, deployments, and daemonsets:
Example issues:
❌ vg-manager-abc123 (worker-0)
Status: CrashLoopBackOff
Restarts: 15
Error: volume group "vg1" not found
5. Pod Logs
Extracts and analyzes error/warning messages from pod logs:
Example output (from real must-gather):
═══════════════════════════════════════════════════════════
POD LOGS ANALYSIS
═══════════════════════════════════════════════════════════
Pod: vg-manager-nz4pc
Unique errors/warnings: 1
❌ 2025-10-28T10:47:28Z: Reconciler error
Controller: lvmvolumegroup
Error Details:
failed to create/extend volume group vg1: failed to extend volume group vg1:
WARNING: VG name vg0 is used by VGs WsNJwk-DK3q-tSHg-zvQJ-imF1-SdRv-8oh4e0 ...
Cannot use /dev/dm-10: device is too small (pv_min_size)
Command requires all devices to be found.
Pod: lvms-operator-65df9f4dbb-92jwl
Unique errors/warnings: 1
❌ 2025-10-28T10:52:48Z: failed to validate device class setup
Controller: lvmcluster
Error: VG vg1 on node Degraded is not in ready state (ocpnode1.ocpiopex.growipx.com)
Key Points:
- Logs are parsed from JSON format
- Errors are deduplicated (same error repeated in reconciliation loops)
- Shows unique error messages with first occurrence timestamp
- Provides additional context not visible in resource status
Step 4: Analyze Root Causes
Connect related issues to identify root causes:
Common Pattern 1: Device Filesystem Conflict
Chain of failures:
1. Device /dev/sdb has existing ext4 filesystem
2. vg-manager cannot create volume group
3. Volume group missing on node
4. PVCs stuck in Pending
Root cause: Device not properly wiped before LVMS use
Common Pattern 2: Insufficient Capacity
Chain of failures:
1. Thin pool at 95% capacity
2. No free space for new volumes
3. PVCs stuck in Pending
Root cause: Insufficient storage capacity or old volumes not cleaned up
Common Pattern 3: Node-Specific Failures
Chain of failures:
1. Volume group missing on specific node
2. TopoLVM CSI driver not functional on that node
3. PVCs with node affinity to that node stuck Pending
Root cause: Node-specific device configuration issue
Step 5: Generate Remediation Plan
Based on analysis results, provide prioritized recommendations:
CRITICAL Issues (Fix Immediately):
Device Conflicts:
# Clean device on affected node oc debug node/{node-name} chroot /host wipefs -a /dev/{device} # Restart vg-manager to recreate VG oc delete pod -n openshift-lvm-storage -l app.kubernetes.io/component=vg-managerPod Crashes:
# After fixing underlying issue, restart failed pods oc delete pod -n openshift-lvm-storage {pod-name}LVMCluster Not Ready:
# Review and fix device configuration oc edit lvmcluster -n openshift-lvm-storage # Ensure devices match actual available devices
WARNING Issues (Address Soon):
Capacity Issues:
# Check logical volume usage oc debug node/{node} -- chroot /host lvs --units g # Remove unused volumes or expand thin poolPartial Node Coverage:
# Investigate why daemonsets not on all nodes oc get nodes --show-labels oc describe daemonset -n openshift-lvm-storage
Step 6: Provide Next Steps
Always provide clear next steps:
Review logs (if available in must-gather):
- Operator logs:
namespaces/openshift-lvm-storage/pods/lvms-operator-*/logs/ - VG-manager logs:
namespaces/openshift-lvm-storage/pods/vg-manager-*/logs/ - TopoLVM logs:
namespaces/openshift-lvm-storage/pods/topolvm-*/logs/
- Operator logs:
Verify fixes (if cluster is accessible):
# After implementing fixes, verify: oc get lvmcluster -n openshift-lvm-storage oc get lvmvolumegroup -A oc get pvc -A | grep PendingRe-collect must-gather (if making changes):
oc adm must-gather --image=quay.io/lvms_dev/lvms-must-gather:latest
Error Handling
Script Execution Errors
Script not found:
# Verify script exists
ls plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
# Ensure it's executable
chmod +x plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py
Python dependencies missing:
# Install PyYAML
pip install pyyaml
# Or use pip3
pip3 install pyyaml
Invalid YAML in must-gather:
- Script handles YAML parsing errors gracefully
- Reports which files failed to parse
- Continues analysis with available data
Must-Gather Issues
Missing directories:
- Script validates required directories exist
- Reports missing components
- Provides guidance on what's missing
Incomplete must-gather:
- If critical resources missing, script reports what it can analyze
- Suggests re-collecting must-gather
Examples
Example 1: Full Analysis
# Run comprehensive analysis
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/registry-ci-openshift-org-origin-4-18.../
Output:
═══════════════════════════════════════════════════════════
LVMCLUSTER STATUS
═══════════════════════════════════════════════════════════
LVMCluster: lvmcluster-sample
❌ State: Failed
❌ Ready: false
...
═══════════════════════════════════════════════════════════
LVMS ANALYSIS SUMMARY
═══════════════════════════════════════════════════════════
❌ CRITICAL ISSUES: 3
- LVMCluster not Ready (state: Failed)
- Volume group vg1 not created on worker-0
- 3 PVCs stuck in Pending state
Example 2: Storage-Only Analysis
# Focus on PVC issues
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/... --component storage
Analyzes only:
- PVC/PV status
- Storage class configuration
- Volume provisioning issues
Example 3: Operator Health Check
# Check operator components
python3 plugins/lvms/skills/lvms-analyzer/scripts/analyze_lvms.py \
./must-gather/... --component operator
Analyzes only:
- LVMCluster resource
- Deployments and daemonsets
- Pod status and crashes
Best Practices
Always validate path first:
- Check for
namespaces/openshift-lvm-storage/directory - Use the correct subdirectory, not parent
- Check for
Run full analysis first:
- Get overall health picture
- Then drill down with component-specific analysis if needed
Correlate issues:
- Look for patterns across components
- Connect pod failures to VG issues to PVC problems
Check timestamps:
- Events and pod restarts have timestamps
- Helps understand sequence of failures
Provide actionable output:
- Don't just list issues
- Explain root causes
- Give specific remediation steps
- Include verification commands
Reference documentation:
- Link to LVMS troubleshooting guide
- Point to relevant sections in must-gather logs