| name | Node Tuning Helper Scripts |
| description | Generate tuned manifests and evaluate node tuning snapshots |
Node Tuning Helper Scripts
Detailed instructions for invoking the helper utilities that back /node-tuning commands:
generate_tuned_profile.pyrenders Tuned manifests (tuned.openshift.io/v1).analyze_node_tuning.pyinspects live nodes or sosreports for tuning gaps.
When to Use These Scripts
- Translate structured command inputs into Tuned manifests for the Node Tuning Operator.
- Iterate on generated YAML outside the assistant or integrate the generator into automation.
- Analyze CPU isolation, IRQ affinity, huge pages, sysctl values, and networking counters from live clusters or archived sosreports.
Prerequisites
- Python 3.8 or newer (
python3 --version). - Repository checkout so the scripts under
plugins/node-tuning/skills/scripts/are accessible. - Optional:
ocCLI when validating or applying manifests. - Optional: Extracted sosreport directory when running the analysis script offline.
- Optional (remote analysis):
ocCLI access plus a validKUBECONFIGwhen capturing/proc//sysor sosreport viaoc debug node/<name>. The sosreport workflow pulls theregistry.redhat.io/rhel9/support-toolsimage (override with--toolbox-imageorTOOLBOX_IMAGE) and requires registry access. HTTP(S) proxy env vars from the host are forwarded automatically when present, but using a proxy is optional.
Script: generate_tuned_profile.py
Implementation Steps
Collect Inputs
--profile-name: Tuned resource name.--summary:[main]section summary.- Repeatable options:
--include,--main-option,--variable,--sysctl,--section(SECTION:KEY=VALUE). - Target selectors:
--machine-config-label key=value,--match-label key[=value]. - Optional:
--priority(default 20),--namespace,--output,--dry-run. - Use
--list-nodes/--node-selectorto inspect nodes and--label-node NODE:KEY[=VALUE](plus--overwrite-labels) to tag machines.
Inspect or Label Nodes (optional)
# List all worker nodes python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py --list-nodes --node-selector "node-role.kubernetes.io/worker" --skip-manifest # Label a specific node for the worker-hp pool python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \ --label-node ip-10-0-1-23.ec2.internal:node-role.kubernetes.io/worker-hp= \ --overwrite-labels \ --skip-manifestRender the Manifest
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \ --profile-name "$PROFILE" \ --summary "$SUMMARY" \ --sysctl net.core.netdev_max_backlog=16384 \ --match-label tuned.openshift.io/custom-net \ --output .work/node-tuning/$PROFILE/tuned.yaml- Omit
--outputto write<profile-name>.yamlin the current directory. - Add
--dry-runto print the manifest to stdout.
- Omit
Review Output
- Inspect the generated YAML for accuracy.
- Optionally format with
yqor open in an editor for readability.
Validate and Apply
- Dry-run:
oc apply --server-dry-run=client -f <manifest>. - Apply:
oc apply -f <manifest>.
- Dry-run:
Error Handling
- Missing required options raise
ValueErrorwith descriptive messages. - The script exits non-zero when no target selectors (
--machine-config-labelor--match-label) are supplied. - Invalid key/value or section inputs identify the failing argument explicitly.
Examples
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
--profile-name realtime-worker \
--summary "Realtime tuned profile" \
--include openshift-node --include realtime \
--variable isolated_cores=1 \
--section bootloader:cmdline_ocp_realtime=+systemd.cpu_affinity=${not_isolated_cores_expanded} \
--machine-config-label machineconfiguration.openshift.io/role=worker-rt \
--priority 25 \
--output .work/node-tuning/realtime-worker/tuned.yaml
python3 plugins/node-tuning/skills/scripts/generate_tuned_profile.py \
--profile-name openshift-node-hugepages \
--summary "Boot time configuration for hugepages" \
--include openshift-node \
--section bootloader:cmdline_openshift_node_hugepages="hugepagesz=2M hugepages=50" \
--machine-config-label machineconfiguration.openshift.io/role=worker-hp \
--priority 30 \
--output .work/node-tuning/openshift-node-hugepages/hugepages-tuned-boottime.yaml
Script: analyze_node_tuning.py
Purpose
Inspect either a live node (/proc, /sys) or an extracted sosreport snapshot for tuning signals (CPU isolation, IRQ affinity, huge pages, sysctl state, networking counters) and emit actionable recommendations.
Usage Patterns
- Live node analysis
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py --format markdown - Remote analysis via oc debug
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --node worker-rt-0 \ --kubeconfig ~/.kube/prod \ --format markdown - Collect sosreport via oc debug and analyze locally
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --node worker-rt-0 \ --toolbox-image registry.example.com/support-tools:latest \ --sosreport-arg "--case-id=01234567" \ --sosreport-output .work/node-tuning/sosreports \ --format json - Offline sosreport analysis
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --sosreport /path/to/sosreport-2025-10-20 - Automation-friendly JSON
python3 plugins/node-tuning/skills/scripts/analyze_node_tuning.py \ --sosreport /path/to/sosreport \ --format json --output .work/node-tuning/node-analysis.json
Implementation Steps
- Select data source
- Provide
--node <name>(with optional--kubeconfig/--oc-binary). By default the helper runssosreportremotely from inside the RHCOS toolbox container (registry.redhat.io/rhel9/support-tools). Override the image with--toolbox-image, extend the sosreport command with--sosreport-arg, or disable the curated OpenShift flags via--skip-default-sosreport-flags. Pass--no-collect-sosreportto fall back to the direct/procsnapshot mode. - Provide
--sosreport <dir>for archived diagnostics; detection finds embeddedproc/andsys/. - Omit both switches to query the live filesystem (defaults to
/procand/sys). - Override paths with
--proc-rootor--sys-rootwhen the layout differs.
- Provide
- Run analysis
- The script parses
cpuinfo, kernel cmdline parameters (isolcpus,nohz_full,tuned.non_isolcpus), default IRQ affinities, huge page counters, sysctl values (net, vm, kernel), transparent hugepage settings,netstat/sockstatcounters, andpssnapshots (when available in sosreport).
- The script parses
- Review the report
- Markdown output groups findings by section (System Overview, CPU & Isolation, Huge Pages, Sysctl Highlights, Network Signals, IRQ Affinity, Process Snapshot) and lists recommendations.
- JSON output contains the same information in structured form for pipelines or dashboards.
- Act on recommendations
- Apply Tuned profiles, MachineConfig updates, or manual sysctl/irqbalance adjustments.
- Feed actionable items back into
/node-tuning:generate-tuned-profileto codify desired state.
Error Handling
- Missing
proc/orsys/directories trigger descriptive errors. - Unreadable files are skipped gracefully and noted in observations where relevant.
- Non-numeric sysctl values are flagged for manual investigation.
Example Output (Markdown excerpt)
# Node Tuning Analysis
## System Overview
- Hostname: worker-rt-1
- Kernel: 4.18.0-477.el8
- NUMA nodes: 2
- Kernel cmdline: `BOOT_IMAGE=... isolcpus=2-15 tuned.non_isolcpus=0-1`
## CPU & Isolation
- Logical CPUs: 32
- Physical cores: 16 across 2 socket(s)
- SMT detected: yes
- Isolated CPUs: 2-15
...
## Recommended Actions
- Configure net.core.netdev_max_backlog (>=32768) to accommodate bursty NIC traffic.
- Transparent Hugepages are not disabled (`[never]` not selected). Consider setting to `never` for latency-sensitive workloads.
- 4 IRQs overlap isolated CPUs. Relocate interrupt affinities using tuned profiles or irqbalance.
Follow-up Automation Ideas
- Persist JSON results in
.work/node-tuning/<host>/analysis.jsonfor historical tracing. - Gate upgrades by comparing recommendations across nodes.
- Integrate with CI jobs that validate cluster tuning post-change.