| name | parsing-helper |
| description | Customize and test Grok parsing for USP, Cloud Sensor, and External adapters. Helps generate parsing rules from sample logs, validate against test data, and deploy configurations. Use when setting up new log sources, troubleshooting parsing issues, or modifying field extraction for adapters. |
| allowed-tools | Task, AskUserQuestion, Skill, Read, Bash |
Parsing Helper
IMPORTANT: Never call
mcp__plugin_lc-essentials_limacharlie__lc_call_tooldirectly. Always use the Task tool withsubagent_type="lc-essentials:limacharlie-api-executor".
A guided workflow for creating, testing, and deploying Grok parsing configurations for LimaCharlie adapters. This skill helps you customize how log data is parsed and normalized as it's ingested into LimaCharlie.
⚠️ TIMEZONE NOTE: Patterns like
SYSLOGTIMESTAMP(Dec 16 17:50:04) lack timezone info. LimaCharlie assumes UTC. This skill automatically detects timezone mismatches by comparing parsed times to current UTC.
When to Use
Use this skill when:
- Setting up new log parsing: Configure Grok patterns for a new log source
- Troubleshooting parsing issues: Logs aren't being parsed correctly or fields are missing
- Modifying field extraction: Need to add, remove, or change extracted fields
- Testing parsing changes: Validate new parsing rules before deploying to production
- Generating adapter configurations: Create YAML/CLI config for local USP adapters
Common scenarios:
- "Help me parse these firewall logs"
- "The timestamp isn't being extracted correctly from my syslog"
- "I need to add a new field to my adapter parsing"
- "Generate a Grok pattern for these log samples"
- "Test this parsing configuration before I deploy it"
What This Skill Does
This skill provides a 5-phase workflow:
- Organization Selection - Select the LimaCharlie organization
- Adapter Discovery - Identify the adapter type and get current configuration
- Sample Log Collection - Get sample logs from the adapter's sensor or user input
- Grok Pattern Generation - Analyze logs and generate appropriate Grok patterns
- Validation & Deployment - Test parsing and apply configuration
Supported Adapter Types
| Type | Description | Configuration Method |
|---|---|---|
| External Adapter | Cloud-managed adapters (syslog, webhook, API) | Update via set_external_adapter API |
| Cloud Sensor | Virtual sensors for cloud platforms (AWS, Azure, GCP, SaaS) | Update via set_cloud_sensor API |
| One-off/USP Adapter | On-prem adapters with local configuration | Output YAML/CLI config for user |
Required Information
Before starting, gather:
- Organization: The LimaCharlie org where the adapter is configured
- Adapter Type: External Adapter, Cloud Sensor, or One-off/USP Adapter
- Adapter Name: For cloud-managed adapters, the name of the existing adapter
- Sample Logs: Either from the adapter's sensor or user-provided samples
How to Use
Phase 1: Organization Selection
Get the list of available organizations:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: list_user_orgs
- Parameters: {}
- Return: RAW"
)
Use AskUserQuestion to let the user select an organization if multiple are available.
Phase 2: Adapter Discovery & Type Selection
Ask the user which adapter type they're working with using AskUserQuestion:
Option A: External Adapter (cloud-managed syslog, webhook, API)
- List available external adapters:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: list_external_adapters
- Parameters: {\"oid\": \"<SELECTED_ORG_ID>\"}
- Return: RAW"
)
- Get the selected adapter's current configuration:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: get_external_adapter
- Parameters: {\"oid\": \"<SELECTED_ORG_ID>\", \"name\": \"<ADAPTER_NAME>\"}
- Return: RAW"
)
Option B: Cloud Sensor (AWS, Azure, GCP, SaaS)
- List available cloud sensors:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: list_cloud_sensors
- Parameters: {\"oid\": \"<SELECTED_ORG_ID>\"}
- Return: RAW"
)
- Get the selected cloud sensor's current configuration:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: get_cloud_sensor
- Parameters: {\"oid\": \"<SELECTED_ORG_ID>\", \"name\": \"<SENSOR_NAME>\"}
- Return: RAW"
)
Option C: One-off/USP Adapter (local adapter, no cloud config)
No API calls needed. Proceed directly to sample log collection.
Phase 3: Get Sample Log Data
Option A: Query from adapter's sensor (for cloud-managed adapters)
If the adapter is already ingesting data, you can fetch sample events:
- Find the sensor by IID (Installation ID from the adapter's installation key):
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: list_sensors
- Parameters: {\"oid\": \"<SELECTED_ORG_ID>\", \"selector\": \"iid == `<IID>`\"}
- Return: RAW"
)
- Get recent events from the sensor:
IMPORTANT: Always calculate timestamps dynamically using Bash FIRST:
# Run this to get current epoch timestamps for the last hour
start=$(date -d '1 hour ago' +%s) && end=$(date +%s) && echo "start=$start end=$end"
Then use the actual output values (e.g., start=1764805928 end=1764809528) directly in the API call - do NOT use placeholder values:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: get_historic_events
- Parameters: {\"oid\": \"<SELECTED_ORG_ID>\", \"sid\": \"<SENSOR_ID>\", \"start\": 1764805928, \"end\": 1764809528, \"limit\": 10}
- Return: RAW"
)
- Extract the raw log content from the events for analysis.
Option B: User provides sample logs
If no data is available or for one-off adapters, ask the user to paste sample log lines.
Use AskUserQuestion or direct text input to collect sample logs.
Recognizing Unparsed Events
When fetching events from a sensor, unparsed logs appear as:
event_type: "unknown_event"in the routing- Event payload contains only a
textfield with the raw log line
Example of an unparsed event:
{
"event": {
"text": "2025-12-03 16:41:19 status installed shared-mime-info:amd64 2.2-1"
},
"routing": {
"event_type": "unknown_event",
"hostname": "my-adapter"
}
}
This indicates no Grok parsing is configured for the adapter. The raw content in the text field is what you need to analyze and create a parsing pattern for.
Phase 4: Generate Grok Pattern
Analyze the sample logs and generate an appropriate Grok pattern.
Common Grok Patterns:
| Pattern | Description | Example Match |
|---|---|---|
%{TIMESTAMP_ISO8601:timestamp} |
ISO 8601 timestamp (YYYY-MM-DD) | 2024-01-15T12:30:45Z or 2024-01-15 12:30:45 |
%{DATESTAMP:timestamp} |
US/EU date format (MM-DD-YY) | 01-15-24 12:30:45 |
%{SYSLOGTIMESTAMP:date} |
Syslog timestamp | Jan 15 12:30:45 |
%{IP:ip_address} |
IPv4/IPv6 address | 192.168.1.100 |
%{HOSTNAME:host} |
Hostname | server01.example.com |
%{NUMBER:value} |
Numeric value | 12345 |
%{INT:count} |
Integer | 42 |
%{WORD:action} |
Single word | ACCEPT |
%{DATA:field} |
Non-greedy match (up to delimiter) | any text |
%{GREEDYDATA:message} |
Greedy match (rest of line) | everything else |
%{LOGLEVEL:level} |
Log level | ERROR, WARN, INFO |
%{POSINT:port} |
Positive integer | 443 |
Timestamp Pattern Warning: Do NOT use
%{DATESTAMP}forYYYY-MM-DDformat logs - it will misparse the year (treating2025-12-03as day=25, month=12, year=03). Use%{TIMESTAMP_ISO8601}instead for anyYYYY-MM-DDformatted timestamps.
IMPORTANT - Grok Field Name for Text Platform:
For text platform adapters, always use message as the key in parsing_grok:
parsing_grok:
message: '%{PATTERN:field} ...' # Always use "message" as the key, NOT "text"
Building the Mapping Configuration:
client_options:
mapping:
parsing_grok:
message: '%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\[%{INT:pid}\]: %{GREEDYDATA:message}'
event_type_path: "service"
event_time_path: "date"
sensor_hostname_path: "host"
Phase 4.5: Automatic Timestamp Timezone Check
REQUIRED: After validation, MUST perform this sanity check for patterns without explicit timezone.
Applies when Grok pattern uses:
%{SYSLOGTIMESTAMP}- e.g.,Dec 16 17:50:04%{DATESTAMP}- e.g.,12-16-24 17:50:04%{TIMESTAMP_ISO8601}withoutZor offset - e.g.,2024-12-16 17:50:04
Automatic Detection Steps:
After
validate_usp_mappingreturns, extract theevent_timefrom results (epoch milliseconds)Get current UTC epoch:
date -u +%s
- Calculate the offset:
# Example: event_time=1734357004000 (milliseconds)
# current_utc=$(date -u +%s)
# offset_hours=$(( (current_utc - event_time/1000) / 3600 ))
If offset is 4-12 hours: The logs are likely in a local timezone (US timezones are UTC-5 to UTC-8)
Automatically warn the user:
⚠️ Timezone Warning: The parsed event_time appears to be ~[X] hours behind current UTC time. This suggests your log timestamps are in local time (e.g., Pacific Time), not UTC.
LimaCharlie interprets all timestamps without timezone info as UTC. Your events will appear [X] hours in the past.
Recommendation: Configure your log source to emit UTC timestamps:
- rsyslog: Add
$ActionFileDefaultTemplate RSYSLOG_FileFormatto rsyslog.conf - systemd-journald: Already uses UTC by default
- rsyslog: Add
Continue with deployment even if warning is shown (user can address later)
Phase 5: Validate & Apply
Step 1: Validate the parsing configuration
Use validate_usp_mapping to test the Grok pattern against sample data:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: validate_usp_mapping
- Parameters: {
\"oid\": \"<SELECTED_ORG_ID>\",
\"platform\": \"text\",
\"mapping\": {
\"parsing_grok\": {
\"message\": \"%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\\\\[%{INT:pid}\\\\]: %{GREEDYDATA:msg}\"
},
\"event_type_path\": \"service\",
\"event_time_path\": \"date\",
\"sensor_hostname_path\": \"host\"
},
\"text_input\": \"<SAMPLE_LOG_LINES>\"
}
- Return: RAW"
)
Note: The
mappingobject should haveparsing_grok,event_type_path,event_time_path, andsensor_hostname_pathat the same level - NOT nested under aparsingkey.
Step 2: Review parsed results
Show the parsed events to the user:
- Verify all expected fields are extracted
- Confirm event type assignment
Automatic Timezone Sanity Check:
If the Grok pattern uses SYSLOGTIMESTAMP, DATESTAMP, or TIMESTAMP_ISO8601 without explicit timezone:
Get current UTC:
current_utc=$(date -u +%s)Extract
event_timefrom validation result (milliseconds)Calculate offset:
offset_hours=$(( (current_utc - event_time/1000) / 3600 ))If offset is 4-12 hours, display warning:
"⚠️ Timezone Mismatch Detected: Parsed time is ~[offset] hours behind UTC. Your logs appear to be in local time. Events will be timestamped incorrectly in LimaCharlie. Consider configuring your log source to use UTC."
Step 3: Iterate if needed
If validation fails or fields are missing, adjust the Grok pattern and re-validate.
Step 4: Apply configuration
For External Adapters:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: set_external_adapter
- Parameters: {
\"oid\": \"<SELECTED_ORG_ID>\",
\"name\": \"<ADAPTER_NAME>\",
\"adapter_type\": \"<EXISTING_TYPE>\",
\"config\": {
// ... existing config ...
\"parsing_rules\": {
\"parsing_grok\": {
\"message\": \"<GROK_PATTERN>\"
},
\"event_type_path\": \"<PATH>\",
\"event_time_path\": \"<PATH>\",
\"sensor_hostname_path\": \"<PATH>\"
}
}
}
- Return: RAW"
)
For Cloud Sensors:
Task(
subagent_type="lc-essentials:limacharlie-api-executor",
model="haiku",
prompt="Execute LimaCharlie API call:
- Function: set_cloud_sensor
- Parameters: {
\"oid\": \"<SELECTED_ORG_ID>\",
\"name\": \"<SENSOR_NAME>\",
\"config\": {
// ... existing config with updated parsing ...
}
}
- Return: RAW"
)
For One-off/USP Adapters:
Provide the configuration for the user to apply locally:
YAML Config:
file:
client_options:
platform: text
identity:
installation_key: <IID>
oid: <OID>
sensor_seed_key: my-adapter
mapping:
parsing_grok:
message: '%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\[%{INT:pid}\]: %{GREEDYDATA:message}'
event_type_path: "service"
event_time_path: "date"
event_time_timezone: "America/New_York" # Required for SYSLOGTIMESTAMP (no timezone in pattern)
sensor_hostname_path: "host"
file_path: "/var/log/messages"
CLI Config:
./lc_adapter file \
client_options.identity.installation_key=<IID> \
client_options.identity.oid=<OID> \
client_options.platform=text \
client_options.sensor_seed_key=my-adapter \
"client_options.mapping.parsing_grok.message=%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\[%{INT:pid}\]: %{GREEDYDATA:message}" \
client_options.mapping.event_type_path=service \
client_options.mapping.event_time_path=date \
client_options.mapping.event_time_timezone=America/New_York \
client_options.mapping.sensor_hostname_path=host \
file_path=/var/log/messages
IMPORTANT: The
event_time_timezoneis REQUIRED when usingSYSLOGTIMESTAMP,DATESTAMP, orTIMESTAMP_ISO8601without explicit timezone info. Without it, LimaCharlie assumes UTC and timestamps will be incorrect.
When outputting CLI config for test-limacharlie-adapter skill, use this format:
--grok '<PATTERN>'
--event-type <PATH>
--event-time <PATH>
--event-time-tz <TIMEZONE> # Include when timestamp pattern lacks timezone
--hostname-path <PATH>
Example Usage
Example 1: Parse Syslog Logs
User: "Help me parse syslog logs from my Linux server"
Sample log:
Jan 15 12:30:45 server01 sshd[12345]: Accepted publickey for admin from 192.168.1.50 port 54321 ssh2
Generated Grok pattern:
%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\[%{INT:pid}\]: %{GREEDYDATA:message}
Mapping configuration:
mapping:
parsing_grok:
message: '%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\[%{INT:pid}\]: %{GREEDYDATA:message}'
event_type_path: "service"
event_time_path: "date"
event_time_timezone: "America/New_York" # Required - SYSLOGTIMESTAMP has no timezone
sensor_hostname_path: "host"
CLI options for test-limacharlie-adapter:
--grok '%{SYSLOGTIMESTAMP:date} %{HOSTNAME:host} %{WORD:service}\[%{INT:pid}\]: %{GREEDYDATA:message}'
--event-type service
--event-time date
--event-time-tz America/New_York
--hostname-path host
Validation result:
{
"valid": true,
"results": [{
"event": {
"timestamp": "Jan 15 12:30:45",
"event_type": "sshd",
"message": "Accepted publickey for admin from 192.168.1.50 port 54321 ssh2"
},
"routing": {
"hostname": "server01"
}
}]
}
Example 2: Parse Firewall Logs
User: "I need to parse these firewall logs and extract the source/destination IPs"
Sample log:
2024-01-15T12:00:00Z ACCEPT TCP 192.168.1.100:54321 10.0.0.5:443 packets=1 bytes=78
Generated Grok pattern:
%{TIMESTAMP_ISO8601:timestamp} %{WORD:action} %{WORD:protocol} %{IP:src_ip}:%{NUMBER:src_port} %{IP:dst_ip}:%{NUMBER:dst_port} packets=%{NUMBER:packets} bytes=%{NUMBER:bytes}
Mapping configuration:
mapping:
parsing_grok:
message: '%{TIMESTAMP_ISO8601:timestamp} %{WORD:action} %{WORD:protocol} %{IP:src_ip}:%{NUMBER:src_port} %{IP:dst_ip}:%{NUMBER:dst_port} packets=%{NUMBER:packets} bytes=%{NUMBER:bytes}'
event_type_path: "action"
event_time_path: "timestamp"
Validation result:
{
"valid": true,
"results": [{
"event": {
"timestamp": "2024-01-15T12:00:00Z",
"event_type": "ACCEPT",
"protocol": "TCP",
"src_ip": "192.168.1.100",
"src_port": "54321",
"dst_ip": "10.0.0.5",
"dst_port": "443",
"packets": "1",
"bytes": "78"
}
}]
}
Example 3: One-off USP Adapter Configuration
User: "I have a local adapter running, just give me the config to parse Apache access logs"
Sample log:
192.168.1.50 - admin [15/Jan/2024:12:30:45 +0000] "GET /api/users HTTP/1.1" 200 1234
Generated configuration (YAML):
file:
client_options:
platform: text
identity:
installation_key: <YOUR_IID>
oid: <YOUR_OID>
sensor_seed_key: apache-logs
mapping:
parsing_grok:
message: '%{COMMONAPACHELOG}'
event_type_path: "verb"
event_time_path: "timestamp"
file_path: "/var/log/apache2/access.log"
Generated configuration (CLI):
./lc_adapter file \
client_options.identity.installation_key=<YOUR_IID> \
client_options.identity.oid=<YOUR_OID> \
client_options.platform=text \
client_options.sensor_seed_key=apache-logs \
"client_options.mapping.parsing_grok.message=%{COMMONAPACHELOG}" \
client_options.mapping.event_type_path=verb \
client_options.mapping.event_time_path=timestamp \
file_path=/var/log/apache2/access.log
Additional Notes
Pre-built Grok Patterns
LimaCharlie supports these pre-built patterns for common log formats:
| Pattern | Use Case |
|---|---|
%{COMMONAPACHELOG} |
Apache common log format |
%{COMBINEDAPACHELOG} |
Apache combined log format |
%{NGINXACCESS} |
Nginx access logs |
Key/Value Parsing (Alternative to Grok)
For logs with key=value format (like CEF), use parsing_re with key-value extraction:
mapping:
parsing_re: '(?:<\d+>\s*)?(\w+)=(".*?"|\S+)'
This regex captures pairs where:
- Submatch 1 = Key name
- Submatch 2 = Value
Field Path Syntax
Use / as separator for nested JSON navigation:
a→ access fieldaa/b/c→ access nested fielda.b.c
Event Type Path Templates
The event_type_path supports template strings for dynamic event type assignment based on event content.
Timestamp Timezone Handling
LimaCharlie assumes UTC for timestamps that don't include explicit timezone information.
Formats with explicit timezone (safe):
- ISO 8601 with
Z:2024-01-15T12:30:45Z - ISO 8601 with offset:
2024-01-15T12:30:45-08:00 - RFC 3339:
2024-01-15T12:30:45.000-08:00
Formats without timezone (assumed UTC):
- Syslog:
Jan 15 12:30:45 - Simple datetime:
2024-01-15 12:30:45
Recommendation: Configure log sources to emit UTC timestamps when possible. For syslog on Linux:
- rsyslog: Use
$ActionFileDefaultTemplate RSYSLOG_FileFormatin rsyslog.conf - journald: Timestamps are already in UTC by default
Common Issues
- Pattern not matching: Ensure special characters are escaped (e.g.,
\[for literal brackets) - Greedy matching: Use
%{DATA}for non-greedy matching,%{GREEDYDATA}only at end of pattern - Timestamp not parsing: Verify timestamp format matches the Grok pattern exactly
Common Pitfalls
Using stale timestamps: Always calculate epoch timestamps dynamically with Bash before querying
get_historic_events. Never use placeholder or hardcoded values - the API will return empty results for timestamps outside the data retention window.Wrong mapping structure: The
mappingobject should haveparsing_grok,event_type_path, andevent_time_pathat the same level - NOT nested under aparsingkey.Wrong Grok field name: For text platform adapters, always use
messageas the key inparsing_grok, nottext.DATESTAMP vs TIMESTAMP_ISO8601:
YYYY-MM-DD HH:MM:SS→ Use%{TIMESTAMP_ISO8601}MM-DD-YY HH:MM:SS→ Use%{DATESTAMP}Jan 15 12:30:45→ Use%{SYSLOGTIMESTAMP}
Not recognizing unparsed events: If you see
event_type: "unknown_event"with only atextfield, that means parsing is not configured - thetextfield contains the raw log data that needs a Grok pattern.Timezone assumptions for timestamps without timezone info:
SYSLOGTIMESTAMP,DATESTAMP, andTIMESTAMP_ISO8601withoutZor offset don't include timezone- LimaCharlie assumes UTC for all such timestamps
- If logs are in local time (e.g., Pacific Time), events will appear 7-8 hours in the future
- Solution: Configure log source to emit UTC timestamps, or verify with user that times appear correct
Indexing Support
For searchable fields, configure custom indexing:
indexing:
- events_included: ["*"]
path: "src_ip"
index_type: "ip"
- events_included: ["*"]
path: "dst_ip"
index_type: "ip"
Supported index types: file_hash, file_path, file_name, domain, ip, user, service_name, package_name
Related Skills
limacharlie-call: For other LimaCharlie API operationsdetection-engineering: Create D&R rules to detect patterns in parsed logstest-limacharlie-adapter: Deploy a test adapter to verify parsinglookup-lc-doc: Search LimaCharlie documentation for more parsing details
Reference
For more details on parsing configuration:
- Adapter Usage:
./docs/limacharlie/doc/Sensors/Adapters/adapter-usage.md - Log Collection Guide:
./docs/limacharlie/doc/Sensors/Reference/logcollectionguide.md