| name | regex-pattern-builder |
| description | Builds and explains regex patterns from natural language, tests patterns, and provides examples. Use when user asks to "create regex", "regex pattern", "match pattern", "validate email/phone", or "regex help". |
| allowed-tools | Read, Write |
Regex Pattern Builder
Creates regex patterns from natural language descriptions, explains existing patterns, and helps test and debug regex.
When to Use
- "Create a regex to match emails"
- "Regex pattern for phone numbers"
- "How do I match URLs"
- "Explain this regex"
- "Test my regex pattern"
- "Validate password regex"
Instructions
1. Understand the Requirement
Ask clarifying questions if needed:
- What format are you trying to match?
- Should it be strict or permissive?
- What language/flavor (JavaScript, Python, etc.)?
- Full match or contains?
- Case sensitive?
2. Build Pattern from Description
Common Patterns
Email validation:
// Simple (permissive)
/^[^\s@]+@[^\s@]+\.[^\s@]+$/
// More strict (RFC 5322 simplified)
/^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
// Explanation:
// ^ - Start of string
// [a-zA-Z0-9._%+-]+ - Username: letters, numbers, dots, etc.
// @ - Literal @ symbol
// [a-zA-Z0-9.-]+ - Domain name
// \. - Literal dot
// [a-zA-Z]{2,} - TLD (2+ letters)
// $ - End of string
Phone numbers:
// US phone (flexible)
/^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$/
// Matches:
// 123-456-7890
// (123) 456-7890
// 123.456.7890
// 1234567890
// International (E.164)
/^\+?[1-9]\d{1,14}$/
// Explanation:
// ^\+? - Optional + at start
// [1-9] - First digit 1-9
// \d{1,14} - 1-14 more digits
// $ - End of string
URLs:
// Simple URL
/^https?:\/\/[\w\-._~:/?#[\]@!$&'()*+,;=]+$/
// With capture groups
/^(https?):\/\/([\w.-]+)(:\d+)?(\/[\w\-._~:/?#[\]@!$&'()*+,;=]*)?$/
// Groups:
// $1 - protocol (http/https)
// $2 - domain
// $3 - port (optional)
// $4 - path (optional)
Passwords:
// At least 8 chars, 1 uppercase, 1 lowercase, 1 number
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d@$!%*?&]{8,}$/
// At least 8 chars, 1 uppercase, 1 lowercase, 1 number, 1 special
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/
// Explanation:
// (?=.*[a-z]) - Lookahead: contains lowercase
// (?=.*[A-Z]) - Lookahead: contains uppercase
// (?=.*\d) - Lookahead: contains digit
// (?=.*[@$!%*?&])- Lookahead: contains special char
// [A-Za-z\d@$!%*?&]{8,} - 8+ valid characters
Dates:
// YYYY-MM-DD
/^\d{4}-\d{2}-\d{2}$/
// MM/DD/YYYY or M/D/YYYY
/^(0?[1-9]|1[0-2])\/(0?[1-9]|[12]\d|3[01])\/\d{4}$/
// ISO 8601 (with time)
/^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d{3})?Z?$/
Credit card:
// Any 13-19 digits with optional spaces/dashes
/^[\d\s-]{13,19}$/
// Specific cards:
// Visa: /^4\d{12}(?:\d{3})?$/
// Mastercard: /^5[1-5]\d{14}$/
// Amex: /^3[47]\d{13}$/
IP addresses:
// IPv4
/^(?:(?:25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(?:25[0-5]|2[0-4]\d|[01]?\d\d?)$/
// IPv4 (simple)
/^(\d{1,3}\.){3}\d{1,3}$/
// IPv6 (simplified)
/^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$/
Usernames:
// 3-16 chars, alphanumeric + underscore/hyphen
/^[a-zA-Z0-9_-]{3,16}$/
// Must start with letter, 3-16 chars
/^[a-zA-Z][a-zA-Z0-9_-]{2,15}$/
Hex colors:
// #RGB or #RRGGBB
/^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$/
// With optional alpha (#RRGGBBAA)
/^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{8}|[A-Fa-f0-9]{3}|[A-Fa-f0-9]{4})$/
HTML tags:
// Match opening tags
/<([a-z]+)([^<]+)*(?:>(.*)<\/\1>|\s+\/>)/
// Strip all HTML tags
/<[^>]*>/g
// Match specific tag
/<div\b[^>]*>(.*?)<\/div>/gs
File paths:
// Windows path
/^[a-zA-Z]:\\(?:[^\\/:*?"<>|\r\n]+\\)*[^\\/:*?"<>|\r\n]*$/
// Unix path
/^\/(?:[^\/\0]+\/)*[^\/\0]*$/
// File extension
/\.([a-zA-Z0-9]+)$/
3. Provide Test Cases
For each pattern, show examples:
const emailRegex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/
// Valid emails
console.log(emailRegex.test('user@example.com')) // true
console.log(emailRegex.test('first.last@company.co.uk')) // true
console.log(emailRegex.test('user+tag@domain.org')) // true
// Invalid emails
console.log(emailRegex.test('@example.com')) // false
console.log(emailRegex.test('user@')) // false
console.log(emailRegex.test('user example.com')) // false
console.log(emailRegex.test('user@domain')) // false
4. Explain Pattern Components
Basic syntax:
. - Any character except newline
\d - Digit [0-9]
\D - Not digit
\w - Word character [a-zA-Z0-9_]
\W - Not word character
\s - Whitespace [\t\n\r ]
\S - Not whitespace
^ - Start of string/line
$ - End of string/line
\b - Word boundary
\B - Not word boundary
* - 0 or more (greedy)
+ - 1 or more (greedy)
? - 0 or 1 (greedy)
{n} - Exactly n times
{n,} - n or more times
{n,m} - Between n and m times
*? - 0 or more (lazy)
+? - 1 or more (lazy)
?? - 0 or 1 (lazy)
[abc] - Any of a, b, or c
[^abc] - Not a, b, or c
[a-z] - Any lowercase letter
[0-9] - Any digit
(...) - Capture group
(?:...) - Non-capturing group
(?=...) - Positive lookahead
(?!...) - Negative lookahead
(?<=...)- Positive lookbehind
(?<!...)- Negative lookbehind
| - OR
\ - Escape special character
5. Language-Specific Variations
JavaScript:
// Flags
const regex = /pattern/gi
// g - global (find all matches)
// i - case insensitive
// m - multiline (^ and $ match line breaks)
// s - dotall (. matches newlines)
// u - unicode
// y - sticky
// Methods
'text'.match(/pattern/g) // Array of matches
'text'.matchAll(/pattern/g) // Iterator of matches
'text'.search(/pattern/) // Index of first match
'text'.replace(/pattern/g, 'new') // Replace matches
/pattern/.test('text') // Boolean
/pattern/.exec('text') // Match details
Python:
import re
# Flags
re.IGNORECASE # Case insensitive
re.MULTILINE # ^ and $ match line breaks
re.DOTALL # . matches newlines
re.VERBOSE # Allow comments in regex
# Methods
re.match(pattern, string) # Match at start
re.search(pattern, string) # Find anywhere
re.findall(pattern, string) # All matches
re.finditer(pattern, string) # Iterator
re.sub(pattern, repl, string) # Replace
re.split(pattern, string) # Split
PHP:
// Functions
preg_match($pattern, $subject) // Single match
preg_match_all($pattern, $subject) // All matches
preg_replace($pattern, $replace, $subject) // Replace
preg_split($pattern, $subject) // Split
// Pattern modifiers
/pattern/i // Case insensitive
/pattern/m // Multiline
/pattern/s // Dotall
/pattern/x // Ignore whitespace
6. Common Use Cases
Extract data:
const text = "Contact: john@example.com or jane@company.org"
const emails = text.match(/[^\s@]+@[^\s@]+\.[^\s@]+/g)
console.log(emails) // ['john@example.com', 'jane@company.org']
Validate input:
function validateEmail(email) {
const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/
return regex.test(email)
}
Replace content:
const text = "Call us at 555-1234 or 555-5678"
const censored = text.replace(/\d{3}-\d{4}/g, 'XXX-XXXX')
console.log(censored) // "Call us at XXX-XXXX or XXX-XXXX"
Parse structured data:
const log = "2024-01-15 ERROR: Failed to connect"
const match = log.match(/^(\d{4}-\d{2}-\d{2}) (\w+): (.+)$/)
if (match) {
const [, date, level, message] = match
console.log({ date, level, message })
}
7. Advanced Patterns
Lookaheads and lookbehinds:
// Password: 8+ chars, must include uppercase, lowercase, number
/^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/
// Match word not followed by another word
/\b\w+\b(?!\s+\w+)/
// Match number preceded by $
/(?<=\$)\d+(\.\d{2})?/
Capture groups:
const text = "John Doe (john@example.com)"
const regex = /(\w+)\s+(\w+)\s+\(([^)]+)\)/
const [, firstName, lastName, email] = text.match(regex)
console.log({ firstName, lastName, email })
// { firstName: 'John', lastName: 'Doe', email: 'john@example.com' }
Named groups (modern):
const regex = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const match = '2024-01-15'.match(regex)
console.log(match.groups)
// { year: '2024', month: '01', day: '15' }
8. Performance Tips
Avoid catastrophic backtracking:
// ❌ BAD: Catastrophic backtracking
/(a+)+b/
// ✅ GOOD: Atomic grouping or possessive quantifiers
/a+b/
Be specific:
// ❌ BAD: Too greedy
/<.*>/
// ✅ GOOD: Lazy quantifier
/<.*?>/
// ✅ BETTER: Specific negation
/<[^>]*>/
Anchor patterns:
// ❌ BAD: Scans entire string
/\d{3}-\d{4}/
// ✅ GOOD: Anchored
/^\d{3}-\d{4}$/
9. Testing Tools
Provide testing code:
function testRegex(pattern, testCases) {
console.log(`Testing: ${pattern}\n`)
testCases.forEach(({ input, expected }) => {
const result = pattern.test(input)
const status = result === expected ? '✓' : '✗'
console.log(`${status} "${input}" -> ${result} (expected ${expected})`)
})
}
// Usage
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/
testRegex(emailRegex, [
{ input: 'user@example.com', expected: true },
{ input: 'invalid@', expected: false },
{ input: '@example.com', expected: false },
{ input: 'user example.com', expected: false }
])
10. Common Mistakes
Forgetting to escape:
// ❌ BAD: . matches any character
/user.example.com/
// ✅ GOOD: Escaped dot
/user\.example\.com/
Not anchoring:
// ❌ BAD: Matches anywhere in string
/\d{3}-\d{4}/ // "xxx-555-1234-yyy" passes
// ✅ GOOD: Anchored to start and end
/^\d{3}-\d{4}$/ // Only "555-1234" passes
Greedy vs lazy:
const html = '<div>content</div><span>more</span>'
// ❌ BAD: Greedy, matches too much
html.match(/<.*>/) // '<div>content</div><span>more</span>'
// ✅ GOOD: Lazy, matches minimally
html.match(/<.*?>/) // '<div>'
Case sensitivity:
// ❌ BAD: Case sensitive
/^[a-z]+$/ // Only lowercase
// ✅ GOOD: Case insensitive
/^[a-z]+$/i // Any case
Regex Cheat Sheet
Character classes:
\d=[0-9]\D=[^0-9]\w=[a-zA-Z0-9_]\W=[^a-zA-Z0-9_]\s=[ \t\n\r\f\v]\S=[^ \t\n\r\f\v]
Quantifiers:
*= {0,∞}+= {1,∞}?= {0,1}{n}= exactly n{n,}= n or more{n,m}= between n and m
Anchors:
^= start of string/line$= end of string/line\b= word boundary\A= start of string (Python)\Z= end of string (Python)
Groups:
(...)= capture(?:...)= non-capture(?<name>...)= named capture(?=...)= lookahead(?!...)= negative lookahead(?<=...)= lookbehind(?<!...)= negative lookbehind