| name | Web Application Reconnaissance |
| description | Systematic methodology for mapping web application attack surface, discovering hidden endpoints, and identifying technologies |
| when_to_use | After identifying live web applications during subdomain enumeration, when starting security assessment of a web app, or when hunting for hidden functionality and forgotten endpoints |
| version | 1.0.0 |
| languages | bash, python, javascript |
Web Application Reconnaissance
Overview
Web application reconnaissance goes beyond simple subdomain discovery to map the full attack surface of a web application. This includes discovering hidden endpoints, analyzing client-side code, identifying backend technologies, and understanding the application's architecture.
Core principle: Systematic enumeration combined with intelligent analysis reveals hidden attack surface that automated scanners miss.
When to Use
Use this skill when:
- Starting security assessment of a web application
- Building comprehensive understanding of app structure
- Looking for hidden admin panels, APIs, or debug endpoints
- Analyzing JavaScript for hardcoded secrets or endpoints
- Mapping application functionality before deeper testing
Don't use when:
- Not authorized to test the target
- Application has strict rate limiting (adjust methodology)
- Need to remain completely passive (use only public sources)
The Four-Phase Methodology
Phase 1: Initial Discovery and Fingerprinting
Goal: Understand what you're dealing with - technologies, frameworks, and basic structure.
Techniques:
Technology Detection
# Comprehensive tech stack identification whatweb -v -a 3 https://target.com # HTTP headers analysis curl -I https://target.com # Wappalyzer or similar wappalyzer https://target.comCommon Files and Directories
# robots.txt - often reveals hidden directories curl https://target.com/robots.txt # sitemap.xml - complete site structure curl https://target.com/sitemap.xml # security.txt - contact info, may reveal scope curl https://target.com/.well-known/security.txt # Common config/info files for file in readme.md humans.txt crossdomain.xml; do curl -s https://target.com/$file doneSSL/TLS Analysis
# Certificate information may reveal additional domains echo | openssl s_client -connect target.com:443 2>/dev/null | \ openssl x509 -noout -text | \ grep -A1 "Subject Alternative Name"
Phase 2: Content Discovery
Goal: Find hidden endpoints, forgotten files, backup directories, and undocumented functionality.
Techniques:
Directory and File Fuzzing
# ffuf - fast web fuzzer ffuf -w /path/to/wordlist.txt \ -u https://target.com/FUZZ \ -mc 200,301,302,403 \ -o directories.json # gobuster for directory brute-forcing gobuster dir -u https://target.com \ -w /path/to/wordlist.txt \ -x php,html,js,txt,json \ -o gobuster_results.txt # feroxbuster - recursive directory discovery feroxbuster -u https://target.com \ -w /path/to/wordlist.txt \ --depth 3 \ -x php js jsonIntelligent Wordlist Selection
# Technology-specific wordlists # For WordPress: ffuf -w wordpress_wordlist.txt -u https://target.com/FUZZ # For APIs: ffuf -w api_wordlist.txt -u https://target.com/api/FUZZ # Custom wordlist from discovered technologies # If tech stack is Python/Django, use Django-specific pathsBackup and Sensitive File Discovery
# Common backup patterns for ext in .bak .old .backup .swp ~; do ffuf -w discovered_files.txt -u https://target.com/FUZZ$ext -mc 200 done # Source code disclosure ffuf -w discovered_files.txt -u https://target.com/FUZZ.txt -mc 200 # Git exposure curl -s https://target.com/.git/HEAD # If found, use git-dumper or similar to extract repository
Phase 3: JavaScript Analysis
Goal: Extract hardcoded secrets, discover API endpoints, and understand client-side logic.
Techniques:
Enumerate All JavaScript Files
# Extract JS URLs from HTML curl -s https://target.com | \ grep -oP 'src="[^"]+\.js"' | \ sed 's/src="//;s/"$//' > js_files.txt # Use LinkFinder or similar python3 linkfinder.py -i https://target.com -o results.htmlSearch for Sensitive Data in JS
# Download all JS files while read url; do curl -s "$url" > "js/$(basename "$url")" done < js_files.txt # Search for patterns grep -r -E "(api_key|apikey|secret|password|token|aws_access)" js/ grep -r -E "(https?://[^\"\'\ ]+)" js/ | grep -v "fonts\|cdn" # Find API endpoints grep -r -E "(/api/|/v[0-9]+/)" js/Beautify and Analyze Minified Code
# Beautify JS for easier analysis for file in js/*.js; do js-beautify "$file" > "js_beautified/$(basename "$file")" done # Look for interesting functions grep -r "function" js_beautified/ | grep -i "admin\|debug\|test"Extract Subdomains and Endpoints from JS
# Use tools like JSFinder, relative-url-extractor python3 relative-url-extractor.py -u https://target.com > endpoints.txt
Phase 4: Architecture Mapping
Goal: Understand application structure, authentication flows, and data flows.
Techniques:
Crawling and Spidering
# Burp Suite spider (manual) # Or use automated crawlers gospider -s https://target.com -d 3 -c 10 -o spider_output # katana - fast crawler katana -u https://target.com -d 5 -ps -jc -o crawl_results.txtParameter Discovery
# Find URL parameters arjun -u https://target.com/search -m GET # ParamSpider - discover parameters from wayback python3 paramspider.py -d target.comAPI Endpoint Enumeration
# If API discovered, enumerate versions and endpoints for version in v1 v2 v3; do ffuf -w api_endpoints.txt -u https://api.target.com/$version/FUZZ done # Swagger/OpenAPI documentation curl https://api.target.com/swagger.json curl https://api.target.com/openapi.json curl https://api.target.com/api-docsAuthentication and Session Analysis
# Analyze authentication mechanisms # - Cookie attributes (HttpOnly, Secure, SameSite) # - JWT tokens (decode and analyze claims) # - OAuth flows # - Session management # Check for JWT # Decode JWT token (use jwt_tool or jwt.io) echo "eyJhbG..." | base64 -d
Automation Pipeline
Complete reconnaissance pipeline:
#!/bin/bash
# web_app_recon.sh
TARGET=$1
OUTPUT_DIR="${TARGET//[.:\/]/_}_webapp_recon"
mkdir -p "$OUTPUT_DIR"/{js,crawl,endpoints}
echo "[*] Starting web application reconnaissance for $TARGET"
# Phase 1: Fingerprinting
echo "[*] Phase 1: Technology fingerprinting"
whatweb -v -a 3 "$TARGET" > "$OUTPUT_DIR/whatweb.txt"
curl -I "$TARGET" > "$OUTPUT_DIR/headers.txt"
curl -s "$TARGET/robots.txt" > "$OUTPUT_DIR/robots.txt"
curl -s "$TARGET/sitemap.xml" > "$OUTPUT_DIR/sitemap.xml"
# Phase 2: Content Discovery
echo "[*] Phase 2: Content discovery"
feroxbuster -u "$TARGET" \
-w /usr/share/wordlists/seclists/Discovery/Web-Content/common.txt \
-x php,html,js,txt,json \
--depth 2 \
-o "$OUTPUT_DIR/feroxbuster.txt"
# Phase 3: JavaScript Analysis
echo "[*] Phase 3: JavaScript analysis"
katana -u "$TARGET" -jc -o "$OUTPUT_DIR/crawl/katana_js.txt"
# Download and analyze JS files
grep "\.js$" "$OUTPUT_DIR/crawl/katana_js.txt" | while read js_url; do
filename=$(echo "$js_url" | md5sum | cut -d' ' -f1)
curl -s "$js_url" > "$OUTPUT_DIR/js/${filename}.js"
done
# Search for secrets in JS
echo "[*] Searching for sensitive data in JavaScript"
grep -r -E "(api[_-]?key|secret|password|token)" "$OUTPUT_DIR/js/" > "$OUTPUT_DIR/js_secrets.txt"
# Phase 4: Endpoint extraction
echo "[*] Phase 4: Endpoint extraction"
cat "$OUTPUT_DIR/js"/*.js | grep -oP '(/api/[^"'"'"'\s]+)' | sort -u > "$OUTPUT_DIR/endpoints/api_endpoints.txt"
echo "[+] Reconnaissance complete. Results in $OUTPUT_DIR/"
echo "[+] Review the following files:"
echo " - whatweb.txt: Technology stack"
echo " - feroxbuster.txt: Discovered directories/files"
echo " - js_secrets.txt: Potential secrets in JavaScript"
echo " - endpoints/api_endpoints.txt: API endpoints found"
Tool Recommendations
Content Discovery:
- ffuf (fast, flexible, modern)
- feroxbuster (recursive, Rust-based)
- gobuster (reliable, simple)
Crawling:
- katana (fast, modern)
- gospider (feature-rich)
- Burp Suite spider (manual, thorough)
JavaScript Analysis:
- LinkFinder (extract endpoints from JS)
- JSFinder (find subdomains/endpoints)
- relative-url-extractor
- js-beautify (beautify minified code)
General:
- httpx (probing and tech detection)
- nuclei (vulnerability templates)
- waybackurls (historical URLs)
Common Patterns and Findings
High-value targets to look for:
Admin/Debug Panels
/admin, /administrator, /admin.php /debug, /test, /dev /phpinfo.php, /info.php /console, /terminalConfiguration Files
/config.php, /.env, /settings.py /web.config, /application.yml /config.json, /.git/configAPI Documentation
/api-docs, /swagger, /api/v1/docs /graphql, /graphiql /redoc, /openapi.jsonBackup Files
/backup, /backups, /old index.php.bak, database.sql.old site.tar.gz, backup.zip
Organizing Findings
Create structured documentation:
# Web App Recon: target.com
## Executive Summary
- Application Type: [E-commerce, API, CMS, etc.]
- Primary Technology: [PHP/Laravel, Python/Django, Node.js, etc.]
- Notable Findings: [X hidden endpoints, Y exposed configs]
## Technology Stack
- Frontend: React 18.2, Bootstrap 5
- Backend: Laravel 9.x
- Server: Nginx 1.21
- Database: MySQL (inferred from error messages)
## Discovered Endpoints
### Public
- /api/v1/products - Product listing API
- /api/v1/users - User profiles (requires auth)
### Hidden/Interesting
- /api/v1/admin - Admin API (403, exists!)
- /api/internal/metrics - Internal metrics endpoint
- /debug/routes - Laravel route list (exposed!)
## Sensitive Files Found
- /storage/logs/laravel.log - Application logs exposed
- /.env.backup - Backup of environment config
- /phpinfo.php - Server info disclosure
## JavaScript Findings
- API keys found: 2 (one appears to be test key)
- Hardcoded API endpoints: 15 additional endpoints
- Subdomains discovered: api-staging.target.com
## Priority Items for Further Testing
1. /debug/routes - Full route disclosure
2. /.env.backup - May contain database credentials
3. /api/internal/metrics - Potential IDOR or info disclosure
4. Staging subdomain - May have weaker security
## Next Steps
- Test IDOR on /api/v1/users endpoints
- Attempt to access admin API with discovered tokens
- Manual review of staging environment
- Test for SQL injection in search parameters
Legal and Ethical Considerations
CRITICAL - Always follow these rules:
Authorization Required
- Never test without explicit permission
- Understand scope and boundaries
- Don't access sensitive data unless authorized
Responsible Disclosure
- Report findings through proper channels
- Don't publicly disclose before remediation
- Follow responsible disclosure timelines
Data Handling
- Don't exfiltrate sensitive data
- Don't store credentials or PII
- Delete reconnaissance data after assessment
Avoid DoS Conditions
- Rate limit your requests
- Don't overload servers
- Use appropriate concurrency settings
Common Pitfalls
| Mistake | Impact | Solution |
|---|---|---|
| Relying only on automated tools | Miss context-specific findings | Combine automation with manual analysis |
| Skipping JavaScript analysis | Miss API endpoints and secrets | Always analyze client-side code |
| Not checking robots.txt first | Waste time on known paths | Start with obvious information sources |
| Ignoring error messages | Miss technology fingerprinting | Pay attention to verbose errors |
| Too aggressive fuzzing | Detection, IP blocking | Start with smaller wordlists, increase gradually |
Integration with Other Skills
This skill works with:
- skills/reconnaissance/automated-subdomain-enum - Feeds discovered subdomains here
- skills/exploitation/* - Use discovered endpoints for exploitation
- skills/analysis/static-vuln-analysis - Analyze discovered source code
- skills/documentation/* - Document findings systematically
Success Metrics
A successful web app reconnaissance should:
- Identify all major technologies used
- Discover hidden or forgotten functionality
- Extract API endpoints and parameters
- Find configuration or sensitive file exposures
- Map authentication and authorization flows
- Prioritize findings for further testing
- Complete without triggering security alerts (if stealth required)
References and Further Reading
- OWASP Web Security Testing Guide
- "The Web Application Hacker's Handbook" by Dafydd Stuttard
- "Bug Bounty Bootcamp" by Vickie Li (Chapters 4-5)
- PortSwigger Web Security Academy
- HackerOne disclosed reports for real-world examples