| name | engineering-nba-data |
| description | Extracts, transforms, and analyzes NBA statistics using the nba_api Python library. Use when working with NBA player stats, team data, game logs, shot charts, league statistics, or any NBA-related data engineering tasks. Supports both stats.nba.com endpoints and static player/team lookups. |
Goal: Extract and process NBA statistical data efficiently using the nba_api library for data analysis, reporting, and application development.
IMPORTANT: The nba_api library accesses stats.nba.com endpoints. All data requests return structured datasets that can be output as JSON, dictionaries, or pandas DataFrames.
Workflow
Phase 1: Setup and Installation
- Install nba_api:
pip install nba_apiif not yet installed - Import required modules based on task:
from nba_api.stats.endpoints import [endpoint_name]for stats.nba.com datafrom nba_api.stats.static import players, teamsfor static lookupsfrom nba_api.stats.library.parameters import [parameter_classes]for valid parameter values
Phase 2: Data Retrieval
For Player/Team Lookups (No API Calls):
- Use
players.find_players_by_full_name('player_name')for player searches - Use
teams.find_teams_by_full_name('team_name')for team searches - Both return dictionaries with
id,full_name, and other metadata - No HTTP requests are sent; data is embedded in the package
For Stats Endpoints (API Calls):
- Identify the correct endpoint from table of contents
- Initialize endpoint with required parameters:
endpoint_class(param1=value1, param2=value2) - Access datasets using dot notation:
response_object.dataset_name - Retrieve data in desired format:
.get_json()for JSON string.get_dict()for dictionary.get_data_frame()for pandas DataFrame
Custom Request Configuration:
- Set custom headers:
endpoint_class(player_id=123, headers=custom_headers) - Set proxy:
endpoint_class(player_id=123, proxy='127.0.0.1:80') - Set timeout:
endpoint_class(player_id=123, timeout=100)(in seconds)
Phase 3: Data Processing
- Extract specific datasets from endpoint responses
- Transform data using pandas for aggregations, filtering, joins
- Normalize nested data structures as needed
- Handle multiple datasets returned by single endpoint
Phase 4: Output and Storage
- Export to CSV:
df.to_csv('output.csv', index=False) - Export to JSON: Use
.get_json()ordf.to_json() - Store in database using pandas
.to_sql()method - Cache responses to minimize API calls
Rules
- Required packages:
nba_apimust be installed before use - Static first: Always use static lookups (players/teams) for ID retrieval before making API calls
- Parameter validation: Reference parameters.md for valid parameter values
- Endpoint selection: Check table of contents to find the correct endpoint
- Rate limiting: Be mindful of API rate limits; cache data when possible
- Error handling: Wrap API calls in try-except blocks to handle network failures
- Data formats: Know when to use JSON, dict, or DataFrame based on downstream requirements
- Season format: Seasons use format
YYYY-YY(e.g.,2019-20) - League IDs: NBA=
00, ABA=01, WNBA=10, G-League=20
Acceptance Criteria
- Data retrieved successfully from appropriate endpoint or static source
- Correct parameters used based on documentation
- Data formatted appropriately for intended use case
- Error handling implemented for API failures
- Code follows Python best practices
- Results validated against expected structure
- Documentation references included where relevant
Reference Documentation
Quick access to common resources:
- Table of Contents - Full documentation index
- Examples - Usage examples for endpoints and static data
- Parameters - Valid parameter values and patterns
- Endpoints Data Structure - Response format and methods
- Players - Static player lookup functions
- Teams - Static team lookup functions
- HTTP Library - HTTP request details
Endpoint-specific documentation:
Refer to docs/nba_api/stats/endpoints/[endpoint_name].md for detailed parameter and dataset information for each endpoint.