| name | advanced-rendering |
| description | Master high-performance rendering for large datasets with Datashader. Use this skill when working with datasets exceeding 100M+ points, optimizing visualization performance, or implementing efficient rendering strategies with rasterization and colormapping techniques. |
| compatibility | Requires datashader >= 0.15.0, colorcet >= 3.1.0, holoviews >= 1.18.0, pandas >= 1.0.0, numpy >= 1.15.0 |
Advanced Rendering Skill
Overview
Master high-performance rendering for large datasets with Datashader and optimization techniques. This skill covers handling 100M+ point datasets, performance tuning, and efficient visualization strategies.
Dependencies
- datashader >= 0.15.0
- colorcet >= 3.1.0
- holoviews >= 1.18.0
- pandas >= 1.0.0
- numpy >= 1.15.0
Core Capabilities
1. Datashader Fundamentals
Datashader is designed for rasterizing large datasets:
import datashader as ds
from datashader.mpl_ext import _colorize
import holoviews as hv
# Load large dataset (can handle 100M+ points)
df = pd.read_csv('large_dataset.csv') # Millions or billions of rows
# Create datashader canvas
canvas = ds.Canvas(plot_width=800, plot_height=600)
# Rasterize aggregation
agg = canvas.points(df, 'x', 'y')
# Convert to image
img = agg.to_array(True)
2. Efficient Point Rendering
from holoviews.operation.datashader import datashade, aggregate, shade
# Quick datashading with HoloViews
scatter = hv.Scatter(df, 'x', 'y')
shaded = datashade(scatter)
# With custom aggregation
agg = aggregate(scatter, width=800, height=600)
colored = shade(agg, cmap='viridis')
# Control rasterization
from holoviews.operation import rasterize
rasterized = rasterize(
scatter,
aggregator=ds.count(),
pixel_ratio=2,
upsample_method='interp'
)
3. Color Mapping and Aggregation
import datashader as ds
from colorcet import cm
# Count aggregation (heatmap)
canvas = ds.Canvas()
agg = canvas.points(df, 'x', 'y', agg=ds.count())
# Weighted aggregation
agg = canvas.points(df, 'x', 'y', agg=ds.sum('value'))
# Mean aggregation
agg = canvas.points(df, 'x', 'y', agg=ds.mean('value'))
# Custom colormapping
import datashader.transfer_functions as tf
shaded = tf.shade(agg, cmap=cm['viridis'])
shaded_with_spread = tf.spread(shaded, px=2)
4. Image Compositing
# Combine multiple datasets
canvas = ds.Canvas(x_range=(0, 100), y_range=(0, 100))
agg1 = canvas.points(df1, 'x', 'y')
agg2 = canvas.points(df2, 'x', 'y')
# Shade separately
shaded1 = tf.shade(agg1, cmap=cm['reds'])
shaded2 = tf.shade(agg2, cmap=cm['blues'])
# Composite
import datashader.transfer_functions as tf
composite = tf.composite(shaded1, shaded2)
5. Interactive Datashader with HoloViews
from holoviews.operation.datashader import datashade
from holoviews import streams
# Interactive scatter with zooming
def create_datashaded_plot(data):
scatter = hv.Scatter(data, 'x', 'y')
return datashade(scatter, cmap='viridis')
# Add interaction
range_stream = streams.RangeXY()
interactive_plot = hv.DynamicMap(
create_datashaded_plot,
streams=[range_stream]
)
6. Time Series Data Streaming
# Efficient streaming plot for time series
from holoviews.operation.datashader import rasterize
from holoviews import streams
def create_timeseries_plot(df_window):
curve = hv.Curve(df_window, 'timestamp', 'value')
return curve
# Rasterize for efficiency
rasterized = rasterize(
hv.Curve(df, 'timestamp', 'value'),
aggregator=ds.mean('value'),
width=1000,
height=400
)
Performance Optimization Strategies
1. Memory Optimization
# Use data types efficiently
df = pd.read_csv(
'large_file.csv',
dtype={
'x': 'float32',
'y': 'float32',
'value': 'float32',
'category': 'category'
}
)
# Chunk processing for extremely large files
chunk_size = 1_000_000
aggregations = []
for chunk in pd.read_csv('huge.csv', chunksize=chunk_size):
canvas = ds.Canvas()
agg = canvas.points(chunk, 'x', 'y')
aggregations.append(agg)
# Combine results
combined_agg = aggregations[0]
for agg in aggregations[1:]:
combined_agg = combined_agg + agg
2. Resolution and Pixel Ratio
# Adjust canvas resolution based on data density
def auto_canvas(df, target_pixels=500000):
data_points = len(df)
aspect_ratio = (df['x'].max() - df['x'].min()) / (df['y'].max() - df['y'].min())
pixels = int(np.sqrt(target_pixels / aspect_ratio))
height = pixels
width = int(pixels * aspect_ratio)
return ds.Canvas(
plot_width=width,
plot_height=height,
x_range=(df['x'].min(), df['x'].max()),
y_range=(df['y'].min(), df['y'].max())
)
canvas = auto_canvas(df)
agg = canvas.points(df, 'x', 'y')
3. Aggregation Selection
# Choose appropriate aggregation for your data
canvas = ds.Canvas()
# For counting: count()
agg_count = canvas.points(df, 'x', 'y', agg=ds.count())
# For averages: mean()
agg_mean = canvas.points(df, 'x', 'y', agg=ds.mean('value'))
# For sums: sum()
agg_sum = canvas.points(df, 'x', 'y', agg=ds.sum('value'))
# For max/min
agg_max = canvas.points(df, 'x', 'y', agg=ds.max('value'))
# For percentiles
agg_p95 = canvas.points(df, 'x', 'y', agg=ds.count_cat('category'))
Colormapping with Colorcet
1. Perceptually Uniform Colormaps
from colorcet import cm, cmap_d
import datashader.transfer_functions as tf
# Use perceptually uniform colormaps
canvas = ds.Canvas()
agg = canvas.points(df, 'x', 'y', agg=ds.count())
# Gray scale
shaded_gray = tf.shade(agg, cmap=cm['gray'])
# Perceptual colormaps
shaded_viridis = tf.shade(agg, cmap=cm['viridis'])
shaded_turbo = tf.shade(agg, cmap=cm['turbo'])
# Category colormaps
shaded_color = tf.shade(agg, cmap=cm['cet_c5'])
2. Custom Color Normalization
# Logarithmic normalization
from datashader.transfer_functions import Log
canvas = ds.Canvas()
agg = canvas.points(df, 'x', 'y', agg=ds.sum('value'))
# Log transform for better visualization
shaded = tf.shade(agg, norm='log', cmap=cm['viridis'])
# Power law normalization
shaded_power = tf.shade(agg, norm=ds.transfer_functions.eq_hist, cmap=cm['plasma'])
3. Multi-Band Compositing
# Separate visualization of multiple datasets
canvas = ds.Canvas()
agg_red = canvas.points(df_red, 'x', 'y')
agg_green = canvas.points(df_green, 'x', 'y')
agg_blue = canvas.points(df_blue, 'x', 'y')
# Stack as RGB
from datashader.colors import rgb
result = rgb(agg_red, agg_green, agg_blue)
Integration with Panel and HoloViews
import panel as pn
from holoviews.operation.datashader import datashade
# Create interactive dashboard with datashader
class LargeDataViewer(param.Parameterized):
cmap = param.Selector(default='viridis', objects=list(cm.keys()))
show_spread = param.Boolean(default=False)
def __init__(self, data):
super().__init__()
self.data = data
@param.depends('cmap', 'show_spread')
def plot(self):
scatter = hv.Scatter(self.data, 'x', 'y')
shaded = datashade(scatter, cmap=cm[self.cmap])
if self.show_spread:
shaded = tf.spread(shaded, px=2)
return shaded
viewer = LargeDataViewer(large_df)
pn.extension('material')
app = pn.Column(
pn.param.ParamMethod.from_param(viewer.param),
viewer.plot
)
app.servable()
Best Practices
1. Choose the Right Tool
< 10k points: Use standard HoloViews/hvPlot
10k - 1M points: Use rasterize() for dense plots
1M - 100M points: Use Datashader
> 100M points: Use Datashader with chunking
2. Appropriate Canvas Size
# General rule: 400-1000 pixels on each axis
# Too small: loses detail
# Too large: slow rendering, memory waste
canvas = ds.Canvas(plot_width=800, plot_height=600) # Good default
3. Normalize Large Value Ranges
# When data has extreme outliers
canvas = ds.Canvas()
agg = canvas.points(df, 'x', 'y', agg=ds.mean('value'))
# Use appropriate normalization
shaded = tf.shade(agg, norm='log', cmap=cm['viridis'])
Common Patterns
Pattern 1: Progressive Disclosure
def create_progressive_plot(df):
# Start with aggregated view
agg = canvas.points(df, 'x', 'y')
return tf.shade(agg, cmap='viridis')
# User can zoom to see more detail
# Datashader automatically recalculates at new resolution
Pattern 2: Categorical Visualization
canvas = ds.Canvas()
# Aggregate by category
for category in df['category'].unique():
subset = df[df['category'] == category]
agg = canvas.points(subset, 'x', 'y', agg=ds.count())
shaded = tf.shade(agg, cmap=cm[f'category_{category}'])
Pattern 3: Time Series Aggregation
def aggregate_time_series(df, time_bucket):
df['time_bucket'] = pd.cut(df['timestamp'], bins=time_bucket)
aggregated = df.groupby('time_bucket').agg({
'x': 'mean',
'y': 'mean',
'value': 'sum'
})
return aggregated
Common Use Cases
- Scatter Plot Analysis: 100M+ point clouds
- Time Series Visualization: High-frequency trading data
- Geospatial Heat Maps: Global-scale location data
- Scientific Visualization: Climate model outputs
- Network Analysis: Large graph layouts
- Financial Analytics: Tick-by-tick market data
Troubleshooting
Issue: Poor Color Differentiation
- Use perceptually uniform colormaps from colorcet
- Apply appropriate normalization (log, power law)
- Adjust canvas size for better resolution
Issue: Memory Issues with Large Data
- Use chunk processing for files larger than RAM
- Reduce data type precision (float64 → float32)
- Aggregate before visualization
- Use categorical data type for strings
Issue: Slow Performance
- Reduce canvas size (fewer pixels)
- Use simpler aggregation functions
- Enable GPU acceleration if available
- Profile with Python profilers to find bottlenecks