| name | narwhals |
| description | Effectively use Narwhals to write dataframe-agnostic code that works seamlessly across multiple Python dataframe libraries. Write correct type annotations for code using Narwhals. |
Narwhals - DataFrame Agnostic API
Narwhals is a lightweight, zero-dependency compatibility layer for dataframe libraries in Python that provides a unified interface across different backends.
Docs: https://narwhals-dev.github.io/narwhals/
What is Narwhals?
Narwhals enables writing dataframe-agnostic code that works seamlessly across multiple Python dataframe libraries:
Full API Support:
- cuDF
- Modin
- pandas
- Polars
- PyArrow
Lazy-Only Support:
- Dask
- DuckDB
- Ibis
- PySpark
- SQLFrame
Core Philosophy
Why Narwhals?
- Resolves subtle differences between libraries (e.g., pandas checking index vs Polars checking values)
- Provides unified, simple, and predictable API
- Handles backwards compatibility internally
- Tests against nightly builds of supported libraries
- Maintains negligible performance overhead
- Full static typing support
- Zero dependencies
Target Use Case: Anyone building libraries, applications, or services that consume dataframes and need complete backend independence.
Key Features
- Backend Agnostic: Write once, run on any supported dataframe library
- Polars-Like API: Uses a subset of the Polars API for consistency
- Lazy & Eager Execution: Separate APIs for both execution modes
- Expression Support: Full expression API for complex operations
- Type Safety: Perfect static typing support
- 100% Branch Coverage: Thoroughly tested
Basic Usage Pattern
Three-Step Workflow
import narwhals as nw
# 1. Convert to Narwhals
df_nw = nw.from_native(df) # Works with pandas, Polars, PyArrow, etc.
# 2. Perform operations using Polars-like API
result = df_nw.select(
a_sum=nw.col("a").sum(), a_mean=nw.col("a").mean(), b_std=nw.col("b").std()
)
# 3. Convert back to original library
result_native = result.to_native()
Using the @narwhalify Decorator
Simplifies function definitions for automatic conversion:
@nw.narwhalify
def my_func(df: IntoDataFrameT):
return df.select(nw.col("a").sum(), nw.col("b").mean()).filter(nw.col("a") > 0)
# Automatically handles conversion to/from Narwhals
result = my_func(pandas_df) # Works!
result = my_func(polars_df) # Also works!
Top-Level Functions
Conversion Functions
from_native(df, ...): Convert native DataFrame/Series to Narwhals object- Parameters:
pass_through,backend,eager_only,allow_series
- Parameters:
to_native(nw_obj): Convert Narwhals object back to native library typenarwhalify(): Decorator for automatic dataframe-agnostic functions
Data Creation
new_series(name, values, dtype): Create a new Seriesfrom_dict(data): Create DataFrame from dictionaryfrom_dicts(data): Create DataFrame from sequence of dictionaries
File I/O
Eager Loading:
read_csv(source, **kwargs): Read CSV file into DataFrameread_parquet(source, **kwargs): Read Parquet file into DataFrame
Lazy Loading:
scan_csv(source, **kwargs): Lazily scan CSV filescan_parquet(source, **kwargs): Lazily scan Parquet file
Aggregation Functions
sum(),mean(),min(),max(),median()sum_horizontal(),mean_horizontal(), etc.
Expression Creation
col(name): Reference column by namelit(value): Create literal expressionwhen(condition): Create conditional expressionformat(template, *args): Format expression as string
Utilities
generate_temporary_column_name(): Generate unique column namesget_native_namespace(obj): Get the native library of an objectshow_versions(): Print debugging information
DataFrame Methods
Properties
columns: List of column namesschema: Ordered mapping of column names to dtypesshape: Tuple of (rows, columns)implementation: Name of native implementation
Column Operations
select(*exprs): Select columns using expressionswith_columns(*exprs): Add or modify columnsdrop(*columns): Remove specified columnsrename(mapping): Rename columns
Row Operations
filter(predicate): Filter rows based on conditionshead(n): Get first n rowstail(n): Get last n rowssample(n): Randomly sample n rowsdrop_nulls(): Drop rows with null valuesunique(): Remove duplicate rows
Inspection
is_empty(): Check if DataFrame has no rowsis_duplicated(): Identify duplicated rowsis_unique(): Identify unique rowsnull_count(): Count null values per columnestimated_size(): Estimate memory usage
Transformations
sort(*by): Sort by one or more columnsgroup_by(*by): Group by columns for aggregationjoin(other, on, how): Perform SQL-style joinspivot(on, index, values): Create pivot tableexplode(*columns): Expand list columns to long formatlazy(): Convert to LazyFrame
Export
to_native(): Convert to original library typeto_numpy(): Convert to NumPy arrayto_pandas(): Convert to pandas DataFrameto_polars(): Convert to Polars DataFrameclone(): Create a copy
LazyFrame Methods
LazyFrame provides the same API as DataFrame but with lazy evaluation:
Key Differences
- Operations build an execution plan without computing
collect(): Materialize the LazyFrame into a DataFramecollect_schema(): Get schema without collecting datasink_parquet(path): Write results directly to Parquet
Common Methods
All DataFrame methods are available on LazyFrame:
select(),filter(),with_columns(),drop()group_by(),join(),sort(),unique()head(),tail(),top_k()gather_every(): Select rows at regular intervalsunpivot(): Convert from wide to long formatwith_row_index(): Add row index columnpipe(): Apply function to LazyFrame
Expression (Expr) API
Expressions are the building blocks for column operations.
Creation
nw.col("column_name") # Reference column
nw.lit(42) # Literal value
Filtering
filter(predicate): Filter elementsis_in(values): Check membershipis_between(lower, upper): Check rangedrop_nulls(): Remove nulls
Aggregations
count(): Count non-null elementsnull_count(): Count null valuesn_unique(): Count unique valuessum(),mean(),median(): Statistical aggregationsmin(),max(): Extremesstd(),var(): Spread measuresquantile(q): Quantile values
Transformations
Mathematical:
abs(): Absolute valueround(),floor(),ceil(): Roundingsqrt(),log(),exp(): Mathematical functions
Type/Value Operations:
cast(dtype): Change data typefill_null(value): Replace null valuesreplace_strict(old, new): Replace specific values
Window Operations:
rolling_mean(window_size): Moving averagerolling_sum(window_size): Moving sumrolling_std(window_size): Moving standard deviationshift(n): Shift values by n positionsover(*by): Compute expression over groups
Ranking/Uniqueness:
rank(): Assign ranksunique(): Get unique valuesis_duplicated(): Identify duplicatesis_first_distinct(): Mark first distinct occurrences
Namespace Methods
Expressions have specialized namespaces for specific data types:
String Operations (Expr.str)
- String manipulation methods
DateTime Operations (Expr.dt)
- Date/time manipulation methods
List Operations (Expr.list)
- List column operations
Categorical Operations (Expr.cat)
- Categorical data methods
Struct Operations (Expr.struct)
- Struct/nested data methods
Name Operations (Expr.name)
- Column name operations
Series API
Series represents a single column:
Properties
- Same as DataFrame:
shape,dtype,name
Methods
- Similar to DataFrame but for single column operations
- Has specialized namespaces:
str,dt,list,cat,struct
Type Hints
Full docs: narwhals.typing
TLDR:
DataFrameT module-attribute
DataFrameT = TypeVar('DataFrameT', bound='DataFrame[Any]') TypeVar bound to Narwhals DataFrame.
Use this if your function can accept a Narwhals DataFrame and returns a Narwhals DataFrame backed by the same backend.
Examples:
>>> import narwhals as nw
>>> from narwhals.typing import DataFrameT
>>> @nw.narwhalify
>>> def func(df: DataFrameT) -> DataFrameT:
... return df.with_columns(c=df["a"] + 1)
Frame module-attribute
Frame: TypeAlias = Union["DataFrame[Any]", "LazyFrame[Any]"] Narwhals DataFrame or Narwhals LazyFrame.
Use this if your function can work with either and your function doesn't care about its backend.
Examples:
>>> import narwhals as nw
>>> from narwhals.typing import Frame
>>> @nw.narwhalify
... def agnostic_columns(df: Frame) -> list[str]:
... return df.columns
FrameT module-attribute
FrameT = TypeVar( "FrameT", "DataFrame[Any]", "LazyFrame[Any]" ) TypeVar bound to Narwhals DataFrame or Narwhals LazyFrame.
Use this if your function accepts either nw.DataFrame or nw.LazyFrame and returns an object of the same kind.
Examples:
>>> import narwhals as nw
>>> from narwhals.typing import FrameT
>>> @nw.narwhalify
... def agnostic_func(df: FrameT) -> FrameT:
... return df.with_columns(c=nw.col("a") + 1)
IntoDataFrame module-attribute
IntoDataFrame: TypeAlias = NativeDataFrame Anything which can be converted to a Narwhals DataFrame.
Use this if your function accepts a narwhalifiable object but doesn't care about its backend.
Examples:
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> def agnostic_shape(df_native: IntoDataFrame) -> tuple[int, int]:
... df = nw.from_native(df_native, eager_only=True)
... return df.shape
IntoDataFrameT module-attribute
IntoDataFrameT = TypeVar( "IntoDataFrameT", bound=IntoDataFrame ) TypeVar bound to object convertible to Narwhals DataFrame.
Use this if your function accepts an object which can be converted to nw.DataFrame and returns an object of the same class.
Examples:
>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> def agnostic_func(df_native: IntoDataFrameT) -> IntoDataFrameT:
... df = nw.from_native(df_native, eager_only=True)
... return df.with_columns(c=df["a"] + 1).to_native()
Common Patterns
Group By and Aggregate
result = df.group_by("category").agg(
count=nw.col("id").count(),
total=nw.col("amount").sum(),
average=nw.col("amount").mean(),
)
Conditional Operations
result = df.with_columns(
category=nw.when(nw.col("value") > 100)
.then(nw.lit("high"))
.otherwise(nw.lit("low"))
)
Joins
result = df1.join(
df2,
on="key_column",
how="left", # inner, left, outer, cross
)
Chain Operations
result = (
df.filter(nw.col("status") == "active")
.select("user_id", "amount")
.group_by("user_id")
.agg(total=nw.col("amount").sum())
.sort("total", descending=True)
.head(10)
)
Best Practices
Use
@nw.narwhalifyfor library functions: Simplifies API and handles conversion automaticallyPrefer expressions over method chaining: More flexible and composable
# Good df.select(nw.col("a").sum(), nw.col("b").mean()) # Also fine, but less composable df.select("a", "b")Use lazy evaluation when possible: Better performance for complex pipelines
result = df.lazy().select(...).filter(...).collect()Always convert back to native: Remember to call
.to_native()when returning from library functions (unless using@narwhalify)Type hint your functions: Use
IntoDataFrameandFrameTfor better IDE supportCheck supported backends: Some operations may not be available on all backends
Important Constraints
Zero Dependencies: Narwhals has no dependencies, keeping it lightweight
Polars API Subset: Uses Polars-style API but may not support all Polars features
Backend Limitations: Some backends (lazy-only) have restricted functionality
Tips
Checking whether a Narwhals frame is a Polars frame:
import polars as pl
import narwhals as nw
df_native = pl.DataFrame({"a": [1, 2, 3]})
df = nw.from_native(df_native)
df.implementation.is_polars()
Asserting series equality:
import pandas as pd
import narwhals as nw
from narwhals.testing import assert_series_equal
s1 = nw.from_native(pd.Series([1, 2, 3]), series_only=True)
s2 = nw.from_native(pd.Series([1, 5, 3]), series_only=True)
assert_series_equal(s1, s2)
Traceback (most recent call last):
...
AssertionError: Series are different (exact value mismatch)
[left]:
┌───────────────┐
|Narwhals Series|
|---------------|
| 0 1 |
| 1 2 |
| 2 3 |
| dtype: int64 |
└───────────────┘
[right]:
┌───────────────┐
|Narwhals Series|
|---------------|
| 0 1 |
| 1 5 |
| 2 3 |
| dtype: int64 |
└───────────────┘