name	narwhals
description	Effectively use Narwhals to write dataframe-agnostic code that works seamlessly across multiple Python dataframe libraries. Write correct type annotations for code using Narwhals.

Narwhals - DataFrame Agnostic API

Narwhals is a lightweight, zero-dependency compatibility layer for dataframe libraries in Python that provides a unified interface across different backends.

Docs: https://narwhals-dev.github.io/narwhals/

What is Narwhals?

Narwhals enables writing dataframe-agnostic code that works seamlessly across multiple Python dataframe libraries:

Full API Support:

cuDF
Modin
pandas
Polars
PyArrow

Lazy-Only Support:

Dask
DuckDB
Ibis
PySpark
SQLFrame

Core Philosophy

Why Narwhals?

Resolves subtle differences between libraries (e.g., pandas checking index vs Polars checking values)
Provides unified, simple, and predictable API
Handles backwards compatibility internally
Tests against nightly builds of supported libraries
Maintains negligible performance overhead
Full static typing support
Zero dependencies

Target Use Case: Anyone building libraries, applications, or services that consume dataframes and need complete backend independence.

Key Features

Backend Agnostic: Write once, run on any supported dataframe library
Polars-Like API: Uses a subset of the Polars API for consistency
Lazy & Eager Execution: Separate APIs for both execution modes
Expression Support: Full expression API for complex operations
Type Safety: Perfect static typing support
100% Branch Coverage: Thoroughly tested

Basic Usage Pattern

Three-Step Workflow

import narwhals as nw

# 1. Convert to Narwhals
df_nw = nw.from_native(df)  # Works with pandas, Polars, PyArrow, etc.

# 2. Perform operations using Polars-like API
result = df_nw.select(
    a_sum=nw.col("a").sum(), a_mean=nw.col("a").mean(), b_std=nw.col("b").std()
)

# 3. Convert back to original library
result_native = result.to_native()

Using the @narwhalify Decorator

Simplifies function definitions for automatic conversion:

@nw.narwhalify
def my_func(df: IntoDataFrameT):
    return df.select(nw.col("a").sum(), nw.col("b").mean()).filter(nw.col("a") > 0)


# Automatically handles conversion to/from Narwhals
result = my_func(pandas_df)  # Works!
result = my_func(polars_df)  # Also works!

Top-Level Functions

Conversion Functions

from_native(df, ...): Convert native DataFrame/Series to Narwhals object
- Parameters: pass_through, backend, eager_only, allow_series
to_native(nw_obj): Convert Narwhals object back to native library type
narwhalify(): Decorator for automatic dataframe-agnostic functions

Data Creation

new_series(name, values, dtype): Create a new Series
from_dict(data): Create DataFrame from dictionary
from_dicts(data): Create DataFrame from sequence of dictionaries

File I/O

Eager Loading:

read_csv(source, **kwargs): Read CSV file into DataFrame
read_parquet(source, **kwargs): Read Parquet file into DataFrame

Lazy Loading:

scan_csv(source, **kwargs): Lazily scan CSV file
scan_parquet(source, **kwargs): Lazily scan Parquet file

Aggregation Functions

sum(), mean(), min(), max(), median()
sum_horizontal(), mean_horizontal(), etc.

Expression Creation

col(name): Reference column by name
lit(value): Create literal expression
when(condition): Create conditional expression
format(template, *args): Format expression as string

Utilities

generate_temporary_column_name(): Generate unique column names
get_native_namespace(obj): Get the native library of an object
show_versions(): Print debugging information

DataFrame Methods

Properties

columns: List of column names
schema: Ordered mapping of column names to dtypes
shape: Tuple of (rows, columns)
implementation: Name of native implementation

Column Operations

select(*exprs): Select columns using expressions
with_columns(*exprs): Add or modify columns
drop(*columns): Remove specified columns
rename(mapping): Rename columns

Row Operations

filter(predicate): Filter rows based on conditions
head(n): Get first n rows
tail(n): Get last n rows
sample(n): Randomly sample n rows
drop_nulls(): Drop rows with null values
unique(): Remove duplicate rows

Inspection

is_empty(): Check if DataFrame has no rows
is_duplicated(): Identify duplicated rows
is_unique(): Identify unique rows
null_count(): Count null values per column
estimated_size(): Estimate memory usage

Transformations

sort(*by): Sort by one or more columns
group_by(*by): Group by columns for aggregation
join(other, on, how): Perform SQL-style joins
pivot(on, index, values): Create pivot table
explode(*columns): Expand list columns to long format
lazy(): Convert to LazyFrame

Export

to_native(): Convert to original library type
to_numpy(): Convert to NumPy array
to_pandas(): Convert to pandas DataFrame
to_polars(): Convert to Polars DataFrame
clone(): Create a copy

LazyFrame Methods

LazyFrame provides the same API as DataFrame but with lazy evaluation:

Key Differences

Operations build an execution plan without computing
collect(): Materialize the LazyFrame into a DataFrame
collect_schema(): Get schema without collecting data
sink_parquet(path): Write results directly to Parquet

Common Methods

All DataFrame methods are available on LazyFrame:

select(), filter(), with_columns(), drop()
group_by(), join(), sort(), unique()
head(), tail(), top_k()
gather_every(): Select rows at regular intervals
unpivot(): Convert from wide to long format
with_row_index(): Add row index column
pipe(): Apply function to LazyFrame

Expression (Expr) API

Expressions are the building blocks for column operations.

Creation

nw.col("column_name")  # Reference column
nw.lit(42)  # Literal value

Filtering

filter(predicate): Filter elements
is_in(values): Check membership
is_between(lower, upper): Check range
drop_nulls(): Remove nulls

Aggregations

count(): Count non-null elements
null_count(): Count null values
n_unique(): Count unique values
sum(), mean(), median(): Statistical aggregations
min(), max(): Extremes
std(), var(): Spread measures
quantile(q): Quantile values

Transformations

Mathematical:

abs(): Absolute value
round(), floor(), ceil(): Rounding
sqrt(), log(), exp(): Mathematical functions

Type/Value Operations:

cast(dtype): Change data type
fill_null(value): Replace null values
replace_strict(old, new): Replace specific values

Window Operations:

rolling_mean(window_size): Moving average
rolling_sum(window_size): Moving sum
rolling_std(window_size): Moving standard deviation
shift(n): Shift values by n positions
over(*by): Compute expression over groups

Ranking/Uniqueness:

rank(): Assign ranks
unique(): Get unique values
is_duplicated(): Identify duplicates
is_first_distinct(): Mark first distinct occurrences

Namespace Methods

Expressions have specialized namespaces for specific data types:

String Operations (Expr.str)

String manipulation methods

DateTime Operations (Expr.dt)

Date/time manipulation methods

List Operations (Expr.list)

List column operations

Categorical Operations (Expr.cat)

Categorical data methods

Struct Operations (Expr.struct)

Struct/nested data methods

Name Operations (Expr.name)

Column name operations

Series API

Series represents a single column:

Properties

Same as DataFrame: shape, dtype, name

Methods

Similar to DataFrame but for single column operations
Has specialized namespaces: str, dt, list, cat, struct

Type Hints

Full docs: narwhals.typing

TLDR:

DataFrameT module-attribute

DataFrameT = TypeVar('DataFrameT', bound='DataFrame[Any]') TypeVar bound to Narwhals DataFrame.

Use this if your function can accept a Narwhals DataFrame and returns a Narwhals DataFrame backed by the same backend.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import DataFrameT
>>> @nw.narwhalify
>>> def func(df: DataFrameT) -> DataFrameT:
...     return df.with_columns(c=df["a"] + 1)
Frame module-attribute

Frame: TypeAlias = Union["DataFrame[Any]", "LazyFrame[Any]"] Narwhals DataFrame or Narwhals LazyFrame.

Use this if your function can work with either and your function doesn't care about its backend.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import Frame
>>> @nw.narwhalify
... def agnostic_columns(df: Frame) -> list[str]:
...     return df.columns
FrameT module-attribute

FrameT = TypeVar( "FrameT", "DataFrame[Any]", "LazyFrame[Any]" ) TypeVar bound to Narwhals DataFrame or Narwhals LazyFrame.

Use this if your function accepts either nw.DataFrame or nw.LazyFrame and returns an object of the same kind.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import FrameT
>>> @nw.narwhalify
... def agnostic_func(df: FrameT) -> FrameT:
...     return df.with_columns(c=nw.col("a") + 1)

IntoDataFrame module-attribute

IntoDataFrame: TypeAlias = NativeDataFrame Anything which can be converted to a Narwhals DataFrame.

Use this if your function accepts a narwhalifiable object but doesn't care about its backend.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrame
>>> def agnostic_shape(df_native: IntoDataFrame) -> tuple[int, int]:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.shape

IntoDataFrameT module-attribute

IntoDataFrameT = TypeVar( "IntoDataFrameT", bound=IntoDataFrame ) TypeVar bound to object convertible to Narwhals DataFrame.

Use this if your function accepts an object which can be converted to nw.DataFrame and returns an object of the same class.

Examples:

>>> import narwhals as nw
>>> from narwhals.typing import IntoDataFrameT
>>> def agnostic_func(df_native: IntoDataFrameT) -> IntoDataFrameT:
...     df = nw.from_native(df_native, eager_only=True)
...     return df.with_columns(c=df["a"] + 1).to_native()

Common Patterns

Group By and Aggregate

result = df.group_by("category").agg(
    count=nw.col("id").count(),
    total=nw.col("amount").sum(),
    average=nw.col("amount").mean(),
)

Conditional Operations

result = df.with_columns(
    category=nw.when(nw.col("value") > 100)
    .then(nw.lit("high"))
    .otherwise(nw.lit("low"))
)

Joins

result = df1.join(
    df2,
    on="key_column",
    how="left",  # inner, left, outer, cross
)

Chain Operations

result = (
    df.filter(nw.col("status") == "active")
    .select("user_id", "amount")
    .group_by("user_id")
    .agg(total=nw.col("amount").sum())
    .sort("total", descending=True)
    .head(10)
)

Best Practices

Use @nw.narwhalify for library functions: Simplifies API and handles conversion automatically

Prefer expressions over method chaining: More flexible and composable

# Good
df.select(nw.col("a").sum(), nw.col("b").mean())

# Also fine, but less composable
df.select("a", "b")

Use lazy evaluation when possible: Better performance for complex pipelines
```
result = df.lazy().select(...).filter(...).collect()
```
Always convert back to native: Remember to call .to_native() when returning from library functions (unless using @narwhalify)
Type hint your functions: Use IntoDataFrame and FrameT for better IDE support
Check supported backends: Some operations may not be available on all backends

Important Constraints

Zero Dependencies: Narwhals has no dependencies, keeping it lightweight
Polars API Subset: Uses Polars-style API but may not support all Polars features
Backend Limitations: Some backends (lazy-only) have restricted functionality

Tips

Checking whether a Narwhals frame is a Polars frame:

import polars as pl
import narwhals as nw

df_native = pl.DataFrame({"a": [1, 2, 3]})
df = nw.from_native(df_native)
df.implementation.is_polars()

Asserting series equality:

import pandas as pd
import narwhals as nw
from narwhals.testing import assert_series_equal
s1 = nw.from_native(pd.Series([1, 2, 3]), series_only=True)
s2 = nw.from_native(pd.Series([1, 5, 3]), series_only=True)
assert_series_equal(s1, s2)
Traceback (most recent call last):
...
AssertionError: Series are different (exact value mismatch)
[left]:
┌───────────────┐
|Narwhals Series|
|---------------|
| 0    1        |
| 1    2        |
| 2    3        |
| dtype: int64  |
└───────────────┘
[right]:
┌───────────────┐
|Narwhals Series|
|---------------|
| 0    1        |
| 1    5        |
| 2    3        |
| dtype: int64  |
└───────────────┘

Install Skill

SKILL.md

Narwhals - DataFrame Agnostic API

What is Narwhals?

Core Philosophy

Key Features

Basic Usage Pattern

Three-Step Workflow

Using the @narwhalify Decorator

Top-Level Functions

Conversion Functions

Data Creation

File I/O

Aggregation Functions

Expression Creation

Utilities

DataFrame Methods

Properties

Column Operations

Row Operations

Inspection

Transformations

Export

LazyFrame Methods

Key Differences

Common Methods

Expression (Expr) API

Creation

Filtering

Aggregations

Transformations

Namespace Methods

Series API

Properties

Methods

Type Hints

Common Patterns

Group By and Aggregate

Conditional Operations

Joins

Chain Operations

Best Practices

Important Constraints

Tips