# Import Polars (convention: use 'pl' alias)
import polars as pl
# Confirmation
print("✅ Polars loaded successfully!")
print(f"📦 Polars version: {pl.__version__}")✅ Polars loaded successfully!
📦 Polars version: 1.37.1
A Beginner’s Guide to Row Filtering with Python and Polars
Alierwai Reng
February 8, 2026
Learn modern data filtering techniques with Polars! This beginner-friendly tutorial covers row filtering, multiple conditions, and lazy evaluation with clear explanations and hands-on exercises.
polars 1.37.1, Python 3.14.0
Welcome to this hands-on Polars filtering tutorial! This guide showcases Polars—a blazingly fast DataFrame library for Python—and introduces essential filtering functions including filter(), is_in(), slice(), and lazy evaluation with .lazy() and .collect().
By the end of this guide, you’ll understand how to:
.slice() and .head().is_in()We’ll work through practical, reproducible examples using a small dataset. The techniques you learn are fully transferable to any dataset—from customer data to scientific measurements to business analytics.
Every Python analysis starts by importing the libraries we need. For this tutorial, we only need Polars!
✅ Polars loaded successfully!
📦 Polars version: 1.37.1
If you don’t have Polars installed, run this once in your terminal:
After installation, you only need to import it in each new Python session.
Version note: This tutorial uses Polars 1.37.1. Polars is actively developed, so some features may evolve in future versions.
Before filtering data, we need data to work with! We’ll create a small, reproducible DataFrame.
# Create a DataFrame from a dictionary
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5, 6], # Unique identifier
"int_col": [5, 12, 8, 20, 15, 3], # Numeric values
"str_col": ["yes", "no", "yes", "yes", "no", "yes"], # Text categories
"group": ["A", "A", "B", "B", "A", "B"] # Grouping variable
})
# Display the result
print(df)shape: (6, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 1 ┆ 5 ┆ yes ┆ A │
│ 2 ┆ 12 ┆ no ┆ A │
│ 3 ┆ 8 ┆ yes ┆ B │
│ 4 ┆ 20 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
│ 6 ┆ 3 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
Schema({'id': Int64, 'int_col': Int64, 'str_col': String, 'group': String})
shape: (4, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 1 ┆ 5 ┆ yes ┆ A │
│ 3 ┆ 8 ┆ yes ┆ B │
│ 4 ┆ 20 ┆ yes ┆ B │
│ 6 ┆ 3 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
shape: (3, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 2 ┆ 12 ┆ no ┆ A │
│ 4 ┆ 20 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
└─────┴─────────┴─────────┴───────┘
shape: (5, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 1 ┆ 5 ┆ yes ┆ A │
│ 2 ┆ 12 ┆ no ┆ A │
│ 3 ┆ 8 ┆ yes ┆ B │
│ 4 ┆ 20 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
└─────┴─────────┴─────────┴───────┘
Use .slice(offset, length) for position-based selection, or .head(n) for the first n rows.
.head() and .tail()First 3 rows:
shape: (3, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 1 ┆ 5 ┆ yes ┆ A │
│ 2 ┆ 12 ┆ no ┆ A │
│ 3 ┆ 8 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
Last 3 rows:
shape: (3, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 4 ┆ 20 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
│ 6 ┆ 3 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
Use .is_in() to filter for membership in a list:
shape: (2, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 3 ┆ 8 ┆ yes ┆ B │
│ 4 ┆ 20 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
Real-world filtering often requires multiple conditions combined with AND or OR logic.
shape: (1, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 4 ┆ 20 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
When combining conditions, each condition must be wrapped in parentheses.
Wrong:
Correct:
This is due to Python’s operator precedence—parentheses ensure the comparison happens before the logical operation!
shape: (4, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 1 ┆ 5 ┆ yes ┆ A │
│ 2 ┆ 12 ┆ no ┆ A │
│ 4 ┆ 20 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
└─────┴─────────┴─────────┴───────┘
shape: (1, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 4 ┆ 20 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
Chaining filters is more readable for complex logic; Polars optimizes both chained and combined conditions equally.
shape: (4, 2)
┌─────┬─────────┐
│ id ┆ int_col │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════════╡
│ 1 ┆ 5 │
│ 3 ┆ 8 │
│ 4 ┆ 20 │
│ 6 ┆ 3 │
└─────┴─────────┘
Polars lets you use Python variables directly in filter expressions—no special syntax needed:
shape: (2, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 3 ┆ 8 ┆ yes ┆ B │
│ 4 ┆ 20 ┆ yes ┆ B │
└─────┴─────────┴─────────┴───────┘
shape: (3, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 2 ┆ 12 ┆ no ┆ A │
│ 4 ┆ 20 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
└─────┴─────────┴─────────┴───────┘
Lazy evaluation lets Polars plan and optimize your entire query before execution:
shape: (1, 2)
┌─────┬─────────┐
│ id ┆ int_col │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════════╡
│ 4 ┆ 20 │
└─────┴─────────┘
Use lazy evaluation for complex queries, large datasets, and production pipelines. Call .collect() to execute the optimized plan.
Use .explain() to see how Polars optimizes your lazy query without executing it:
simple π 2/3 ["id", "int_col"]
FILTER [([(col("group")) == (String(B))]) & ([(col("int_col")) > (5)])] FROM
DF ["id", "int_col", "str_col", "group"]; PROJECT["id", "int_col", "group"] 3/4 COLUMNS
.is_between()shape: (3, 4)
┌─────┬─────────┬─────────┬───────┐
│ id ┆ int_col ┆ str_col ┆ group │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞═════╪═════════╪═════════╪═══════╡
│ 2 ┆ 12 ┆ no ┆ A │
│ 3 ┆ 8 ┆ yes ┆ B │
│ 5 ┆ 15 ┆ no ┆ A │
└─────┴─────────┴─────────┴───────┘
.is_between() is inclusive by default; use closed="none" for exclusive bounds.
# Create DataFrame with some null values
df_with_nulls = pl.DataFrame({
"id": [1, 2, 3, 4],
"value": [10, None, 30, None]
})
print("Original data:")
print(df_with_nulls)
# Filter: Keep only non-null values
print("\nNon-null rows:")
print(df_with_nulls.filter(pl.col("value").is_not_null()))
# Filter: Keep only null values
print("\nNull rows:")
print(df_with_nulls.filter(pl.col("value").is_null()))Original data:
shape: (4, 2)
┌─────┬───────┐
│ id ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════╡
│ 1 ┆ 10 │
│ 2 ┆ null │
│ 3 ┆ 30 │
│ 4 ┆ null │
└─────┴───────┘
Non-null rows:
shape: (2, 2)
┌─────┬───────┐
│ id ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════╡
│ 1 ┆ 10 │
│ 3 ┆ 30 │
└─────┴───────┘
Null rows:
shape: (2, 2)
┌─────┬───────┐
│ id ┆ value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═══════╡
│ 2 ┆ null │
│ 4 ┆ null │
└─────┴───────┘
Reinforce your learning with hands-on practice using a separate dataset.
Learning deepens when you explain your thinking and learn from others!
shape: (6, 4)
┌────────────┬───────┬────────┬────────┐
│ student_id ┆ score ┆ passed ┆ cohort │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞════════════╪═══════╪════════╪════════╡
│ 101 ┆ 55 ┆ no ┆ A │
│ 102 ┆ 78 ┆ yes ┆ A │
│ 103 ┆ 62 ┆ no ┆ B │
│ 104 ┆ 91 ┆ yes ┆ B │
│ 105 ┆ 84 ┆ yes ┆ A │
│ 106 ┆ 47 ┆ no ┆ B │
└────────────┴───────┴────────┴────────┘
Goal: Keep only rows where passed == "yes".
Expected result (student_id): [102, 104, 105]
Alternative (with column selection):
Goal: Get the first 4 rows using position-based selection.
Expected result (student_id): [101, 102, 103, 104]
Goal: Keep rows where score is either 62 or 91.
Expected result (student_id): [103, 104]
Goal: Find students who passed AND scored above 80.
Expected result (student_id): [104, 105]
Goal: Store scores [62, 91] in a variable, then filter using .is_in().
Expected result (student_id): [103, 104]
Goal: Get only student_id and score for students who passed.
Expected result:
shape: (3, 2)
┌────────────┬───────┐
│ student_id ┆ score │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════════╪═══════╡
│ 102 ┆ 78 │
│ 104 ┆ 91 │
│ 105 ┆ 84 │
└────────────┴───────┘
Goal: Find students in cohort B who passed.
# Method 1: Single filter with &
exercise_df.filter(
(pl.col("cohort") == "B") & (pl.col("passed") == "yes")
)
# Method 2: Chained filters (more readable)
exercise_df.filter(
pl.col("cohort") == "B"
).filter(
pl.col("passed") == "yes"
)
# Method 3: With lazy evaluation
result = (
exercise_df.lazy()
.filter(pl.col("cohort") == "B")
.filter(pl.col("passed") == "yes")
.collect()
)
print(result)Expected output:
shape: (1, 4)
┌────────────┬───────┬────────┬────────┐
│ student_id ┆ score ┆ passed ┆ cohort │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞════════════╪═══════╪════════╪════════╡
│ 104 ┆ 91 ┆ yes ┆ B │
└────────────┴───────┴────────┴────────┘
Goal: Use lazy evaluation to find students who: - Scored between 60 and 85 (inclusive) - Are in cohort A or B - Return only student_id and score
Expected output:
shape: (3, 2)
┌────────────┬───────┐
│ student_id ┆ score │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════════╪═══════╡
│ 102 ┆ 78 │
│ 103 ┆ 62 │
│ 105 ┆ 84 │
└────────────┴───────┘
Congratulations! You’ve completed a comprehensive introduction to data filtering with Polars in Python.
Core Filtering Skills: ✅ Filtering rows with .filter() and pl.col() ✅ Position-based selection with .slice(), .head(), and .tail() ✅ Filtering multiple values with .is_in() ✅ Combining conditions using & (AND) and | (OR)
Advanced Techniques: ✅ Method chaining for readable code ✅ Using Python variables in filters ✅ Lazy evaluation with .lazy() and .collect() ✅ Range filtering with .is_between() ✅ Null value handling with .is_null() and .is_not_null()
Best Practices: ✅ Using pl.col() for column expressions ✅ Wrapping conditions in parentheses when combining ✅ Choosing between chained filters vs. combined conditions ✅ Understanding when lazy evaluation provides benefits
Beginner: 1. Practice filtering with your own datasets 2. Experiment with different comparison operators (>, <, !=) 3. Try combining 3 or more conditions
Intermediate: 4. Learn aggregations with .group_by() and .agg() 5. Explore window functions using .over() 6. Study joins with .join()
Advanced: 7. Master lazy evaluation for large datasets 8. Explore the .str namespace for text operations 9. Learn the .dt namespace for date/time operations 10. Contribute to the Polars community!
Resources: - Official Polars Documentation — Comprehensive guides and API reference - Polars User Guide — In-depth tutorials and concepts - Modern Polars — Community-driven cookbook - Polars GitHub — Source code and issue tracking
Alier Reng is a Data Scientist, Educator, and Founder of PyStatR+, a platform advancing open and practical data science education. His work blends analytics, philosophy, and storytelling to make complex ideas human and empowering. Knowledge is freedom. Data is truth's language — ethics and transparency, its grammar.
This tutorial reflects a deliberate editorial balance between approachability and technical depth. While Polars offers many advanced features (including streaming mode, custom expressions, and plugin systems), this guide emphasizes the core filtering operations that analysts encounter daily.
The decision to introduce lazy evaluation in a beginner tutorial deserves explanation: although lazy mode is technically “advanced,” understanding its existence early helps learners grasp Polars’ performance advantages and builds good habits. We present lazy evaluation with clear examples and explanations, making it accessible rather than intimidating.
By introducing pl.col() expressions from the start, learners develop an intuition for Polars’ expression API—the foundation for all advanced operations. This approach aligns with the PyStatR+ Charter by emphasizing clarity, honesty, and accessibility without unnecessary complexity.
This lesson is part of the broader PyStatR+ Learning Platform, developed with gratitude to mentors, learners, and the open-source community that continually advances the Python data science ecosystem. Special thanks to the Polars development team for creating a library that combines performance with elegance, making data analysis faster and more enjoyable for everyone.
PyStatR+ — Learning Simplified. Communication Amplified. 🚀
We use cookies to ensure basic functionality and improve your experience. Learn more
---
title: "Data Filtering in Polars: A Modern Approach to DataFrame Operations"
subtitle: "A Beginner's Guide to Row Filtering with Python and Polars"
author: "Alierwai Reng"
date: today
categories: [Python, Polars, Data Analysis]
tags: [python, polars, data-filtering, beginners, dataframes]
image: featured.png
description: "Learn modern data filtering techniques with Polars! This beginner-friendly tutorial covers row filtering, multiple conditions, and lazy evaluation with clear explanations and hands-on exercises."
format:
html:
code-fold: false
code-tools: true
toc: true
toc-depth: 3
execute:
warning: false
message: false
---
# Data Filtering in Polars: A Modern Approach to DataFrame Operations
## A Beginner's Guide to Row Filtering with Python and Polars
Learn modern data filtering techniques with Polars! This beginner-friendly tutorial covers row filtering, multiple conditions, and lazy evaluation with clear explanations and hands-on exercises.
::: {.callout-note}
## Tested With
polars 1.37.1, Python 3.14.0
:::
## Introduction {#sec-intro}
Welcome to this hands-on Polars filtering tutorial! This guide showcases **Polars**—a blazingly fast DataFrame library for Python—and introduces essential filtering functions including `filter()`, `is_in()`, `slice()`, and lazy evaluation with `.lazy()` and `.collect()`.
By the end of this guide, you'll understand how to:
- **Filter rows** using column expressions and boolean conditions
- **Select data** by position with `.slice()` and `.head()`
- **Apply multiple conditions** using logical operators
- **Work with lists** of values using `.is_in()`
- **Optimize queries** with lazy evaluation
- **Chain methods** for readable, maintainable code
We'll work through practical, reproducible examples using a small dataset. The techniques you learn are fully transferable to any dataset—from customer data to scientific measurements to business analytics.
---
## Part 1: Environment Setup {#sec-setup}
### Step 1: Load Required Packages
Every Python analysis starts by importing the libraries we need. For this tutorial, we only need Polars!
```{python}
#| label: load-packages
#| code-summary: "Import Polars library"
# Import Polars (convention: use 'pl' alias)
import polars as pl
# Confirmation
print("✅ Polars loaded successfully!")
print(f"📦 Polars version: {pl.__version__}")
```
::: {.callout-note}
## Package Installation
If you don't have Polars installed, run this **once** in your terminal:
```bash
uv add "polars>=1.37.1"
pip install polars>=1.37.1
```
After installation, you only need to import it in each new Python session.
**Version note:** This tutorial uses Polars 1.37.1. Polars is actively developed, so some features may evolve in future versions.
:::
---
## Part 2: Creating Sample Data {#sec-data}
### Step 2: Create an Example DataFrame
Before filtering data, we need data to work with! We'll create a small, reproducible DataFrame.
```{python}
#| label: create-data
#| code-summary: "Create example DataFrame"
# Create a DataFrame from a dictionary
df = pl.DataFrame({
"id": [1, 2, 3, 4, 5, 6], # Unique identifier
"int_col": [5, 12, 8, 20, 15, 3], # Numeric values
"str_col": ["yes", "no", "yes", "yes", "no", "yes"], # Text categories
"group": ["A", "A", "B", "B", "A", "B"] # Grouping variable
})
# Display the result
print(df)
```
### Step 3: Examine Data Types
```{python}
#| label: examine-schema
#| code-summary: "View column data types"
# Display schema (column names and types)
print(df.schema)
```
---
## Part 3: Basic Row Filtering {#sec-filter-basic}
### Step 4: Filter with a Single Condition
```{python}
#| label: filter-single
#| code-summary: "Filter rows where str_col equals 'yes'"
result = df.filter(pl.col("str_col") == "yes")
print(result)
```
### Step 5: Filter with Numeric Comparisons
```{python}
#| label: filter-numeric
#| code-summary: "Filter rows where int_col is greater than 10"
result = df.filter(pl.col("int_col") > 10)
print(result)
```
---
## Part 4: Position-Based Selection {#sec-position}
### Step 6: Select Rows by Position
```{python}
#| label: slice-rows
#| code-summary: "Select first 5 rows using .slice()"
result = df.slice(0, 5)
print(result)
```
Use `.slice(offset, length)` for position-based selection, or `.head(n)` for the first n rows.
### Step 7: Using `.head()` and `.tail()`
```{python}
#| label: head-tail
#| code-summary: "Convenience methods for first/last rows"
# First 3 rows
print("First 3 rows:")
print(df.head(3))
print("\nLast 3 rows:")
print(df.tail(3))
```
---
## Part 5: Multiple Values and Conditions {#sec-multiple}
### Step 8: Filter Using Multiple Values
Use `.is_in()` to filter for membership in a list:
```{python}
#| label: filter-isin
#| code-summary: "Filter using .is_in() for multiple values"
result = df.filter(pl.col("int_col").is_in([8, 20]))
print(result)
```
### Step 9: Combine Multiple Conditions
Real-world filtering often requires multiple conditions combined with AND or OR logic.
```{python}
#| label: filter-multiple
#| code-summary: "Filter with two conditions using &"
# Filter: int_col > 10 AND str_col == "yes"
result = df.filter(
(pl.col("int_col") > 10) & (pl.col("str_col") == "yes")
)
print(result)
```
::: {.callout-warning}
## Common Pitfall: Forgetting Parentheses
When combining conditions, **each condition must be wrapped in parentheses**.
**Wrong:**
```python
# This will cause an error!
df.filter(pl.col("int_col") > 10 & pl.col("str_col") == "yes")
```
**Correct:**
```python
# Wrap each condition in ()
df.filter((pl.col("int_col") > 10) & (pl.col("str_col") == "yes"))
```
This is due to Python's operator precedence—parentheses ensure the comparison happens before the logical operation!
:::
### Step 10: OR Logic
```{python}
#| label: filter-or
#| code-summary: "Filter with OR condition using |"
# Filter: group == "A" OR int_col > 15
result = df.filter(
(pl.col("group") == "A") | (pl.col("int_col") > 15)
)
print(result)
```
::: {.callout-note}
## Logical Operators Reference
- `&` — AND (both conditions must be True)
- `|` — OR (at least one condition must be True)
- `~` — NOT (inverts the condition)
**Example of NOT:**
```python
# Keep rows where str_col is NOT "yes"
df.filter(~(pl.col("str_col") == "yes"))
```
:::
---
## Part 6: Method Chaining and Readability {#sec-chaining}
### Step 11: Chain Multiple Filters
```{python}
#| label: chain-filters
#| code-summary: "Chain multiple .filter() calls"
result = (
df
.filter(pl.col("str_col") == "yes")
.filter(pl.col("int_col") > 10)
)
print(result)
```
Chaining filters is more readable for complex logic; Polars optimizes both chained and combined conditions equally.
### Step 12: Combine Filtering and Selection
```{python}
#| label: filter-select
#| code-summary: "Chain .filter() and .select()"
result = (
df
.filter(pl.col("str_col") == "yes")
.select(["id", "int_col"])
)
print(result)
```
---
## Part 7: Working with Python Variables {#sec-variables}
### Step 13: Dynamic Filtering with Variables
Polars lets you use Python variables directly in filter expressions—no special syntax needed:
```{python}
#| label: filter-variables
#| code-summary: "Use Python variables in filters"
target_values = [8, 20]
min_threshold = 10
result1 = df.filter(pl.col("int_col").is_in(target_values))
result2 = df.filter(pl.col("int_col") > min_threshold)
print(result1)
print(result2)
```
---
## Part 8: Lazy Evaluation {#sec-lazy}
### Step 14: Introduction to Lazy Mode
Lazy evaluation lets Polars plan and optimize your entire query before execution:
```{python}
#| label: lazy-evaluation
#| code-summary: "Use lazy evaluation for query optimization"
lazy_result = (
df.lazy()
.filter(pl.col("int_col") > 10)
.filter(pl.col("str_col") == "yes")
.select(["id", "int_col"])
)
result = lazy_result.collect()
print(result)
```
Use lazy evaluation for complex queries, large datasets, and production pipelines. Call `.collect()` to execute the optimized plan.
### Step 15: Viewing the Query Plan
Use `.explain()` to see how Polars optimizes your lazy query without executing it:
```{python}
#| label: explain-query
#| code-summary: "View the optimized query plan"
lazy_query = (
df.lazy()
.filter(pl.col("group") == "B")
.filter(pl.col("int_col") > 5)
.select(["id", "int_col"])
)
print(lazy_query.explain())
```
---
## Part 9: Additional Filtering Methods {#sec-advanced}
### Step 16: Range Filtering with `.is_between()`
```{python}
#| label: filter-between
#| code-summary: "Filter values within a range"
result = df.filter(pl.col("int_col").is_between(8, 15))
print(result)
```
`.is_between()` is inclusive by default; use `closed="none"` for exclusive bounds.
### Step 17: Handling Null Values
```{python}
#| label: filter-nulls
#| code-summary: "Create data with nulls and filter"
# Create DataFrame with some null values
df_with_nulls = pl.DataFrame({
"id": [1, 2, 3, 4],
"value": [10, None, 30, None]
})
print("Original data:")
print(df_with_nulls)
# Filter: Keep only non-null values
print("\nNon-null rows:")
print(df_with_nulls.filter(pl.col("value").is_not_null()))
# Filter: Keep only null values
print("\nNull rows:")
print(df_with_nulls.filter(pl.col("value").is_null()))
```
---
## Student Exercises {#sec-exercises}
Reinforce your learning with hands-on practice using a separate dataset.
::: {.callout-note}
## How to Use This Section
1. **Attempt each exercise** before checking expected results
2. **Run your code** and inspect the output
3. **Compare** your results with the expected output
4. **Refine** your code for clarity and correctness
5. **Share solutions** in the **PyStatR+ Learning Community** on Facebook: **@PyStatRPlus-Learning-Community**
Learning deepens when you explain your thinking and learn from others!
:::
### Exercise DataFrame
```{python}
#| label: exercise-data
#| code-summary: "Create practice DataFrame"
exercise_df = pl.DataFrame({
"student_id": [101, 102, 103, 104, 105, 106],
"score": [55, 78, 62, 91, 84, 47],
"passed": ["no", "yes", "no", "yes", "yes", "no"],
"cohort": ["A", "A", "B", "B", "A", "B"]
})
print(exercise_df)
```
---
### Exercise 1 — Filter Students Who Passed
**Goal:** Keep only rows where `passed == "yes"`.
```python
# Your code here:
```
**Expected result (student_id):** `[102, 104, 105]`
<details>
<summary><strong>Solution</strong></summary>
```python
exercise_df.filter(pl.col("passed") == "yes")
```
**Alternative (with column selection):**
```python
exercise_df.filter(pl.col("passed") == "yes").select("student_id")
```
</details>
---
### Exercise 2 — Select the First 4 Students by Position
**Goal:** Get the first 4 rows using position-based selection.
```python
# Your code here:
```
**Expected result (student_id):** `[101, 102, 103, 104]`
<details>
<summary><strong>Solution</strong></summary>
```python
# Method 1: Using .slice()
exercise_df.slice(0, 4)
# Method 2: Using .head()
exercise_df.head(4)
```
</details>
---
### Exercise 3 — Filter Students with Specific Scores
**Goal:** Keep rows where `score` is either 62 or 91.
```python
# Your code here:
```
**Expected result (student_id):** `[103, 104]`
<details>
<summary><strong>Solution</strong></summary>
```python
exercise_df.filter(pl.col("score").is_in([62, 91]))
```
</details>
---
### Exercise 4 — Multiple Conditions with AND
**Goal:** Find students who passed AND scored above 80.
```python
# Your code here:
```
**Expected result (student_id):** `[104, 105]`
<details>
<summary><strong>Solution</strong></summary>
```python
# Method 1: Single filter with &
exercise_df.filter(
(pl.col("score") > 80) & (pl.col("passed") == "yes")
)
# Method 2: Chained filters
exercise_df.filter(
pl.col("score") > 80
).filter(
pl.col("passed") == "yes"
)
```
</details>
---
### Exercise 5 — Use Python Variables in Filters
**Goal:** Store scores `[62, 91]` in a variable, then filter using `.is_in()`.
```python
# Your code here:
```
**Expected result (student_id):** `[103, 104]`
<details>
<summary><strong>Solution</strong></summary>
```python
target_scores = [62, 91]
exercise_df.filter(pl.col("score").is_in(target_scores))
```
</details>
---
### Exercise 6 — Filter and Select Specific Columns
**Goal:** Get only `student_id` and `score` for students who passed.
```python
# Your code here:
```
**Expected result:**
```
shape: (3, 2)
┌────────────┬───────┐
│ student_id ┆ score │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════════╪═══════╡
│ 102 ┆ 78 │
│ 104 ┆ 91 │
│ 105 ┆ 84 │
└────────────┴───────┘
```
<details>
<summary><strong>Solution</strong></summary>
```python
exercise_df.filter(
pl.col("passed") == "yes"
).select(["student_id", "score"])
```
</details>
---
### Mini Challenge — Combine Cohort and Pass Status
**Goal:** Find students in cohort B who passed.
<details>
<summary><strong>Example Solutions</strong></summary>
```python
# Method 1: Single filter with &
exercise_df.filter(
(pl.col("cohort") == "B") & (pl.col("passed") == "yes")
)
# Method 2: Chained filters (more readable)
exercise_df.filter(
pl.col("cohort") == "B"
).filter(
pl.col("passed") == "yes"
)
# Method 3: With lazy evaluation
result = (
exercise_df.lazy()
.filter(pl.col("cohort") == "B")
.filter(pl.col("passed") == "yes")
.collect()
)
print(result)
```
**Expected output:**
```
shape: (1, 4)
┌────────────┬───────┬────────┬────────┐
│ student_id ┆ score ┆ passed ┆ cohort │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ str │
╞════════════╪═══════╪════════╪════════╡
│ 104 ┆ 91 ┆ yes ┆ B │
└────────────┴───────┴────────┴────────┘
```
</details>
---
### Bonus Challenge — Lazy Evaluation Practice
**Goal:** Use lazy evaluation to find students who:
- Scored between 60 and 85 (inclusive)
- Are in cohort A or B
- Return only `student_id` and `score`
<details>
<summary><strong>Solution</strong></summary>
```python
result = (
exercise_df.lazy()
.filter(pl.col("score").is_between(60, 85))
.filter(pl.col("cohort").is_in(["A", "B"]))
.select(["student_id", "score"])
.collect()
)
print(result)
```
**Expected output:**
```
shape: (3, 2)
┌────────────┬───────┐
│ student_id ┆ score │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞════════════╪═══════╡
│ 102 ┆ 78 │
│ 103 ┆ 62 │
│ 105 ┆ 84 │
└────────────┴───────┘
```
</details>
---
## Conclusion {#sec-conclusion}
Congratulations! You've completed a comprehensive introduction to data filtering with **Polars** in Python.
::: {.callout-tip icon="true"}
## What You've Learned
**Core Filtering Skills:**
✅ Filtering rows with `.filter()` and `pl.col()`
✅ Position-based selection with `.slice()`, `.head()`, and `.tail()`
✅ Filtering multiple values with `.is_in()`
✅ Combining conditions using `&` (AND) and `|` (OR)
**Advanced Techniques:**
✅ Method chaining for readable code
✅ Using Python variables in filters
✅ Lazy evaluation with `.lazy()` and `.collect()`
✅ Range filtering with `.is_between()`
✅ Null value handling with `.is_null()` and `.is_not_null()`
**Best Practices:**
✅ Using `pl.col()` for column expressions
✅ Wrapping conditions in parentheses when combining
✅ Choosing between chained filters vs. combined conditions
✅ Understanding when lazy evaluation provides benefits
:::
::: {.callout-note}
## Next Steps for Learning
**Beginner:**
1. Practice filtering with your own datasets
2. Experiment with different comparison operators (`>`, `<`, `!=`)
3. Try combining 3 or more conditions
**Intermediate:**
4. Learn **aggregations** with `.group_by()` and `.agg()`
5. Explore **window functions** using `.over()`
6. Study **joins** with `.join()`
**Advanced:**
7. Master lazy evaluation for large datasets
8. Explore the `.str` namespace for text operations
9. Learn the `.dt` namespace for date/time operations
10. Contribute to the Polars community!
**Resources:**
- [Official Polars Documentation](https://docs.pola.rs/) — Comprehensive guides and API reference
- [Polars User Guide](https://docs.pola.rs/user-guide/) — In-depth tutorials and concepts
- [Modern Polars](https://kevinheavey.github.io/modern-polars/) — Community-driven cookbook
- [Polars GitHub](https://github.com/pola-rs/polars) — Source code and issue tracking
:::
---
```{=html}
<!-- Author Card: Alier Reng -->
<hr class="author-section-divider">
<div class="author-card">
<img src="/images/blog/alier-reng-founder.png"
alt="Alier Reng"
class="author-card-photo">
<div class="author-card-info">
<h3>Alier Reng</h3>
<div class="author-card-role">Founder, Lead Educator & Creative Director at PyStatR+</div>
<p class="author-card-bio">
Alier Reng is a Data Scientist, Educator, and Founder of PyStatR+, a platform advancing open and practical data science education. His work blends analytics, philosophy, and storytelling to make complex ideas human and empowering. Knowledge is freedom. Data is truth's language — ethics and transparency, its grammar.
</p>
<div class="author-card-social">
<a href="https://www.pystatrplus.org" title="PyStatR+" aria-label="PyStatR+ Website" class="social-website">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M12 2C6.48 2 2 6.48 2 12s4.48 10 10 10 10-4.48 10-10S17.52 2 12 2zm-1 17.93c-3.95-.49-7-3.85-7-7.93 0-.62.08-1.21.21-1.79L9 15v1c0 1.1.9 2 2 2v1.93zm6.9-2.54c-.26-.81-1-1.39-1.9-1.39h-1v-3c0-.55-.45-1-1-1H8v-2h2c.55 0 1-.45 1-1V7h2c1.1 0 2-.9 2-2v-.41c2.93 1.19 5 4.06 5 7.41 0 2.08-.8 3.97-2.1 5.39z"/></svg>
<span>Website</span>
</a>
<a href="https://github.com/Alierwai" title="GitHub" aria-label="GitHub" class="social-github">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M12 2A10 10 0 0 0 2 12c0 4.42 2.87 8.17 6.84 9.5.5.08.66-.23.66-.5v-1.69c-2.77.6-3.36-1.34-3.36-1.34-.46-1.16-1.11-1.47-1.11-1.47-.91-.62.07-.6.07-.6 1 .07 1.53 1.03 1.53 1.03.87 1.52 2.34 1.07 2.91.83.09-.65.35-1.09.63-1.34-2.22-.25-4.55-1.11-4.55-4.92 0-1.11.38-2 1.03-2.71-.1-.25-.45-1.29.1-2.64 0 0 .84-.27 2.75 1.02.79-.22 1.65-.33 2.5-.33.85 0 1.71.11 2.5.33 1.91-1.29 2.75-1.02 2.75-1.02.55 1.35.2 2.39.1 2.64.65.71 1.03 1.6 1.03 2.71 0 3.82-2.34 4.66-4.57 4.91.36.31.69.92.69 1.85V21c0 .27.16.59.67.5C19.14 20.16 22 16.42 22 12A10 10 0 0 0 12 2z"/></svg>
<span>GitHub</span>
</a>
<a href="https://www.linkedin.com/in/alierreng" title="LinkedIn" aria-label="LinkedIn" class="social-linkedin">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M20.45 20.45h-3.56v-5.57c0-1.33-.02-3.04-1.85-3.04-1.85 0-2.14 1.45-2.14 2.94v5.67H9.34V9h3.41v1.56h.05c.48-.9 1.64-1.85 3.37-1.85 3.6 0 4.27 2.37 4.27 5.46v6.28zM5.34 7.43a2.06 2.06 0 1 1 0-4.12 2.06 2.06 0 0 1 0 4.12zM7.12 20.45H3.56V9h3.56v11.45z"/></svg>
<span>LinkedIn</span>
</a>
<a href="https://youtube.com/@PyStatRPlus" title="YouTube" aria-label="YouTube" class="social-youtube">
<svg viewBox="0 0 24 24" xmlns="http://www.w3.org/2000/svg"><path d="M23.498 6.186a3.016 3.016 0 0 0-2.122-2.136C19.505 3.545 12 3.545 12 3.545s-7.505 0-9.377.505A3.017 3.017 0 0 0 .502 6.186C0 8.07 0 12 0 12s0 3.93.502 5.814a3.016 3.016 0 0 0 2.122 2.136c1.871.505 9.376.505 9.376.505s7.505 0 9.377-.505a3.015 3.015 0 0 0 2.122-2.136C24 15.93 24 12 24 12s0-3.93-.502-5.814zM9.545 15.568V8.432L15.818 12l-6.273 3.568z"/></svg>
<span>YouTube</span>
</a>
</div>
</div>
</div>
```
---
## Editor's Note
This tutorial reflects a deliberate editorial balance between **approachability** and **technical depth**. While Polars offers many advanced features (including streaming mode, custom expressions, and plugin systems), this guide emphasizes the core filtering operations that analysts encounter daily.
The decision to introduce **lazy evaluation** in a beginner tutorial deserves explanation: although lazy mode is technically "advanced," understanding its existence early helps learners grasp Polars' performance advantages and builds good habits. We present lazy evaluation with clear examples and explanations, making it accessible rather than intimidating.
By introducing `pl.col()` expressions from the start, learners develop an intuition for Polars' expression API—the foundation for all advanced operations. This approach aligns with the **PyStatR+ Charter** by emphasizing clarity, honesty, and accessibility without unnecessary complexity.
---
## Acknowledgements
This lesson is part of the broader **PyStatR+ Learning Platform**, developed with gratitude to mentors, learners, and the open-source community that continually advances the Python data science ecosystem. Special thanks to the Polars development team for creating a library that combines performance with elegance, making data analysis faster and more enjoyable for everyone.
---
## References
- [Polars Documentation](https://docs.pola.rs/) — Official documentation and API reference
- [Polars User Guide](https://docs.pola.rs/user-guide/) — Comprehensive tutorials
- [Polars GitHub Repository](https://github.com/pola-rs/polars) — Source code and development
- [Apache Arrow](https://arrow.apache.org/) — The columnar format underlying Polars
---
**PyStatR+** — *Learning Simplified. Communication Amplified.* 🚀
Join the Conversation
Share your thoughts, ask questions, or contribute insights