Filtering using mask and where in pandas


Filtering a dataframe can be achieved in multiple ways using pandas. There are times when you simply need to update a column based on a condition which is true or vice-versa. In pandas dataframe there are some inbuilt methods to achieve the same using .where() and .mask().

  • df.where - Replace value when condition is false
  • df.mask - Replace value when condition is true

Initiating a dummy dataframe with some columns to understand the same -

import pandas as pd
import numpy as np
dummy_data = pd.DataFrame(
    np.array([[1, 2, 3], [4, 5, np.nan], [7, 8, 9], [3, 2, np.nan], [5, 6, np.nan]]),
    columns=['Column 1', 'Column 2', 'Column 3'])
dummy_data
Column 1 Column 2 Column 3
0 1.0 2.0 3.0
1 4.0 5.0 NaN
2 7.0 8.0 9.0
3 3.0 2.0 NaN
4 5.0 6.0 NaN
---

Operations over the data frame

  • Using df.where - Replace values where values in Column 3 is not null by null across all the columns
dummy_data.where(dummy_data['Column 3'].isnull(),np.nan)
Column 1 Column 2 Column 3
0 NaN NaN NaN
1 4.0 5.0 NaN
2 NaN NaN NaN
3 3.0 2.0 NaN
4 5.0 6.0 NaN

  • Using df.mask - Replace values where Column 3 values are null with 0 across the dataframe
dummy_data.mask(dummy_data['Column 3'].isnull(),0)
Column 1 Column 2 Column 3
0 1.0 2.0 3.0
1 0.0 0.0 0.0
2 7.0 8.0 9.0
3 0.0 0.0 0.0
4 0.0 0.0 0.0
---

Operations over a particular columns

  • Using df.mask - Replace values in Column 3 by 0 where values are null.
    The following code results in a list with previous value in Column 3 and the value obtained after using .mask()
list(zip(dummy_data['Column 3'],dummy_data['Column 3'].mask(dummy_data['Column 3'].isnull(),0)))

[(3.0, 3.0), (nan, 0.0), (9.0, 9.0), (nan, 0.0), (nan, 0.0)]

  • Using df.where - Replace values in Column 3 by null where values are not null.
    The following code results in a list with previous value in Column 3 & the value obtained after using .where()
list(zip(dummy_data['Column 3'],dummy_data['Column 3'].where(dummy_data['Column 3'].isnull(),np.nan)))

[(3.0, nan), (nan, nan), (9.0, nan), (nan, nan), (nan, nan)]


Notebook Link - Filtering using mask and where in pandas{:target="_blank"}


Related Posts

Attributes, Methods and Functions in python

January 7, 2019

Read More
Indexing and Sorting a dataframe using iloc and loc

January 1, 2019

Read More
String Interpolation in Python

March 28, 2019

Read More