Pandas Transform: Preform operations on groups and concatenate the results
Simple transform
Section titled “Simple transform”First, Lets create a dummy dataframe
Section titled “First, Lets create a dummy dataframe”We assume that a customer can have n orders, an order can have m items, and items can be ordered more multiple times
orders_df = pd.DataFrame()orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples', 'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
# And this is how the dataframe looks like:print(orders_df)# customer_id order_id item# 0 1 1 apples# 1 1 1 chocolate# 2 1 1 chocolate# 3 1 2 coffee# 4 1 2 coffee# 5 2 3 apples# 6 2 3 bananas# 7 3 4 coffee# 8 3 5 milkshake# 9 3 6 chocolate# 10 3 6 strawberry# 11 3 6 strawberry.
.
Now, we will use pandas transform function to count the number of orders per customer
Section titled “Now, we will use pandas transform function to count the number of orders per customer”# First, we define the function that will be applied per customer_idcount_number_of_orders = lambda x: len(x.unique())
# And now, we can tranform each group using the logic defined aboveorders_df['number_of_orders_per_cient'] = ( # Put the results into a new column that is called 'number_of_orders_per_cient' orders_df # Take the original dataframe .groupby(['customer_id'])['order_id'] # Create a seperate group for each customer_id & select the order_id .transform(count_number_of_orders)) # Apply the function to each group seperatly
# Inspecting the results ...print(orders_df)# customer_id order_id item number_of_orders_per_cient# 0 1 1 apples 2# 1 1 1 chocolate 2# 2 1 1 chocolate 2# 3 1 2 coffee 2# 4 1 2 coffee 2# 5 2 3 apples 1# 6 2 3 bananas 1# 7 3 4 coffee 3# 8 3 5 milkshake 3# 9 3 6 chocolate 3# 10 3 6 strawberry 3# 11 3 6 strawberry 3Multiple results per group
Section titled “Multiple results per group”Using transform functions that return sub-calculations per group
Section titled “Using transform functions that return sub-calculations per group”In the previous example, we had one result per client. However, functions returning different values for the group can also be applied.
# Create a dummy dataframeorders_df = pd.DataFrame()orders_df['customer_id'] = [1,1,1,1,1,2,2,3,3,3,3,3]orders_df['order_id'] = [1,1,1,2,2,3,3,4,5,6,6,6]orders_df['item'] = ['apples', 'chocolate', 'chocolate', 'coffee', 'coffee', 'apples', 'bananas', 'coffee', 'milkshake', 'chocolate', 'strawberry', 'strawberry']
# Let's try to see if the items were ordered more than once in each orders
# First, we define a fuction that will be applied per groupdef multiple_items_per_order(_items): # Apply .duplicated, which will return True is the item occurs more than once. multiple_item_bool = _items.duplicated(keep=False) return(multiple_item_bool)
# Then, we transform each group according to the defined functionorders_df['item_duplicated_per_order'] = ( # Put the results into a new column orders_df # Take the orders dataframe .groupby(['order_id'])['item'] # Create a seperate group for each order_id & select the item .transform(multiple_items_per_order)) # Apply the defined function to each group separately
# Inspecting the results ...print(orders_df)# customer_id order_id item item_duplicated_per_order# 0 1 1 apples False# 1 1 1 chocolate True# 2 1 1 chocolate True# 3 1 2 coffee True# 4 1 2 coffee True# 5 2 3 apples False# 6 2 3 bananas False# 7 3 4 coffee False# 8 3 5 milkshake False# 9 3 6 chocolate False# 10 3 6 strawberry True# 11 3 6 strawberry True