current position:Home>Pandas: how to group and calculate by index

Pandas: how to group and calculate by index

2022-01-30 11:52:26 Stone sword


You can use the following methods in pandas Group by one or more index columns and perform some calculations .

Method 1: Group by an index column

df.groupby('index1')['numeric_column'].max()

 Copy code 

Method 2: Group by multiple index columns

df.groupby(['index1', 'index2'])['numeric_column'].sum()
 Copy code 

Method 3: Group by index column and general column

df.groupby(['index1', 'numeric_column1'])['numeric_column2'].nunique()
 Copy code 

The following example shows how to use the following multi indicator pandas DataFrame To use each method .

import pandas as pd

#create DataFrame
df = pd.DataFrame({'team': ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B'],
                   'position': ['G', 'G', 'G', 'F', 'F', 'G', 'G', 'F', 'F', 'F'],
                   'points': [7, 7, 7, 19, 16, 9, 10, 10, 8, 8],
                   'rebounds': [8, 8, 8, 10, 11, 12, 13, 13, 15, 11]})

#set 'team' column to be index column
df.set_index(['team', 'position'], inplace=True)

#view DataFrame
df

		 points	 rebounds
team	position		
A	G	 7	 8
        G	 7	 8
        G	 7	 8
        F	 19	 10
        F	 16	 11
B	G	 9	 12
        G	 10	 13
        F	 10	 13
        F	 8	 15
        F	 8	 11

 Copy code 

Method 1: Group by an index column

The following code shows how to find 'points' The maximum value of the column , And pass 'position' Index columns are grouped .

#find max value of 'points' grouped by 'position index column
df.groupby('position')['points'].max()

position
F    19
G    10
Name: points, dtype: int64

 Copy code 

Method 2: Group by multiple index columns

The following code shows how to pass " The team " and " Location " Index columns to find " integral " The sum of the columns .

#find max value of 'points' grouped by 'position index column
df.groupby(['team', 'position'])['points'].sum()

team  position
A     F           35
      G           21
B     F           26
      G           19
Name: points, dtype: int64
 Copy code 

Method 3: Group by index column and general column

The following code shows how to ' rebounds ' The number of unique values found in the column , And through the index column ' The team ' And normal columns ' integral ' grouping .

#find max value of 'points' grouped by 'position index column
df.groupby(['team', 'points'])['rebounds'].nunique()

team  points
A     7         1
      16        1
      19        1
B     8         2
      9         1
      10        1
Name: rebounds, dtype: int64
 Copy code 

copyright notice
author[Stone sword],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201301152237752.html

Random recommended