current position:Home>This pandas exercise must be successfully won

This pandas exercise must be successfully won

2022-01-29 12:50:55 PI dada

official account : Youer cottage
author :Peter
edit :Peter

Hello everyone , I am a Peter~

Wrote a lot Pandas The article , It mainly explains the usage of common operations and functions . Today, I made a fruit order and sales data ( Analog data , Just for learning ), It is mainly used to deepen the understanding of how to use flexibly and quickly Pandas To fulfill our needs .

Pandas article


Data interpretation

1、 The first data of the simulation are 5 A field : The order number 、 Next single 、 goods 、 Price 、 Number

  • The order number : The order number of each order , One or more items exist in an order number
  • Next single : A person may go down 1 One or more orders , For example, Zhang San only placed an order , Li Si placed several orders
  • goods : The same item may appear in multiple orders
  • Price : The price of each item in each order , In different orders , The price of the same commodity may be different , such as SOD Apple in the order is 10, But in DFH In the order is 9.8
  • Number : Sales quantity of each item in each order

2、 There are only two fields in the second data of the simulation : Commodity and origin

At the same time we can see : There are differences between the two data sheet Medium , Storage becomes xlslx file , And there is no missing value data .

demand 1: Read data in different ways

There is the same Excel Different in sheet in , We take different ways to read :

The way 1: Specify both files and sheet The name of

import pandas as pd  #  Import the package first 
 Copy code 

The way 2: Specify a file name and sheet The index number of , Index from 0 Start

demand 2: The combination of the two data

You can see two sheet The data in is through “ goods ” This field is associated with , We use pandas Medium merge function , And keep the first ( On the left left) All the information in the table .

merge Function is a very important function , Can handle flexibly Pandas Data merging in .

The following requirements are processed for the data merged above

demand 3: Order quantity 、 Number of customers 、 Commodity volume

Order quantity : How many orders have been placed in total

unique: Chinese has a unique meaning , The field of order number has several unique characters 、 The only information . The total is 7 Order per order

Same thing : How many order users can you get 、 How many kinds of goods are sold ?

demand 4: Order quantity per user

Is to ask how many orders each user has placed : Use groupby Group and count the order quantity of each issuer .

  • First use groupby Function to group
  • Then use the aggregate function nunique, Count each one “ The order number ” The number of ( To heavy statistics )
  • Finally, reset the index

I saw Li four times 3 Zhang order , The most.

demand 5: Total consumption amount per user

1、 Add a column first : Total

2、 Two different ways of grouping and regrouping

demand 6: Orders from different places of origin 、 sales 、 Total sales

demand 7: The item with the highest price in each order

Find the item with the highest price in each order , such as :SOD The highest price in the order is grapes

The way 1: The first implementation is as follows :

  • First arrange the whole in descending order
  • Then group according to the order number , Take out the first first Data is enough

The way 2: The implementation is as follows

1、 First, each order number is arranged in descending order according to the price

2、 Mixed use of multiple functions , You can run it separately to see the results of each step

df.groupby(" The order number ").apply(lambda x: x.sort_values(" Price ",ascending=False)).reset_index(drop=True).groupby(" The order number ").first().reset_index()
 Copy code 

The way 3: Use... When grouping groupby_keys Parameters

demand 8: The highest price in each order 2 position

Take out the highest price in each order 2 position , If there is only one, take out one .

The above is the highest data after taking out the grouping , Article 1 first. In this requirement, we use head function , You can take out any n Data :Top-N

demand 9: The unit price of each commodity ( Retain 2 Decimal place )

Let's disassemble the meaning of the title :

  • Every product : It is determined that the grouped elements are groupby=" goods "
  • Unit price : First find the total sales of each commodity , I'm looking for the number of orders for each commodity , Final division

How to keep two decimal places for the unit price of the above commodity pen ? Two ways to achieve :

copyright notice
author[PI dada],Please bring the original link to reprint, thank you.

Random recommended