current position:Home>You can easily get started with Excel. Python data analysis package pandas (II): advanced filtering (I)

You can easily get started with Excel. Python data analysis package pandas (II): advanced filtering (I)

2021-08-23 04:42:17 Excel catalyst


I often listen to others Python How powerful in the data field , As a result, I studied for a long time , Even data processing is a death of trouble . Only later , It's not Python Data processing is powerful , But he has a data analysis artifact —— pandas .

The last article started from Excel Screening as the core , Introduce pandas In the implementation of , however ,Excel There is also an advanced filtering function in , Ordinary screening and comparison , It can only be regarded as " Low level screening " The function of . Next , Through simple to complex requirements , See what kind of function this is .


The sample data in this article are as follows :

  • - An examination report form of a school (8 Academic achievements )
  • - The name has been desensitized

Load data

pandas Need to load Excel data , Very easy to , as follows :

  • - Most of the parameters have been described in the previous article , Here's the point
  • - Because there are redundant columns in the worksheet , We just need the front 10 Column , Therefore, it is specified that usecols Parameters . He accepted a list . list(range(0,10)), In fact, it's equivalent to [0,1,2…………,9] A list of


Let's start with a super simple example , Let's see how to operate Excel Advanced filtering .

" Name is A1 The record of ",Excel The operation steps of advanced filtering are as follows :

  • - Functional area " data " page , stay " Filter large icons " There is one on the lower right " senior " , Click out the advanced filtering function window
  • - Mainly look at the picture above 2 Red box of , Select our data source area , Remember to include the title
  • - Upper figure 2 The blue box is the conditional area , The selection of condition area is shown in the figure
  • - Click ok , You can filter out names A1 The record of

Look at the setting of the condition area :

  • - The format is , title + Conditional value ( Up and down cells )
  • - The title must be consistent with the column corresponding to the data source , For example, specify the name column here , So the title of the condition area is also " full name "
  • - Conditional values we use ="=A1"

> If you just specify a text value A1 ,Excel By default, the name will be prefixed A1 Filter out the records .

pandas There is no high-level screening in , Because his selection is flexible , have a look pandas The implementation of the :

  • - Simple and easy to understand , It's all introduced in the previous article , I won't go into that

Specific value filtering

"4、5 or 7 Class records ",Excel The condition area settings for advanced filtering are as follows :

  • - The red box is the condition area
  • - The title is " class ", This should be consistent with the data source
  • - The condition value area is represented by multiple lines " or " Relationship , The picture above shows that the class is 4 or 5 or 7, Any matching record

pandas The implementation is as follows :

  • - Also use query Method
  • - in [4,5,6] , Semantic clarity , If the class is in the list, it meets

pandas Of query Queries can be very flexible , Can accept an external list variable , as follows :

  • - The query string should use external variables , You just need to write "@+ Variable name " that will do

Range filtering

" Total score 450 to 500 Between the records ",Excel The condition area settings for advanced filtering are as follows :

  • - The data source does not have a total breakdown , Add one sum The total breakdown of the formula
  • - The condition area is on the same line , Express " also " Relationship
  • - The condition value can directly use the common comparison symbols
  • - Still pay attention to the condition title " Total score "

pandas The implementation is as follows :

  • - The first sentence , Add new column , Sum column . pandas Adding new columns is very simple ,df[ New column name ]= New column value , that will do
  • - df.loc[:,' Chinese language and literature ':' biological '] , Is to get the data of the column between language and Biology
  • - .sum(axis=1) , Horizontal summation . because pandas You can operate on rows or columns flexibly , adopt axis It can express whether the operation is on rows or columns .
  • - The second sentence is to query , Easy to understand

" Chinese is higher than 90, perhaps , Mathematics is higher than or equal to 100",Excel The condition area settings for advanced filtering are as follows :

pandas The implementation is as follows :

  • - query The query string in can use python Logical keywords in and or or These are all fine

The next trailer

This paper starts from Excel Introduction to simple applications from the perspective of advanced filtering , The next article will cover more complex applications , Let's first look at the complex requirements :

  • - " Students whose total score is higher than the average of the whole class ", This requires that each line of records be compared with the overall average
  • - " A school whose total score is higher than the average score of its class " , This is an upgraded version of the previous article
  • - " In the whole class ,8 Students whose grades exceed the grade average " , The results of each subject should be carefully compared

The next chapter will solve the above problems , Stay tuned .

This article is from WeChat official account. - Excel catalyst (ExcelCuiHuaJi)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the [email protected] Delete .

Original publication time : 2019-08-21

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

copyright notice
author[Excel catalyst],Please bring the original link to reprint, thank you.

guess what you like

Random recommended