current position:Home>You can easily get started with Excel. Python data analysis package pandas (VII): breakdown

You can easily get started with Excel. Python data analysis package pandas (VII): breakdown

2021-08-22 05:22:35 Excel catalyst

> I often listen to others Python How powerful in the data field , As a result, I studied for a long time , Even data processing is a death of trouble . Only later , It's not Python Data processing is powerful , But he has a data analysis artifact —— pandas

Preface

Today, let's look at the data disaggregation function from two requirements , because Excel The built-in function is weak , When dealing with slightly more complex requirements, it will appear inadequate , therefore , This series will introduce Excel A very efficient data processing plug-in —— Power Query, And look at pandas How to flexibly solve .

The structure of this paper :

  • - Let's start with a simple breakdown
  • - Then try to split and expand into rows
  • - Finally, multi column segmentation and expansion into rows

Excel Dissection

Excel The disaggregation of data in is very simple . as follows :

  • - Select the column to be processed
  • - Function card " data "," Dissection " Button , The settings pop-up window appears
  • - choose " Separator symbol ", Click next
  • - Upper left part , Check " comma ", Click next
  • - Finally, see the result preview , That's all right. , Click directly to finish

pandas Dissection

pandas Sort text columns , It's simple :

  • - DataFrame.str.split() , Sort text columns , The first parameter specifies the delimiter
  • - Besides , Parameters expand , Indicates whether to expand into columns , If set to True , Then each element after segmentation becomes a separate column . This is in line with current needs

Complex requirements

occasionally , What we want to split , Convert to row , Requirements are as follows :

  • - such as , first line Zhang San There are 3 Elements , Then the segmentation result Zhang San has 3 That's ok

Use Excel It is difficult to handle this requirement with its own function , We use Power Query To deal with it :

  • - Functional area "Power Query", spot " From the table / Range "
  • - This will start Power query Edit window
  • - Point selection subject Entire column
  • - Upper ribbon " Start "," transformation " In the area , Point selection " Split column ", choose " By separator "
  • - Most of the settings here are related to Excel The built-in functions are basically the same
  • - It opens at " Advanced options ", Point selection " Split into " Medium " That's ok "
  • - Functional area " Start ", Click the leftmost button " Close and upload ", You can output the results Excel

> Please download and install this plug-in from the official website

that pandas How to achieve this requirement in :

  • - First use str.split Division , But not this time expand
  • - call DataFrame.explode(), Expand columns of a sequence type

> Be careful ,explode The method is pandas 0.25 New method of version

Increase the difficulty

What if there are multiple columns that need to be split and expanded ? as follows :

  • - At the same time, the division of subjects and grades is extended to lines

Look directly at pandas How to solve :

  • - First pair subject And achievement Columns are separated split after , Proceed again explode
  • - And then through concat, Original Sexual name columns

Although it did , But the semantics of the code is not clear enough . The flexibility of programming language can be fully reflected here , We encapsulate the logic into a function hp_explode , When you need to use it later , Simply call :

  • - hp_explode() , Automatically identifiable content is list Expand the columns of

> hp_explode The definition of method is not the core of this article , Small partners who need the source code see the end of the text

I don't want to call .str.split ? Yes, of course :

  • - In a word

summary

  • - Series.str.split() , Split text columns
  • - expand Parameter specifies whether to expand to column
  • - DataFrame.explode() , Expand the columns of the sequence into rows , Usually with Series.str.split() In combination with

This article is from WeChat official account. - Excel catalyst (ExcelCuiHuaJi)

The source and reprint of the original text are detailed in the text , If there is any infringement , Please contact the [email protected] Delete .

Original publication time : 2019-08-22

Participation of this paper Tencent cloud media sharing plan , You are welcome to join us , share .

copyright notice
author[Excel catalyst],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2021/08/20210822052227007O.html