current position:Home>Four skills of pandas row column conversion

Four skills of pandas row column conversion

2022-02-02 08:08:21 PI dada

official account : Youer cottage
author :Peter
edit :Peter

Hello everyone , I am a Peter~

This article introduces Pandas in 4 A method of row column conversion , contain :

  • melt
  • Transposition T perhaps transpose
  • wide_to_long
  • explode( Explosion function )

Finally, answer a data processing question asked by a reader friend .

Pandas Row column conversion

pandas There are many ways to realize row column conversion :

Import library

import pandas as pd
import numpy as np
 Copy code 

function melt

melt Main parameters of :

pandas.melt(frame, 
            id_vars=None, 
            value_vars=None, 
            var_name=None, 
            value_name='value',
            ignore_index=True,  
            col_level=None)
 Copy code 

The meaning of parameters is explained below :

  • frame: Data frame to be processed DataFrame.

  • id_vars: Express It doesn't need to be converted Column name of

  • value_vars: Express Need to transform Column name of , If all the remaining columns need to be converted , You don't have to write

  • var_name and value_name: Column names corresponding to custom settings , So take a new column name

  • igonore_index: Whether to ignore the original column name , The default is True, Just ignore the original citation , To regenerate the 0,1,2,3,4.... The natural index of

  • col_level: If the column is a multi-level index column MultiIndex, Use this parameter ; This parameter is rarely used

Analog data

#  Data to be converted :frame
df = pd.DataFrame({"col1":[1,1,1,1,1],
                   "col2":[3,3,3,3,3],
                   "col3":["a","a","a","b","b"]
                  })
df
 Copy code 

id_vars

value_vars

Use the above two parameters at the same time :

Convert multiple column attributes at the same time :

var_name and value_name

pd.melt(
    df,
    id_vars=["col1"],  #  unchanged 
    value_vars=["col3"],  #  shift 
    var_name="col4",  #  New column names 
    value_name="col5" #  The new column name of the corresponding value 
)
 Copy code 

ignore_index

By default, natural indexes are generated :

You can change to False, Use the original index :

Transpose function

pandas Medium T Property or transpose Function is to realize the function of row to column , To be exact, transpose

Simple transpose

Simulated a data , See the result of transpose :

Use transpose Function to transpose :

There's another way : First pair value values To transpose , Then exchange the index and column name :

Finally, let's look at a simple case :

wide_to_long function

Literally : Convert the dataset from wide format to long format

wide_to_long(
    df,
    stubnames,
    i,
    j,
    sep: str = "",
    suffix: str = "\\d+"
 Copy code 

Specific explanation of parameters :

  • df: Data frame to be converted
  • stubnames: The storage part with the same column name in the wide table
  • i: To be used as id Columns of variables
  • j: Give long format “ suffix ” Column settings columns
  • sep: Set the separator to be deleted . for example columns by A-2020, Specifies the sep='-' To remove the separator . The default is empty. .
  • suffix: Get by setting the regular expression “ suffix ”. Default '\d+' Means to get the numeric suffix . There is no number “ suffix ” It can be used '\D+' To obtain

Analog data

The conversion process

Use functions to implement transformations :

Set multi-level index

First simulate a piece of data :

If you're not used to multi-level indexing , It can be converted to the following format :

sep and suffix

df5 = pd.DataFrame({
    'a': [1, 1, 2, 2, 3, 3, 3],
    'b': [1, 2, 2, 3, 1, 2, 3],
    'stu_one': [2.8, 2.9, 1.8, 1.9, 2.2, 2.3, 2.1],
    'stu_two': [3.4, 3.8, 2.8, 2.4, 3.3, 3.4, 2.9]
})
df5
 Copy code 

pd.wide_to_long(
    df5, 
    stubnames='stu', 
    i=['a', 'b'], 
    j='number',
    sep='_', #  Use... When there is a connector in the column name ; The default is empty. 
    suffix=r'\w+')  #  Suffix based on regular expression ; The default is number \d+; This is changed into \w+, For letters 
 Copy code 

Explosion function -explode

explode(column, ignore_index=False)
 Copy code 

This function has only two arguments :

  • column: Elements to explode
  • ignore_index: Ignore index ; The default is False, Keep the original index

Analog data

Single field explosion

Perform an explosion process on a single field , Turn the wide table to the long table :

Parameters ignore_index

Multiple field explosion

The process of exploding multiple fields in succession :

Readers' questions

Answer a reader's question here , The data is in the form of simulation . There is such a data as follows , demand :

Every shop Under each fruit In their own shop The proportion of

fruit = pd.DataFrame({
    "shop":["shop1","shop3","shop2","shop3",
            "shop2","shop1","shop3","shop2",
            "shop3","shop2","shop3","shop2","shop1"],
    "fruit":[" Apple "," grapes "," Banana "," Apple ",
             " grapes "," a mandarin orange "," pear "," Hami melon ",
             " grapes "," Banana "," Apple "," grapes "," a mandarin orange "],
    "number":[100,200,340,150,
              200,300,90,80,340,
              150,200,300,90]})
fruit
 Copy code 

First of all, we need to count every shop Every fruit The sales of

Method 1: multistep

Method 1 A multi-step solution is adopted :

1、 Every shop Total sales of

2、 Increase the sum shop_sum Column

3、 Generation proportion

Method 2: Use transform function

copyright notice
author[PI dada],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202020808177089.html

Random recommended