current position:Home>Blow up this pandas GUI artifact and automatically turn the code!

Blow up this pandas GUI artifact and automatically turn the code!

2022-02-02 03:11:19 Open source outpost

The following article comes from Python Data Science , Author Dongge took off

About pandas Of GUI Tools , so to speak , With GUI Visual interface , Operation is the same as Excel It's as simple as , This time we introduce a more powerful GUI Artifact :D-Tale.

Why should the name of this library D-Tale Well ? And checked , It is detail Homophony of , The original intention is to provide all the details of the data . Here's how to use it .

start-up 、 Data loading

D-Tale Support multiple file formats , Include CSVTSVXLSXLSX. It's a place where Flask For the back-end ,React Built as a front end , adopt pip Can be installed .

pip install dtale
 Copy code 

Two kinds of startup D-Tale The way :

  • take DataFrame Object passed to D-Tale function , stay Jupyter Instantiate in the unit GUI.
  • Do not import DataFrame Object D-Tale, Displayed as a with GUI Interactive menu to load data and provide various other options .

In order to better demonstrate , Here's the second one .

import dtale
dtale.show(open_browser=True)
 Copy code 

After running the code , You will get the following GUI menu :

There are several ways to import data ,

  1. Load data from file
  2. Load data from web site . Need to pass the link to the website , You can get CSV、JSON、TSV or Excel Wait for the documents .
  3. Load sample dataset . These datasets may require some background downloads to obtain datasets from the server .

This article shows how to use... With an example dataset of a movie . After loading the dataset , It's like pandas Render a table like . All cells in the table can be edited , As in the excel Change the value directly as in .

Column menu function

When you click a column header , You can get a list of options , The content of the option depends on the data type of the column . For example, below 3 A type of :datatime64int64str,3 What the two option lists have in common is that they are sorted in ascending or descending order . besides , There are different filtering methods for different types of data .

Specifically explore the functions of the project .

1. frozen

The locking function is similar to excel The first line in is frozen , Lock the column to the far left , In this way, you can freely scroll the table and see the locked Columns .

2. Hide and delete

Hiding the option will remove a column from the table , But it won't actually delete . Of course , You can also unhide the column from the bar in the upper right corner .

The delete option permanently deletes columns from the data frame . It is similar to pandas Of drop function .

3. Substitution and type conversion

The replace option can replace some fixed values in the table or fill in empty values . Sure inplace=True Replace a column or create a new column . meanwhile , The substitution type can be defined as : Specific value 、 Replacement of spaces or specific strings .

For example, the following ,date_published The column should contain strings of all date types , But there is text that should not exist TV Movie 2019, You can use numpy nan To replace this outlier .

Now? , I can easily convert date_published The data type of the column is changed from string to datetime , It also provides inplace or new columns Changed options :

4. Descriptive statistics

pandas Medium describe The function helps to provide a statistical summary of a column or dataset . there describe equally , And provide more information than ordinary pandas More functions , It provides unique information for each data type describe Abstract .

For date time type column, It provides the following details :

in addition , It also generates... For features histogram and value_counts chart :

For integer type columns , It provides centrality 、 frequency 、 Kurtosis and skewness . It also represents a box diagram 、 Histogram 、value_count Map and QQ The data in the figure .

For columns of type string , It provides the most commonly used words and their frequency 、 Detailed summary of characters 、 Word value counting graph and value counting graph .

5. Screening filtration

stay D-Tale Filtering data in is very simple , Just specify the desired filter type .

When filtering data in the date time type column , You can also filter data by date range . For columns of type string , Data can be filtered in the following ways :

6. Variance report

This option does not apply to string type values . The variance report determines whether a feature has low variance based on the following two points :

  • Count of unique values in the feature / Sample size < 10%
  • Count of the most commonly used values / Count of the second common value > 20

Show calculation results , Histogram is used to present the results .

7. Text cleaning

This option only applies to string type values . Text cleansing is a major part of the data science project , If the correct type is used , It can improve model performance .D-Tale Provides all possible text cleaning methods that can be applied to text , We just need to select the method to apply to the text , The work will be done on the back end .

Main menu options

In the main menu , In fact, it contains all the functions of the column menu , But when used in the main menu , More universal . such as , Can be in a single or multiple columns , Instead of manually selecting . Here are some core functions .

1. Create column

You can create new columns or create columns from existing columns . Like before we modeled Feature Engineering You can also use it to achieve , For example, use two columns to perform an arithmetic operation to create a new column . meanwhile , We can provide a name for the new column created , And setting the data type .

2. Summary data

stay pandas in , We go through grouping or PivotTable Summary data . We can also use D-Tale Do the same thing . We can directly select columns 、 The aggregation function and the columns required in the final data set are sufficient , No code required . Here is an example .

3. Deletion rate analysis

Missing data is a common problem in all data sets , Because no data set is perfect , It has many missing values intentionally or unintentionally .D-Tale Integrated missingno Library to visualize missing values in the dataset , It also provides matrix Bar chart Thermogram and Tree view .

4. mapping

Drawing is data science EDA An important step in the exploratory analysis phase .D-Tale Integrate plotly To create interactive drawings . It can provide Broken line diagram 、 Bar chart 、 Scatter plot 、 The pie chart 、 Clouds of words 、 Thermogram 、3D Scatter plot 、 Surface map 、 Map 、 Candlestick chart 、 Tree view and funnel view . Different types of data support different types of drawings , Like this .

5. Highlight function

Used to highlight parts of a dataset , It's like we're here pandas Use in stylers To display special values ,highlighters You can do the same thing . such as , We can highlight missing values 、 data type 、 Outliers or ranges . The following example shows how to highlight missing and abnormal values :

6. Code export 、 Export data

stay D-Tale All operations performed on the data frame in are automatically converted to their python/pandas/plotly Equivalent code . You can click each action and chart GUI See them with the export code options that appear in . Here are some automatically generated code .

import pandas as pd
from dtale.datasets import {dataset}
df = {dataset}()
if isinstance(df, (pd.DatetimeIndex, pd.MultiIndex)):
 df = df.to_frame(index=False)
# remove any pre-existing indices for ease of use in the D-Tale code, but this is not required
df = df.reset_index().drop('index', axis=1, errors='ignore')
df.columns = [str(c) for c in df.columns]  # update columns to strings in case they are numbers
df = df[[c for c in df.columns if c != 'imdb_title_id']]
df = df.rename(columns={'title': 'Movie_title'})
s = df['date_published']
s = df['date_published']
s = s.replace({
 'TV Movie 2019': np.nan})
df.loc[:, 'date_published'] = s
df.loc[:, 'date_published'] = pd.Series(pd.to_datetime(df['date_published'], infer_datetime_format=True), name='date_published', index=df['date_published'].index)
 Copy code 

Last , We can also use the export option to change to CSV or TSV Export dataset after .

Conclusion

D-Tale This GUI The powerful function is really fragrant , When you don't want to knock the code in the future, you can carry out visual operation , Then turn it into code, if necessary . Compared with other similar tools introduced before , such as pandasGUIMito,D-Tale Is more powerful .

GitHub Home address :github.com/man-group/d…

Reference resources :

www.analyticsvidhya.com/blog/2021/0…

Open source outpost Everyday sharing is hot 、 Interesting and practical open source projects . Participate in maintenance 10 ten thousand + Star Open source technology repository for , Include :Python、Java、C/C++、Go、JS、CSS、Node.js、PHP、.NET etc. .

copyright notice
author[Open source outpost],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202020311178625.html

Random recommended