current position:Home>Pyechart + pandas made word cloud pictures of 19 report documents

Pyechart + pandas made word cloud pictures of 19 report documents

2022-01-30 22:27:52 Mo Wuwei

「 This is my participation 11 The fourth of the yuegengwen challenge N God , Check out the activity details :2021 One last more challenge

pyechart- Clouds of words

Correlation Library Search address
jieba Thesaurus github.com/fxsjy/jieba
pyechart pyecharts.org/#/zh-cn/
pandas pandas.pydata.org/pandas-docs…

Yes 19 Big report .txt Make a word cloud picture of the contents of the document

  1. Specific needs :
  • Only count words with more than two words
  • Only for more than two words , Before the number of occurrences 100 Count the words of
  1. Purpose :
  • master pyecharts Word cloud map supports
  • Master the use of jieba Library analysis
  • master pandas Grouping statistics

1. Data acquisition

Data acquisition ideas :

  1. 19 The contents of the big report are as follows , Other documents can be used instead of

 Insert picture description here

  1. The type of data required to analyze the word cloud

Specific analysis of specific problems , Check the website :pyechart The official manual ( Chinese )

 Insert picture description here

import jieba
from pyecharts.charts import WordCloud
import pandas as pd 
from pyecharts import options as opts
# Define a list , Store the data 
wordlist = []
# File reading 
with open("19 Big report .txt") as file:
    s = file.read()
    # Use jieba The default mode is to cut words 
    words = jieba.lcut(s)
    # Take out and divide every word 
    for word in words:
    	# Remove less than 2 A word 
        if len(word) > 1:
            # Data addition ,
            wordlist.append({"word":word,"count":1})
 Copy code 

2. Count the frequency of words

( have access to pandas Grouping statistics ), Sort in descending order and take the first 100 The data of

pandas-DataFrame data structure

  • DataFrame It is a marked two-dimensional data structure , Its different data column types can be different . You can think of it as a Excel Table or a database table .

2.1 establish DataFrame

Usually common DataFrame There are several ways to :

  • From one dimension list,Series Wait for objects as values (value) - Form a dictionary to create ;
  • From two-dimensional ndarray objects creating ;
  • Read a two-dimensional data table from a file or database

pandas Library official document address :pandas.pydata.org/pandas-docs…

2.2 Group calculation groupby、 Sum up sum:

Group by words , Data form -->word:count( for example : Development :1)

2.3 Sort in descending order and take the first 100 The data of

The word frequency statistics code is as follows :

#wordlist Is a list type , The element is of dictionary type [{"word": Development ,"count":1},...,...,]
df = pd.DataFrame(wordlist)
#  With word The values of are grouped as keywords , Then count the number of each group (count) Total of sum
#groupby DataFrame Grouping function in 
dfword = df.groupby('word')['count'].sum()
# sort_values Sort by the value of the column ,ascending by false Time descending sort 
dfword2 = dfword.sort_values(ascending=False)
# take dfword2  Before 100 The data goes to DataFrame.
dfword3 = pd.DataFrame(dfword2.head(100),columns=['count'])
#  Column at this time ”word“ Is used as a column index , You can turn it into a column 
dfword3['word'] = dfword3.index
 Copy code 

The result we want

 Insert picture description here

3. The making of cloud pictures of CI

Now that there's data , Then the drawing is relatively simple , Go straight to the code

# take word Column to list 
word = dfword3['word'].tolist()
# take count Column to list 
count = dfword3['count'].tolist()
# use for Loop merge data 
a = [list(z) for z in zip(word,count)]
c = (
		#WordCloud Real serialization of classes 
        WordCloud()
        # Add graph name 、 data 、 The random size of the font 、 Image type 
        .add("", a, word_size_range=[20, 100],shape="diamond")
        # The specific settings of the image can also be set in the global settings , There are also some fun settings , It's not going deep here 
        .set_global_opts(title_opts=opts.TitleOpts(title="19 Big report words cloud picture "))
    )
# stay jupyter Displayed on the 
c.render_notebook()
 Copy code 

Let's take a look at the final product  Insert picture description here

Let me go with the party , Seek development for the motherland

copyright notice
author[Mo Wuwei],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/01/202201302227500975.html

Random recommended