current position:Home>Python tiktok 5000+ V, and found that everyone love this video.

Python tiktok 5000+ V, and found that everyone love this video.

2021-08-22 06:22:12 Python Programming

Write it at the front

lately , I saw a problem about shaking tiktok. .

It says , At present, China's per capita swipes short videos every day 110 minute .

 

1

Look at the data , Looks like I'm being averaged again .

But to be honest , Tiktok , There's a feeling that you can't stop ~

So still tiktok less. , How to read , How to write code . Otherwise time would have gone by .

This time we will use data to analyze the tiktok. , What kind of videos are most popular .
Jupyter Code , Data visualization & analysis

 
 
   

"""
Of course, I'm learning Python It's going to be difficult , There is no good learning material , How to learn ?
Study Python I don't know how to recommend to join the communication Q Group number :928946953
There are like-minded partners in the group , Help each other , There are good video tutorials and PDF!
And Daniel's answer !
"""


from
pyecharts.charts import Pie, Bar, TreeMap, Map, Geo from wordcloud import WordCloud, ImageColorGenerator from pyecharts import options as opts from pyecharts.globals import ThemeType import matplotlib.pyplot as plt from PIL import Image import pandas as pd import numpy as np import jieba

 

In [2]:

df = pd.read_csv('../file/douyin.csv',encoding = 'utf-8-sig')
df.head()

 

Out[2]:

 

name

gender

country

province

city

location

category

fans

videos

likes

comments

shares

following

school

custom_verify

enterprise_verify

signature

0

The People's Daily

0

China

Beijing

Beijing

Beijing

government administration versus business enterprise

117259000

2427

1165446000

11906782048

9089061412

18

NaN

NaN

Official account of people's daily

Participate in 、 communicate 、 Record the times .

1

CCTV news

0

China

Beijing

Beijing

Beijing

government administration versus business enterprise

105648000

3681

3814571666

2603872833

1989050522

27

NaN

CCTV news official account

CCTV news official tiktok

I haven't thought of my signature yet

2

Chen He

1

China

NaN

Shanghai

Shanghai

star

68374000

422

570096000

430908721

117639297

131

Shanghai Academy of drama

Actor Chen he

NaN

Too handsome to have any friends ‍*️ There's something live room ️️️️ ️ official account [ Chen He ]

3

Dear- Delireba

0

NaN

NaN

NaN

NaN

star

49790000

29

181167000

202448645

151645265

0

NaN

actor

NaN

NaN

4

Poison tongue film

1

China

guangdong

Guangzhou

Guangzhou

The plot

46355000

616

820393000

28026109

13005392

24

NaN

High quality film and television we media 、 Tiktok

NaN

See a movie , Can change life . Business mail :[email protected] ️ Calendar pre-sale ...

In [27]:

df.loc[df.gender == '0', 'gender'] = ' Unknown '
df.loc[df.gender == '1', 'gender'] = ' men '
df.loc[df.gender == '2', 'gender'] = ' women '
#  Group by sex 
gender_message = df.groupby(['gender'])
#  Count the results after grouping 
gender_com = gender_message['gender'].agg(['count'])
gender_com.reset_index(inplace=True)

#  Pie chart data 
attr = gender_com['gender']
v1 = gender_com['count']

#  Initialize configuration 
pie = Pie(init_opts=opts.InitOpts(width="800px", height="400px",theme=ThemeType.LIGHT))
#  Add data , Set the radius 
pie.add("", [list(z) for z in zip(attr, v1)], radius=["40%", "75%"])
#  Set global configuration item , title 、 legend 、 hold-all ( Download the pictures )
pie.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Gender distribution ", pos_left="center", pos_top="top"),
                    legend_opts=opts.LegendOpts(orient="vertical", pos_left="left"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}))
#  Set series configuration items , Label style 
pie.set_series_opts(label_opts=opts.LabelOpts(is_show=True, formatter="{b}:{d}%",font_size=14))
pie.render_notebook()

 

Out[27]:

In [26]:

df = df.sort_values('likes', ascending=False)
#  obtain TOP10 The data of 
attr = df['name'][0:10]
v1 = [float('%.1f' % (float(i) / 100000000)) for i in df['likes'][0:10]]

#  Initialize configuration 
bar = Bar(init_opts=opts.InitOpts(width="1000px", height="600px"))
# x Axis data 
bar.add_xaxis(list(reversed(attr.tolist())))
# y Axis data 
bar.add_yaxis("", list(reversed(v1)),color = '#84E0E3')
#  Set global configuration item , title 、y Axis split line 
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Number of likes TOP10( Billion )", pos_left="center", pos_top="18"),
                    xaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)),
                    yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(font_size=12))
                   )
#  Set series configuration items , Label style 
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right", color="black"))
bar.reversal_axis()
bar.render_notebook()

 

Out[26]:

In [29]:

#  Segment data 
Bins = [0, 1000000, 5000000, 10000000, 25000000, 50000000, 100000000, 5000000000]
Labels = ['0-100', '100-500', '500-1000', '1000-2500', '2500-5000', '5000-10000', '10000 above ']
len_stage = pd.cut(df['likes'], bins=Bins, labels=Labels).value_counts().sort_index()
#  get data 
attr = len_stage.index.tolist()
v1 = len_stage.values.tolist()

#  Generate a histogram 
bar = Bar(init_opts=opts.InitOpts(width="800px", height="400px"))
bar.add_xaxis(attr)
bar.add_yaxis("", v1,color = '#84E0E3')
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Distribution of likes ( ten thousand )", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)))
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="top", color="black"))
bar.render_notebook()

  

Out[29]:

In [34]:

df = df.sort_values('fans', ascending=False)
attr = df['name'][0:10]
v1 = ['%.1f' % (float(i) / 10000) for i in df['fans'][0:10]]

bar = Bar(init_opts=opts.InitOpts(width="1000px", height="600px"))
bar.add_xaxis(list(reversed(attr.tolist())))
bar.add_yaxis("", list(reversed(v1)),color = '#84E0E3')
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Number of fans TOP10( ten thousand )", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    xaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)))
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right", color="black"))
bar.reversal_axis()
bar.render_notebook()

 

Out[34]:

In [37]:

Bins = [0, 1500000, 2000000, 5000000, 10000000, 25000000, 200000000]
Labels = ['0-150', '150-200', '200-500', '500-1000', '1000-2500', '5000 above ']
len_stage = pd.cut(df['fans'], bins=Bins, labels=Labels).value_counts().sort_index()

attr = len_stage.index.tolist()
v1 = len_stage.values.tolist()

bar = Bar(init_opts=opts.InitOpts(width="800px", height="400px"))
bar.add_xaxis(attr)
bar.add_yaxis("", v1,color = '#84E0E3')
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Distribution of fans ( ten thousand )", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)))
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="top", color="black"))
bar.render_notebook()

 

Out[37]:

In [40]:

df = df.sort_values('comments', ascending=False)
attr = df['name'][0:10]
v1 = ['%.1f' % (float(i) / 100000000) for i in df['comments'][0:10]]

bar = Bar(init_opts=opts.InitOpts(width="1000px", height="600px"))
bar.add_xaxis(list(reversed(attr.tolist())))
bar.add_yaxis("", list(reversed(v1)),color = '#84E0E3')
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V comments TOP10( Billion )", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    xaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)))
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right", color="black"))
bar.reversal_axis()
bar.render_notebook()

 

Out[40]:

In [44]:

df = df.sort_values('shares', ascending=False)
attr = df['name'][0:10]
v1 = ['%.1f' % (float(i) / 100000000) for i in df['shares'][0:10]]

bar = Bar(init_opts=opts.InitOpts(width="1000px", height="600px"))
bar.add_xaxis(list(reversed(attr.tolist())))
bar.add_yaxis("", list(reversed(v1)),color = '#84E0E3')

bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Number of shares TOP10( Billion )", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    xaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)),
                    yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=30))
                   )
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right", color="black"))
bar.reversal_axis()
bar.render_notebook()

 

Out[44]:

In [28]:

#  Grouping sum 
likes_type_message = df.groupby(['category'])
likes_type_com = likes_type_message['likes'].agg(['sum'])
likes_type_com.reset_index(inplace=True)
#  Processing data 
dom = [{'name':name, 'value':num} for name, num in zip(likes_type_com['category'], likes_type_com['sum'])]

#  Initialize configuration 
treemap = TreeMap(init_opts=opts.InitOpts(width="1000px", height="600px",theme=ThemeType.LIGHT))
#  Add data 
treemap.add('', dom)
#  Set global configuration item , title 、 hold-all ( Download the pictures )
treemap.set_global_opts(title_opts=opts.TitleOpts(title=" Each type has tiktok. V Summary of likes ", pos_left="center", pos_top="5"),
                        toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                        legend_opts=opts.LegendOpts(is_show=False),
                        
                       )
treemap.render_notebook()

 

Out[28]:

In [32]:

dom = []
fans_type_message = df.groupby(['category'])
fans_type_com = fans_type_message['fans'].agg(['sum'])
fans_type_com.reset_index(inplace=True)
for name, num in zip(fans_type_com['category'], fans_type_com['sum']):
    data = {}
    data['name'] = name
    data['value'] = num
    dom.append(data)

treemap = TreeMap(init_opts=opts.InitOpts(width="1000px", height="600px",theme=ThemeType.LIGHT))
treemap.add('', dom)
treemap.set_global_opts(title_opts=opts.TitleOpts(title=" Each type has tiktok. V Summary of fans ", pos_left="center", pos_top="5"),
                        toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                        legend_opts=opts.LegendOpts(is_show=False))
treemap.set_series_opts(treemapbreadcrumb_opts=opts.TreeMapBreadcrumbOpts(is_show=False))
treemap.render_notebook()

 

Out[32]:

In [4]:

#  Screening 
df = df[df['videos'] > 0]
#  Calculate the average likes of a single video 
df.eval('result = likes/(videos*10000)', inplace=True)
df['result'] = df['result'].round(decimals=1)
df = df.sort_values('result', ascending=False)

#  take TOP10
attr = df['name'][0:10]
v1 = ['%.1f' % (float(i)) for i in  df['result'][0:10]]

#  Initialize configuration 
bar = Bar(init_opts=opts.InitOpts(width="1000px", height="600px"))
#  Add data 
bar.add_xaxis(list(reversed(attr.tolist())))
bar.add_yaxis("", list(reversed(v1)),color = '#84E0E3')
#  Set global configuration item , title 、 hold-all ( Download the pictures )、y Axis split line 
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Average video likes TOP10( ten thousand )", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    xaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)))
#  Set series configuration items 
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="right", color="black"))
#  Flip xy Axis 
bar.reversal_axis()
bar.render_notebook()

 

Out[4]:

In [13]:

#  Filter data 
df = df[df["country"] == " China "]
df1 = df.copy()
#  Data substitution 
df1["province"] = df1["province"].str.replace(" province ", "").str.replace(" Zhuang Autonomous Region ", "").str.replace(" The Uygur Autonomous Region ", "").str.replace(" Autonomous region ", "")
#  Group count 
df_num = df1.groupby("province")["province"].agg(count="count")
df_province = df_num.index.values.tolist()
df_count = df_num["count"].values.tolist()

#  Initialize configuration 
map = Map(init_opts=opts.InitOpts(width="1000px", height="600px"))
#  Map of China 
map.add("", [list(z) for z in zip(df_province, df_count)], "china")
#  Set global configuration item , title 、 hold-all ( Download the pictures )、 Color legend 
map.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Distribution of provinces ", pos_left="center", pos_top="0"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    #  Set value range 0-600,is_piecewise Label value continuous 
                    visualmap_opts=opts.VisualMapOpts(max_=600, is_piecewise=False))
map.render_notebook()

 

Out[13]:

In [17]:

df1 = df[(df["school"] != "") & (df["school"] != " Graduated ") & (df["school"] != " Unknown ")]
df1 = df1.copy()
df_num = df1.groupby("school")["school"].agg(count="count").reset_index().sort_values(by="count", ascending=False)
df_school = df_num[:10]["school"].values.tolist()
df_count = df_num[:10]["count"].values.tolist()

#  Initialize configuration 
bar = Bar(init_opts=opts.InitOpts(width="1200px", height="400px"))
bar.add_xaxis(df_school)
bar.add_yaxis("", df_count,color = '#84E0E3')
bar.set_global_opts(title_opts=opts.TitleOpts(title=" Tiktok V Graduation school TOP10", pos_left="center", pos_top="18"),
                    toolbox_opts=opts.ToolboxOpts(is_show=True, feature={"saveAsImage": {}}),
                    yaxis_opts=opts.AxisOpts(splitline_opts=opts.SplitLineOpts(is_show=True)))
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=True, position="top", color="black"))
bar.render_notebook()

 

Out[17]:

In [21]:

"""
 Generate title and summary word cloud 
"""
words = pd.read_csv('../file/chineseStopWords.txt', encoding='gbk', sep='\t', names=['stopword'])
#  participle 
text = ''
df1 = df[df["signature"] != ""]
df1 = df1.copy()
for line in df1['signature']:
    text += ' '.join(jieba.cut(str(line).replace(" ", ""), cut_all=False))
#  Stop words 
stopwords = set('')
stopwords.update(words['stopword'])
backgroud_Image = plt.imread('../file/douyin.png')
#  Use tiktok background color 
alice_coloring = np.array(Image.open(r"../file/douyin.png"))
image_colors = ImageColorGenerator(alice_coloring)
wc = WordCloud(
    background_color='white',
    mask=backgroud_Image,
    font_path='../file/simhei.ttf',
    max_words=2000,
    max_font_size=70,
    min_font_size=1,
    prefer_horizontal=1,
    color_func=image_colors,
    random_state=50,
    stopwords=stopwords,
    margin=5
)
wc.generate_from_text(text)
wc.to_file('../file/douyin_word.png')
print(' Word cloud generated successfully !')
 Click and drag to move 
Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\10076\AppData\Local\Temp\jieba.cache
Loading model cost 0.550 seconds.
Prefix dict has been built successfully.
c:\users\10076\appdata\local\programs\python\python38\lib\site-packages\wordcloud\wordcloud.py:995: UserWarning: mask image should be unsigned byte between 0 and 255. Got a float array
  warnings.warn("mask image should be unsigned byte between 0"

 

Word cloud generated successfully !

Data acquisition

The data comes from third-party monitoring , There is a total of 5000+ Tiktok V Data information ( I'll put the required documents at the end of the article , For download )

 

2

It mainly includes large V Our nicknames 、 Gender 、 place 、 type 、 Number of likes 、 Number of fans 、 Number of videos 、 comments 、 Number of shares 、 Pay attention to several 、 Graduation school 、 authentication 、 Introduction and other information .

One of the biggest fans is 「 The People's Daily 」, near 1.2 Billion .「 CCTV news 」 More than 100 million , I remember when I broke 100 million yuan, I was still searching for overheat ~

The bloggers with the least fans also have close friends 150w+ A fan of , this 5000 Multi position big V Cumulative 236.5 Billion fans , More than three times the population of the earth !
( Because of time , These are big V The number of fans must be higher than that )

Data visualization & analysis

Import third-party library , Then read the data

from pyecharts.charts import Pie, Bar, TreeMap, Map, Geo
from wordcloud import WordCloud, ImageColorGenerator
from pyecharts import options as opts
from pyecharts.globals import ThemeType
import matplotlib.pyplot as plt
from PIL import Image
import pandas as pd
import numpy as np
import jieba

df = pd.read_csv('../file/douyin.csv',encoding = 'utf-8-sig')
df.head()

 

Python

Copy

Running results :

 

3

Gender distribution

Gender distribution CODE

 

Tiktok V Gender distribution

 

On the whole , There is little difference in the ratio of men to women .

Remove unknown data , Basic is 1:1

Number of likes

Tiktok V Number of likes CODE


 

 

Tiktok V Number of likes TOP10( Billion )

Number of likes TOP10, except 「 Little League 」 and 「 A poisonous tongue 」, Others are big news media V.

This year, because of the epidemic , Tiktok is the first time that many news reports are on the pitch. , So it has a great influence , I like it a lot .

Remember 「 Sichuan observation 」 Also ridiculed by the comment area as looking around , It means that news is released very quickly .

Tiktok V Distribution of likes CODE


 

 

Tiktok V Distribution of likes ( ten thousand )


 

There are more than 100 million praises 500 Many big V,1000 Wan to 5000 Ten thousand likes V The largest number

Number of fans

Tiktok V Number of fans CODE


 

 

Tiktok V Number of fans TOP10( ten thousand )


 

「 The People's Daily 」 and 「 CCTV news 」 Fans have broken 100 million .

A comparison with last year's quiver tiktok data. ,「 Hot bar 」 Hundreds of thousands of fans are missing , Chen he has a lot of fans .

This year's live broadcast is very popular , Li Jiaqi ranked in the top ten , No wonder , After all, a brother with goods .

Tiktok V Distribution of fans CODE


 

 

Tiktok V Distribution of fans ( ten thousand )

5000 All the above 56 individual , The right big man .

200w~500w The largest number of people , A lot of bloggers who are on fire for a while , After a period of time, it didn't increase much .

It's probably all here , For example, it has been brushed before 「 Three flowers 」, If you don't understand it, it can be a fire ...

comments

comments CODE


 

 

Tiktok V comments TOP10( Billion )


 

Tiktok video review area is also more interesting. .

For example, it's more urgent ,「 Go ahead and update , It's been more than ten minutes , The donkeys in the production team dare not rest for such a long time 」.

And five cats shaking their heads crazy , Also occupied the comment area for a while .

What's more distinctive is @ My friends , Remind them to watch a video , This is probably a tiktok culture. .

in general , There are many media video comments .

Number of shares

Number of shares CODE


 

 

Tiktok V Number of shares TOP10( Billion )


 

Tiktok sharing is a way of video propagation. , More people can see the video .

From the data point of view , We still prefer to share news and food videos .

It may be the new year's Eve , One month at home , Besides ge you, I'm reading the news , It's eating .

everyone , They all have a dream of becoming a chef .

Likes by type / Distribution of fans

Each type has tiktok. V Summary of likes CODE


 

 

Each type has tiktok. V Summary of likes

Each type has tiktok. V Summary of fans CODE


 

 

Each type has tiktok. V Summary of fans


 

I remember once a big man said , Tiktok is killing your time. (Kill Time), Instead of saving time (Save Time), Video with a little deeper technology can't survive .

You can see from the rectangular tree above , Everyone likes it 「 beauty 」 Type of video , After all, who doesn't like beautiful girls ~

For example, look at the sister of the copper man affectionately 、 A girl full of stars for college entrance examination , Peng 16 elf wait , There are too many videos of girls exploding ...

in addition 「 Funny 」、「 game 」、「 The plot 」 Class video is also more attractive , Proper Kill Time.

Tiktok V Graduation school

Tiktok V Graduation school CODE


 

 

Tiktok V Graduation school TOP10


 

North shadow 、 Middle passage 、 Zhejiang biography 、 Chinese opera 、 Play 、 Yangmei , A big man in the show business .

Tiktok V Distribution of provinces

Tiktok V Distribution of provinces CODE


 

 

Tiktok V Distribution of provinces


 

It can be seen that Tibet is a big city V None , So there's no color .

guangdong 、 Zhejiang 、 Sichuan ranked in the top three .

Tiktok V Introduction to CI Yun

Tiktok V Introduction to CI Yun CODE


 

 

douyin_word


 

You can see most of the big ones V All left the information of business cooperation , Good for content creators , Only in this way can we win-win situation .

According to statistics , Tiktok 2200 More than 10000 creators have achieved more than 417 A hundred million dollars in revenue .

From creation to entrepreneurship , This sentence is tiktok very well. .

At the end

Don't let tiktok kill most of your time. , After all, there are many things more interesting than tiktok.

copyright notice
author[Python Programming],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2021/08/20210822061933988e.html

Random recommended