current position：Home>Are people who like drinking tea successful? I use Python to make a tea guide! Do you like it?
Are people who like drinking tea successful? I use Python to make a tea guide! Do you like it?
2022-01-30 06:35:01 【Programming small code farmer】
Throw a handful of green leaves , Sigh that the years are still like a dream , Note I Wang Qingquan , Pondering floating life in a variety of forms .
Tell me why I suddenly thought of writing Python Come on A small case like Jiancha ！ Because today, the leader called me to the office , We had two cups of tea together , But I'm not the kind of person who likes drinking tea , I haven't studied tea ！ So I'm going to give a tutorial today , It can also teach you Python It can also prevent the lack of this knowledge , At least some of the most common etiquette for drinking tea should be clear to yourself , Not to make a fool of yourself in the future ！
Read this article and the source code , You can learn with Xiaobian xpath Expression crawls data , Multi process crawling ,pandas Basic operation ,pyecharts visualization ,stylecloud The word cloud , Text cosine similarity ,KMeans, Keyword extraction algorithm ：TextRank,TF-IDF,LDA Theme model .
The source code is obtained at the end of the article
Xiaobian found a website related to tea ：
Enter tea review from the home page , You can see the basic information of all tea , The result is multiple pages , Get all the basic information, including the title , score , brand , Place of Origin , Tea , Detailed Links ,id：
Then according to the obtained link , Drill down and climb to get the recommended index of each kind of tea , General comment , All ranking ：
And crawl the corresponding comments , If you have more than one page, you can crawl more than one page , Include field reviewers , Reviewer rating , score , Comment on , Comment on time ：
Last saved as tea.csv,comment.csv Two csv：
The whole crawler process is like this , Used xpath extract , Multi process crawling , Logic is not complicated , See the source code for detailed implementation logic .
All in all 3w Data , Once you get the data, you can start exploring .
Check the title first , The title is composed of brand and name , Process to keep only the name part , Draw word clouds .
black tea , Baidudan , Tie Guanyin , Green tea , Maojian and others have heard a lot of tea names ：
The tea score is 0-10, Cut the score every two points and draw the histogram .
In terms of the results , The scores were very high , Only individual scores are lower than 4 Points of , Xiao Bian selected the data and looked at it , The general evaluation is not particularly friendly to these low-grade teas ：
Now basically every kind of tea has a special brand on sale , Make statistics on brands , Draw words .
Discovery of douji tea industry , Chinese tea , Great benefit , Tianfu tea is more prominent , Even if these brands don't know tea , But more or less I've heard and seen in the street ：
Each kind of tea has its unique origin , Draw a thermal map of the place of origin .
It is found that the origin comes from Yunnan , Up to thousands of , Xiao Bian checked , The most important origin of Yunnan tea , Yunnan is the oldest hometown of tea .
Followed by Fujian , It has a tea culture history of more than 1000 years , It is the most important tea producing area in China ：
At present, tea can be divided into Pu'er , Green tea , black tea , Wulong , Black tea , White tea , scented tea , Yellow tea , Bag bubble , Ten categories of instant tea , Each big category has many sub categories , Make statistics for each category and draw a histogram .
It is found that Pu'er tea has the most categories , Followed by green tea , black tea , Seeing here, Xiao Bian thought that he seldom drank Pu'er tea ：
Hot search can reflect whether a kind of tea is popular or not , Xiaobian selects the top of hot search 10 My tea , Pull out details .
It was found that the classic Pu'er tea ranked first , Pu'er is also the most diverse tea , You can buy some specially and try it later ：
The comment time is in the dimension of time, month and year , Comment trend chart year-on-year, year-on-year and month-on-month .
Find comment users 14-17 The annual activity level has been rising , Then it fell ：
Come here , The exploratory analysis is complete , Mainly used ,pandas,stylecloud,jieba,pyecharts These technologies , The detailed implementation process can refer to the source code .
In the data obtained , There is a general comment field , That is, comments on each kind of tea , There is a field for each user comment , These two fields are used to extract text keywords .
For the general comment , We want to divide the tea with similar general comments , have access to KMeans clustering algorithm , But the general comment is text data .
You need to extract the keywords in each general comment first , Used TextRank The algorithm extracts keywords , The principle is word segmentation based on sentences , Weight each word , Get a high score as a keyword .
Vectorization of keywords , Then calculate the cosine similarity , Finally, the clustering algorithm is used , There are two kinds .
Category 1 is mainly evaluated from the taste direction , aroma , Taste , entrance , Smooth, etc .
Category 2 is mainly evaluated from the appearance direction , shape , A rope , Colour and lustre , Raw materials, etc ：
Use... For comments first TF-IDF Algorithm for keyword extraction , Yes, there is TF,IDF The algorithm consists of two parts .
TF, Calculate the frequency of each word in all texts .
IDF, Calculate each word in all comments , In how many comments, how many times , Map a score .
Last TF*IDF Before selecting the score 10 Key words ：
The second method is to use the topic model LDA Keyword extraction , You need to determine the number of topics first , Then extract the keywords , Here we choose 1 A theme , And before 10 key word ：
The source code for
author[Programming small code farmer],Please bring the original link to reprint, thank you.
The sidebar is recommended
- [recalling the 1970s] using Python to repair the wonderful memories of parents' generation, black-and-white photos become color photos
- You used to know Python advanced
- Pyinstaller package Python project
- 2021 IEEE programming language rankings: Python tops the list!
- Implementation of Python automatic test control
- Python advanced: [Baidu translation reverse] graphic and video teaching!!!
- Do you know the fuzzy semantics in Python syntax?
- [Python from introduction to mastery] (XXVII) learn more about pilot!
- Playing excel office automation with Python
- Some applications of heapq module of Python module
guess what you like
Python and go languages are so popular, which is more suitable for you?
Python practical skills task segmentation
Python simulated Login, numpy module, python simulated epidemic spread
Python opencv contour discovery function based on image edge extraction
Application of Hoff circle detection in Python opencv
Python reptile test ox knife (I)
Day 1: learn the Django framework of Python development
django -- minio_ S3 file storage service
[algorithm learning] 02.03 Delete intermediate nodes (Java / C / C + + / Python / go)
Similarities and differences of five pandas combinatorial functions
- Learning in Python + opencv -- extracting corners
- Python beginner's eighth day ()
- Necessary knowledge of Python: take you to learn regular expressions from zero
- Get your girlfriend's chat records with Python and solve the paranoia with one move
- My new book "Python 3 web crawler development practice (Second Edition)" has been recommended by the father of Python!
- From zero to familiarity, it will take you to master the use of Python len() function
- Python type hint type annotation guide
- leetcode 108. Convert Sorted Array to Binary Search Tree（python）
- For the geometric transformation of Python OpenCV image, let's first talk about the extraordinary resize function
- leetcode 701. Insert into a Binary Search Tree （python）
- For another 3 days, I sorted out 80 Python datetime examples, which must be collected!
- Python crawler actual combat | using multithreading to crawl lol HD Wallpaper
- Complete a python game in 28 minutes, "customer service play over the president card"
- The universal Python praise machine (commonly known as the brushing machine) in the whole network. Do you want to know the principle? After reading this article, you can write one yourself
- How does Python compare file differences between two paths
- Common OS operations for Python
- [Python data structure series] linear table - explanation of knowledge points + code implementation
- How Python parses web pages using BS4
- How do Python Network requests pass parameters
- Python core programming - decorator
- Python Network Programming -- create a simple UPD socket to realize mutual communication between two processes
- leetcode 110. Balanced Binary Tree（python）
- Django uses Django celery beat to dynamically add scheduled tasks
- The bear child said "you haven't seen Altman" and hurriedly studied it in Python. Unexpectedly
- Optimization iteration of nearest neighbor interpolation and bilinear interpolation algorithm for Python OpenCV image
- Bilinear interpolation algorithm for Python OpenCV image, the most detailed algorithm description in the whole network
- Use of Python partial()
- Python game development, pyGame module, python implementation of angry birds
- leetcode 1104. Path In Zigzag Labelled Binary Tree（python）
- Save time and effort. 10 lines of Python code automatically clean up duplicate files in the computer
- Learn python, know more meat, and be a "meat expert" in the technical circle. One article is enough
- [Python data structure series] "stack (sequential stack and chain stack)" -- Explanation of knowledge points + code implementation
- Datetime module of Python time series
- Python encrypts and decrypts des to solve the problem of inconsistency with Java results
- Chapter 1: introduction to Python programming-4 Hello World
- Summary of Python technical points
- 11.5K Star！ An open source Python static type checking Library
- Chapter 2: Fundamentals of python-1 grammar
- [Python daily homework] day4: write a function to count the number of occurrences of each number in the incoming list and return the corresponding dictionary.
- Python uses turtle to express white