current position:Home>Python crawler actual combat, pyecharts module, python realizes China Metro data visualization
Python crawler actual combat, pyecharts module, python realizes China Metro data visualization
2022-02-01 16:20:50 【Dai mubai】
「 This is my participation 11 The fourth of the yuegengwen challenge 27 God , Check out the activity details :2021 One last more challenge 」.
Preface
utilize Python Realize the visualization of China Metro data . I don't say much nonsense .
Let's start happily ~
development tool
Python edition : 3.6.4
Related modules :
requests modular ;
wordcloud modular ;
pandas modular ;
numpy modular ;
jieba modular ;
pyecharts modular ;
matplotlib modular ;
As well as some Python Built in modules .
Environment building
install Python And add to environment variable ,pip Install the relevant modules required .
This time, through the acquisition of subway line data , Visual analysis of urban distribution data .
Analyze and obtain
Subway information is obtained from Gaode map .
The above mainly obtains the city's 「id」,「cityname」 And 「 name 」.
URL for splicing request , Then get the specific information of subway lines .
Request information found , Get the details of subway lines and stations in each city .
get data
Specific code
import json
import requests
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}
def get_message(ID, cityname, name):
""" Metro line information acquisition """
url = 'http://map.amap.com/service/subway?_1555502190153&srhdata=' + ID + '_drw_' + cityname + '.json'
response = requests.get(url=url, headers=headers)
html = response.text
result = json.loads(html)
for i in result['l']:
for j in i['st']:
# Judge whether there is a subway line
if len(i['la']) > 0:
print(name, i['ln'] + '(' + i['la'] + ')', j['n'])
with open('subway.csv', 'a+', encoding='gbk') as f:
f.write(name + ',' + i['ln'] + '(' + i['la'] + ')' + ',' + j['n'] + '\n')
else:
print(name, i['ln'], j['n'])
with open('subway.csv', 'a+', encoding='gbk') as f:
f.write(name + ',' + i['ln'] + ',' + j['n'] + '\n')
def get_city():
""" City information acquisition """
url = 'http://map.amap.com/subway/index.html?&1100'
response = requests.get(url=url, headers=headers)
html = response.text
# code
html = html.encode('ISO-8859-1')
html = html.decode('utf-8')
soup = BeautifulSoup(html, 'lxml')
# City list
res1 = soup.find_all(class_="city-list fl")[0]
res2 = soup.find_all(class_="more-city-list")[0]
for i in res1.find_all('a'):
# City ID value
ID = i['id']
# City Pinyin name
cityname = i['cityname']
# City name
name = i.get_text()
get_message(ID, cityname, name)
for i in res2.find_all('a'):
# City ID value
ID = i['id']
# City Pinyin name
cityname = i['cityname']
# City name
name = i.get_text()
get_message(ID, cityname, name)
if __name__ == '__main__':
get_city()
Copy code
Obtain data and display results
3541 A subway station
Data visualization
Clean the data first , Remove duplicate transfer station information .
from wordcloud import WordCloud, ImageColorGenerator
from pyecharts import Line, Bar
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import jieba
# Set column names to align with data
pd.set_option('display.unicode.ambiguous_as_wide', True)
pd.set_option('display.unicode.east_asian_width', True)
# Show 10 That's ok
pd.set_option('display.max_rows', 10)
# Reading data
df = pd.read_csv('subway.csv', header=None, names=['city', 'line', 'station'], encoding='gbk')
# Subway lines in various cities
df_line = df.groupby(['city', 'line']).count().reset_index()
print(df_line)
Copy code
Grouping by city and subway lines , Get the total number of subway lines in the country .
183 A subway line
def create_map(df):
# Make a map
value = [i for i in df['line']]
attr = [i for i in df['city']]
geo = Geo(" Distribution of metro cities opened ", title_pos='center', title_top='0', width=800, height=400, title_color="#fff", background_color="#404a59", )
geo.add("", attr, value, is_visualmap=True, visual_range=[0, 25], visual_text_color="#fff", symbol_size=15)
geo.render(" Distribution of metro cities opened .html")
def create_line(df):
""" Generate the number and distribution of urban subway lines """
title_len = df['line']
bins = [0, 5, 10, 15, 20, 25]
level = ['0-5', '5-10', '10-15', '15-20', '20 above ']
len_stage = pd.cut(title_len, bins=bins, labels=level).value_counts().sort_index()
# Generate a histogram
attr = len_stage.index
v1 = len_stage.values
bar = Bar(" Number and distribution of subway lines in each city ", title_pos='center', title_top='18', width=800, height=400)
bar.add("", attr, v1, is_stack=True, is_label_show=True)
bar.render(" Number and distribution of subway lines in each city .html")
# Number of subway lines in each city
df_city = df_line.groupby(['city']).count().reset_index().sort_values(by='line', ascending=False)
print(df_city)
create_map(df_city)
create_line(df_city)
Copy code
City data of Metro has been opened , And the number of subway lines in each city .
32 Cities open subways
Urban distribution
Most of them are provincial capitals , There are also some cities with strong economic strength .
Number and distribution of lines
You can see that most of them are still 「0-5」 At this stage , Of course, at least 1 line .
# Which city has the most subway stations on which line
print(df_line.sort_values(by='station', ascending=False))
Copy code
Which city has the most subway stations on which line
Beijing 10 Line 1 , Chongqing 3 Line 2
Remove duplicate transfer station data
# Remove the subway data of repeated transfer stations
df_station = df.groupby(['city', 'station']).count().reset_index()
print(df_station)
Copy code
contain 3034 A subway station
Reduce the proximity 400 A subway station
Next, let's see which city has the most subway stations
# Count the number of subway stations in each city ( Duplicate transfer stations have been removed )
print(df_station.groupby(['city']).count().reset_index().sort_values(by='station', ascending=False))
Copy code
There are so many subway stations in Wuhan
Realize the operation in the new weekly , Generate Metro noun cloud
def create_wordcloud(df):
""" Generate Metro noun cloud """
# participle
text = ''
for line in df['station']:
text += ' '.join(jieba.cut(line, cut_all=False))
text += ' '
backgroud_Image = plt.imread('rocket.jpg')
wc = WordCloud(
background_color='white',
mask=backgroud_Image,
font_path='C:\Windows\Fonts\ Huakangli golden black W8.TTF',
max_words=1000,
max_font_size=150,
min_font_size=15,
prefer_horizontal=1,
random_state=50,
)
wc.generate_from_text(text)
img_colors = ImageColorGenerator(backgroud_Image)
wc.recolor(color_func=img_colors)
# What are the high frequency words
process_word = WordCloud.process_text(wc, text)
sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True)
print(sort[:50])
plt.imshow(wc)
plt.axis('off')
wc.to_file(" Subway noun cloud .jpg")
print(' Word cloud generated successfully !')
create_wordcloud(df_station)
Copy code
Cloud map of exhibition words
copyright notice
author[Dai mubai],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2022/02/202202011620488753.html
The sidebar is recommended
- Python learning notes - the fifth bullet * class & object oriented
- Python learning notes - the fourth bullet IO operation
- Python crawler actual combat: crawl all the pictures in the answer
- Quick reference manual of common regular expressions, necessary for Python text processing
- [Python] the characteristics of dictionaries and collections and the hash table behind them
- Python crawler - fund information storage
- Python crawler actual combat, pyteseract module, python realizes the visualization of boos direct employment & hook post data
- Pit filling summary: Python memory leak troubleshooting tips
- Python code reading (Chapter 61): delaying function calls
- Through the for loop, compare the differences between Python and Ruby Programming ideas
guess what you like
-
leetcode 1606. Find Servers That Handled Most Number of Requests(python)
-
leetcode 1611. Minimum One Bit Operations to Make Integers Zero(python)
-
06python learning notes - reading external text data
-
[Python] functions, higher-order functions, anonymous functions and function attributes
-
Python Networkx practice social network visualization
-
Data analysis starts from scratch, and pandas reads and writes CSV data
-
Python review (format string)
-
[pandas learning notes 01] powerful tool set for analyzing structured data
-
leetcode 147. Insertion Sort List(python)
-
apache2. 4 + windows deployment Django (multi site)
Random recommended
- Python data analysis - linear regression selection fund
- How to make a python SDK and upload and download private servers
- Python from 0 to 1 (day 20) - basic concepts of Python dictionary
- Django -- closure decorator regular expression
- Implementation of home page and back end of Vue + Django tourism network project
- Easy to use scaffold in Python
- [Python actual combat sharing] I wrote a GIF generation tool, which is really TM simple (Douluo continent, did you see it?)
- [Python] function decorators and common decorators
- Explain the python streamlit framework in detail, which is used to build a beautiful data visualization web app, and practice making a garbage classification app
- Construction of the first Django project
- Python crawler actual combat, pyecharts module, python realizes the visualization of river review data
- Python series -- web crawler
- Plotly + pandas + sklearn: shoot the first shot of kaggle
- How to learn Python systematically?
- Analysis on several implementations of Python crawler data De duplication
- leetcode 1616. Split Two Strings to Make Palindrome (python)
- Python Matplotlib drawing violin diagram
- Python crawls a large number of beautiful pictures with 10 lines of code
- [tool] integrated use of firebase push function in Python project
- How to use Python to statistically analyze access logs?
- How IOS developers learn Python Programming 22 - Supplement 1
- Python can meet any API you need
- Python 3 process control statement
- The 20th of 120 Python crawlers, 1637. All the way business opportunity network joined in data collection
- Datetime of pandas time series preamble
- How to send payslips in Python
- [Python] closure and scope
- Application of Python Matplotlib color
- leetcode 1627. Graph Connectivity With Threshold (python)
- Python thread 08 uses queues to transform the transfer scenario
- Python: simple single player strange game (text)
- Daily python, chapter 27, Django template
- TCP / UDP communication based on Python socket
- Use of pandas timestamp index
- leetcode 148. Sort List(python)
- Confucius old book network data collection, take one anti three learning crawler, python crawler 120 cases, the 21st case
- [HTB] cap (datagram analysis, setuid capability: Python)
- How IOS developers learn Python Programming 23 - Supplement 2
- How to automatically identify n + 1 queries in Django applications (2)?
- Data analysis starts from scratch. Pandas reads HTML pages + data processing and analysis