current position:Home>Using Python to realize national second-hand housing data capture + map display
Using Python to realize national second-hand housing data capture + map display
2021-08-23 19:36:58 【The second brother is not like a programmer】
The recent introduction of various policies , The price of second-hand houses fluctuates greatly , In this article, the second brother will lead you through the chain of second-hand houses as an example , A brief analysis of the second-hand housing prices in many parts of the country .
【 I suggest you like it first 、 Re collection 】
One 、 Ideas ️
Want to get the information of second-hand houses in Lianjia country , First, let's go to the relevant second-hand housing page and have a look ( Take Beijing for example ):
Here you can see , We can see the second-hand housing information in Beijing , But there are no options for other provinces and cities , So go back to the home page and find options for major cities , By clicking the city button in the upper left corner of the home page , You can enter the relevant provinces - Page city :
With provinces - After the city page , We can get the information of each city through this page url Information , Then visit each url Just grab the second-hand housing data .
The overall process is as follows
Two 、 Get city information ️
When getting city information , We get directly to the city page HTML Just parse , Here because HTML The information structure of some provinces in China is different , Therefore, it is analyzed that the information of most provinces is used .
The code for obtaining city information is as follows :
import random
import time
import csv
import requests
from lxml import etree
import pandas as pd
# Get each province , City Information
def city(i, j):
try:
p1 = "//li[@class='city_list_li city_list_li_selected'][{}]/div[@class='city_list']/div[@class='city_province']/div[@class='city_list_tit c_b']/text()".format(
i)
province = et.xpath(p1)[0]
cn1 = "//li[@class='city_list_li city_list_li_selected'][{}]/div[@class='city_list']/div[@class='city_province']/ul/li[{}]/a/text()".format(
i, j)
city_name = et.xpath(cn1)[0]
cu1 = "//li[@class='city_list_li city_list_li_selected'][{}]/div[@class='city_list']/div[@class='city_province']/ul/li[{}]/a/@href".format(
i, j)
city_url = et.xpath(cu1)[0]
except:
return 0, 0, 0
return province, city_name, city_url
# Generating Province - City -URL Dictionaries
dic1 = {
}
count = 1
for i in range(1, 15):
for j in range(1, 6):
province, city_name, city_url = city(i, j)
if province != 0:
dic1[count] = [province, city_name, city_url]
count += 1
else:
pass
# dic1
The obtained results are as follows :
3、 ... and 、 Get second-hand housing data ️
With the home page information of each city , We can try to obtain multi city data by constructing the web site of second-hand houses , When constructing the second-hand house website, we only need to URL Suffix with ershoufang/pg{}/ that will do . With the website, we can obtain data in a normal way :
f = open(' National second-hand housing data .csv', 'a', encoding='gb18030')
write = csv.writer(f)
def parser_html(pr_ci, page, User_Agent):
headers = {
'User-Agent': User_Agent[random.randint(0,
len(User_Agent) - 1)]
}
for i in range(1, len(pr_ci) + 1):
province = pr_ci.get(i)[0]
city = pr_ci.get(i)[1]
url = pr_ci.get(i)[2] + 'ershoufang/pg{}/'.format(page)
print(url)
html = requests.get(url=url, headers=headers).text
eobj = etree.HTML(html)
li_list = eobj.xpath("//li[@class='clear LOGVIEWDATA LOGCLICKDATA']")
for li in li_list:
title_list = li.xpath(".//div[@class='title']/a/text()")
title = title_list[0] if title_list else None
name_list = li.xpath(".//div[@class='positionInfo']/a[1]/text()")
name = name_list[0] if name_list else None
area_list = li.xpath(".//div[@class='positionInfo']/a[2]/text()")
area = area_list[0] if area_list else None
info_list = li.xpath(".//div[@class='houseInfo']/text()")
info = info_list[0] if info_list else None
if info:
model = size = face = decorate = floor = year = type1 = None
info_list1 = info.split("|")
for i in info_list1:
if ' room ' in i:
model = i
elif ' Square meters ' in i:
size = i
elif ' In the east ' in i or ' In the west ' in i or ' south ' in i or ' north ' in i:
face = i
elif ' loading ' in i or ' hair ' in i:
decorate = i
elif ' layer ' in i:
floor = i
elif ' year ' in i:
year = i
elif ' plate ' in i or ' tower ' in i:
type1 = i
else:
pass
else:
model = size = face = decorate = floor = year = type1 = None
follow_list = li.xpath(".//div[@class='followInfo']/text()")
follow = follow_list[0].split(
'/')[0].strip() if follow_list else None
time1 = follow_list[0].split(
'/')[1].strip() if follow_list else None
price_list = li.xpath(".//div[@class='totalPrice']/span/text()")
price = price_list[0] + ' ten thousand ' if price_list else None
unit_list = li.xpath(".//div[@class='unitPrice']/span/text()")
unit = unit_list[0][2:-4] if unit_list else None
# Specific cities + Building information
list1 = [
province, city, url, title, name, area, model, size, face,
decorate, floor, year, type1, follow, time1, price, unit
]
write.writerow(list1)
time.sleep(random.randint(2, 5))
def serve_forever():
write.writerow([
'province', 'city', 'url', 'title', 'name', 'area', 'model', 'size',
'face', 'decorate', 'floor', 'year', 'type', 'follow', 'time', 'price',
'unit'
])
try:
for i in range(1, 3):
parser_html(dic1, i, User_Agent)
time.sleep(random.randint(1, 3))
except:
pass
The data after crawling are as follows :
Four 、 mapping ️
Since we are capturing national data , So the best way to present the data is to display the data through the map , Here we take the number of houses as an example to show the map , You can replace other dimensions with data .
The implementation is as follows :
from pyecharts import options as opts
from pyecharts.charts import Geo
from pyecharts.faker import Faker
from pyecharts.globals import ChartType
import pandas as pd
ljdata = pd.read_csv(" National second-hand housing data .csv",encoding = 'gb18030')
pro_num = ljdata['province'].value_counts()
c = (
Geo()
.add_schema(maptype="china")
.add(
" Number of houses available ",
[list(z) for z in zip(pro_num.index, pro_num.values)],
type_=ChartType.HEATMAP,
)
.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
.set_global_opts(
visualmap_opts=opts.VisualMapOpts(),
title_opts=opts.TitleOpts(title="Geo-HeatMap"),
)
)
c.render_notebook()
c.render()
The results after running are as follows :
So far, our data acquisition + The visualization is complete .
️ I like it !️
Collection !
Attention !
copyright notice
author[The second brother is not like a programmer],Please bring the original link to reprint, thank you.
https://en.pythonmana.com/2021/08/20210823193648116i.html
The sidebar is recommended
- [Python introduction project] use Python to generate QR code
- Compile D + +, and use d to call C from python
- Quickly build Django blog based on function calculation
- Python collects and monitors system data -- psutil
- Finally, this Python import guide has been sorted out. Look!
- Quickly build Django blog based on function calculation
- Python interface test unittest usage details
- Implementation of top-level design pattern in Python
- You can easily get started with Excel. Python data analysis package pandas (VII): breakdown
- Python simulation random coin toss (non optimized version)
guess what you like
-
Python tiktok 5000+ V, and found that everyone love this video.
-
Using linear systems in python with scipy.linalg
-
Using linear systems in python with scipy.linalg
-
Together with Python to do a license plate automatic recognition system, fun and practical!
-
You can easily get started with Excel. Python data analysis package pandas (XI): segment matching
-
Advanced practical case: Javascript confusion of Python anti crawling
-
Using linear systems in python with scipy.linalg
-
Fast power modulus Python implementation of large numbers
-
Quickly build Django blog based on function calculation
-
This paper clarifies the chaotic switching operation and elegant derivation of Python
Random recommended
- You can easily get started with Excel pandas (I): filtering function
- You can easily get started with Excel. Python data analysis package pandas (II): advanced filtering (I)
- You can easily get started with Excel. Python data analysis package pandas (2): advanced filtering (2)
- You can easily get started with Excel. Python data analysis package pandas (3): making score bar
- Test Development: self study Dubbo + Python experience summary and sharing
- You can easily get started with Excel. Python data analysis package pandas (V): duplicate value processing
- How does Python correctly call jar package encryption to get the encrypted value?
- Python 3 interview question: give an array. If there is 0 in the array, add a 0 after 0, and the overall array length remains the same
- Python simple Snake game (single player mode)
- Using linear systems in python with scipy.linalg
- Python executes functions and even code through strings! Come and understand the operation of such a top!
- Decoding the verification code of Taobao slider with Python + selenium, the road of information security
- [Python introduction project] use Python to generate QR code
- Vanessa basks in her photos and gets caught up in the golden python. There are highlights in the accompanying text. She can't forget Kobe after all
- [windows] Python installation pyteseract
- [introduction to Python project] create bar chart animation in Python
- Fundamentals of Python I
- Python series tutorials 116
- Python code reading (chapter 35): fully (deeply) expand nested lists
- Practical series 1 ️⃣ Wechat applet automatic testing practice (with Python source code)
- Python Basics: do you know how to use lists?
- Solution of no Python 3.9 installation was detected when uninstalling Python
- [Python homework] coupling network information dissemination
- [common links of Python & Python]
- Python application software development tool - tkinterdesigner v1.0 5.1 release!
- [Python development tool tkinterdiesigner]: example: develop stock monitoring alarm using Tkinter desinger
- [Python development tool Tkinter designer]: Lecture 2: introduction to Tkinter designer's example project
- [Python development tool Tkinter designer]: Lecture 1: introduction to the basic functions of Tkinter Designer
- [introduction to Python tutorial] use Python 3 to teach you how to extract any HTML main content
- Python socket implements UDP server and client
- Python socket implements TCP server and client
- leetcode 1261. Find Elements in a Contaminated Binary Tree(python)
- [algorithm learning] 1486 Array XOR operation (Java / C / C + + / Python / go / trust)
- leetcode 1974. Minimum Time to Type Word Using Special Typewriter(python)
- The mobile phone uses Python to operate picture files
- [learning notes] Python exception handling try except...
- Two methods of using pandas to read poorly structured excel. You're welcome to take them away
- Python sum (): the summation method of Python
- Practical experience sharing: use pyo3 to build your Python module
- Using Python to realize multitasking process