python crawler produces novel coronavirus epidemic map -pyecharts1.7 version

Need to know:

(1) The map making of pyechart 1.0 or above is different from that of pyechart 1.0 or below. Some methods of updating pyechart RS in the lower version cannot be used. They are not compatible with each other. The epidemic map was made in pyechart version 1.7.
(2) pyecharts1.0 version of the following epidemic map production please see my blog other designated map production content.

Background process:

1. Get epidemic information data through python crawler;
2. python processes the epidemic data;
3. Map the epidemic situation.

1. Crawler get data:

The details of the disease are: Dr. Ding Xiangyuan - Dr. Xiang Xiang's disease page
Website address: https://3g.dxy.cn/newh5/view/pneumatia_peopelapp

  1. Guide Kit:
from bs4 import BeautifulSoup
from urllib.request import urlopen

What is bs4 package?
bs4 is Beautiful Soup. The main function of Beautiful Soup is to grab data from web pages. Beautiful Soup automatically converts the input document to Unicode encoding and the output document to utf-8 encoding.
Beautiful Soup is an HTML/XML parser. Its main function is how to parse and extract HTML/XML data.

Generally speaking: extract the web content, return the web string to us, then use the interface to generate an object from the web string, and then extract the data through the method of this object.
urlopen method
The urllib.request.urlopen() function is used to access the target url.

  1. Import the urlopen function to read the content of the web page. If there is Chinese in the web page, use "utf-8" to decode it
html = urlopen(
   "https://3g.dxy.cn/newh5/view/pneumonia_peopleapp"
).read().decode('utf-8')
#Get the source code of html page
bs= BeautifulSoup(html,"html.parser")
print(bs.body)

Because we can see from the source code that the epidemic data of each province we need is in the body tag, so we can check the data in the body.
In this way, we get the epidemic string we want from the web page.

2. To process the acquired epidemic data:

① To operate on the source code:
str1=bs.body.text  
print(str1)  #Output to see everything in the body string
#Find the keyword of the data corresponding to the specified domestic province in the string and intercept it
str1=str1[str1.find('window.getAreaStat = '):]    
data = str1[str1.find('[{'):str1.find('}catch')]
data_list=eval(data)    #String to dictionary array
print(type(data_list))  #View type
print(data_list)

Here, the string is intercepted according to the keyword specified in the string.
str1=str1[str1.find('window.getAreaStat = '):]
The existing diagnosis data of domestic provinces we need is shown in the following string: print(str1).

data = str1[str1.find('[{'):str1.find('}catch')]
The meaning of this sentence is to find the string from "[{" to "} catch" in str1, and assign it to data after finding it
eval is to convert a string to a dictionary array

② Two dictionaries are defined to store provincial diagnosis data:

New dict represents the existing diagnosis
New dict1 represents general diagnosis

new_dict={}   #Number of confirmed cases in provinces
new_dict1={}  #Number of confirmed cases in provinces
③ Traverse data to obtain diagnosis and existing diagnosis:
#Cycle through data list to get data {Province: number of confirmed cases}
for province in data_list:
    #Put the existing confirmed number of provinces into the new dict dictionary, and process the unqualified province name replace
    new_dict[province['provinceName'].replace('Autonomous Region','').replace('Hui nationality','').replace('Uygur','').replace('province','').replace('city','').replace('Zhuang Nationality','')] = province['currentConfirmedCount']
    #Number of confirmed cases in provinces
    new_dict1[province['provinceName'].replace('Autonomous Region','').replace('Hui nationality','').replace('Uygur','').replace('province','').replace('city','').replace('Zhuang Nationality','')] = province['confirmedCount']

print(new_dict)
print(new_dict1)

By looking at the source code and comparing the real-time data, we can see that currentConfirmedCount refers to the number of existing diagnoses,
confirmedCount represents the total number of confirmed cases.
The replace() method is also used to replace the unqualified province name. Because when we use the map to make a map, the name of the province is very strict. If the names are not unified, the corresponding data will not be displayed.

Partial data of diagnosis and existing diagnosis are as follows:

3, making China novel coronavirus pneumonia map

Current diagnosis map of epidemic situation in China:

1. Guide Package:

from pyecharts.charts import Map,Geo
from pyecharts import options as opts
from pyecharts.globals import ThemeType#theme

2. Map code details:

'''
pyecharts 1.7 Edition map Writing method
//Current diagnosis in China
'''
province=list(new_dict.keys())   #Extract the province key in the dictionary in the form of a list
values=list(new_dict.values())   #Take out the confirmed values in the dictionary in the form of a list
list1 = [[province[i],values[i]] for i in range(len(province))] #List generation is used here
map_1 = Map()   #Map()Medium, init_opts=opts.InitOpts(theme=ThemeType.ROMANTIC)Set the theme, bg_color="#Ebeeb "set map background color
map_1.set_global_opts(
    title_opts=opts.TitleOpts(title="China nCoV Current diagnosis of pneumonia",pos_left="left"),
    # visualmap_opts=opts.VisualMapOpts(max_=50),#Maximum data range
    visualmap_opts=opts.VisualMapOpts(
    is_piecewise=True,  # Set whether to display in segments
    # Customize the range of each paragraph, as well as the text of each paragraph, and the special style of each paragraph. For example:
    pieces=[
    {"max":0,"label":"0 people","color":"#FFFFFF"},
    {"min":1,"max":9,"label":"1-10 people","color":"#FFEBCD"},
    {"min":10,"max":99,"label":"10-99 people","color":"#FFA07A"},
    {"min":100,"max":499,"label":"100-499 people","color":"#FF4040"},
    {"min":500,"max":999,"label":"500-999 people","color":"#CD2626"},
    {"min":1000,"max":10000,"label":"1000-10000 people","color":"#B22222"},
    {'min':10000,"label":">10000 people","color":"#8B1A1A"}  #Do not specify max, which means Max is infinite
    ] )
)
map_1.add("Current diagnosis data in China", list1, maptype="china", is_map_symbol_show=False)#Set whether to display small red dots on the map
map_1.render("G:/map of China-Existing diagnosis.html")

The above code uses list generation. If you don't understand it, you can check my blog about list generation.
Is ﹣ piece wise = true, ᦇ set whether to display in sections (Data & color, different data in different sections show different colors)
Visualmap? Opts = opts.visualmap? Opts (max? 50) is set to continuous display, Max is the maximum value, and the color display is gradual.
Set not to display small red dots on the map

The effect is as follows:

Cumulative diagnosis map of epidemic situation in China:

province1=list(new_dict1.keys())   #Extract the province key in the dictionary in the form of a list
values1=list(new_dict1.values())   #Take out the confirmed values in the dictionary in the form of a list
list2 = [[province1[i],values1[i]] for i in range(len(province1))] #List generation is used in the second place
map_2 = Map()   #Map()Medium, init_opts=opts.InitOpts(theme=ThemeType.ROMANTIC)Set the theme, bg_color="#Ebeeb "set map background color
map_2.set_global_opts(
    title_opts=opts.TitleOpts(title="China nCoV Confirmed picture of pneumonia",pos_left="left"),
    visualmap_opts=opts.VisualMapOpts(
    is_piecewise=True,  # Set whether to display in segments
    # Customize the range of each paragraph, as well as the text of each paragraph, and the special style of each paragraph. For example:
    pieces=[
    {"max":0,"label":"0 people","color":"#FFFFFF"},
    {"min":1,"max":9,"label":"1-10 people","color":"#FFEBCD"},
    {"min":10,"max":99,"label":"10-99 people","color":"#FFA07A"},
    {"min":100,"max":499,"label":"100-499 people","color":"#EE5C42"},
    {"min":500,"max":999,"label":"500-999 people","color":"#CD3333"},
    {"min":1000,"max":10000,"label":"1000-10000 people","color":"#A52A2A"},
    {'min':10000,"label":">10000 people","color":"#8B0000"}  #Do not specify max, which means Max is infinite
    ] )
)
map_2.add("Confirmed data in China", list2, maptype="china", is_map_symbol_show=False)#Set whether to display small red dots on the map
map_2.render("G:/map of China-Diagnosis.html")

The effect is as follows:

Real time data of Ding Xiangyuan web page:

The above is the epidemic area where the data is simply crawled and made. Welcome to give advice.
If you want to see the map of the world or the map of provinces, you can follow my other blog content.

Published 39 original articles, won praise 18, visited 704
Private letter follow

Tags: Python encoding xml

Posted on Tue, 10 Mar 2020 07:27:30 -0400 by chyan