Use Python data analysis to choose mobile phones. You just chose the right one after the double 11

Preface

The text and pictures of the article are from the Internet, only for learning and communication, and do not have any commercial use. The copyright belongs to the original author. If you have any questions, please contact us in time for handling.

By

PS: if you need Python learning materials, you can click the link below to get them by yourself

http://note.youdao.com/noteshare?id=3054cce4add8a909e784ad934f956cef

Analytical thinking

The idea is very simple. Go to Jingdong Mall to crawl down the data of all mobile phones, then filter out the qualified mobile phones according to the configuration and price, and choose the one with the highest cost performance among the filtered mobile phones. Draw a flow chart, roughly like this

Crawling data

The first step is to crawl all mobile phone data on sale from Jingdong Mall. Here we are mainly concerned about the price and configuration information. The price and configuration information on the product page are shown in the following two figures

We write code to crawl the price and configuration information of all mobile phones. The core code of the crawler is as follows

 1 # Get the price of mobile phone items
 2 def get_price(skuid):
 3     url = "https://c0.3.cn/stock?skuId=" + str(skuid) + "&area=1_72_4137_0&venderId=1000004123&cat=9987,653,655&buyNum=1&choseSuitSkuIds=&extraParam={%22originid%22:%221%22}&ch=1&fqsp=0&pduid=15379228074621272760279&pdpin=&detailedAdd=null&callback=jQuery3285040"
 4     r = requests.get(url, verify=False)
 5     content = r.content.decode('GBK')
 6     matched = re.search(r'jQuery\d+\((.*)\)', content, re.M)
 7     if matched:
 8         data = json.loads(matched.group(1))
 9         price = float(data["stock"]["jdPrice"]["p"])
10         return price
11     return 0
12 13 # Get configuration information of mobile phone
14 def get_item(skuid, url):
15     price = get_price(skuid)
16     r = requests.get(url, verify=False)
17     content = r.content
18     root = etree.HTML(content)
19     nodes = root.xpath('.//div[@class="Ptable"]/div[@class="Ptable-item"]')
20     params = {"price": price, "skuid": skuid}
21     for node in nodes:
22         text_nodes = node.xpath('./dl')[0]
23         k = ""
24         v = ""
25         for text_node in text_nodes:
26             if text_node.tag == "dt":
27                 k = text_node.text
28             elif text_node.tag == "dd" and "class" not in text_node.attrib:
29                 v = text_node.text
30                 params[k] = v
31     return params
32 33 # Get all mobile phone information in one page
34 def get_cellphone(page):
35     url = "https://list.jd.com/list.html?cat=9987,653,655&page={}&sort=sort_rank_asc&trans=1&JL=6_0_0&ms=4#J_main".format(page)
36     r = requests.get(url, verify=False)
37     content = r.content.decode("utf-8")
38     root = etree.HTML(content)
39     cell_nodes = root.xpath('.//div[@class="p-img"]/a')
40     client = pymongo.MongoClient()
41     db = client[DB]
42     for node in cell_nodes:
43         item_url = fix_url(node.attrib["href"])
44         matched = re.search('item.jd.com/(\d+)\.html', item_url)
45         skuid = int(matched.group(1))
46         saved = db.items.find({"skuid": skuid}).count()
47         if saved > 0:
48             print(saved)
49             continue
50         item = get_item(skuid, item_url)
51         # Result deposit MongoDB
52         db.items.insert(item)

 

It should be noted that the above get price and get item functions obtain data from two URLs respectively, because the configuration information can be directly parsed from the product page, and the price information needs to be obtained from another ajax request. All the data crawled down is stored in MongoDB.

Filtering data

Among the mobile phone data, there are more than 4700 pieces of data with complete information, and more than 4700 mobile phones belong to 70 mobile phone brands These brands are like this

The configuration of mobile phone mainly includes the following parameters

  • Double card and double waiting

  • body material

  • CPU model

  • Memory size

  • storage capacity

  • Battery capacity

  • Screen material

  • Screen size

  • Resolving power

  • Camera

Usually, I use my mobile phone to read books, brush and know wechat, and buy things. So when I buy a new mobile phone, I am most concerned about speed, capacity, and standby time. I am not particularly concerned about the camera and screen material. Considering the above factors, when filtering the data, I set the following conditions

  • The brand of CPU is Qualcomm

  • Memory size greater than or equal to 6GB

  • Storage capacity ≥ 64GB

  • Battery capacity greater than 3000mAh

  • It must be double card and double standby

  • The price is within 1500 yuan

The code for filtering data is as follows

1 client = pymongo.MongoClient()
2 db = client[DB]
3 items = db.items.find({})
4 result = preprocess(items)
5 df = pd.DataFrame(result)
6 df_res = df[df.cpu_brand=="Xiaolong ( Snapdragon)"][df.battery_cap >= 3000][df.rom >= 64][df.ram >= 6][df.dual_sim == True][df.price<=1500]
7 print(df_res[["brand", "model", "color", "cpu_brand", "cpu_freq", "cpu_core", "cpu_model", "rom", "ram", "battery_cap", "price"]].sort_values(by="price"))

 

First, read the data from MongoDB, then create a DataFrame, and select the data in the DataFrame according to the above conditions. In the last line of the code, the screened mobile phones are printed and sorted by price from low to high.

After such a round of screening, we got the following 38 mobile phones

The configuration of the above mobile phones is similar, but the online evaluation of Xiaomi is generally high, so all Xiaomi mobile phones are screened out in the list above, and the following seven are obtained

This becomes the PK of red rice Note5 and millet 6X. In terms of price, they are no different. In terms of configuration, it is found on the Internet that the CPU of Hongmi Note5 is Xiaolong 636 (the above table lacks the CPU model of Hongmi Note5). Compared with Xiaomi 6X's Xiaolong 660636, although its performance is not as good as 660, it is more energy-efficient. Considering the large capacity battery of Hongmi Note5 with 4000 Ma, we finally decided to buy Hongmi Note5. As a thousand yuan machine, Xiaolong 636 8-core CPU, 6G memory, 64G memory, 5.99-inch wide field full screen, front camera + rear dual camera, and long standby time, this mobile phone is probably the king of the thousand yuan machine.

Tags: Python Mobile MongoDB JQuery

Posted on Tue, 12 Nov 2019 01:50:17 -0500 by ego0