I used Python to crawl 1000 love letters to help my roommate express ban Hua, but I reversed it again and again... It turns out that this is the ultimate secret of ban Hua!

Hello, I'm La Tiao. This is the last article before the final chapter of ban Hua series. The next article will end this series.  

  preface

 

Last time, after the Wulong event of writing script for Banhua's mailbox, I was also sorry. I failed to help several times and was about to graduate. I didn't want to leave regret in my roommate's heart. It was a man who had to make a big confession once, so I thought of the most original confession and write a love letter! Think about it. It was popular in junior high school and senior high school. I climbed the love letter network and got a handwritten ten thousand character love letter for my roommate. I thought things would either succeed or fail. It's just that I didn't expect things to be complicated. It turns out that this is the ultimate secret of ban Hua... Where should the three of us go

 

The goal of this blog

Love story title and article content

Climb target

http://www.qingshu.so/aiqing/aqgs.html

 

Tool use

Development tool: pycharm
 Development environment: Python 3.7, windows 10
 Toolkit: requests, pyquery

Key learning contents

1. requests
 2. Request header reverse crawl setting
 3. Secondary storage of files

Page analysis

Use the shortcut key F12 to open the browser console. The source code page contains the data format of the website as a static page

 

The following code is:

1. Install the corresponding third-party library (requests pyquery)
2. pip install requests

 

 

 

import requests
from pyquery import PyQuery as pq
​
​
url = 'http://www.qingshu.so/aiqing/aqgs.html'
headers = {
    # Request the server to get the required data type
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Referer': 'http://www.qingshu.so/bbqs.html ', # anti theft link is mainly to verify where you come from
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' # Browser type
}
# Send request    content.decode('utf-8 ') return data decoding
response = requests.get(url, headers=headers).content.decode('utf-8')
print(response)

Detail page address extraction

 

doc = pq(response) # Create pyquery object
details = doc('.t a').items() # Through the front end css Class selector to extract corresponding data (a Yes, the label is class='t'Subordinate of)   (Class selector correspondence .   id Selector correspondence#)
for i in details:
    href = i.attr('href') # Extract attributes from a tag
    urls = 'http://Www.qingshu. So '+ href # url address
    print(urls)

Detail page

 

Detail page code:

response = requests.get(urls).content.decode('utf-8')
doc = pq(response)
title = doc('.a_title').text() # title
content = doc('.a_content.clearfix').text() # Article content
print(title)
print(content)

Article storage

def Save(title, content):
    '''
    Article storage
    :param title: Article title
    :param content: Article content
    :return: 
    '''
    path = './Love letter net article/'
    if not os.path.exists(path): # If the folder does not exist, create a new folder
        os.makedirs(path)
    with open(path + '{}.txt'.format(title), 'a') as f:
        f.write(content)
    print('{}Download completed....'.format(title))

All codes

The code is too messy. A simple function encapsulation is made to realize code decoupling

import requests
from pyquery import PyQuery as pq
import os
​
​
def Tools(url):
    '''
    Request tool function
    :param url: Request address
    :return: response
    '''
    headers = {
        # Request the server to get the required data type
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        # 'Referer': 'http://www.qingshu.so/bbqs.html ', # anti theft link is mainly to verify where you come from
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'
        # Browser type
    }
    # Send request    content.decode('utf-8 ') return data decoding
    response = requests.get(url, headers=headers).content.decode('utf-8')
    return response
​
​
def Save(title, content):
    '''
    Article storage
    :param title: Article title
    :param content: Article content
    :return:
    '''
    path = './Love letter net article/'
    if not os.path.exists(path): # If the folder does not exist, create a new folder
        os.makedirs(path)
    with open(path + '{}.txt'.format(title), 'a') as f:
        f.write(content)
    print('{}Download completed....'.format(title))
​
​
def Details(urls):
    '''
    Request details page address for title and article content
    :param urls: Details page address
    :return:
    '''
    response = Tools(urls)
    doc = pq(response)
    title = doc('.a_title').text() # title
    content = doc('.a_content.clearfix').text() # Article content
    Save(title, content)
​
​
url = 'http://www.qingshu.so/aiqing/aqgs.html'
response = Tools(url)
doc = pq(response) # Create pyquery object
details = doc('.t a').items() # Through the front end css Class selector to extract corresponding data (a Yes, the label is class='t'Subordinate of)   (Class selector correspondence .   id Selector correspondence#)
for i in details:
    href = i.attr('href') # Extract attributes from a tag
    urls = 'http://Www.qingshu. So '+ href # url address
    Details(urls)

 

  ending

When I was waiting for my roommate's news with great expectation the next day, I received a message from ban Hua. I was stunned... Completely stunned

  I organize my thoughts, that is, I think the secrets of class flowers are what I think??? It turns out that the ultimate secret of Banhua is that she once liked me, a straight man of steel!!! Ah, this... TV series dare not shoot such a bloody plot

I'm messy... Then slowly think about how to deal with this

 

Previous review:

I modified ban Hua's boot password in Python and found her secret after logging in again!​​​​​​ 

I collected Banhua's spatial data set in Python. In addition to meizhao, I found her other secrets again!

My roommate failed to fall in love with ban Hua. I climbed a website and sent it to him for instant cure. Men's happiness is so simple [once a day, forget their first love]

I wrote an email script in Python and sent it to ban Hua. Unexpectedly, things got big

Answer your main questions:

Are the class flowers and roommates in this series true or pure stories?

  A: I answered this question a long time ago, including in my Wanfan summary and semi annual summary. I have explained my creative journey in detail. I won't explain it too much here. In fact, you won't believe the true and false questions I said. Can I prove that the things in my article are true with a few screenshots, but what I can tell you is, I wrote the first article in the Banhua series, which is the first article in which I found my own style orientation. I wrote the things in life together with the technical blog, which greatly inspired my creative inspiration. I exaggerated the things in life and put them into the blog, so you believe it is true. You should learn the technology involved, I don't believe it. I read the story while learning technology.

Is it really good to disclose other people's privacy issues in the article?

  A: first of all, I have coded all the information related to class flowers, roommates and myself, whether it's photos or everything else. As for the article that others really need to obtain the consent of others, I really didn't notice this problem in the first two articles, but I obtained the consent in this article, But because it involves the content of the article, I won't make it public first. I will make it public in the last article. Can you rest assured?

Is it appropriate to post "story articles" on a technology blog website? Is this article valuable on technology blog sites?

A: what are the criteria for judging the story text? Did my Banhua series articles not talk about technology? On the contrary, in my articles, technology is the core. The story is only for technology, but your eyes always focus on this story and don't care about my large code and analysis. Second, what are the criteria for judging whether there is value? What a deep and awesome technical article you write. As a result, 0 people see that you have not found the value point of your article output, the demand point of your audience group, and the core content of we media operation. You talk about the value with me and say that the judgment standard of blog is in the hands of users, If users like it, they can bring traffic to the website and quickly suck powder for you. Only when your articles are written and read by others and commented and collected by others can it show that your articles are valuable. Of course, if you are a person who uses love to generate electricity, write a blog, study and serve yourself, when I don't say it, you don't read it. I write it to my fans and my audience.

What I really write down is the recognition of the fans. Their comments, likes and collections make me feel that the content I output is valuable, accepted by the public, and can have empathy with them.

 
Industry data: add to get PPT template, resume template, industry classic book PDF.
Interview question bank: the classic and hot real interview questions of large factories over the years are continuously updated and added.
Learning materials: including Python, crawler, data analysis, algorithm and other learning videos and documents, which can be added and obtained
Communication plus group: the boss points out the maze. Your problems are often encountered by others. Technical assistance and communication.

receive  

Tags: Python crawler Python crawler

Posted on Fri, 01 Oct 2021 15:56:36 -0400 by pugs1501