Hello, I'm La Tiao. This is the last article before the final chapter of ban Hua series. The next article will end this series.
preface
Last time, after the Wulong event of writing script for Banhua's mailbox, I was also sorry. I failed to help several times and was about to graduate. I didn't want to leave regret in my roommate's heart. It was a man who had to make a big confession once, so I thought of the most original confession and write a love letter! Think about it. It was popular in junior high school and senior high school. I climbed the love letter network and got a handwritten ten thousand character love letter for my roommate. I thought things would either succeed or fail. It's just that I didn't expect things to be complicated. It turns out that this is the ultimate secret of ban Hua... Where should the three of us go
The goal of this blog
Love story title and article content
Climb target
http://www.qingshu.so/aiqing/aqgs.html
Tool use
Development tool: pycharm Development environment: Python 3.7, windows 10 Toolkit: requests, pyquery
Key learning contents
1. requests 2. Request header reverse crawl setting 3. Secondary storage of files
Page analysis
Use the shortcut key F12 to open the browser console. The source code page contains the data format of the website as a static page
The following code is:
1. Install the corresponding third-party library (requests pyquery) 2. pip install requests
import requests from pyquery import PyQuery as pq url = 'http://www.qingshu.so/aiqing/aqgs.html' headers = { # Request the server to get the required data type 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 'Referer': 'http://www.qingshu.so/bbqs.html ', # anti theft link is mainly to verify where you come from 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' # Browser type } # Send request content.decode('utf-8 ') return data decoding response = requests.get(url, headers=headers).content.decode('utf-8') print(response)
Detail page address extraction
doc = pq(response) # Create pyquery object details = doc('.t a').items() # Through the front end css Class selector to extract corresponding data (a Yes, the label is class='t'Subordinate of) (Class selector correspondence . id Selector correspondence#) for i in details: href = i.attr('href') # Extract attributes from a tag urls = 'http://Www.qingshu. So '+ href # url address print(urls)
Detail page
Detail page code:
response = requests.get(urls).content.decode('utf-8') doc = pq(response) title = doc('.a_title').text() # title content = doc('.a_content.clearfix').text() # Article content print(title) print(content)
Article storage
def Save(title, content): ''' Article storage :param title: Article title :param content: Article content :return: ''' path = './Love letter net article/' if not os.path.exists(path): # If the folder does not exist, create a new folder os.makedirs(path) with open(path + '{}.txt'.format(title), 'a') as f: f.write(content) print('{}Download completed....'.format(title))
All codes
The code is too messy. A simple function encapsulation is made to realize code decoupling
import requests from pyquery import PyQuery as pq import os def Tools(url): ''' Request tool function :param url: Request address :return: response ''' headers = { # Request the server to get the required data type 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', # 'Referer': 'http://www.qingshu.so/bbqs.html ', # anti theft link is mainly to verify where you come from 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36' # Browser type } # Send request content.decode('utf-8 ') return data decoding response = requests.get(url, headers=headers).content.decode('utf-8') return response def Save(title, content): ''' Article storage :param title: Article title :param content: Article content :return: ''' path = './Love letter net article/' if not os.path.exists(path): # If the folder does not exist, create a new folder os.makedirs(path) with open(path + '{}.txt'.format(title), 'a') as f: f.write(content) print('{}Download completed....'.format(title)) def Details(urls): ''' Request details page address for title and article content :param urls: Details page address :return: ''' response = Tools(urls) doc = pq(response) title = doc('.a_title').text() # title content = doc('.a_content.clearfix').text() # Article content Save(title, content) url = 'http://www.qingshu.so/aiqing/aqgs.html' response = Tools(url) doc = pq(response) # Create pyquery object details = doc('.t a').items() # Through the front end css Class selector to extract corresponding data (a Yes, the label is class='t'Subordinate of) (Class selector correspondence . id Selector correspondence#) for i in details: href = i.attr('href') # Extract attributes from a tag urls = 'http://Www.qingshu. So '+ href # url address Details(urls)
ending
When I was waiting for my roommate's news with great expectation the next day, I received a message from ban Hua. I was stunned... Completely stunned
I organize my thoughts, that is, I think the secrets of class flowers are what I think??? It turns out that the ultimate secret of Banhua is that she once liked me, a straight man of steel!!! Ah, this... TV series dare not shoot such a bloody plot
I'm messy... Then slowly think about how to deal with this
Previous review:
I modified ban Hua's boot password in Python and found her secret after logging in again!
I wrote an email script in Python and sent it to ban Hua. Unexpectedly, things got big
Answer your main questions:
Are the class flowers and roommates in this series true or pure stories?
A: I answered this question a long time ago, including in my Wanfan summary and semi annual summary. I have explained my creative journey in detail. I won't explain it too much here. In fact, you won't believe the true and false questions I said. Can I prove that the things in my article are true with a few screenshots, but what I can tell you is, I wrote the first article in the Banhua series, which is the first article in which I found my own style orientation. I wrote the things in life together with the technical blog, which greatly inspired my creative inspiration. I exaggerated the things in life and put them into the blog, so you believe it is true. You should learn the technology involved, I don't believe it. I read the story while learning technology.
Is it really good to disclose other people's privacy issues in the article?
A: first of all, I have coded all the information related to class flowers, roommates and myself, whether it's photos or everything else. As for the article that others really need to obtain the consent of others, I really didn't notice this problem in the first two articles, but I obtained the consent in this article, But because it involves the content of the article, I won't make it public first. I will make it public in the last article. Can you rest assured?
Is it appropriate to post "story articles" on a technology blog website? Is this article valuable on technology blog sites?
A: what are the criteria for judging the story text? Did my Banhua series articles not talk about technology? On the contrary, in my articles, technology is the core. The story is only for technology, but your eyes always focus on this story and don't care about my large code and analysis. Second, what are the criteria for judging whether there is value? What a deep and awesome technical article you write. As a result, 0 people see that you have not found the value point of your article output, the demand point of your audience group, and the core content of we media operation. You talk about the value with me and say that the judgment standard of blog is in the hands of users, If users like it, they can bring traffic to the website and quickly suck powder for you. Only when your articles are written and read by others and commented and collected by others can it show that your articles are valuable. Of course, if you are a person who uses love to generate electricity, write a blog, study and serve yourself, when I don't say it, you don't read it. I write it to my fans and my audience.
What I really write down is the recognition of the fans. Their comments, likes and collections make me feel that the content I output is valuable, accepted by the public, and can have empathy with them.
Industry data: add to get PPT template, resume template, industry classic book PDF.
Interview question bank: the classic and hot real interview questions of large factories over the years are continuously updated and added.
Learning materials: including Python, crawler, data analysis, algorithm and other learning videos and documents, which can be added and obtained
Communication plus group: the boss points out the maze. Your problems are often encountered by others. Technical assistance and communication.