python Crawler Public Number All Info and Mass Download Public Number Video

I've written a similar article before: python crawls the public number and crawls in the simplest way

Other students have been asking, maybe there are some small details are not clear, this time to clarify the details thoroughly.

This article adds the ability to download videos in public numbers in batches, so that a public number can be copied completely. Dangerous actions, please do not operate!Thank you

major function
How to Simply Crawl the WeChat Public Number
Get information: title, summary, cover page, article address
Automatically download videos in public numbers in bulk

This selected public number: Bear Kid and Pop
Update videos every day: Bear kid routine, pet routine, bear kid and pet funny videos, laughter is always happy!

Allow me to force a wave of advertisements:

Because the public number of each crawler is his home. A year ago, and now, it's just changed the subject and name.
A yard farmer who likes pets and can't keep cats is still happy to see after work.You can watch it!
For video security and to avoid loss, the friendly reminder has added watermarks to the video.

1. Get Public Number Information: Title, Summary, Cover Page, Article URL
Operation steps:
1. Apply for a public number yourself first
2. Log in to your account, create a new article, click on the hyperlink

3. Pop up the search box, search for the public number you need, and view historical articles


4. Locate the url of the request by capturing the package for information

By looking at the information, we found what we needed: the title, summary, cover page, and article URL. Make sure this is the URL we need. By clicking on the next page and getting the URL several times, we found that only random and bengin parameters changed

So the primary information URL is determined.

Let's get started:

It turns out that the parameters we need to modify are token, random, cookie

The source of these two values, when you get the url, you can

# -*- coding: utf-8 -*-
import re

import requests
import jsonpath
import json

headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
"Host": "mp.weixin.qq.com",
"Referer": "https://mp.weixin.qq.com/cgi-bin/appmsg?t=media/appmsg_edit&action=edit&type=10&isMul=1&isNew=1&lang=zh_CN&token=1862390040",
"Cookie": "When I get information on my own cookie"
       }

def getInfo():
    for i in range(80):
        # Token random needs to have its begin: parameter passed in
        url = "https://mp.weixin.qq.com/cgi-bin/appmsg?token=1904193044&lang=zh_CN&f=json&ajax=1&random=0.9468236563826882&action=list_ex&begin={}&count=5&query=&fakeid=MzI4MzkzMTc3OA%3D%3D&type=9".format(str(i * 5))

        response = requests.get(url, headers = headers)

        jsonRes = response.json()


        titleList = jsonpath.jsonpath(jsonRes, "$..title")
        coverList = jsonpath.jsonpath(jsonRes, "$..cover")
        urlList = jsonpath.jsonpath(jsonRes, "$..link")

        # Traversal constructs a storage string
        for index in range(len(titleList)):
            title = titleList[index]
            cover = coverList[index]
            url = urlList[index]

            scvStr = "%s,%s, %s,\n" % (title, cover, url)
            with open("info.csv", "a+", encoding="gbk", newline='') as f:
                f.write(scvStr)

Get results (success):

2. Obtaining videos in articles: Achieving mass Downloads

I found this link by analyzing a single video article:

Open through a web page and find a web download link for the video:


Oops, it seems a bit interesting. I found a link to the video's web page for download only, so let's get started.

Found a key parameter in the link vid doesn't know where it came from?
It doesn't matter what other information you get, it just has to be hard.

This parameter is found in the url request information for a single article and retrieved.

response = requests.get(url_wxv, headers=headers)

    # I use regular or xpath
    jsonRes = response.text  #  Match: wxv_1105179750743556096
    dirRe = r"wxv_.{19}"
    result = re.search(dirRe, jsonRes)

    wxv = result.group(0)
    print(wxv)

Video download:

def getVideo(video_title, url_wxv):
    video_path = './videoFiles/' + video_title + ".mp4"

    # Page Downloadable Form
    video_url_temp = "https://mp.weixin.qq.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&__biz=MzI4MzkzMTc3OA==&mid=2247488495&idx=4&vid=" + wxv
    response = requests.get(video_url_temp, headers=headers)
    content = response.content.decode()
    content = json.loads(content)
    url_info = content.get("url_info")
    video_url2 = url_info[0].get("url")
    print(video_url2)

    # Request url address to download
    html = requests.get(video_url2)
    # content returns bytes, or binary, data.
    html = html.content
    with open(video_path, 'wb') as f:
        f.write(html)

Then all the information is done and the code is assembled.

a. Get Public Number Information

b. Filtering individual article information

c. Getting vid information

d. Stitching video page download URL

e. Download videos, save

Code experiment results:

Get Public Number: Title, Summary, Cover, Video,

It can be said that all the information with a video public number can be copied out completely.

Dangerous actions, do not operate!Remember!Remember!Remember!

Get code and ask public number to reply: 20191210 or public number code

Tags: Python JSON Mac OS X

Posted on Mon, 09 Dec 2019 22:32:58 -0500 by zesoft.net