Tiktok and topic crawler

Article catalog

We are going to achieve the tiktok and the related data grabbing.

Packet capture tool: charles

Simulator: Wood simulator

Tiktok

1: You can obtain the interface directly through the packet capture tool

Copy the obtained interface address (Simplified): https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/

Then you can directly request to obtain the hot search data.

2: Get the interface through the hot search sharing page Click the sharing option in the upper right corner, copy the link and open it with the browser.

After opening in browser https://www.iesdouyin.com/share/billboard/

You can also get the interface address. Get requests can be made directly https://www.iesdouyin.com/web/api/v2/hotsearch/billboard/word/

Hot search the corresponding topic data

Let's click a topic to find the corresponding topic data under the hot search: The playback volume data in the upper right corner is https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/?&source=3&os_api=23&version_code=860

We copy the link by looking for other data interfaces (Simplified): https://aweme-hl.snssdk.com/aweme/v1/hot/search/video/list/?hotword= Wu Yifan neck

The desired data is available, such as the total number of participants in the current topic. You can directly GET the request interface to parse the data.

The hot search data can be obtained easily, However, for the specified topic, some encrypted parameters have not been studied and understood. Welcome to leave a message

However, in order to capture topic data, we had to find another way. Unexpectedly, we really found other interfaces.

Data acquisition method of specified topic

Take a topic example:

What we need is the amount of playback and video corresponding to the topic. Through packet capturing, the following interfaces are found: https://aweme-hl.snssdk.com/aweme/v1/challenge/detail/?query_type=0&ch_id=1635753360881672

We need ch here_ ID to get the data we need. How can I get this ch easily and quickly_ ID, after a period of analysis. I found that the topic "starting from the earth" is very interesting_ id: 1635753360881672, It can be found in the details of the relevant user.

So it's still the old way to get the link to the sharing page and open it from the browser

View the interface data in the sharing page.

Sure enough, we found the id we needed. A new problem has arisen. How to obtain the detailed data of the sharing page can refer to the previous blog:

Interested friends can visit: Tiktok user information crawling case

Video details under topic: So how to get the video details under the topic? Back to the simulator, I found the sharing option at the top right

After copying the link, open it with a browser and find the data we need in the interface

https://www.iesdouyin.com/share/challenge/1635753360881672

Look at the parameters of this interface

ch_ I already know, _ signature, which was explained in the previous article. I won't repeat it here.

Interested friends can visit: Tiktok video sharing page signature

Code part

The case code is relatively simple and needs to be improved by ourselves.

Hot search list data:

import requests
import pprint
# Tiktok
hot_search = 'https://aweme-hl.snssdk.com/aweme/v1/hot/search/list/?detail_list=1'
headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Mobile Safari/537.36"}
hot_json = requests.get(hot_search,headers=headers).json()
hot_list = []
for data in hot_json['data']['word_list']:
    item = {}
    keyword = data['word']
    hot_value = data['hot_value']
    item[keyword] = hot_value
    hot_list.append(item)
pprint.pprint(hot_list)

Number of readers corresponding to hot search words Take one of the hot search words here.

import requests
headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Mobile Safari/537.36"}
hot_word = 'Lu Han eats and broadcasts'
hot_reading = 'https://aweme-hl.snssdk.com/aweme/v1/hot/search/video/list/?hotword={}'.format(hot_word)
hot_json = requests.get(hot_reading,headers=headers).json()
print("Duration:",hot_json['aweme_list'][2]['duration'])
print("Heat value:",hot_json['aweme_list'][2]['hot_info']['value'])
print("Current ranking:",hot_json['aweme_list'][2]['hot_info']['rank'])

Reading volume of single topic

import requests
dy_topic = 'https://aweme-hl.snssdk.com/aweme/v1/challenge/detail/?query_type=0&ch_id=1635753360881672'
headers = {"User-Agent":"Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Mobile Safari/537.36"}
topic_json = requests.get(dy_topic,headers=headers).json()
view_count = topic_json['ch_info']['view_count'] # Reading volume
print(view_count)

to update

It is found that the generation method of topic sign in the sharing page is the same as that of personal home page. In addition, in the get request of topic video, dytk is not required and ch is brought_ ID and_ Just sign.

Posted on Mon, 22 Nov 2021 07:42:24 -0500 by aquilina