[python] batch synchronize halo blog articles to csdn


At present, the new personal blog website is difficult to be included in the search engine. Even if the webmaster is registered in the search engine, the sitemap is submitted, the statistical code and click hotspot script are added, and the SEO is done, it is still not found.

So in order to increase the exposure of your article, you can choose to add more friend chains, but ask yourself, will you see the friend chains when you find an article? Of course, the friend chain is actually retrieved by the search engine as the website page content, but this requires that the website that posts your friend chain itself has a high degree of exposure.

Therefore, in order to increase the exposure, I think of a plan, that is, send my articles on major Blog websites, and occasionally quote the address of my personal blog in the article, which can improve the probability of search engine search.

Because I started blogging with halo, and then I had a csdn account for 13 years. I wanted to synchronize halo's articles to csdn in batches.

In order to avoid making wheels repeatedly, of course, it is necessary to walk with github first.

First, I learned that the major blogs actually have a unified standard interface, and Microsoft's word can even be published directly to the blog. But a few years ago, domestic Blog websites closed this standard interface, and did not provide an official public api, so they can only simulate http requests on the page.

Then I didn't find any ready-made ones, but I found node.js for sending local markdown articles to major Blog websites through api,
I downloaded it and looked at it. First, open the page and log in manually. The program will save the cookie, and then select the md file to publish to a blog website selected by the user. The program will simulate an http request for publishing, and then verify whether the publishing is successful.

This program relies on many node.js libraries, takes up a large space and has many functions, so the code is a little complex, and it must be converted into an md file before it can be published with this program

So I thought of writing a simple in python.


1. First of all, halo has an api, as long as the api function is enabled and the api is set in the blog settings_ access_ Key, you can use various APIs provided by halo

API documentation https://api.halo.run/

2. Get the article list first through / api/content/posts? For this interface, set the size to a large value, at least equal to the number of articles published in the halo blog, so that you can get all the article lists at one time without paging

3. Analyze the article of each article_id and title, you can remove the articles you don't want to synchronize

4. Reuse the article_id call / api/content/posts/$article_id to get the details of the article (the article id is not displayed on the halo page...)

5. Extract the markdown text of the article

6. Since various relative references, such as attachments and articles, can be used in halo blog, it is necessary to replace the relative references in the article content with absolute references

7.csdn login requires user name, password and slider verification code. Although python can do it, it's not troublesome to obtain cookies manually, so you'd better log in manually and save cookies by yourself

8. After analyzing the http request of csdn to save the draft, several parameters can be provided. When publishing, the same interface is actually called, and several more parameters will be added, such as Article id, whether it is original, classification, label, etc. then the status will be changed from draft to publishing. I see that node.js above reproduces 100% of the request parameters on the page, and if is used to distinguish the saved draft The draft is still published, and publishing also simulates saving the draft to obtain the id before publishing. However, I think, except that the status is different, the id does not need to be given, and other parameters can be written together when saving the draft. There is no need to judge the draft and publishing at all, because other values to be transmitted are exactly the same.


In the following code, I use the method of manually adding the article list, because I only need to synchronize the specified articles, and I can only use the interface to check the article id;
If you need to synchronize the whole site, you can change to the way of using the interface to obtain the article list


# -*- coding: utf-8 -*-
#Function: batch synchronize halo blog posts to csdn
#Date: October 3, 2021 
#Author: Dark Athena
#email : darkathena@qq.com

Copyright DarkAthena(darkathena@qq.com)
   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at
   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   See the License for the specific language governing permissions and
   limitations under the License.
import requests,json,time
import markdown,urllib 
#Parameter configuration
#1. The website of the main website of halo blog is changed to your own
#2. api_access_key configured in halo background
#3. The cookie text file of the logged in csdn manually saved in advance can be obtained by chrome browser F12
#4. Status: 0 published, 2 draft
#5. To synchronize the full title list of articles, please add it yourself
title_list.append('[python]Automatic replacement of local HOSTS in github.com of ip Point to minimum delay ip')
title_list.append('[[cloud] object storage service Amazon cloud S3,Tencent cloud cos,Alibaba cloud oss Sorting out the use of command line tools')
title_list.append('[ORACLE]On polymorphic table functions PTF(Polymorphic Table Functions)Use of')

#csdn saves the api of the article. Don't move
csdn_url = 'https://blog-console-api.csdn.net/v1/mdeditor/saveArticle'

#Get the content of halo blog posts
def get_content(title,key):
    url_search =url_halo_main+'/api/content/posts?keyword='+title+'&api_access_key='+halo_key
    url_get_content =url_halo_main+'/api/content/posts/articleId?&api_access_key='+halo_key
    #Find article list
    s =requests.get(url_search)
    t =json.loads(s.text)
    #halo search may find multiple articles. Here, do accurate Title Matching to find the unique articleId
    for i in t['data']['content'] :
        if i['title']==title:
    response =requests.get(url_get_content.replace('articleId',str(id)))
    t =json.loads(response.text)
    #Get original article content
    #Change the relative path in the article into an absolute path
    #The language type of some code blocks in csdn is not recognized
    return newcontent

#Sync to csdn
def push_csdn(title,content):
    data = {"title":title,
            "type": "original",

    headers={"content-type": "application/json;charset=UTF-8",
             "cookie": cookie,
             "user-agent": "Mozilla/5.0"}
    response = requests.post(csdn_url,data=json.dumps(data),headers=headers)
    result = json.loads(response.text)
    return result

    for title in title_list:
        if result["code"]==200:
        time.sleep(30)#csdn sets an error message when the interval between two articles is less than 30 seconds
except Exception as e:


First save the cookie file of csdn login, and then modify the parameter 12345 in the above code, and then execute it directly


After that, we will also consider writing code to synchronize to the blog Park and itpub, and then see if there is any way to integrate it into the halo background interface

Tags: Python

Posted on Tue, 05 Oct 2021 21:08:51 -0400 by modigy