1. General
I'm blogging Notes on curl usage (II) -- downloading a picture remotely Describes how to obtain remote files through Curl. However, in that example, the steps of obtaining remote data and writing data are mixed. In a multithreaded scenario, this may cause read-write conflicts. In theory, remote access data is first saved to memory and then written out to a file. Remote data access to memory can be regarded as a read operation, and there will be no read conflict. Therefore, a good strategy is to read the data into the memory Buf at one time and then write it to the file.
2. Realization
take Notes on curl usage (II) -- downloading a picture remotely The specific code examples are as follows:
#include <iostream> #include <curl/curl.h> using namespace std; //Memory block structure struct MemoryStruct { char *memory; size_t size; MemoryStruct() { memory = (char *)malloc(1); size = 0; } ~MemoryStruct() { free(memory); memory = NULL; } }; //Callback function implementation: a request may call the callback function multiple times size_t HttpPostWriteBack(void *contents, size_t size, size_t nmemb, void *userp) { size_t realsize = size * nmemb;//The amount of data returned by a callback struct MemoryStruct *mem = (struct MemoryStruct *)userp; char *ptr = (char *)realloc(mem->memory, mem->size + realsize); if (ptr == NULL) { printf("not enough memory (realloc returned NULL)\n"); return 0; } mem->memory = ptr; memcpy(&(mem->memory[mem->size]), contents, realsize); mem->size += realsize; return realsize;//Real data must be returned } int main() { const char *netlink = "http://cn.bing.com/th?id=OHR.GrandsCausses_EN-CN3335882379_800x480.jpg"; const char *output = "D:/dst1.jpg"; curl_global_init(CURL_GLOBAL_ALL); //Initialize global resources CURL *curl = curl_easy_init(); //Initialization handle //If necessary, you can set up an agent //curl_easy_setopt(curl, CURLOPT_PROXY, "127.0.0.1:7890"); //Visit website curl_easy_setopt(curl, CURLOPT_URL, netlink); //Set up user agent curl_easy_setopt(curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"); //get data MemoryStruct chunk; curl_easy_setopt(curl, CURLOPT_WRITEDATA, &chunk); curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, HttpPostWriteBack); ////Achieve download progress //curl_easy_setopt(curl, CURLOPT_NOPROGRESS, false); //curl_easy_setopt(curl, CURLOPT_PROGRESSFUNCTION, progress_callback); //curl_easy_setopt(curl, CURLOPT_PROGRESSDATA, nullptr); //function curl_easy_perform(curl); curl_easy_cleanup(curl); //Release handle curl_global_cleanup(); //Free global resources //Write data FILE *fp = nullptr; if (fopen_s(&fp, output, "wb") != 0) { curl_easy_cleanup(curl); return 0; } fwrite(chunk.memory, chunk.size, 1, fp); fclose(fp); return 1; }
One of the key improvements of this code is that it implements a design similar to dynamic array through custom structure MemoryStruct. Since the data amount of remote access files cannot be determined at the beginning, you need to access a part first, then expand the container, then access a part, and then expand the capacity. This memory expansion operation is implemented through the realloc() function of C. This structure MemoryStruct also uses the RAII mechanism of C + + for memory management.
Another key is CURLOPT_WRITEDATA in curlopt_ Used in conjunction with writefunction. CURLOPT_WRITEFUNCTION is used to set the callback function, CURLOPT_WRITEDATA is used to set the parameters of the callback function. This is actually the programming thinking of C. everything is a pointer. All operations are abstracted into the same function interface, but not the same thing.