curl usage notes -- get remote data to memory buffer

catalogue

1. General

I'm blogging Notes on curl usage (II) -- downloading a picture remotely Describes how to obtain remote files through Curl. However, in that example, the steps of obtaining remote data and writing data are mixed. In a multithreaded scenario, this may cause read-write conflicts. In theory, remote access data is first saved to memory and then written out to a file. Remote data access to memory can be regarded as a read operation, and there will be no read conflict. Therefore, a good strategy is to read the data into the memory Buf at one time and then write it to the file.

2. Realization

take Notes on curl usage (II) -- downloading a picture remotely The specific code examples are as follows:

#include <iostream>
#include <curl/curl.h>

using namespace std;

//Memory block structure
struct MemoryStruct
{
	char *memory;
	size_t size;

	MemoryStruct()
	{
		memory = (char *)malloc(1);
		size = 0;
	}

	~MemoryStruct()
	{
		free(memory);
		memory = NULL;
	}
};

//Callback function implementation: a request may call the callback function multiple times
size_t HttpPostWriteBack(void *contents, size_t size, size_t nmemb, void *userp)
{
	size_t realsize = size * nmemb;//The amount of data returned by a callback
	struct MemoryStruct *mem = (struct MemoryStruct *)userp;

	char *ptr = (char *)realloc(mem->memory, mem->size + realsize);
	if (ptr == NULL)
	{
		printf("not enough memory (realloc returned NULL)\n");
		return 0;
	}

	mem->memory = ptr;
	memcpy(&(mem->memory[mem->size]), contents, realsize);
	mem->size += realsize;
	return realsize;//Real data must be returned
}

int main()
{
	const char *netlink = "http://cn.bing.com/th?id=OHR.GrandsCausses_EN-CN3335882379_800x480.jpg";
	const char *output = "D:/dst1.jpg";

	curl_global_init(CURL_GLOBAL_ALL);		//Initialize global resources

	CURL *curl = curl_easy_init();		//Initialization handle

	//If necessary, you can set up an agent
	//curl_easy_setopt(curl, CURLOPT_PROXY, "127.0.0.1:7890");
	
	//Visit website
	curl_easy_setopt(curl, CURLOPT_URL, netlink);

	//Set up user agent
	curl_easy_setopt(curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36");
	   
	//get data
	MemoryStruct chunk;	
	curl_easy_setopt(curl, CURLOPT_WRITEDATA, &chunk);
	curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, HttpPostWriteBack);

	////Achieve download progress
	//curl_easy_setopt(curl, CURLOPT_NOPROGRESS, false);
	//curl_easy_setopt(curl, CURLOPT_PROGRESSFUNCTION, progress_callback);
	//curl_easy_setopt(curl, CURLOPT_PROGRESSDATA, nullptr);

	//function
	curl_easy_perform(curl);

	curl_easy_cleanup(curl);			//Release handle

	curl_global_cleanup(); //Free global resources

	//Write data
	FILE *fp = nullptr;
	if (fopen_s(&fp, output, "wb") != 0)
	{
		curl_easy_cleanup(curl);
		return 0;
	}
	fwrite(chunk.memory, chunk.size, 1, fp);
	fclose(fp);

	return 1;
}    

One of the key improvements of this code is that it implements a design similar to dynamic array through custom structure MemoryStruct. Since the data amount of remote access files cannot be determined at the beginning, you need to access a part first, then expand the container, then access a part, and then expand the capacity. This memory expansion operation is implemented through the realloc() function of C. This structure MemoryStruct also uses the RAII mechanism of C + + for memory management.

Another key is CURLOPT_WRITEDATA in curlopt_ Used in conjunction with writefunction. CURLOPT_WRITEFUNCTION is used to set the callback function, CURLOPT_WRITEDATA is used to set the parameters of the callback function. This is actually the programming thinking of C. everything is a pointer. All operations are abstracted into the same function interface, but not the same thing.

3. Reference

  1. curl CURLOPT_WRITEDATA CURLOPT_WRITEFUNCTION callback function
  2. Curlopt in libcurl_ Writefunction set callback function

Tags: C++

Posted on Sun, 31 Oct 2021 12:04:49 -0400 by zenabi