Deep Understanding of FastCGI Protocol and Its Implementation in PHP

Before discussing FastCGI, we have to say how traditional CGI works, and at the same time we should have a general understanding of it. CGI 1.1 Agreement

Analysis of the Working Principle of Traditional CGI

After the client accesses a URL address, it submits data through GET/POST/PUT, and sends a request to the Web server through HTTP protocol. The HTTP Daemon (daemon) on the server side transmits the information described in the HTTP request to the CGI program specified on the home page through standard input stdin and environment variable, and starts the place where the application proceeds. Reasons (including database processing), processing results are returned to the HTTP Daemon daemon daemon through standard output stdout, and then to the client through HTTP protocol by the HTTP Daemon process.

The above paragraph may be more abstract. Let's take a GET request as an example to illustrate in detail.
The following code is used to implement the functions described in the diagram. The Web server starts a socket listening service and executes the CGI program locally. There is a more detailed code interpretation.

Web server code

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <arpa/inet.h> #include <netinet/in.h> #include <string.h> #define SERV_PORT 9003 char* str_join(char *str1, char *str2); char* html_response(char *res, char *buf); int main(void) { int lfd, cfd; struct sockaddr_in serv_addr,clin_addr; socklen_t clin_len; char buf[1024],web_result[1024]; int len; FILE *cin; if((lfd = socket(AF_INET,SOCK_STREAM,0)) == -1){ perror("create socket failed"); exit(1); } memset(&serv_addr, 0, sizeof(serv_addr)); serv_addr.sin_family = AF_INET; serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); serv_addr.sin_port = htons(SERV_PORT); if(bind(lfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) == -1) { perror("bind error"); exit(1); } if(listen(lfd, 128) == -1) { perror("listen error"); exit(1); } signal(SIGCLD,SIG_IGN); while(1) { clin_len = sizeof(clin_addr); if ((cfd = accept(lfd, (struct sockaddr *)&clin_addr, &clin_len)) == -1) { perror("Acceptance error\n"); continue; } cin = fdopen(cfd, "r"); setbuf(cin, (char *)0); fgets(buf,1024,cin); //Read the first line printf("\n%s", buf); //============================ Demonstration of cgi environment variable settings============================ // For example, "GET/user.cgi?Id=1 HTTP/1.1"; char *delim = " "; char *p; char *method, *filename, *query_string; char *query_string_pre = "QUERY_STRING="; method = strtok(buf,delim); // GET p = strtok(NULL,delim); // /user.cgi?id=1 filename = strtok(p,"?"); // /user.cgi if (strcmp(filename,"/favicon.ico") == 0) { continue; } query_string = strtok(NULL,"?"); // id=1 putenv(str_join(query_string_pre,query_string)); //============================ Demonstration of cgi environment variable settings============================ int pid = fork(); if (pid > 0) { close(cfd); } else if (pid == 0) { close(lfd); FILE *stream = popen(str_join(".",filename),"r"); fread(buf,sizeof(char),sizeof(buf),stream); html_response(web_result,buf); write(cfd,web_result,sizeof(web_result)); pclose(stream); close(cfd); exit(0); } else { perror("fork error"); exit(1); } } close(lfd); return 0; } char* str_join(char *str1, char *str2) { char *result = malloc(strlen(str1)+strlen(str2)+1); if (result == NULL) exit (1); strcpy(result, str1); strcat(result, str2); return result; } char* html_response(char *res, char *buf) { char *html_response_template = "HTTP/1.1 200 OK\r\nContent-Type:text/html\r\nContent-Length: %d\r\nServer: mengkang\r\n\r\n%s"; sprintf(res,html_response_template,strlen(buf),buf); return res; }

As highlighted in the above code:

Lines 66-81 find the relative path of CGI program (for simplicity, we define its root directory as the current directory of Web program directly), so that CGI program can be executed in the sub-process, and set environment variables to facilitate reading when CGI program runs;
Lines 94 to 95 write the standard output of the CGI program into the cache of the Web server daemon.
Line 97 writes the wrapped html result to the client socket descriptor and returns it to the client connecting to the Web server.

CGI program (user.c)

#include <stdio.h> #include <stdlib.h> // Query the user's information through the acquired id int main(void){ //============================ Analog database============================ typedef struct { int id; char *username; int age; } user; user users[] = { {}, { 1, "mengkang.zhou", 18 } }; //============================ Analog database============================ char *query_string; int id; query_string = getenv("QUERY_STRING"); if (query_string == NULL) { printf("No input data"); } else if (sscanf(query_string,"id=%d",&id) != 1) { printf("No input id"); } else { printf("User Information Query<br>Student ID: %d<br>Full name: %s<br>Age: %d",id,users[id].username,users[id].age); } return 0; }

Compile the above CGI program into GCC user.c-o user.cgi and place it in the same directory of the above web program.
Line 28 of the code, reading the environment variables set up in the Web server daemon from the environment variables, is the focus of our demonstration.

Working Principle Analysis of FastCGI

Comparing with CGI/1.1 specification, CGI program is executed in a subprocess of Web server in local fork, CGI predefined environment variables are filled in, system environment variables are put in, HTTP body content is passed into the subprocess through standard input, and then returned to Web server through standard output after processing. The core of FastCGI is to eliminate the traditional fork-and-execute mode, reduce the huge cost of each start (as illustrated later in PHP), and process requests in a resident manner.

The FastCGI workflow is as follows:

FastCGI process manager initializes itself, starts multiple CGI interpreter processes, and waits for connections from Web Server.
The Web server communicates with the FastCGI process manager in Socket and sends CGI environment variables and standard input data to the CGI interpreter process through the FastCGI protocol.
After the CGI interpreter process is completed, the standard output and error information are returned to Web Server from the same connection.
The CGI interpreter process then waits and processes the next connection from Web Server.

One of the differences between FastCGI and traditional CGI mode is that Web servers do not directly execute CGI programs, but interact with FastCGI responders (FastCGI process manager) through sockets. Web servers need to encapsulate CGI interface data in FastCGI protocol packages and send them to FastCGI responders. Because FastCGI process manager is based on socket communication, it is also distributed. Web server and CGI responder server are deployed separately.

Again, FastCGI is a protocol, which is based on CGI/1.1 and transfers the data to be transmitted in CGI/1.1 through the order and format defined by FastCGI protocol.

Dead work

Perhaps the above content is still very abstract, because the first FastCGI protocol has not yet a general understanding, and the second is no actual code learning. So we need to learn the content of FastCGI protocol in advance, not necessarily need to fully understand, but after a rough understanding, read this article and then combined with learning, understanding and digestion.

http://www.fastcgi.com/devkit... (English original)
http://andylin02.iteye.com/bl... (Chinese version)

Analysis of FastCGI Protocol

The following code is analyzed with the FastCGI code of PHP. Without special explanation, the following code comes from the PHP source code.

FastCGI message type

FastCGI divides the transmitted messages into many types. Its structure is defined as follows:

typedef enum _fcgi_request_type { FCGI_BEGIN_REQUEST = 1, /* [in] */ FCGI_ABORT_REQUEST = 2, /* [in] (not supported) */ FCGI_END_REQUEST = 3, /* [out] */ FCGI_PARAMS = 4, /* [in] environment variables */ FCGI_STDIN = 5, /* [in] post data */ FCGI_STDOUT = 6, /* [out] response */ FCGI_STDERR = 7, /* [out] errors */ FCGI_DATA = 8, /* [in] filter data (not supported) */ FCGI_GET_VALUES = 9, /* [in] */ FCGI_GET_VALUES_RESULT = 10 /* [out] */ } fcgi_request_type;

Sending order of messages

The following figure is a simple messaging process

First FCGI_BEGIN_REQUEST is sent, then FCGI_PARAMS and FCGI_STDIN. Since the maximum length of each message header (described in detail below) is 65535, these two types of messages may not be sent only once, but may be sent several times in a row.

When the FastCGI responder is processed, FCGI_STDOUT and FCGI_STDERR will be sent. Similarly, it may be sent several times continuously. Finally, FCGI_END_REQUEST is used to indicate the end of the request.

It should be noted that FCGI_BEGIN_REQUEST and FCGI_END_REQUEST mark the beginning and end of the request respectively, and are closely related to the whole protocol, so their message body content is also part of the protocol, so there will also be corresponding structure (detailed later). The environment variables, standard input, standard output and error output are all business-related and protocol-independent, so the content of their message body has no structural correspondence.

Since the whole message is transmitted continuously in binary system, a unified structure of message header must be defined so that the message body of each message can be read and cut easily. This is a very common means of network communication.

FastCGI header

As mentioned above, FastCGI messages are divided into 10 message types, some are input and some are output. And all messages start with a header. Its structure is defined as follows:

typedef struct _fcgi_header { unsigned char version; unsigned char type; unsigned char requestIdB1; unsigned char requestIdB0; unsigned char contentLengthB1; unsigned char contentLengthB0; unsigned char paddingLength; unsigned char reserved; } fcgi_header;

Field Explanation:
Version identifies the FastCGI protocol version.
Type identifies the FastCGI record type, which is the general function of record execution.
RequId identifies the FastCGI request to which the record belongs.
The number of bytes of contentData components recorded by contentLength.
The protocol description of xxB1 and xxB0 above shows that when two adjacent structural components are named the same except for the suffixes "B1" and "B0", it means that the two components can be regarded as a single number valued as B1 8 + B0. The name of the single number is the name of these components minus the suffix. This convention summarizes a way of processing numbers represented by more than two bytes.

For example, the maximum value of requestId and contentLength in the protocol header is 65535.

#include <stdio.h> #include <stdlib.h> #include <limits.h> int main() { unsigned char requestIdB1 = UCHAR_MAX; unsigned char requestIdB0 = UCHAR_MAX; printf("%d\n", (requestIdB1 << 8) + requestIdB0); // 65535 }

You may wonder what to do if a message body is longer than 65535, then it can be divided into multiple messages of the same type.

Definition of FCGI_BEGIN_REQUEST

typedef struct _fcgi_begin_request { unsigned char roleB1; unsigned char roleB0; unsigned char flags; unsigned char reserved[5]; } fcgi_begin_request;

Field Interpretation

Role represents the role that Web servers expect applications to play. It's divided into three roles (and we're generally talking about responder roles here)

typedef enum _fcgi_role { FCGI_RESPONDER = 1, FCGI_AUTHORIZER = 2, FCGI_FILTER = 3 } fcgi_role;

The flags component in FCGI_BEGIN_REQUEST contains a bit to control the line closure: Flags & FCGI_KEEP_CONN: if 0, then the line is closed after the response to this request. If not 0, the application will not close the line after responding to this request; the Web server maintains responsiveness for the line.

Definition of FCGI_END_REQUEST

typedef struct _fcgi_end_request { unsigned char appStatusB3; unsigned char appStatusB2; unsigned char appStatusB1; unsigned char appStatusB0; unsigned char protocolStatus; unsigned char reserved[3]; } fcgi_end_request;

Field Interpretation
appStatus components are application-level status codes.
The protocolStatus component is a protocol-level status code; the value of the protocolStatus may be:

FCGI_REQUEST_COMPLETE: The normal end of the request.
FCGI_CANT_MPX_CONN: Deny new requests. This occurs when a Web server sends concurrent requests to an application over a single line, which is designed to process one request per line.
FCGI_OVERLOADED: Deny new requests. This happens when the application runs out of some resources, such as database connections.
FCGI_UNKNOWN_ROLE: Deny new requests. This happens when the Web server specifies a role that the application cannot recognize.

protocolStatus is defined in PHP as follows

typedef enum _fcgi_protocol_status { FCGI_REQUEST_COMPLETE = 0, FCGI_CANT_MPX_CONN = 1, FCGI_OVERLOADED = 2, FCGI_UNKNOWN_ROLE = 3 } dcgi_protocol_status;

It is important to note that the values of dcgi_protocol_status and fcgi_role elements are defined in the FastCGI protocol, rather than customized by PHP.

Sample message communication

For simplicity, the header only displays the type of message and the id of the message, and no other fields are displayed. The following example comes from the official website

} }

With the above structures, the analysis and response flow of FastCGI responders can be roughly thought of.

First read the message header, get its type as FCGI_BEGIN_REQUEST, then parse its message body, and know that its required role is FCGI_RESPONDER, flag is 0, indicating that the line is closed after the request. Then the second message is parsed and the message type is FCGI_PARAMS. Then the content in the message body is cut by carriage return and stored in the environment variable. Similarly, after processing, the FCGI_STDOUT message body and the FCGI_END_REQUEST message body are returned for Web server parsing.

Implementation of FastCGI in PHP

The following code interpretation notes are only a comb and refinement of my personal knowledge, if there are errors, please point out. For those who are not familiar with the code, it may be a guide, preliminary understanding, if you feel very vague, then you still need to read it line by line.

This paper takes php-src/sapi/cgi/cgi_main.c as an example to analyze and explain, assuming that the development environment is unix environment. The definition of some variables in main function and the initialization of SAPI will not be discussed here, but only about FastCGI.

1. Open a socket listening service

fcgi_fd = fcgi_listen(bindpath, 128);

From here on, listen, and the fcgi_listen function completes the first three steps of socket service: socket,bind,listen.

2. Initialize the request object

Allocate memory for the fcgi_request object and bind monitored socket sockets.

fcgi_init_request(&request, fcgi_fd);

From input to return, the whole request revolves around the fcgi_request structure object.

typedef struct _fcgi_request { int listen_socket; int fd; int id; int keep; int closed; int in_len; int in_pad; fcgi_header *out_hdr; unsigned char *out_pos; unsigned char out_buf[1024*8]; unsigned char reserved[sizeof(fcgi_end_request_rec)]; HashTable *env; } fcgi_request;

3. Create multiple CGI parser subprocesses

Here, the default number of sub-processes is 0, read settings from the configuration file to environment variables, then read in the program, and then create a specified number of sub-processes to wait for the processing of Web server requests.

if (getenv("PHP_FCGI_CHILDREN")) { char * children_str = getenv("PHP_FCGI_CHILDREN"); children = atoi(children_str); ... } do { pid = fork(); switch (pid) { case 0: parent = 0; // Change the parent process identifier of the child process to 0 to prevent looping fork /* don't catch our signals */ sigaction(SIGTERM, &old_term, 0); sigaction(SIGQUIT, &old_quit, 0); sigaction(SIGINT, &old_int, 0); break; case -1: perror("php (pre-forking)"); exit(1); break; default: /* Fine */ running++; break; } } while (parent && (running < children));

4. Receiving requests in a subprocess

Everything here is still the service routine of socket. Accept the request and then call fcgi_read_request.

fcgi_accept_request(&request)

int fcgi_accept_request(fcgi_request *req) { int listen_socket = req->listen_socket; sa_t sa; socklen_t len = sizeof(sa); req->fd = accept(listen_socket, (struct sockaddr *)&sa, &len); ... if (req->fd >= 0) { // Multiplexing mechanism struct pollfd fds; int ret; fds.fd = req->fd; fds.events = POLLIN; fds.revents = 0; do { errno = 0; ret = poll(&fds, 1, 5000); } while (ret < 0 && errno == EINTR); if (ret > 0 && (fds.revents & POLLIN)) { break; } // Just close the socket connection, not empty req - > env fcgi_close(req, 1, 0); } ... if (fcgi_read_request(req)) { return req->fd; } }

And it's important to put the request into the global variable sapi_globals.server_context, which facilitates calls to requests elsewhere.

SG(server_context) = (void *) &request;

5. Read data

The following code deletes some exception handling, showing only the normal execution order.
In fcgi_read_request, we complete our message reading in the message communication sample, and many of them are len = (hdr. contentLengthB1 << 8) | hdr. contentLengthB0; the operation has been explained in the previous FastCGI header.
This is the key to the analysis of FastCGI protocol.

static inline ssize_t safe_read(fcgi_request *req, const void *buf, size_t count) { int ret; size_t n = 0; do { errno = 0; ret = read(req->fd, ((char*)buf)+n, count-n); n += ret; } while (n != count); return n; }

static int fcgi_read_request(fcgi_request *req) { ... if (safe_read(req, &hdr, sizeof(fcgi_header)) != sizeof(fcgi_header) || hdr.version < FCGI_VERSION_1) { return 0; } len = (hdr.contentLengthB1 << 8) | hdr.contentLengthB0; padding = hdr.paddingLength; req->id = (hdr.requestIdB1 << 8) + hdr.requestIdB0; if (hdr.type == FCGI_BEGIN_REQUEST && len == sizeof(fcgi_begin_request)) { char *val; if (safe_read(req, buf, len+padding) != len+padding) { return 0; } req->keep = (((fcgi_begin_request*)buf)->flags & FCGI_KEEP_CONN); switch ((((fcgi_begin_request*)buf)->roleB1 << 8) + ((fcgi_begin_request*)buf)->roleB0) { case FCGI_RESPONDER: val = estrdup("RESPONDER"); zend_hash_update(req->env, "FCGI_ROLE", sizeof("FCGI_ROLE"), &val, sizeof(char*), NULL); break; ... default: return 0; } if (safe_read(req, &hdr, sizeof(fcgi_header)) != sizeof(fcgi_header) || hdr.version < FCGI_VERSION_1) { return 0; } len = (hdr.contentLengthB1 << 8) | hdr.contentLengthB0; padding = hdr.paddingLength; while (hdr.type == FCGI_PARAMS && len > 0) { if (safe_read(req, &hdr, sizeof(fcgi_header)) != sizeof(fcgi_header) || hdr.version < FCGI_VERSION_1) { req->keep = 0; return 0; } len = (hdr.contentLengthB1 << 8) | hdr.contentLengthB0; padding = hdr.paddingLength; } ... } }

6. Executing scripts

Assuming that this request is PHP_MODE_STANDARD, php_execute_script is called to execute the PHP file. This is not going to happen.

7. Closing the request

fcgi_finish_request(&request, 1);

int fcgi_finish_request(fcgi_request *req, int force_close) { int ret = 1; if (req->fd >= 0) { if (!req->closed) { ret = fcgi_flush(req, 1); req->closed = 1; } fcgi_close(req, force_close, 1); } return ret; }

Call fcgi_flush in fcgi_finish_request, encapsulate a FCGI_END_REQUEST message body in fcgi_flush, and write the client descriptor of socket connection through safe_write.

8. Processing of Standard Input and Standard Output

Standard input and standard output are not discussed above, but they are actually defined in the structure of cgi_sapi_module. But the structure of cgi_sapi_module, which is a sapi_module_struct structure, is too coupled with other codes. I don't have a deep understanding of it myself. Here, I simply make a comparison and hope that other netizens can point out and supplement it.

sapi_cgi_read_post is defined in cgi_sapi_module to process POST data reading.

while (read_bytes < count_bytes) { fcgi_request *request = (fcgi_request*) SG(server_context); tmp_read_bytes = fcgi_read(request, buffer + read_bytes, count_bytes - read_bytes); read_bytes += tmp_read_bytes; }

In fcgi_read, the data of FCGI_STDIN is read.
At the same time, sapi_cgibin_ub_write is defined in cgi_sapi_module to take over the output processing, and sapi_cgibin_single_write is called in it. Finally, the encapsulation of FCGI_STDOUT FastCGI data package is realized.

fcgi_write(request, FCGI_STDOUT, str, str_length);

Write at the end

It is not easy to make such a note of the process of learning and understanding knowledge of FastCGI, and to write out the content of one's own understanding (self-perception) in a methodical way, so that others can understand it more easily. At the same time, let oneself have a deeper understanding of this knowledge point. There are still many confusions in the learning and understanding of PHP code, which need to be digested and understood slowly by myself later.

This article has been merged into http://www.php-internals.com/...

This article is my own understanding, the level is limited, if there are errors, I hope you will correct.
It's so boring. You can see all the real warriors here!
The old iron is free to admire. Everyone's encouragement is that Kang Ge continues to export power!