tornado server implementation principles

The tornado version analyzed in this paper is 1.0.0, which has a relatively small amount of code to help us find its core parts. Here You can download version 1.0.0 tornado.

1. Basic Process

Use the following code to implement the simplest tornado server:

import tornado.httpserver import tornado.ioloop import tornado.web class MainHandler(tornado.web.RequestHandler): def get(self): self.write('hello world') if __name__ == '__main__': application = tornado.web.Application( handlers=[ (r'/', MainHandler) ] ) http_server = tornado.httpserver.HTTPServer(application) http_server.listen(8000) tornado.ioloop.IOLoop.instance().start()

Here, tornado's httpserver, ioloop and web modules are used, of which httpserver is the http server responsible for receiving and processing connections; ioloop is the underlying event loop system responsible for notifying when events are heard; and the web module is equivalent to a web application.

Overall, a tornado server can be divided into four layers, and the workflow is roughly as follows:

The picture above may be a bit complicated. It doesn't matter if you don't understand it at the moment. It will be explained in detail later.

2. Asynchronous non-blocking socket

tornado's high performance mainly comes from two classes, ioloop.IOLoop and iostream.IOStream, which is an event loop and monitors and schedules different socket objects through epoll. IOStream is the encapsulation of socket objects, which implements non-blocking + asynchronous callback of socket read and write function by relying on the event loop of IOLoop.

The main code for ioloop.IOLoop is as follows:

import select import logging class IOLoop(object): _EPOLLIN = 0x001 _EPOLLPRI = 0x002 _EPOLLOUT = 0x004 _EPOLLERR = 0x008 _EPOLLHUP = 0x010 _EPOLLRDHUP = 0x2000 _EPOLLONESHOT = (1 << 30) _EPOLLET = (1 << 31) # Types of events that can be listened on,Literally understood NONE = 0 READ = _EPOLLIN WRITE = _EPOLLOUT ERROR = _EPOLLERR | _EPOLLHUP | _EPOLLRDHUP def __init__(self): self._impl = select.epoll() # Not supported epoll In the system, Event notification mechanism will degrade to kqueue perhaps select self._handlers = {} @classmethod def instance(cls): # Need IOLoop Object time,Do not instantiate directly,Instead, call this class method,This ensures that IOLoop Is singleton if not hasattr(cls, "_instance"): cls._instance = cls() return cls._instance def add_handler(self, fd, handler, events): self._handlers[fd] = handler self._impl.register(fd, events | self.ERROR) def update_handler(self, fd, events): self._impl.modify(fd, events | self.ERROR) def remove_handler(self, fd): self._handlers.pop(fd, None) try: self._impl.unregister(fd) except (OSError, IOError): logging.debug("Error deleting fd from IOLoop", exc_info=True) def start(self): while 1: event_pairs = self._impl.poll() for fd, events in event_pairs: self._handlers[fd](fd, events)

IOLoop is essentially an encapsulation of epoll, and its usage is relatively simple: first, we can call the add&update&remove_handler method to set the handle, event, and callback functions that need to be listened on, and then, whenever the start method is called, IOLoop will use epoll to listen on and call the corresponding callback function when it listens on events.This enables monitoring and scheduling.

The main code for the iostream.IOStream class is as follows:

import errno import logging import socket class IOStream: def __init__(self, socket, io_loop, read_chunk_size=4096): self.socket = socket self.socket.setblocking(False) self.io_loop = io_loop self.read_chunk_size = read_chunk_size self._read_buffer = "" self._write_buffer = "" self._read_delimiter = None self._read_callback = None self._write_callback = None self._state = self.io_loop.ERROR self.io_loop.add_handler( self.socket.fileno(), self._handle_events, self._state) def read_until(self, delimiter, callback): loc = self._read_buffer.find(delimiter) if loc != -1: callback(self._consume(loc + len(delimiter))) return self._read_delimiter = delimiter self._read_callback = callback self._add_io_state(self.io_loop.READ) def write(self, data, callback=None): self._write_buffer += data self._add_io_state(self.io_loop.WRITE) self._write_callback = callback def _consume(self, loc): # This method is responsible for truncating the specified length of the read buffer back result = self._read_buffer[:loc] self._read_buffer = self._read_buffer[loc:] return result def close(self): if self.socket is not None: self.io_loop.remove_handler(self.socket.fileno()) self.socket.close() self.socket = None def _add_io_state(self, state): # Call this method to add events to listen for if not self._state & state: self._state = self._state | state self.io_loop.update_handler(self.socket.fileno(), self._state) def _handle_events(self, fd, events): # This method is called back by the event loop # It first calls the corresponding method to handle it based on the event type,Then update the events registered in the event loop based on the processing results if events & self.io_loop.READ: self._handle_read() if not self.socket: return if events & self.io_loop.WRITE: self._handle_write() if not self.socket: return if events & self.io_loop.ERROR: self.close() return # Determine if you still need to read&Write data,Then re-register the event state = self.io_loop.ERROR if self._read_delimiter: state |= self.io_loop.READ if self._write_buffer: state |= self.io_loop.WRITE if state != self._state: self._state = state self.io_loop.update_handler(self.socket.fileno(), self._state) def _handle_read(self): # Trigger this method when there are readable events,Read data from a readable buffer and write it to self._read_buffer in try: chunk = self.socket.recv(self.read_chunk_size) except socket.error, e: if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN): return else: logging.warning("Read error on %d: %s", self.socket.fileno(), e) self.close() return if not chunk: self.close() return self._read_buffer += chunk # If terminator is set,And has read the terminator,No more reading if self._read_delimiter: loc = self._read_buffer.find(self._read_delimiter) if loc != -1: callback = self._read_callback delimiter_len = len(self._read_delimiter) self._read_callback = None self._read_delimiter = None callback(self._consume(loc + delimiter_len)) def _handle_write(self): # Trigger this function when there are writable events,hold self._write_buffer Write data to writable buffer,Until I finish or I can't write it down while self._write_buffer: try: num_bytes = self.socket.send(self._write_buffer) self._write_buffer = self._write_buffer[num_bytes:] except socket.error, e: if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN): break else: logging.warning("Write error on %d: %s", self.socket.fileno(), e) self.close() return # After writing,Call a preset callback if not self._write_buffer and self._write_callback: callback = self._write_callback self._write_callback = None callback()

IOStream is essentially a socket object, but it becomes asynchronous through an event loop. When we call its read_until or write method, IOStream does not immediately try to read or write data. Instead, it sets a callback function and then calls the _add_io_state method to add monitoring of readable or writable events to the event loop.When an event loop listens for an event, it calls IOStream's _handle_events method, which calls _handle_read and _handle_write again to read or write data based on the type of event, and calls the previously set callback so that a read-write is considered complete.

In addition, IOStream sets its socket to a non-blocking state to avoid blocking when the socket is unreadable and not writable. The main reason for tornado's high performance is event loop callback and non-blocking socket. First, the mechanism of asynchronous callback allows tornado to maintain multiple socket connections in a single thread at the same time.When a connection triggers an event, a callback is called to handle it. The non-blocking state of the socket then maximizes CPU time by avoiding the blocking that occurs when processing the event.

Overall, the workflow of IOStream + IOLoop is as follows:

3. web Server

There are three classes in the httpserver module: HTTPServer, HTTPConnection and HTTPRequest. HTTPServer is equivalent to the encapsulation of the socket on the server side and is responsible for receiving the client's connection. The connection is handled by HTTPConnection, which uses the iostream module to read the client's request data and then encapsulates the request data into an HTTPRequest object.Delegate this object to the web application for processing.

The main code for HTTPServer is as follows:

import errno import socket class HTTPServer: def __init__(self, application): self.application = application self.io_loop = ioloop.IOLoop.instance() self._socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) self._socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) self._socket.setblocking(0) def listen(self, port, address=''): self._socket.bind((address, port)) self._socket.listen(128) self.io_loop.add_handler(self._socket.fileno(), self._handle_events, ioloop.IOLoop.READ) def _handle_events(self, fd, events): while 1: try: connection, address = self._socket.accept() except socket.error, e: # After the client has been received,Jump out of the loop if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN): return raise stream = iostream.IOStream(connection, io_loop=self.io_loop) HTTPConnection(stream, address, self.application)

HTTPServer is a web server-side entry. First, we specify the web application that the web server is supporting by instantiating this object. Then, by calling its listen method, ioloop listens for readable events on the specified port, that is, client connections. When there is a client connection, HTTPServer instantiates an IOStream object first.This object is equivalent to encapsulating a client socket object and then creating a new HTTPConnection object to handle the new connection.

The main code for HTTPConnection is as follows:

import tornado.httputil class HTTPConnection: def __init__(self, stream, address, application): self.stream = stream self.address = address self.application = application self.stream.read_until("\r\n\r\n", self._on_headers) def _on_headers(self, data): eol = data.find("\r\n") start_line = data[:eol] method, uri, version = start_line.split(" ") headers = tornado.httputil.HTTPHeaders.parse(data[eol:]) # This parses the request data into a dictionary object and returns it self._request = HTTPRequest( connection=self, method=method, uri=uri, version=version, headers=headers, remote_ip=self.address[0]) self.application(self._request) def write(self, chunk): self.stream.write(chunk, self._on_write_complete) def _on_write_complete(self): self.stream.close()

After HTTPServer receives a new connection, HTTPConnection handles the new connection. First, HTTPConnection uses IOStream to read the client's request data asynchronously, parse the contents of the request row and the request header data, and then encapsulate the data into an HTTPRequest object for the web application to process the request object. After the web application has finished processing, the HTTPConnectionThe request is processed by calling its write method, writing the response data through IOStream, and closing the socket connection.

HTTPRequest is mainly an encapsulation of request data, nothing to say. Its main code is as follows:

import urlparse class HTTPRequest: def __init__(self, method, uri, version="HTTP/1.0", headers=None, remote_ip=None, connection=None): self.method = method self.uri = uri self.version = version self.headers = headers self.remote_ip = remote_ip self.host = self.headers.get("Host") or "127.0.0.1" self.connection = connection scheme, netloc, path, query, fragment = urlparse.urlsplit(uri) self.path = path self.query = query def write(self, chunk): # web Apply call this method to write response data,adopt HTTPConnection Ultimately by IOStream To write data self.connection.write(chunk)

This completes an http server with a process like the following:

4. web Applications

The responsibility of the web application is to receive the request data from the web server and return the response results after performing some logic based on the data. The web module of tornado is responsible for the web application.

First, let's analyze the web.Application class, which, in short, has code similar to the following:

import re class Application(object): """ //In fact, this class does other things, such as setting debug mode, specifying wsgi, and so on. //In addition, the mapping relationship of routes is actually encapsulated by the web.URLSpec class //However, these are not the main points. This code is just for ease of understanding and to illustrate what the Application does """ def __init__(self, handlers): self.handlers = handlers def __call__(self, request): path = request.path h_obj = None for pattern, handler in self.handlers: if re.match(pattern, path): h_obj = handler(request) h_obj._execute()

The web.Application is the entry to the web application. As you can see from the code just now, it is responsible for the distribution of routes. First we instantiate the object and pass in parameters such as handlers = [(r'/', MainHandler)]. Then we call the application object and pass in the request, and it will find the handler class according to the path to which the request data is sent.Instantiate the handler class and call the handler's _execute method to have the handler object perform the specific action.

In general, the handler class we specify inherits the web.RequestHandler, and its code is almost as follows:

import httplib class RequestHandler(object): """ //The RequestHandler here also lists only the core code //In addition, RequestHandler implements functions such as cookie acquisition and setup, user authentication, and anti-csrf attacks. """ def __init__(self, request): self.request = request self._headers = { "Content-Type": "text/html; charset=UTF-8", } self._write_buffer = [] self._status_code = 200 def get(self): # This method requires us to define it by request type,except get abroad,Also supported head,post,delete and put raise HTTPError(405) def write(self, chunk): self._write_buffer.append(chunk.encode('utf8')) def finish(self): # First generate response status and response header lines = [self.request.version + " " + str(self._status_code) + " " + httplib.responses[self._status_code]] lines.extend(["%s: %s" % (n, v) for n, v in self._headers.iteritems()]) headers = "\r\n".join(lines) + "\r\n\r\n" # Then generate the response content chunk = "".join(self._write_buffer) self.request.write(headers + chunk) def _execute(self): getattr(self, self.request.method.lower())() self.finish()

RequestHandler encapsulates the response. When an Application calls its _execute method, it reflects the method we override, such as the get method, based on the type of request. After executing the method we defined, it calls its own finish method to generate the response message and returns it via request.

Application and RequestHandler implement a framework for web applications that simply inherit the RequestHandler class and then override the corresponding method for the request type.

V. Summary

In summary, tornado servers can be divided into four layers: the Event Loop Layer, the TCP Transport Layer, the HTTP Layer, and the web application, which work like this:

In the phase of writing demo applications, we did four things:

Inherit RequestHandler, override the method corresponding to the request type, such as get method
Define Route for Application
Specify app and port for HTTPServer
Start IOLoop

This starts a tornado application and the flow of a request is as follows:

IOLoop listens for new client connections and notifies HTTPServer
HTTPServer instantiates an HTTPConnection to process this new client
HTTPConnection uses IOStream to read client request data asynchronously
IOStream registers readable events through IOLoop, reads data when the event is triggered, and then calls the callback function of HTTPConnection
HTTPConnection parses the read request data and encapsulates the parsed request data with an HTTPRequest object
HTTPConnection Sends HTTPRequest to Application
Application finds the corresponding RequestHandler by routing it to process the request
RequestHandler finds the processing method corresponding to the request type through reflection and processes the request
After processing is complete, RequestHandler calls the write method of HTTPRequest to write the response result
HTTPRequest hands the response results to HTTPConnection, which uses IOStream to write response data
IOStream continues to write data asynchronously using IOLoop and calls the callback function of HTTPConnection when it has finished writing
The HTTPConnection is called back, it closes the socket connection, and the request ends (not discussed at http1.1 or keep-alive)

tornado server implementation principles

5 May 2020, 08:53 | Views: 6092

Add new comment

0 comments