tornado server implementation principles

The tornado version analyzed in this paper is 1.0.0, which has a relatively small amount of code to help us find its core parts. Here You can download version 1.0.0 tornado.

1. Basic Process

Use the following code to implement the simplest tornado server:

import tornado.httpserver
import tornado.ioloop
import tornado.web


class MainHandler(tornado.web.RequestHandler):

    def get(self):
        self.write('hello world')


if __name__ == '__main__':
    application = tornado.web.Application(
        handlers=[
            (r'/', MainHandler)
        ]
    )
    http_server = tornado.httpserver.HTTPServer(application)
    http_server.listen(8000)
    tornado.ioloop.IOLoop.instance().start()

Here, tornado's httpserver, ioloop and web modules are used, of which httpserver is the http server responsible for receiving and processing connections; ioloop is the underlying event loop system responsible for notifying when events are heard; and the web module is equivalent to a web application.

Overall, a tornado server can be divided into four layers, and the workflow is roughly as follows:

The picture above may be a bit complicated. It doesn't matter if you don't understand it at the moment. It will be explained in detail later.

2. Asynchronous non-blocking socket

tornado's high performance mainly comes from two classes, ioloop.IOLoop and iostream.IOStream, which is an event loop and monitors and schedules different socket objects through epoll. IOStream is the encapsulation of socket objects, which implements non-blocking + asynchronous callback of socket read and write function by relying on the event loop of IOLoop.

The main code for ioloop.IOLoop is as follows:

import select
import logging


class IOLoop(object):
    _EPOLLIN = 0x001
    _EPOLLPRI = 0x002
    _EPOLLOUT = 0x004
    _EPOLLERR = 0x008
    _EPOLLHUP = 0x010
    _EPOLLRDHUP = 0x2000
    _EPOLLONESHOT = (1 << 30)
    _EPOLLET = (1 << 31)

    # Types of events that can be listened on,Literally understood
    NONE = 0
    READ = _EPOLLIN
    WRITE = _EPOLLOUT
    ERROR = _EPOLLERR | _EPOLLHUP | _EPOLLRDHUP

    def __init__(self):
        self._impl = select.epoll()  # Not supported epoll In the system, Event notification mechanism will degrade to kqueue perhaps select
        self._handlers = {}

    @classmethod
    def instance(cls):
        # Need IOLoop Object time,Do not instantiate directly,Instead, call this class method,This ensures that IOLoop Is singleton
        if not hasattr(cls, "_instance"):
            cls._instance = cls()
        return cls._instance

    def add_handler(self, fd, handler, events):
        self._handlers[fd] = handler
        self._impl.register(fd, events | self.ERROR)

    def update_handler(self, fd, events):
        self._impl.modify(fd, events | self.ERROR)

    def remove_handler(self, fd):
        self._handlers.pop(fd, None)
        try:
            self._impl.unregister(fd)
        except (OSError, IOError):
            logging.debug("Error deleting fd from IOLoop", exc_info=True)

    def start(self):
        while 1:
            event_pairs = self._impl.poll()
            for fd, events in event_pairs:
                self._handlers[fd](fd, events)

IOLoop is essentially an encapsulation of epoll, and its usage is relatively simple: first, we can call the add&update&remove_handler method to set the handle, event, and callback functions that need to be listened on, and then, whenever the start method is called, IOLoop will use epoll to listen on and call the corresponding callback function when it listens on events.This enables monitoring and scheduling.

The main code for the iostream.IOStream class is as follows:

import errno
import logging
import socket


class IOStream:

    def __init__(self, socket, io_loop, read_chunk_size=4096):
        self.socket = socket
        self.socket.setblocking(False)
        self.io_loop = io_loop
        self.read_chunk_size = read_chunk_size
        self._read_buffer = ""
        self._write_buffer = ""
        self._read_delimiter = None
        self._read_callback = None
        self._write_callback = None
        self._state = self.io_loop.ERROR
        self.io_loop.add_handler(
            self.socket.fileno(), self._handle_events, self._state)

    def read_until(self, delimiter, callback):
        loc = self._read_buffer.find(delimiter)
        if loc != -1:
            callback(self._consume(loc + len(delimiter)))
            return
        self._read_delimiter = delimiter
        self._read_callback = callback
        self._add_io_state(self.io_loop.READ)

    def write(self, data, callback=None):
        self._write_buffer += data
        self._add_io_state(self.io_loop.WRITE)
        self._write_callback = callback

    def _consume(self, loc):
        # This method is responsible for truncating the specified length of the read buffer back
        result = self._read_buffer[:loc]
        self._read_buffer = self._read_buffer[loc:]
        return result

    def close(self):
        if self.socket is not None:
            self.io_loop.remove_handler(self.socket.fileno())
            self.socket.close()
            self.socket = None

    def _add_io_state(self, state):
        # Call this method to add events to listen for
        if not self._state & state:
            self._state = self._state | state
            self.io_loop.update_handler(self.socket.fileno(), self._state)

    def _handle_events(self, fd, events):
        # This method is called back by the event loop
        # It first calls the corresponding method to handle it based on the event type,Then update the events registered in the event loop based on the processing results
        if events & self.io_loop.READ:
            self._handle_read()
        if not self.socket:
            return
        if events & self.io_loop.WRITE:
            self._handle_write()
        if not self.socket:
            return
        if events & self.io_loop.ERROR:
            self.close()
            return
        # Determine if you still need to read&Write data,Then re-register the event
        state = self.io_loop.ERROR
        if self._read_delimiter:
            state |= self.io_loop.READ
        if self._write_buffer:
            state |= self.io_loop.WRITE
        if state != self._state:
            self._state = state
            self.io_loop.update_handler(self.socket.fileno(), self._state)

    def _handle_read(self):
        # Trigger this method when there are readable events,Read data from a readable buffer and write it to self._read_buffer in
        try:
            chunk = self.socket.recv(self.read_chunk_size)
        except socket.error, e:
            if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN):
                return
            else:
                logging.warning("Read error on %d: %s",
                                self.socket.fileno(), e)
                self.close()
                return
        if not chunk:
            self.close()
            return
        self._read_buffer += chunk
        # If terminator is set,And has read the terminator,No more reading
        if self._read_delimiter:
            loc = self._read_buffer.find(self._read_delimiter)
            if loc != -1:
                callback = self._read_callback
                delimiter_len = len(self._read_delimiter)
                self._read_callback = None
                self._read_delimiter = None
                callback(self._consume(loc + delimiter_len))

    def _handle_write(self):
        # Trigger this function when there are writable events,hold self._write_buffer Write data to writable buffer,Until I finish or I can't write it down
        while self._write_buffer:
            try:
                num_bytes = self.socket.send(self._write_buffer)
                self._write_buffer = self._write_buffer[num_bytes:]
            except socket.error, e:
                if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN):
                    break
                else:
                    logging.warning("Write error on %d: %s",
                                    self.socket.fileno(), e)
                    self.close()
                    return
        # After writing,Call a preset callback
        if not self._write_buffer and self._write_callback:
            callback = self._write_callback
            self._write_callback = None
            callback()

IOStream is essentially a socket object, but it becomes asynchronous through an event loop. When we call its read_until or write method, IOStream does not immediately try to read or write data. Instead, it sets a callback function and then calls the _add_io_state method to add monitoring of readable or writable events to the event loop.When an event loop listens for an event, it calls IOStream's _handle_events method, which calls _handle_read and _handle_write again to read or write data based on the type of event, and calls the previously set callback so that a read-write is considered complete.

In addition, IOStream sets its socket to a non-blocking state to avoid blocking when the socket is unreadable and not writable. The main reason for tornado's high performance is event loop callback and non-blocking socket. First, the mechanism of asynchronous callback allows tornado to maintain multiple socket connections in a single thread at the same time.When a connection triggers an event, a callback is called to handle it. The non-blocking state of the socket then maximizes CPU time by avoiding the blocking that occurs when processing the event.

Overall, the workflow of IOStream + IOLoop is as follows:

   

3. web Server

There are three classes in the httpserver module: HTTPServer, HTTPConnection and HTTPRequest. HTTPServer is equivalent to the encapsulation of the socket on the server side and is responsible for receiving the client's connection. The connection is handled by HTTPConnection, which uses the iostream module to read the client's request data and then encapsulates the request data into an HTTPRequest object.Delegate this object to the web application for processing.

The main code for HTTPServer is as follows:

import errno
import socket


class HTTPServer:

    def __init__(self, application):
        self.application = application
        self.io_loop = ioloop.IOLoop.instance()
        self._socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0)
        self._socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
        self._socket.setblocking(0)

    def listen(self, port, address=''):
        self._socket.bind((address, port))
        self._socket.listen(128)
        self.io_loop.add_handler(self._socket.fileno(),
                                 self._handle_events,
                                 ioloop.IOLoop.READ)

    def _handle_events(self, fd, events):
        while 1:
            try:
                connection, address = self._socket.accept()
            except socket.error, e:
                # After the client has been received,Jump out of the loop
                if e[0] in (errno.EWOULDBLOCK, errno.EAGAIN):
                    return
                raise
            stream = iostream.IOStream(connection, io_loop=self.io_loop)
            HTTPConnection(stream, address, self.application)

HTTPServer is a web server-side entry. First, we specify the web application that the web server is supporting by instantiating this object. Then, by calling its listen method, ioloop listens for readable events on the specified port, that is, client connections. When there is a client connection, HTTPServer instantiates an IOStream object first.This object is equivalent to encapsulating a client socket object and then creating a new HTTPConnection object to handle the new connection.

The main code for HTTPConnection is as follows:

import tornado.httputil


class HTTPConnection:

    def __init__(self, stream, address, application):
        self.stream = stream
        self.address = address
        self.application = application
        self.stream.read_until("\r\n\r\n", self._on_headers)

    def _on_headers(self, data):
        eol = data.find("\r\n")
        start_line = data[:eol]
        method, uri, version = start_line.split(" ")
        headers = tornado.httputil.HTTPHeaders.parse(data[eol:])  # This parses the request data into a dictionary object and returns it
        self._request = HTTPRequest(
            connection=self, method=method, uri=uri, version=version,
            headers=headers, remote_ip=self.address[0])
        self.application(self._request)

    def write(self, chunk):
        self.stream.write(chunk, self._on_write_complete)

    def _on_write_complete(self):
        self.stream.close()

After HTTPServer receives a new connection, HTTPConnection handles the new connection. First, HTTPConnection uses IOStream to read the client's request data asynchronously, parse the contents of the request row and the request header data, and then encapsulate the data into an HTTPRequest object for the web application to process the request object. After the web application has finished processing, the HTTPConnectionThe request is processed by calling its write method, writing the response data through IOStream, and closing the socket connection.

HTTPRequest is mainly an encapsulation of request data, nothing to say. Its main code is as follows:

import urlparse


class HTTPRequest:

    def __init__(self, method, uri, version="HTTP/1.0", headers=None,
                 remote_ip=None, connection=None):
        self.method = method
        self.uri = uri
        self.version = version
        self.headers = headers
        self.remote_ip = remote_ip
        self.host = self.headers.get("Host") or "127.0.0.1"
        self.connection = connection

        scheme, netloc, path, query, fragment = urlparse.urlsplit(uri)
        self.path = path
        self.query = query

    def write(self, chunk):
        # web Apply call this method to write response data,adopt HTTPConnection Ultimately by IOStream To write data
        self.connection.write(chunk)

This completes an http server with a process like the following:

   

4. web Applications

The responsibility of the web application is to receive the request data from the web server and return the response results after performing some logic based on the data. The web module of tornado is responsible for the web application.

First, let's analyze the web.Application class, which, in short, has code similar to the following:

import re


class Application(object):
    """
    //In fact, this class does other things, such as setting debug mode, specifying wsgi, and so on.
    //In addition, the mapping relationship of routes is actually encapsulated by the web.URLSpec class
    //However, these are not the main points. This code is just for ease of understanding and to illustrate what the Application does
    """

    def __init__(self, handlers):
        self.handlers = handlers

    def __call__(self, request):
        path = request.path
        h_obj = None
        for pattern, handler in self.handlers:
            if re.match(pattern, path):
                h_obj = handler(request)
                h_obj._execute()

The web.Application is the entry to the web application. As you can see from the code just now, it is responsible for the distribution of routes. First we instantiate the object and pass in parameters such as handlers = [(r'/', MainHandler)]. Then we call the application object and pass in the request, and it will find the handler class according to the path to which the request data is sent.Instantiate the handler class and call the handler's _execute method to have the handler object perform the specific action.

In general, the handler class we specify inherits the web.RequestHandler, and its code is almost as follows:

import httplib


class RequestHandler(object):
    """
    //The RequestHandler here also lists only the core code
    //In addition, RequestHandler implements functions such as cookie acquisition and setup, user authentication, and anti-csrf attacks.
    """

    def __init__(self, request):
        self.request = request
        self._headers = {
            "Content-Type": "text/html; charset=UTF-8",
        }
        self._write_buffer = []
        self._status_code = 200

    def get(self):
        # This method requires us to define it by request type,except get abroad,Also supported head,post,delete and put
        raise HTTPError(405)

    def write(self, chunk):
        self._write_buffer.append(chunk.encode('utf8'))

    def finish(self):
        # First generate response status and response header
        lines = [self.request.version + " " + str(self._status_code) + " " +
                 httplib.responses[self._status_code]]
        lines.extend(["%s: %s" % (n, v) for n, v in self._headers.iteritems()])
        headers = "\r\n".join(lines) + "\r\n\r\n"
        # Then generate the response content
        chunk = "".join(self._write_buffer)
        self.request.write(headers + chunk)

    def _execute(self):
        getattr(self, self.request.method.lower())()
        self.finish()

RequestHandler encapsulates the response. When an Application calls its _execute method, it reflects the method we override, such as the get method, based on the type of request. After executing the method we defined, it calls its own finish method to generate the response message and returns it via request.

Application and RequestHandler implement a framework for web applications that simply inherit the RequestHandler class and then override the corresponding method for the request type.

V. Summary

In summary, tornado servers can be divided into four layers: the Event Loop Layer, the TCP Transport Layer, the HTTP Layer, and the web application, which work like this:

In the phase of writing demo applications, we did four things:

  • Inherit RequestHandler, override the method corresponding to the request type, such as get method
  • Define Route for Application
  • Specify app and port for HTTPServer
  • Start IOLoop

This starts a tornado application and the flow of a request is as follows:

  1. IOLoop listens for new client connections and notifies HTTPServer
  2. HTTPServer instantiates an HTTPConnection to process this new client
  3. HTTPConnection uses IOStream to read client request data asynchronously
  4. IOStream registers readable events through IOLoop, reads data when the event is triggered, and then calls the callback function of HTTPConnection
  5. HTTPConnection parses the read request data and encapsulates the parsed request data with an HTTPRequest object
  6. HTTPConnection Sends HTTPRequest to Application
  7. Application finds the corresponding RequestHandler by routing it to process the request
  8. RequestHandler finds the processing method corresponding to the request type through reflection and processes the request
  9. After processing is complete, RequestHandler calls the write method of HTTPRequest to write the response result
  10. HTTPRequest hands the response results to HTTPConnection, which uses IOStream to write response data
  11. IOStream continues to write data asynchronously using IOLoop and calls the callback function of HTTPConnection when it has finished writing
  12. The HTTPConnection is called back, it closes the socket connection, and the request ends (not discussed at http1.1 or keep-alive)

Tags: Python socket Web Server Fragment

Posted on Tue, 05 May 2020 08:53:58 -0400 by jon23d