2021-2022-1-diocs-TCP/IP and network programming

1, Task details

Self study textbook chapter 13, submit study notes (10 points)

Summary of knowledge points and their most rewarding content (3 points)

Problems and solutions (2 points)

Practice content and screenshot, code link (3 points)

... (knowledge structure, knowledge integrity, etc., submitting markdown documents, using openeuler system, etc.) (2 points)

2, Summary of knowledge points

This chapter discusses TCP/IP and network programming, which is divided into two parts. The first part discusses TCP/IP protocol and its application, including TCP/IP stack, IP address, host name, DNS, IP packet and router; This paper introduces UDP and TCP protocol, port number and data flow in TCP/IP network; The server client computing model and socket programming interface are described; Network programming is demonstrated by using examples of UDP and TCP sockets. The first programming project can implement a pair of TCP server clients that perform file operations through the Internet, allowing users to define other communication protocols to reliably transmit file content.
The second part of this chapter introduces Web and CGI programming, and explains HTTP programming model, Web page and Web browser; It shows how to configure Linux HTTPD server to support user Web page, PHP and CGI programming; The dynamic Web pages of client and server are explained; Demonstrates how to create server-side dynamic Web pages using PHP and CGI.

1.TCP/IP protocol

TCP/IP (Comer 19882001;RFC1180 1991) is the foundation of the Internet. TCP stands for transmission control protocol. IP stands for internet protocol. There are currently two versions of IP, IPv4 and IPv6. IPv4 uses a 32-bit address and IPv6 uses a 128 bit address. This section focuses on IPv4, which is still the most used IP version at present. The organizational structure of TCP/IP is divided into several levels, commonly known as TCP/IP stack. As shown in the figure, each level of TCP/IP and the representative components and functions of each level are shown.

The data transmission on or above the transport layer between the process and the host is only logical transmission. The actual data transmission occurs at the Internet (IP) and link layers, which divide data packets into data frames for transmission between physical networks. The following figure shows the data flow path in the TCP/IP network.

2.IP host and IP address

The IP address is divided into two parts: networkID field and HostID field. According to the division, IP addresses are divided into A~E categories. For example, a class B IP address is divided into a 16 bit networkID, where the first two bits are 10, and then a 16 bit HostID field. Packets destined for IP addresses are first sent to routers with the same networkID. The router will forward the packet to a specific host in the network through the HostID. Each host has a local host name localhost, and the default IP address is 127.0.0.1. The link layer of the local host is a loopback virtual device, which routes each packet back to the same localhost. This feature allows us to run TCP/IP applications on the same computer without actually connecting to the Internet.

3.IP protocol

IP protocol is used to send / receive packets between IP hosts. IP works best. The IP host only sends packets to the receiving host, but it can not guarantee that the packets will be sent to their destination or in order. This means that IP is not a reliable protocol. If necessary, reliability must be realized above the IP layer. The following figure shows the IP header format:

4.UDP/TCP

UDP (User Datagram Protocol) (RFC768 1980;Comer 1988) runs over IP for sending / receiving datagrams. Similar to IP, UDP does not guarantee reliability, but it is fast and efficient. It can be used in situations where reliability is not important.

TCP (transmission control protocol) is a connection oriented protocol used to send / receive data streams. TCP can also run over IP, but it ensures reliable data transmission. Generally, UDP is similar to USPS sending mail, while TCP is similar to telephone connection.

5. Port number

Application = (host IP, protocol, port number)
The protocol is TCP or UDP, and the port number is the only unsigned short integer assigned to the application. To use UDP or TCP, an application (process) must first select or obtain a port number. The first 1024 port numbers have been reserved. Other port numbers are available for general use. The application can select an available port number or let the operating system kernel assign a port number. The following figure shows some applications that use TCP in the transport layer and their default port numbers.

6. Data flow in TCP / IP network

In the figure, the data of the application layer is passed to the transport layer, which adds a TCP or UDP header to the data to identify the transport protocol used. The combined data is transferred to the IP network layer, and an IP header containing IP address is added to identify the sending and receiving hosts. Then, the combined data is transmitted to the network link layer, which divides the data into multiple frames and adds the addresses of the sending and receiving networks for transmission between physical networks. The mapping of IP address to network address is performed by address resolution protocol (ARP) (ARP1982). At the receiving end, the data encoding process is the opposite. Each layer unpacks the received data, reassembles the data and passes the data to the upper layer by stripping the data header. The original data of the application on the sending host will eventually be transferred to the corresponding application on the receiving host.

7. Socket programming

(1) Socket address

struct sockaddr_in { sa_family_t sin_family; // AF_INET for TCP/IP // port number in_port_t sin_port; struct in_addr sin_addr;// IP address ); // internet address struct in_addr { // IP address in network byte order s_addr; uint32_t )；

In the socket address structure,
● sin of TCP/IP network_ Family is always set to AF_INET.

● sin_port contains the port numbers in network byte order.

● sin addr is the host IP address in network byte order.

(2) Socket API

The server must create a socket and bind it to the socket address containing the server IP address and port number. It can use a fixed port number or let the operating system kernel choose a port number (if sin port is 0). In order to communicate with the server, the client must create a socket. For UPD sockets, you can bind the socket to the server address. If the socket is not bound to any specific server, it must provide a socket address containing the server IP and port number in subsequent sendto() / recvfrom() calls.

(3) TCP/UDP socket

UDP sockets use sendto (/ recvfrom() to send / receive datagrams.

ssize_t sendto(int soCkfd,const void *buf,size_t len,int flags, const struct sockaddr *dest_addr,socklen_t addrlen); ssize_t recvfrom(int sockfd,void *buf,size_t len,int flags, struct sockaddr *src_addr,socklen_t *addrlen);

After creating the socket and binding it to the server address, the TCP server uses listen () and accept () to receive the connection from the client
int listen(int sockfd, int backlog);
listen() marks the socket referenced by sockfd as the socket that will be used to receive human connections. The backlog parameter defines the maximum queue length waiting for connections.
int accept(int sockfd, struct sockaddr *addr, socklen t *addrlen);

(4) Universal socket address structure

Universal socket address structure: sockaddr

struct sockaddr { uint8_t sa_len; sa_family_t sa_family; char sa_data[14]; }; IPv6 Socket address structure
IPv6 Socket address structure in<netinet/in.h>Defined in header file

struct in6_addr
{
unit8_t s6_add[16];

};

#define SIN6_LEN struct sockaddr_in6 { uint8_t sin6_len; sa_family_t sin6_family; in_port_t sin6_port; uint32_t sin6_flowinfo; struct in6_addr sin6_addr; uint32_t sin6_scope_id; };

New struct sockaddr_storage is sufficient to accommodate any socket address structure supported by the system. sockaddr_ The storage structure is defined in the < netinet / in. H > header file

struct sockaddr_storage { uint8_t ss_len; sa_family_t ss_family; };

8. Byte sorting function

Small end and large end (there are two ways to store two bytes in memory)

Little endian: store low order bytes at the starting address

Big endian: stores high-order bytes at the starting address

Host byte order: the byte order used by a given system

Program to output byte order:

#iclude"unp.h" int main(int argc,char **argv) { union{ short s; char c[sizeof(short)]; }un; un.s=0x0102; printf("%s:",CUP_VENDOR_OS); if(sizeof(short)==2){ if(un.c[0]==1&&un.c[1]==2) printf("big-endian\n"); else if (un.c[0]==2&&un.c[1]==1) printf("little-endian\n"); else printf("unknown\n"); }else printf("sizeof(short)=%d\n",sizeof(short)); exit(0); }

9. Byte manipulation function

bzero: bzero sets the specified number of bytes of the target byte string to 0. We often use this function to initialize a socket address structure to 0

bocpy: moves the specified number of bytes from the source byte string to the destination byte string.

bcmp: compare two arbitrary byte strings. If they are the same, the return value is 0; otherwise, the return value is non-0

memset: set the specified number of bytes of the target byte string to c.

mencmp: compare two arbitrary strings. If the same is 0, otherwise a non-0 value is returned. Whether it is greater than 0 or less than 0 depends on the first unequal byte.

INET supporting IPv4_ Simple definition of Pton function:

int inet_pton(int family,const char *strptr,void *addrptr) { if(family==AF_INET) { struct in_addr in_val; if(inet_aton(strptr,&in_val)) { memcpy(addrptr,&in_val,sizeof(struct int_addr)); return(1); } return(0); } errno=EAFNOSUPPROT; return(-1); }

3, The most rewarding content

Web and CGI programming

The world wide web (WWW) or web is a combination of resources and users on the Internet. It uses Hypertext Transfer Protocol (HTTP) (RFC2616 1999) for information exchange. Since it came out in the early 1990s, with the continuous expansion of the ability of the Internet, the web has become an indispensable part of people's daily life all over the world. Therefore, it is very important for computer science students to understand this technology. In this section, we will introduce the basics of HTTP and web programming. Web programming usually includes writing, marking and coding involved in web development, including web content, web client and server scripts, and network security. In a narrow sense, web programming refers to creating and maintaining web pages. The most commonly used languages in web programming are HTML, XHTML, JavaScript, Perl5, and PHP.

Http programming model

HTTP is a server client based protocol for applications on the Internet. It runs on TCP because it requires reliable file transfer. Figure 13.10 shows the HTTP programming model.

In HTTP, the client can issue multiple URL s to send requests to different HTTP servers. It is neither necessary nor desirable for a client to maintain a permanent connection to a specific server. The client connects to the server only to send a request. After sending, the connection will be closed. Similarly, the server connects to the client only to send a response. After sending, the connection will be closed again. Each request or reply requires a separate connection. This means that HTTP is a stateless protocol because no information needs to be maintained between successive requests or responses. Naturally, this will lead to a lot of system overhead and inefficiency. To compensate for this lack of status information, HTTP servers and clients can use cookie s to provide and maintain some status information between them.

Web interface

Web pages are files written in HTML markup language. A web file specifies the layout of a web page through a series of HTML elements, which can be interpreted and displayed on a web browser. Common web browsers include Internet Explorer, Firefox, Google Chrome, etc. Creating a web page is equivalent to creating a text file using HTML elements as building blocks. It is not so much programming as paperwork. Therefore, we will not discuss how to create web pages. Instead, we will use only a sample HTML file to illustrate the nature of a web page. A simple HTML Web file is given below.

CGI programming

CGI stands for the common gateway interface (RFC 3875 2004). It is a protocol that allows the web server to execute programs and dynamically generate web pages according to user input. With CGI.Web server, you don't have to maintain millions of static web page files to meet client requests. Instead, it satisfies client requests by dynamically generating web pages. Figure 13.14 shows the CGI programming model.

In the CGI programming model, the client sends a request, usually an HTML form containing the input and name of the CGI program for the server to execute. After receiving the request, the httpd server will derive a child process to execute the CGI program. CGI programs can use user input to query database systems, such as MySQL, so as to generate HTML files according to user input. When the child process ends, the httpd server sends the generated HTML file back to the client. CGI programs can be written in any programming language, such as C language, sh script and Perl.

4, Practice content (screenshot, code link)

Code link:

https://gitee.com/two_thousand_and_thirteen/zx-code/issues/I4J8R1

#include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <time.h> #include <string.h> #include <unistd.h>

#define MAXLINE 256
#define PORT 7777
void sys_err(char *msg){
perror(msg);
exit(-1);
}
int main(int argc , char **argv){

int sockFd,n; char recvLine[MAXLINE]; struct sockaddr_in servAddr; if (argc != 2) { sys_err("usage: a.out <IPaddress>"); } sockFd=socket(AF_INET,SOCK_STREAM,0); memset(&servAddr,0,sizeof(servAddr)); servAddr.sin_family = AF_INET; servAddr.sin_port = htons(PORT); if (inet_pton(AF_INET,argv[1],&servAddr.sin_addr) <= 0) { sys_err("inet_pton error"); } connect(sockFd,(struct sockaddr *)&servAddr,sizeof(servAddr)); while((n=read(sockFd,recvLine,MAXLINE)) >0 ){ recvLine[n] = '\0'; if(fputs(recvLine,stdout) == EOF){ sys_err("fputs error"); } } if(n <0){ sys_err("read error"); } return 0;

}

Screenshot of code operation:

5, Problems and Solutions

What are the five hidden dangers in socket programming under Linux system?

1. Ignore return status

The first hidden danger is obvious, but it is the most common mistake for novice developers. If you ignore the return status of functions, you may get lost when they fail or partially succeed. In turn, this may spread errors, making it difficult to locate the source of the problem.

Capture and check each return status instead of ignoring them. Consider the example shown in Listing 1, a socket send function.

1. Ignore the return status of API function

int status， sock， mode;
/ Create a new stream (TCP) socket /sock =

socket( AF_INET， SOCK_STREAM， 0 );

...status = send( sock， buffer， buflen， MSG_DONTWAIT );

if (status == -1) {/ send failed /printf( "send failed: %s\n"，？

strerror(errno) );

} else {/ send succeeded -- or did it? /}

Listing 1 explores a function fragment that completes the socket send operation (sending data through the socket). The error state of the function is captured and tested, but this example ignores a feature of send in non blocking mode (enabled by the msg_donwait flag).

The send API function has three possible return values: 0 if the data is successfully queued to the transmission queue. If the queue fails, it returns - 1 (you can know the reason for the failure by using the errno variable). If not all characters can be queued during the function call, the final return value is the number of characters sent.

Due to MSG of send_ The nonblocking nature of the donwait variable. The function call returns after sending all data, some data, or no data. Ignoring the return status here will result in incomplete transmission and subsequent data loss.

2. Peer socket closure

The interesting thing about UNIX is that you can think of almost anything as a file. Files themselves, directories, pipes, devices, and sockets are treated as files. This is a novel abstraction, which means that a complete set of API s can be used for a wide range of device types.

Consider the read API function, which reads a certain number of bytes from a file. The read function returns the number of bytes read (up to the maximum value you specify); Or - 1, indicating an error; Or 0, if the end of the file has been reached.

If a read operation is completed on a socket and a return value of 0 is obtained, it indicates that the peer layer on the remote socket side has called the close API method. This indication is the same as file reading -- no extra data can be read through the descriptor (see Listing 2).

2. Properly handle the return value of the read API function

int sock， status;sock = socket( AF_INET， SOCK_STREAM， 0 );
...status = read( sock， buffer， buflen );

if (status > 0) {/ Data read from the socket /} else if (status == -1)

{/ Error， check errno， take action... /} else if (status ==

0) {/ Peer closed the socket， finish the close /close( sock );

/ Further processing... /}

Similarly, you can use the write API function to detect the closure of a peer socket. In this case, the SIGPIPE signal is received, or if the signal is blocked, the write function returns - 1 and sets errno to EPIPE.

Hidden trouble 3. Address usage error (eaddinuse)

3. Use so_ The reuseaddr socket option avoids address usage errors

int sock， ret， on;struct sockaddr_in servaddr;
/ Create a new stream (TCP) socket /sock =

socket( AF_INET， SOCK_STREAM， 0 ):

/ Enable address reuse /on = 1;

ret = setsockopt( sock， SOL_SOCKET， SO_REUSEADDR，

&on， sizeof(on) );/* Allow connections to

port 8080 from any available interface

*/memset( &servaddr， 0， sizeof(servaddr) );

servaddr.sin_family = AF_INET;

servaddr.sin_addr.s_addr = htonl( INADDR_ANY );

servaddr.sin_port = htons( 45000 );

/* Bind to the address (interface/port)

*/ret = bind( sock， (struct sockaddr *)&servaddr， sizeof(servaddr) );

In the application of so_ After the reuseaddr option, the bind API function allows immediate reuse of addresses.

4. Send structured data

Sockets are the perfect tool for sending unstructured binary byte streams or ASCII data streams, such as HTTP pages over HTTP or e-mail over SMTP. However, if you try to send binary data on a socket, things will become more complicated.

For example, you want to send an integer: are you sure that the receiver will interpret the integer in the same way? Applications running on the same architecture can rely on their common platform to make the same interpretation of this type of data. But what happens if a client running on a high priority IBM PowerPC sends a 32-bit integer to a low priority Intel x86? Byte alignment will cause incorrect interpretation.

Hidden trouble 5. Frame synchronization assumption in TCP

TCP does not provide frame synchronization, which makes it perfect for byte stream oriented protocols. This is an important difference between TCP and UDP (User Datagram Protocol). UDP is a message oriented protocol that preserves the message boundary between sender and receiver. TCP is a stream oriented protocol, which assumes that the data being communicated is unstructured.

5. Usage mode of tcpdump tool

Display all traffic on the eth0 interface for
the local host$ tcpdump -l -i eth0Show all traffic

on the network coming from or going

to host plato$ tcpdump host platoShow all HTTP traffic

for host camus$ tcpdump host camus and (port http)View

traffic coming from or going

to TCP port 45000 on the local host$ tcpdump tcp port 45000

The tcpdump and tcpflow tools have a number of options, including the ability to create complex filter expressions. See resources below for more information on these tools.