1. Problem description
In the previous article( Understanding and use of zmq module )I have written that zmq there are three modes to choose from. I mainly use PUB-SUB mode in my work. The usage scenario mainly includes one server and multiple client s:
server side: read the video, detect and track the target of each picture, and broadcast the detection and tracking results in PUB mode
client side: there are multiple clients. The SUB mode is adopted to receive the data from the PUB side, and then process the data
Recently, I encountered a problem that sometimes the client can't receive the data from the server, and the program can't catch any exceptions. I found some information on the Internet. Due to the unstable network environment, the tcp connection at the bottom of zmq is unstable and can not return to the disconnected state, resulting in the failure of zmq automatic reconnection mechanism. There are two types of solutions:
1. Adopt TCP Keepalive provided by zmq
2. Implement the heartbeat mode by yourself and reconnect after timeout
After comparison, it is decided to adopt heartbeat mode, which is more flexible and reliable.
2. Solution testing
Write a simulation code to receive orders, and plan to test it for a few days to observe the effect
server:
The server uses two topics. One topic is used to send heartbeat data (every 1 second) and the other topic is used to send service data (target detection). The server example code is as follows:
import zmq import time import random def start_server(topics, url, port): ctx = zmq.Context() send_scoket = ctx.socket(zmq.PUB) responseUrl = "tcp://{}:{}".format(url, port) print("bind to: {}".format(responseUrl)) send_scoket.bind(responseUrl) last_heartbeat = time.time() i = 1 while True: # Send heartbeat data every 1 second if time.time()-last_heartbeat > 1: send_scoket.send_multipart([topics[1].encode("utf-8"), b'heartbeat']) last_heartbeat = time.time() print(i, "send heartbeat") # Send detection data with a certain probability to simulate video target detection if random.random() < 0.2: detection_message = "message{} for {}".format(i, topics[0]) send_scoket.send_multipart([topics[0].encode("utf-8"), detection_message.encode("utf-8")]) print(i, "send detection_message") i += 1 time.sleep(0.5) if __name__ =="__main__": topics = ['detection', 'heartbeat'] url = "127.0.0.1" port = 4488 start_server(topics, url, port)
client:
The client subscribes to the two topic s of the server and receives the heartbeat data and business data of the server at the same time. Each time, it judges how long it has not received the heartbeat data. If the heartbeat data times out, it reconnects. The example code adopted by the client is as follows:,
import zmq import time def start_client1(topics, url, port): ctx = zmq.Context() recv_scoket = ctx.socket(zmq.SUB) requestUrl = "tcp://{}:{}".format(url, port) print("connect to: {}".format(requestUrl)) recv_scoket.connect(requestUrl) for topic in topics: recv_scoket.subscribe(topic) last_heartbeat = 0 while True: # If the heartbeat data of the server is not received within 30 seconds, reconnect if last_heartbeat != 0 and time.time() - last_heartbeat > 30: recv_scoket.disconnect(requestUrl) recv_scoket.connect(requestUrl) for topic in topics: recv_scoket.subscribe(topic) print("Reconnect pub server") time.sleep(2) # Retry the connection every 2 seconds. Too frequent reconnection will result in failure to receive data try: data = recv_scoket.recv_multipart(flags=1) datatopic = data[0].decode() if datatopic.startswith("heartbeat"): # After receiving the heartbeat data, update the heartbeat received time last_heartbeat = time.time() print("receive message: ", data[1].decode("utf-8")) except zmq.error.Again as e: # print(e) pass if __name__ == "__main__": topics = ['detection', 'heartbeat'] url = "192.168.2.139" port = 4488 start_client1(topics, url, port)
Reference documents:
Reference 1: (PUB/SUB) Sub Silent Disconnect on Unreliable Connection · Issue #1199 · zeromq/libzmq · GitHub
Reference 2: https://blog.csdn.net/bubbleyang/article/details/107559224
Reference 3: https://blog.csdn.net/sinat_36265222/article/details/107252069