Secret of Xinye voice service architecture

Secret of Xinye voice service architecture


Telemarketing and post loan contact are an important part of the company's business. In order to realize the function of voice call, the company has a self-developed voice system to support it. Are you curious about how the voice system is realized?

With the rise of the Internet, VoIP (network telephone) has also developed. VoIP transmits voice and video through the Internet. For example, the voice and video calls of wechat are realized through VoIP. Therefore, VoIP has low cost and price, and can also realize many functions that cannot be realized by traditional telephone. It can also be connected with PSTN (traditional telephone network) through conversion gateway The company's internal voice system is based on VoIP.

Voice service architecture description

1, Related agreements

Because the basic voice service is implemented based on VoIP technology, session control and media transmission need to follow a standard protocol.

Session control

During a call, the session is controlled through session initiation protocol (SIP, RFC 3261).

SIP is an application layer signaling control protocol. Its main purpose is to establish, modify and release the application layer protocol of multimedia session in IP network. Its main applications include but are not limited to voice, message, video, call control, etc. Session participants can communicate through multicast, unicast, or a mixture of both.
The business model of SIP is a point-to-point protocol, in which there are two elements - SIP user agent (UA) and SIP network server.

Transmission of media stream

The media stream (voice stream, video stream, etc.) during a call is transmitted through Real-time Transport Protocol (RTP).

RTP is a network transmission protocol, which was published in RFC 1889 by the multimedia transmission working group of IETF in 1996. Is a network transmission protocol, which was published in RFC 1889 by the multimedia transmission working group of IETF in 1996.

RTP protocol specifies the standard packet format for transmitting audio and video on the Internet. It was originally designed as a multicast protocol, but later it was used in many unicast applications. RTP protocol is often used in streaming media system (with RTSP protocol), video conference and Push to Talk system (with H.323 or SIP), making it the technical basis of IP telephony industry. RTP protocol is used with RTP control protocol RTCP, and it is created on UDP protocol.

SIP signaling composition

  1. SIP signaling can be transmitted based on TCP or UDP
  2. SIP signaling can be divided into request (left: request line) and response (right: status line), which are determined by the starting line
  3. The Message Header part is used to store routing information and information to be processed in the session
  4. The messenger body (payload) stores media information, some text and XML information

  1. The request signaling contains a Method field to determine the type of request. Different requests have different functions, such as INVITE, ACK, BYE, CANCEL, OPTIONS, MESSAGE, INFO, REGISTER, etc.
  2. The response signaling contains a status code field to identify the response code. Different response codes also have different meanings, such as 100, 180, 183, 200, 400, 407, 487, 500, 503, etc.

  1. Record route, Via and Contact contain the call route and the call transmission path, which allows the response to find the return path.
  2. From and To contain the calling and called information of the call
  3. Some data at the beginning of X - are user-defined parameters that can be used to transfer business parameters

The SDP carried in the Message Body marks the media information, and the media stream carried through the RTP will be transmitted according to this information.

Media information:

  • Server address: IPv4
  • Media type: audio
  • Media port: 12116
  • Media protocol: RTP/AVP
  • Speech coding: G.711 PCMU

Voice call signaling process

  1. 215 sends an INVITE request to 217 to make a call
  2. 217 replies to 215 with a response code of 100, indicating that I have received the request and am processing it. Please wait
  3. 217 replies to 215 with a response code of 183, indicating that I have started ringing
  4. Here 183 carries the SDP and negotiates the media information with the SDP in INVITE. The voice stream can be transmitted, and the voice stream is transmitted through RTP
  5. 217 replies to 215 with a response code of 200, indicating that I have answered the phone
  6. 215 sends an ACK request to 217, indicating that the INVITE request processing is completed
  7. 217 sends a BYE request to 215, indicating that I hang up (which party hangs up first and which party initiates the request)
  8. 215 replies to 217 with a response code of 200, indicating that I know. At this time, both parties hang up and the call ends

Note: the response code of 180 also indicates ringing, but 180 does not carry SDP information. At this time, no voice channel is established. The voice channel will not be opened until it is answered (200OK). Therefore, the RBT tone can be heard when replying to 183.

2, Architecture description

Registration server

The registration server is developed based on the open source SIP server OpenSIPS, which is mainly used for extension management, authentication and registration, proxy forwarding of SIP signaling and RTP voice stream, and load balancing of media server.

The customer service of telemarketing is registered to the registration server through the telephone terminal (webphone / soft phone / hard phone). After successful registration, voice calls can be made through the telephone terminal.

media server

The media server is developed based on the open source soft switch FreeSWITCH, which mainly controls the call logic and provides various call related information.

When the telemarketing agent initiates a call, the registration server forwards the call agent to the media server. The media server controls the call, records the call, and returns the call record to the business system through the interface.

SBC gateway proxy server

The SBC gateway proxy server is developed based on the open source SIP server OpenSIPS. It can control call concurrency, connect the line through SIP protocol, forward SIP signaling and RTP voice flow through proxy, and balance the load of the line.

The SBC load server can connect with the line server based on SIP protocol. If it is a traditional telephone network, it needs to be converted into a line supporting SIP connection through the conversion gateway. When the media server needs to make an external call, the SBC gateway agent will forward it to the corresponding line through the routing load agent according to the information carried in the SIP signaling sent by the media server, and finally Complete the outgoing call.

System docking

The voice service provides a set of API interfaces to connect with other systems and access voice services.

3, Description of relevant technical points

Softphone / hard phone

Softphone is a kind of software phone installed on PC, mobile phone and other terminal devices. Softphone has similar interface and functions with real phone. When connected to the Internet, you can register on the SIP server to complete the functions of making and receiving calls.

The hard phone here is basically the same as the ordinary landline, but it can be connected to the Internet and registered on the SIP server. After configuration, it is no different from the traditional landline.


WebPhone is implemented based on WebRTC technology and runs on the browser. Compared with soft phone / hard phone, it does not need cumbersome installation and configuration. As long as the browser supports WebRTC, it can be opened and used across platforms, and there is basically no cross platform problem.

WebRTC, whose name is derived from the abbreviation of Web real time communication, is an API that supports real-time voice conversation or video conversation with web browsers. It was open source on June 1, 2011 and was included in the W3C recommendation of the World Wide Web Alliance with the support of Google, Mozilla and Opera.


OpenSIPS is a mature open source SIP server. It not only provides basic SIP proxy and sip routing functions, but also provides some application level functions. The structure of OpenSIPS is very flexible. Its core routing function is fully realized through scripts. It can flexibly customize various routing strategies and be flexibly applied to voice, video communication, IM, Presence and other applications. At the same time, OpenSIPS is one of the fastest SIP servers at present, which can be used for the construction of carrier grade products.


FreeSWITCH is an open source telephone switching platform with strong scalability – from a simple softphone client to operator level Softswitch devices. It can run natively on many 32 / 64 bit platforms such as Windows, Max OS X, Linux, BSD and solaris. It can be used as a simple switching engine, a PBX, a media gateway or a server supporting IVR. It supports sip, H323, Skype, Google Talk and other protocols, and can easily communicate with various open source PBX systems such as sipXecs, Call Weaver, Bayonne, YATE and Asterisk. FreeSWITCH follows RFC and supports many advanced SIP features, such as presence, BLF, SLA, TCP, TLS and sRTP. It can also be used as a transparent SIP proxy for SBC to support other media, such as T.38. FreeSWITCH supports wideband and narrowband speech coding, and the teleconference bridge can support 8, 12, 16, 24, 32 and 48kHZ speech at the same time.

4, Call processing logic

When a call is initiated from the front-end webphone, an INVITE request will be sent to the registration server (OpenSIPS)

if (is_method("INVITE")) {
      # Registration certification is required
      if (!is_registered("location", "$fu")) {
        send_reply("503", "Attack!");
      $var(signid) = $(hdr(X-Sign-ID)[0]);
      xlog("call-id:$ci, signid:$var(signid), call from agent[$fU] to freeswitch[$tU], set AGENT_TO_FS");
      # incoming from user lb to freeswitch
      lb_start("100", "pstn", "rs");
      switch ($retcode) {
        case -1:
          xlog("call-id:$ci, System error, unable to find freeswitch");
          send_reply("500", "System Error");
        case -2:
          xlog("call-id:$ci, lookup freeswitch Failure, freeswitch The concurrency of is full");
          send_reply("500", "Service Full");
        case -3:
          xlog("call-id:$ci, lookup freeswitch Failed, no available freeswitch");
          send_reply("500", "Service Down");
        case -4:
          xlog("call-id:$ci, lookup freeswitch Failure, freeswitch not configured pstn resources");
           send_reply("500", "No Resource");
      xlog("call-id:$ci, call to freeswitch[$du] success");

The call initiated by webphone will pass through is first_ Registered for registration authentication. If the extension number is registered, the call will pass lb_start (load balancing) is forwarded to the corresponding media server, and the unregistered extension will be deemed as a hack.

<extension name="decodeCallOut">
      <condition field="destination_number" expression="^de(.*)$">
        <action application="set" data="sip_h_X-accountcode=${accountcode}" />
        <action application="set" data="sip_h_X-Tag=" />
        <action application="set" data="callType=${sip_h_X-Call-Type}"/>
        <action application="set" data="serviceType=1"/>
        <action application="set" data="call_direction=outbound" />
        <action application="set" data="effective_caller_id_number=${outbound_caller_id_number}" />
        <action application="lua" data="${ivrpath}/callout.lua"/>

When the phone arrives at the media server, the media server (FreeSWITCH) will encapsulate each call into a session. First, it will preprocess it through diallan and transfer the phone control to the script of callout.lua. FreeSWITCH will analyze and store the call information in the session. In Lua, the call process can be controlled through session

local caller = session:getVariable("caller_id_number")
local destination_number = session:getVariable("destination_number")

The number of the primary callee can be obtained from the channel

function createCall(callNumber, shareChannelVariable)
    callString = shareChannelVariable .. displayNumber .. "}sofia/gateway/"
    return callString .. gateway .. routeGroupId .. gwPrefix .. callNumber

local channel_variable = "{origination_uuid=" .. uuid .. ",park_after_bridge=true,extension_number=" .. caller .. ",call_number=" .. encodeCallee .. ",sip_h_X-Is-Public=" .. isPublic .. ",origination_caller_id_number="

if session:ready() then
    local dialString = createCall(encodeCallee, channel_variable)
    session:setVariable("media_bug_answer_req", "true")
    session:execute("record_session", recordPath)
    session:setVariable("recordPath", recordPathPara)
    showLog("info", "dialString", dialString)
    session:execute("bridge", dialString)
    local legB = freeswitch.Session(uuid)
    if legB:ready() and not legB:answered() then
        showLog("info", "waiting exit", "waiting exit for detect thread")
    if legB:answered() then
  • Use the ready function to ensure that the current call is ready for control
  • The call can be recorded through the record_session command
  • You can hang up the phone actively through the hangup function

When everything is ready, you need to really initiate a call request to the line and initiate a call request to the SBC server through the bridge command

if (is_method("INVITE")) {
    # Check call direction freeswitch
    if (lb_is_destination("$si", "$sp", "100")) {
      # incoming from freeswitch lb to gw
      xlog("call-id:$ci, call from freeswitch to gateway, set FS_TO_GW");
      $var(signid) = $(hdr(X-Sign-ID)[0]);
      xlog("call-id:$ci, signid:$var(signid), call from $fU, to $tU");
      $var(callee) = $ruri.user;
      # Merchant prefix - > routing group ID
      $var(prefix) = $(var(callee){s.substr,0,2});
      xlog("call-id:$ci, prefix: $var(prefix), number: $var(callee)");

      lb_start("$(var(prefix){})", "pstn", "rs");
      switch ($retcode) {
        case -1:
          xlog("call-id:$ci, Failed to find gateway, system error");
          send_reply("500", "System Error");
        case -2:
          xlog("call-id:$ci, Failed to find gateway. All gateways are full");
          send_reply("500", "Service Full");
        case -3:
          xlog("call-id:$ci, Failed to find gateway. No gateway is available");
          send_reply("500", "Service Down");
        case -4:
          xlog("call-id:$ci, Failed to find gateway, no pstn Gateway resources");
          send_reply("500", "No Resource");

      # Called number (without merchant prefix)
      $var(callee) = $(var(callee){s.substr,2,0});
      # The value of info is "line prefix, line calling authentication, line domain"
      dp_translate("$(var(prefix){})", "$du/$var(info)");
      xlog("call-id:$ci, select perfix and caller is $var(info)\n");
      # Split info with ","
      # Line prefix
      $var(gw_prefix) = $(var(info){,0,,});
      # Line calling authentication
      $var(gw_caller) = $(var(info){,1,,});
      # Line domain
      $var(dest_uri) = $du;
      $var(dest_domain) = $(var(dest_uri){s.substr,4,0});
      # Splice request uri
      $ruri = "sip:" + $var(gw_prefix) + $var(callee) + "@" + $var(dest_domain);
      # Modify to
      xlog("call-id:$ci, new to: $ruri");
      # Modify from
      if ($var(gw_caller) != "" && $var(gw_caller) != null) {
        if (isflagset(FROM_LOCAL)) {
          uac_replace_from("$var(gw_caller)", "sip:$var(gw_caller)@LocalIpV4");
          xlog("call-id:$ci, new from: sip:$var(gw_caller)@LocalIpV4");
        } else {
          uac_replace_from("$var(gw_caller)", "sip:$var(gw_caller)@NetIpV4");
          xlog("call-id:$ci, new from: sip:$var(gw_caller)@NetIpV4");
      xlog("call-id:$ci, call to gateway success");
    } else {
        xlog("Attack from $si:$sp!!!");
        send_reply("500", "Attack!!");

    # account only INVITEs

When the phone arrives at the SBC gateway server, it will find the real line through lb_start according to the preset information, and finally deliver the phone to the called mobile phone. If the network and line are normal, the called mobile phone will start ringing and the phone call is successful.

Voice service application

Besides telemarketing and post loan contact, voice services also have many other applications within the company:

Idle stop detection

The system automatically dials the phone, and obtains the called status by analyzing the sounds before dialing the phone, such as long beeping back tone, short beeping busy tone, color ring, empty number, in call, shutdown and other prompt tones given by the operator's network, so as to eliminate those empty numbers and shutdown numbers and improve the efficiency of telephone sales and post loan contact.

Zhiniu voice robot

The voice system is connected with ASR (voice to text), TTS (text to voice) and NLP (natural language understanding) services to realize automatic voice robot. Without manual access, it can complete the tasks of telemarketing and post loan contact.

Telephone alarm

When the voice system is connected to TTS, it can automatically make a call, play a text or preset voice to the user, and can be used for alarm and reminder.

Author introduction

Passerby, technical expert of technology output team.

Tags: lua sip

Posted on Thu, 04 Nov 2021 04:22:13 -0400 by aesir5