XFRM framework into Linux kernel

Netfilter network framework into Linux kernel

Initial publication Nuggets

The author has done some research on Linux kernel related modules to realize kernel level communication encryption and video stream encryption, including Linux kernel network protocol stack, Linux kernel communication module, Linux kernel encryption module, secret key generation and distribution, etc.
Consider setting up a Linux kernel column in the future.

The previous article discussed the basic architecture and implementation principle of Netfilter subsystem in Linux kernel. This article will discuss another important subsystem of Linux kernel - XFRM framework.

Let's start with talent and take you into the XFRM framework of the Linux kernel.

Overview: what is the XFRM framework

The correct pronunciation of XFRM is transform, which means that the IPsec message received by the kernel protocol stack needs to be converted before it can be restored to the original message;

Similarly, the original message to be sent needs to be converted into IPsec message before it can be sent.

IPsec (Internet Protocol Security) should be heard by many people. IPsec is a group of protocols. They authenticate and encrypt each packet in the communication session to ensure the security of IP traffic.

XFRM framework is the "infrastructure" of IPsec, which is implemented through XFRM framework. XFRM comes from the USAGI project, which aims to provide IPv6 and IPsec protocol stacks suitable for production environments. XFRM framework has been introduced since kernel 2.5. This "infrastructure" is independent of the protocol cluster and contains common parts that can be applied to IPv4 and IPv6 at the same time. It is located in the net/xfrm / directory of the source code.

The xfrm framework supports network namespaces. This is a lightweight process virtualization, which can make one or a group of processes have their own network stack. Each network namespace contains a member named xfrm -- a netns_xfrm structure instance. This object contains many data structures and variables, such as xfrm policy hash, xfrm status hash, sysctl parameters, xfrm status garbage collector, counters, etc.

netns_xfrm structure definition, file path include/net/netns/xfrm.h

struct netns_xfrm {
        struct hlist_head       *state_bydst;
        struct hlist_head       *state_bysrc;
        struct hlist_head       *state_byspi;
        . . .
        unsigned int            state_num;
        . . .
        struct work_struct      state_gc_work;
        . . .
        u32                     sysctl_aevent_etime;
        u32                     sysctl_aevent_rseqth;
        int                     sysctl_larval_drop;
        u32                     sysctl_acq_expires;

XFRM Init: XFRM Init

In IPv4, XFRM is initialized through IP_ rt_ The init() function (located in the net/ipv4/route.c file) calls related functions to complete the function. The function call structure is: ip_rt_init()->XFRM4_ init()->XFRM_ init().

In IPv6, IPv6_ route_ Call XFRM6_ in init () function The init () method implements the initialization of XFRM.

Communication between user space and the kernel creates NETLINK_XFRM type uses netlink sockets as well as sending and receiving netlink messages. Kernel NETLINK_XFRM Netlink socket is created in the following function.

static int __net_init xfrm_user_net_init(struct net *net)
        struct sock *nlsk;
        struct netlink_kernel_cfg cfg = {
                .groups = XFRMNLGRP_MAX,
                .input  = xfrm_netlink_rcv,
        nlsk = netlink_kernel_create(net, NETLINK_XFRM, &cfg);
        . . .
        return 0;

Messages sent from user space (such as XFRM_MSG_NEWPOLICY creating a new security policy or XFRM_MSG_NEWSA creating a new security alliance) will be xfrm_netlink_rcv() method, which in turn is xfrm_user_rcv_msg() method call (netlink socket was discussed in Chapter 2).

XFRM policy and XFRM state are the basic data structures of XFRM framework. Next, I will introduce what is XFRM policy and XFRM state.

XFRM Policy: XFRM Policy

A security policy is a rule that tells IPsec whether a particular traffic should be processed or bypassed, XFRM_ The policy structure is used to describe IPsec policies. A security policy contains a selector (a xfrm_selector object). A policy is provided when its selector matches a stream. The XFRM selector consists of a series of attributes, such as source and destination address, source and destination port, protocol, etc. these attributes are used to identify a stream:

File path: include/uapi/linux/xfrm.h

struct xfrm_selector {
        xfrm_address_t  daddr;
        xfrm_address_t  saddr;
        __be16  dport;
        __be16  dport_mask;
        __be16  sport;
        __be16  sport_mask;
        __u16   family;
        __u8    prefixlen_d;
        __u8    prefixlen_s;
        __u8    proto;
        int     ifindex;
        __kernel_uid32_t        user;

xfrm_selector_ The match () method uses XFRM selector, flow and family (IPv4 corresponds to AF_INET and IPv6 corresponds to AF_INET6) as parameters. When a specific XFRM traffic matches a specific selector in the, it returns true. Attention XFRM_ The selector structure is also used in XFRM state.

Security Policy uses xfrm_policy structure representation, xfrm_ The policy structure is used to describe the specific implementation of SP in the kernel:

File path: include/net/xfrm.h

struct xfrm_policy
 struct xfrm_policy *next; // Next strategy
 struct hlist_node bydst; // Linked list of HASH by destination address
 struct hlist_node byidx; // Linked list by index number HASH
 /* This lock only affects elements except for entry. */
 rwlock_t  lock;  // Policy structure lock
 atomic_t  refcnt; // Number of references
 struct timer_list timer; // Policy timer
 u8   type;     // type
 u32   priority; // Policy priority
 u32   index;    // Policy index number
 struct xfrm_selector selector; // Selector
 struct xfrm_lifetime_cfg lft;     // Policy life cycle
 struct xfrm_lifetime_cur curlft;  // Current lifecycle data
 struct dst_entry       *bundles;  // Routing linked list
 __u16   family;   // protocol family
 __u8   action;   // Policy action, accept / encrypt / block
 __u8   flags;    // sign
 __u8   dead;     // Strategic death flag
 __u8   xfrm_nr;  // Xfrm used_ Number of VECs
 struct xfrm_sec_ctx *security; // Security context
 struct xfrm_tmpl        xfrm_vec[XFRM_MAX_DEPTH]; // Status template


There are many fields in this structure, but most of them do not need to be concerned. We focus on the following fields:

  • selector: indicates the characteristics of the flow matched by the Policy
  • action: the value is XFRM_POLICY_ALLOW(0) or XFRM_POLICY_BLOCK(1), the former indicates that the traffic is allowed, and the latter indicates that it is not allowed.
  • xfrm_nr: indicates the number of templates associated with this Policy. template can be understood as xfrm_ Simplified version of state, xfrm_nr determines the number of times the traffic is converted. Usually, this value is 1
  • xfrm_vec: represents the template associated with this Policy. Each element of the array is xfrm_tmpl, an xfrm_tmpl can be restored to a complete state

You can list xfrm on the current host through the following command_ policy

ip xfrm policy ls
src dst uid 0
	dir out action allow index 5025 priority 383615 ptype main share any flag  (0x00000000)
	lifetime config:
	  limit: soft (INF)(bytes), hard (INF)(bytes)
	  limit: soft (INF)(packets), hard (INF)(packets)
	  expire add: soft 0(sec), hard 0(sec)
	  expire use: soft 0(sec), hard 0(sec)
	lifetime current:
	  0(bytes), 0(packets)
	  add 2019-09-02 10:25:39 use 2019-09-02 10:25:39
	tmpl src dst
		proto esp spi 0xc420a5ed(3290473965) reqid 1(0x00000001) mode tunnel
		level required share any 
		enc-mask ffffffff auth-mask ffffffff comp-mask ffffffff

XFRM State: XFRM State

Structure xfrm_state indicates IPsec Security Association (include/net/xfrm.h). It represents one-way traffic, including encryption key, flag, request ID, statistics, playback parameters and other information. To add XFRM status, send the request XFRM from the user space socket_ MSG_ NewsA, in the kernel, this request method is provided by xfrm_state_add() processing (in the file net/xfrm/xfrm_user.c). Similarly, to delete the status, send XFRM_ MSG_ NEWSAXFRM_ MSG_ Delta message. In the kernel, this request method is provided by xfrm_del_sa() processing.
xfrm_ The state structure is used to describe the specific implementation of SA in the kernel:

struct xfrm_state\
 /* Note: bydst is re-used during gc */\
// Each state structure is linked to three HASH linked lists\
 struct hlist_node bydst; // HASH by destination address\
 struct hlist_node bysrc; // HASH by source address\
 struct hlist_node byspi; // HASH by SPI value

 atomic_t  refcnt; // All usage counts\
 spinlock_t  lock;   // State lock

 struct xfrm_id  id; // ID structure, i.e. destination address, SPI, protocol triplet\
 struct xfrm_selector sel; // Status selector

 u32   genid; // Status flag value to prevent collision

 /* Key manger bits */\
 struct {\
  u8  state;\
  u8  dying;\
  u32  seq;\
 } km;  // KEY callback management processing structure parameters

 /* Parameters of this state. */\
 struct {\
  u32  reqid; // Request ID\
  u8  mode;  // Mode: transmission / Channel\
  u8  replay_window; // Playback window\
  u8  aalgo, ealgo, calgo; // Authentication, encryption, compression algorithm, ID value\
  u8  flags; // Some standards\
  u16  family; // Protocol family\
  xfrm_address_t saddr;  // Source address\
  int  header_len;  // Added protocol header length\
  int  trailer_len; //\
 } props; // SA related parameter structure

 struct xfrm_lifetime_cfg lft; // Lifetime configuration

 /* Data for transformer */\
 struct xfrm_algo *aalg; // hash algorithm\
 struct xfrm_algo *ealg; // Encryption algorithm\
 struct xfrm_algo *calg; // compression algorithm

 /* Data for encapsulator */\
 struct xfrm_encap_tmpl *encap; // NAT-T encapsulation information

 /* Data for care-of address */\
 xfrm_address_t *coaddr;

 /* IPComp needs an IPIP tunnel for handling uncompressed packets */\
 struct xfrm_state *tunnel;  // Channel, actually another SA

 /* If a tunnel, number of users + 1 */\
 atomic_t  tunnel_users; // Number of channels used

 /* State for replay detection */\
 struct xfrm_replay_state replay; // Playback detection structure, including various serial number masks and other information

 /* Replay detection state at the time we sent the last notification */\
 struct xfrm_replay_state preplay; // Last playback record value

 /* internal flag that only holds state for delayed aevent at the\
  * moment\
 u32   xflags; // sign

 /* Replay detection notification settings */\
 u32   replay_maxage; // Maximum playback interval\
 u32   replay_maxdiff; // Playback maximum difference

 /* Replay detection notification timer */\
 struct timer_list rtimer; // Playback detection timer

 /* Statistics */\
 struct xfrm_stats stats; // Statistical value

 struct xfrm_lifetime_cur curlft; // Current time counter\
 struct timer_list timer;  // SA timer

 /* Last used time */\
 u64   lastused; // Last used

 /* Reference to data common to all the instances of this\
  * transformer. */\
 struct xfrm_type *type;  // Protocol, ESP/AH/IPCOMP\
 struct xfrm_mode *mode;  // Mode, channel or transmission

 /* Security context */\
 struct xfrm_sec_ctx *security; // Security context, used for encryption

 /* Private data of this transformer, format is opaque,\
  * interpreted by xfrm_type methods. */\
 void   *data; // Internal data\

xfrm_state contains many fields, which are not posted here. Only the most important fields are listed:

  • id: it is an xfrm_id structure, including the destination address, SPI, and protocol (AH/ESP) of the SA
  • props: indicates other properties of the SA, including IPsec Mode(Transport/Tunnel), source address and other information

Each xfrm_state will add multiple hash tables to the kernel, so the kernel can find the same SA from multiple features:

  • xfrm_state_lookup(): find SA through the specified SPI information
  • xfrm_state_lookup_byaddr(): find SA by source address
  • xfrm_state_find(): find SA by destination address

You can list xfrm on the current host through the following command_ state

ip xfrm state ls
src dst
	proto esp spi 0xc420a5ed(3290473965) reqid 1(0x00000001) mode tunnel
	replay-window 0 seq 0x00000000 flag af-unspec (0x00100000)
	auth-trunc hmac(sha256) 0xa65e95de83369bd9f3be3afafc5c363ea5e5e3e12c3017837a7b9dd40fe1901f (256 bits) 128
	enc cbc(aes) 0x61cd9e16bb8c1d9757852ce1ff46791f (128 bits)
	anti-replay context: seq 0x0, oseq 0x1, bitmap 0x00000000
	lifetime config:
	  limit: soft (INF)(bytes), hard (INF)(bytes)
	  limit: soft (INF)(packets), hard (INF)(packets)
	  expire add: soft 1004(sec), hard 1200(sec)
	  expire use: soft 0(sec), hard 0(sec)
	lifetime current:
	  84(bytes), 1(packets)
	  add 2019-09-02 10:25:39 use 2019-09-02 10:25:39
	  replay-window 0 replay 0 failed 0

XFRM template: XFRM TMPL

xfrm template structure, used to query status and policy:

struct xfrm_tmpl\
/* id in template is interpreted as:\
 * daddr - destination of tunnel, may be zero for transport mode.\
 * spi   - zero to acquire spi. Not zero if spi is static, then\
 *    daddr must be fixed too.\
 * proto - AH/ESP/IPCOMP\
// SA triplet, destination address, protocol, SOI\
 struct xfrm_id  id;

/* Source address of tunnel. Ignored, if it is not a tunnel. */\
// Source address\
 xfrm_address_t  saddr;

// Request ID\
 __u32   reqid;

/* Mode: transport, tunnel etc. */\
 __u8   mode;

/* Sharing mode: unique, this session only, this user only etc. */\
 __u8   share;

/* May skip this transfomration if no SA is found */\
 __u8   optional;

/* Bit mask of algos allowed for acquisition */\
 __u32   aalgos;\
 __u32   ealgos;\
 __u32   calgos;\

Extended reading

Proficient in Linux kernel

Linux kernel network: implementation and theory

IPSEC implementation in Linux kernel

xfrm framework

Tags: Linux kernel udp TCP/IP

Posted on Tue, 21 Sep 2021 03:17:43 -0400 by awpti