算法 - LINUX内核 - RADIX树实现分析

目录

1 介绍

RADIX-TREE,也称为Radix Trie或Compact Prefix Tree,中文翻译基数树,是一种空间优化的字典树,其每一个节点都是由子节点和它的父节点合并而成。其结果是,每个内部节点的子节点数目至少是基数数的基数r,其中r是一个正数,且为 2x,其中x≥1。有别于其他的字典树,RADIX树的边可以标记为元素序列或单个元素,这使得小集合场景下基数树更加有效,由于对于比较长的字符串,以及有比较长前缀的字符串。

普通的树在对比时,总是进行完整的键值比对(比如红黑树、AVL树都是这样)。RADIX树在进行比对时,键值在给定节点上进行位组比对,位组中位的数量是基数树的基数r。当r=2 时为二进制RADIX树,此时最小化稀疏度,有最大的字典深度,最大化到不可分割的键值位串长度。当r为2到4的整数幂时,基数树是r-进制字典树,它牺牲潜在的稀疏程度,获取更小的RADIX树深度。RADIX树在构造键值可表示为字符串的关联数组时非常有用。比如IP Routing领域、文本文档的倒排索引。

RADIX树支持插入、删除、查找三种操作。插入操作添加一个新字符串,同时尝试最小化内存占用。删除操作从树中删除一个字符串。查找操作包括精准查找、前任查找、后继查找,以及相同字符串前缀所有字符串查找。这些操作的时间复杂度是O(k),k是集合中所有字符串的最大长度,此长度用RADIX树基数r决定的位组数量决定的。

上述所用词汇字符串只是为描述方便,实际上可以用任何给定长度的数值作为键值,比如 Linux内核实现的RADIX树key值为unsigned long数值。另外,再次强调,RADIX树不同于其他树的最大特点是,它不在每个分支上比较整个键值,在进行搜索操作时,只需将键值的一部分与节点中存储的键值进行比较。

在LINUX内核中,块设备/文件系统页缓存、VMALLOC内存映射、BRD(Block RamDisk,块内存盘)、IDR基于RADIX树算法实现。

本文描述LINUX内核RADIX的实现,并以一个简单示例结束。

2 结构体描述

2.1 radix_tree_node

  • RADIX_TREE_MAP_SHIFT 定义基数r,内核通过CONFIG选项,可设置为4或6,默认为6;
  • RADIX_TREE_MAP_SIZE 定义为2RADIX_TREE_MAP_SHIFT,等于64。该数值定义指针数组 slot 数量;
  • RADIX_TREE_MAP_MASK 在提取特定位时使用,C语言通常使用移位与掩码操作实现位组提取;
  • RADIX_TREE_MAX_TAGS 定义为3,每个节点有3个tag。比如IDR使用了1个位IDR_FREE;
  • RADIX_TREE_TAG_LONGS 与基数r、unsigned long型别位数目相关相关,r等于6时,32

位平台结果为2,64位平台为1,因此数量总是与 RADIX_TREE_MAP_SIZE 一致;

  • RADIX_TREE_INDEX_BITS 基数数key值为unsigned long,因此bit数合计 8*sizeof(unsigned log)
  • RADIX_TREE_MAX_PATH 根据RADIX定义,可知最大深度为 ⌈RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT⌉{},64位下为11。
  • 结构体定义中,offset记录自己在parent中的偏移,count表示slots已占用数,slots保存

子节点地址或元素值,tags用于保存标记,每个slots元素有3个tag;其他后续函数操作中再分析。

#define RADIX_TREE_MAX_TAGS 3

#ifndef RADIX_TREE_MAP_SHIFT
#define RADIX_TREE_MAP_SHIFT    (CONFIG_BASE_SMALL ? 4 : 6)
#endif

#define RADIX_TREE_MAP_SIZE     (1UL << RADIX_TREE_MAP_SHIFT)
#define RADIX_TREE_MAP_MASK     (RADIX_TREE_MAP_SIZE-1)

#define RADIX_TREE_TAG_LONGS    \
        ((RADIX_TREE_MAP_SIZE + BITS_PER_LONG - 1) / BITS_PER_LONG)

#define RADIX_TREE_INDEX_BITS  (8 /* CHAR_BIT */ * sizeof(unsigned long))
#define RADIX_TREE_MAX_PATH (DIV_ROUND_UP(RADIX_TREE_INDEX_BITS, \
                                          RADIX_TREE_MAP_SHIFT))
struct radix_tree_node {
        unsigned char   shift;          /* Bits remaining in each slot */
        unsigned char   offset;         /* Slot offset in parent */
        unsigned char   count;          /* Total entry count */
        unsigned char   exceptional;    /* Exceptional entry count */
        struct radix_tree_node *parent;         /* Used when ascending tree */
        struct radix_tree_root *root;           /* The tree we belong to */
        union {
                struct list_head private_list;  /* For tree user */
                struct rcu_head rcu_head;       /* Used when freeing node */
        };
        void __rcu      *slots[RADIX_TREE_MAP_SIZE];
        unsigned long   tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};

**

/*
 * The bottom two bits of the slot determine how the remaining bits in the
 * slot are interpreted:
 *
 * 00 - data pointer
 * 01 - internal entry
 * 10 - exceptional entry
 * 11 - this bit combination is currently unused/reserved
 *
 * The internal entry may be a pointer to the next level in the tree, a
 * sibling entry, or an indicator that the entry in this slot has been moved
 * to another location in the tree and the lookup should be restarted.  While
 * NULL fits the 'data pointer' pattern, it means that there is no entry in
 * the tree for this index (no matter what level of the tree it is found at).
 * This means that you cannot store NULL in the tree as a value for the index.
 */
#define RADIX_TREE_ENTRY_MASK           3UL
#define RADIX_TREE_INTERNAL_NODE        1UL

2.2 radix_tree_root

radix树根节点。gfp_mask 指定从哪个内存域分配内存,rnode指向第一个节点;

struct radix_tree_root {
        gfp_t                   gfp_mask;
        struct radix_tree_node  __rcu *rnode;
};

3 插入操作

radix_tree_insert 是一个包装函数,将任务派发给 __radix_tree_insert

static inline int radix_tree_insert(struct radix_tree_root *root,
                        unsigned long index, void *entry)
{
        return __radix_tree_insert(root, index, 0, entry);
}

__radix_tree_insert 插入键值为index的元素item。

/**
 *      __radix_tree_insert    -    insert into a radix tree
 *      @root:          radix tree root
 *      @index:         index key
 *      @order:         key covers the 2^order indices around index
 *      @item:          item to insert
 *
 *      Insert an item into the radix tree at position @index.
 */
int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
                        unsigned order, void *item);

首先检查节点是否为内部节点。RADIX树用指针的低两位保存额外的信息。item低两位用于标识内部节点,或EXCEPTIONAL(用于tmpfs)。

int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
                        unsigned order, void *item)
{
        struct radix_tree_node *node;
        void __rcu **slot;
        int error;

        BUG_ON(radix_tree_is_internal_node(item));

        error = __radix_tree_create(root, index, order, &node, &slot);
        if (error)
                return error;

        error = insert_entries(node, slot, item, order, false);
        if (error < 0)
                return error;

4 API

4.1 radix_tree_lookup

5 Data Structures

**

#+END_SRC

6 Reference