高精度定时器Hrtimer

上一篇文章介绍了基于tick和时间轮的动态定时器。虽然时间轮的算法非常高效，但是tick的缺陷影响了动态定时器的精度。因此在Linux2.6.16加入了高精度定时器 High-resolution kernel timers，简称hrtimer。 hrtimer根据系统的配置和硬件提供更加精确的定时器解决方案。

hrtimer没有使用时间轮对定时器进行管理，而是选用了更加通用、性能稳定的红黑树。我们知道红黑树查找，插入和删除平均时间复杂度是O(logN)。虽然查找的时间复杂度达不到O(1)，但是避免了时间轮的迁移。因此在平均性能上红黑树和时间轮会较为接近。 hrtimer红黑树是在红黑树的基础上做了简单的封装，hrtimer红黑树节点使用数据结构struct timerqueue_node，比rb_node多了一个expires字段，用于记录超时时刻。

struct timerqueue_node {
    struct rb_node node;
    ktime_t expires;
};

hrtimer的数据结构与动态定时器的数据结构类似：

struct hrtimer {
    struct timerqueue_node          node;
    ktime_t                         _softexpires;
    enum hrtimer_restart            (*function)(struct hrtimer *);
    ......
};

hrtimer和动态定时器一样拥有超时计数字段_softexpires和超时后的回调函数。通过比较当前时间是否大于_softexpires来决定定时器超时，执行回调函数。在红黑树的node中同样保存了一份expires，node.expires大于等于_softexpires。这样做的目的是在高精度模式下，在_softexpires到node.expires期间设定定时器被触发的时间。

hrtimer管理器结构

struct hrtimer_cpu_base {
    struct hrtimer			    *running;
    unsigned int			    cpu;
    unsigned int			    active_bases;
    ktime_t				    expires_next;
    struct hrtimer			*next_timer;
    struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
    ......
} ____cacheline_aligned;

running字段指向正在执行的hrtimer。 next_timer字段指向第一个超时的hrtimer。 clock_base字段是当前CPU维护的定时器树。目前每个CPU都维护了若干组定时器树，其中包括单条递增时间HRTIMER_BASE_MONOTONIC,上墙时间HRTIMER_BASE_REALTIME等。

在struct hrtimer_clock_base结构体中包含结构struct timerqueue_head active。相对与红黑树的根节点多一个next字段，指向下一个即将到期或者已经到期的hrtimer的红黑树节点。当一个定时器到期，执行回调函数前，会在红黑树中找到下一个定时器节点，更新next字段，并删除当前定时器的红黑树节点，调整红黑树。

检查hrtimer函数会遍历每一组定时器树，通过next字段取出下一个定时器，直到定时器还未超时。下面是检查hrtimer函数的核心代码。

static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
{
    struct hrtimer_clock_base *base = cpu_base->clock_base;
    unsigned int active = cpu_base->active_bases;

    for (; active; base++, active >>= 1) {
	    struct timerqueue_node *node;
	    ktime_t basenow;

	    if (!(active & 0x01))
		    continue;

	    basenow = ktime_add(now, base->offset);

	    while ((node = timerqueue_getnext(&base->active))) {
		    struct hrtimer *timer;

		    timer = container_of(node, struct hrtimer, node);
 
		    if (basenow.tv64 < hrtimer_get_softexpires_tv64(timer))
			    break;

		    __run_hrtimer(cpu_base, base, timer, &basenow);
	    }
    }
}

hrtimer有两种精度模式，高精度模式和低精度模式。那么同样的数据设计，如何实现两种精度呢？这里的关键就是调用检查hrtimer函数的地方。我们来看看两种精度的调用栈。

低精度

timer_interrupt() -> update_process_times() -> run_local_timers() -> hrtimer_run_queues() -> __hrtimer_run_queues()
高精度

hrtimer_interrupt() -> __hrtimer_run_queues()

当hrtimer处于低精度模式时，每次irq0上的时间中断，即每次tick事件，调用一次检查hrtimer函数。 tick的频率是1000hz，此时，hrtimer处于低精度模式。在第一篇文章中，我们提到TSC、HPET都是可以提供纳秒级的时间设备。当hrtimer处于高精度模式时，Linux把hrtimer_interrupt()绑定到高精度的时间设备，这时就可以提供纳秒级的定时器服务了。但是频繁的调用检查hrtimer函数会非常消耗机器性能。为了避免这个缺陷，hrtimer_interrupt()的调用采用了one-shot的模式。每一次调用都会设定下一次调用的时间。

void hrtimer_interrupt(struct clock_event_device *dev)
{
    ....

    delta = ktime_sub(now, entry_time);
    
    if (delta.tv64 > 100 * NSEC_PER_MSEC)
	    expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
    else
	    expires_next = ktime_add(now, delta);
    tick_program_event(expires_next, 1);
}

在hrtimer_interrupt()后半段的代码中会取出下一次超时时间，把这个时间设定为下一次中断调用的时间。

另外并不是所有系统都支持hrtimer高精度定时器。因此，Linux开始会以低精度的hrtimer运行，然后在每一次的时间中断调用hrtimer_run_queues()时，根据系统状态切换为高精度的定时器。

void hrtimer_run_queues(void)
{
    ...
    if (tick_check_oneshot_change(!hrtimer_is_hres_enabled())) {
	    hrtimer_switch_to_hres();
	    return;
    }

    raw_spin_lock(&cpu_base->lock);
    now = hrtimer_update_base(cpu_base);
    __hrtimer_run_queues(cpu_base, now);
    raw_spin_unlock(&cpu_base->lock);
}

虽然Linux提供了hrtimer高精度的定时器，但是基于tick和时间轮的动态定时器并没有废除。

RangeChow

自律，坚持

高精度定时器Hrtimer