Yanyg - SAN Software Engineer

LINUX调度器/Linux Scheduler

目录

1 Scheduler Abbreviation

rt
realtime
dl
deadline
tsc
time stamp counter (x86 rdtsc)

2 Scheduler Config and Options

2.1 config

  • CONFIG_SCHEDSTATS

2.2 proc/sys/kernel/

2.3 boot arguments

schedstats
schedstats=[enable|disable]

3 Schedule initialize Trace

In start_kernel routine, call sched_init initialize scheduler:

asmlinkage __visible void __init start_kernel()
{
        ... ...
        /* trace_printk can be enabled here */
        early_trace_init();

        /*
         * Set up the scheduler prior starting any interrupts (such as the
         * timer interrupt). Full topology setup happens at smp_init()
         * time - but meanwhile we still have a functioning scheduler.
         */
        sched_init();
        /*
         * Disable preemption - early bootup scheduling is extremely
         * fragile until we cpu_idle() for the first time.
         */
        preempt_disable();
        ... ...
}

sched_init behavior relate to CONFIG_xx, make menuconfig directory: General Setup-> Control Group Support-> CPU Controller->Items..

sched_init inits all cpu run queue (rq) iterately:

init __init init_sched()
{
        ... ...
        for_each_possible_cpu(i) {
                struct rq *rq;

                rq = cpu_rq(i);
                raw_spin_lock_init(&rq->lock);
                rq->nr_running = 0;
                rq->calc_load_active = 0;
                rq->calc_load_update = jiffies + LOAD_FREQ;
                init_cfs_rq(&rq->cfs);
                init_rt_rq(&rq->rt);
                init_dl_rq(&rq->dl);

        }
        ... ...
}

Then call init_idle and init_sched_fair_class do bottom init work.

4 Schedule Routine Analysis

Here is the entry:

asmlinkage __visible void __sched schedule(void)
{
        struct task_struct *tsk = current;

        sched_submit_work(tsk);
        do {
                preempt_disable();
                __schedule(false);
                sched_preempt_enable_no_resched();
        } while (need_resched());
}
EXPORT_SYMBOL(schedule);

sched_submit_work checks rt_mutex and plugged IO:

static inline void sched_submit_work(struct task_struct *tsk)
{
        if (!tsk->state || tsk_is_pi_blocked(tsk))
                return;
        /*
         * If we are going to sleep and we have plugged IO queued,
         * make sure to submit it to avoid deadlocks.
         */
        if (blk_needs_flush_plug(tsk))
                blk_schedule_flush_plug(tsk);
}

If rt_mutex config is opened and rt_mutex blocked on, then stop schedule. Because if process scheduled out with holding an rt_mutex, the system may deadlocked. Here is an problem called invert priority.

#ifdef CONFIG_RT_MUTEXES
static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
{
        return tsk->pi_blocked_on != NULL;
}
#else
static inline bool tsk_is_pi_blocked(struct task_struct *tsk)
{
        return false;
}
#endif

blk_needs_flush_plug check current process plug state and list.

static inline bool blk_needs_flush_plug(struct task_struct *tsk)
{
        struct blk_plug *plug = tsk->plug;

        return plug &&
                (!list_empty(&plug->list) ||
                 !list_empty(&plug->mq_list) ||
                 !list_empty(&plug->cb_list));
}

blk_schedule_flush_plug builds an IO list then call blk_run_queue_async process IO asynchronously.

And now disable and do schedule:


5 References