CPU Schedulers

Published on
33 min read––– views

Covenants, without the sword, are but words and of no strength to secure a man at all.

– Thomas Hobbes1

Continuing in the series on systems architecture, this post aims to explain what CPU scheduling is, how it works, and why we care.

Scheduling

Scheduling refers to the technique(s) employed by an operating system to determine which process gets to use the CPU at any given time based on criteria including:

  • CPU utilization,
  • throughput,
  • turnaround time,
  • waiting time,
  • response time.

There are a number of scheduling algorithms we might devise to allocate CPU resources to a collection of processes that need to be executed, the simplest being a naïve First Come First Served approach wherein compute is granted to the first process that requests the CPU.

CPU Allocation

Before specifying how this might work in practice, we'll first define what it means to allocate the CPU to a process.

When a program is executed, it becomes a process at which point the OS reserves a memory block for it.

Within this block of memory, there's a segment called the TEXT section which contains the executable code of the program.

Recall that the CPU tracks a special address register called the Program Counter which contains the address of the next instruction to be executed. When we say that a CPU is allocated to a specific process, we mean that the PC register is pointing to an address within that process's TEXT section. When the CPU is reallocated to another process, the OS updates the PC to point to the executable code in the TEXT section of a different process. Allocation and reallocation is a comparatively costly operation which requires a context switch2

Syscalls can be expensive because they require us to:

  1. capture the state of our process known as the execution context which is represented in CPU registers,
  2. store them in memory,
  3. yield to the OS to perform the syscall,
  4. store the syscall's result in memory,
  5. restore the execution context of our process,
  6. and finally retrieve the result over the syscall.

Even in this oversimplification, we see that several memory accesses are required for something as simple as e.g. I/O.

Context switching like this, between threads or processes, or even just large and fragmented swaths of code is resource expensive & slow.

It's also worth noting that a process is wholly modeled by the OS as a record called a Process Control Block which contains all the information about the process's state, PC, registers,, and other management data which proves useful for scheduling and necessary for context switching:

pub struct ProcessControlBlock {
	pid: u16,
	state: ProcessState,
	program_counter: u32,
	registers: [u8; 4],
	insruction: u8,
	flags: [u1: 3],
	stack_pointer: u32,
	index_registers: [u32; 2],
	memory_limits: [u32, 2],
	io_devices: Vec<IO_Device>,
	open_files: Vec<File>,
	parent: Arc<ProcessControlBlock>,
	children: Vec<Arc<ProcessControlBlock>>
}

First Come First Served

Returning to consideration of the simplest scheduling algorithm, we can implement this as a queue containing the PCBs representing the processes to be allocated. When a process requests the CPU, the OS pushes its PCB to the tail of the queue, and whenever the CPU is free, it is then immediately allocated to whatever processes's PCB is at the head of the queue.

I'm glossing over a lot of assumptions about CPU/OS behavior here, namely that the CPU is only free to be reallocated after a process terminates – which is rarely the case since many processes never terminate, e.g. servers or background tasks like the OS itself.3 If we were to truly allocate the CPU to a process that never terminates, we'd block all other processes in the queue indefinitely. Additionally, relying on termination of a process before reallocating the CPU ignores the fact that not all processes are equally important.

Bursting

Programs in execution do not use the CPU continuously. If a terminating task takes two minutes to complete execution, that does not imply that it kept the CPU busy for a full 120,000ms.

During the execution of any arbitrary, non-trivial process, there will be periods of CPU idle time when the process will be awaiting something else that is independent of the CPU.

For example, consider this program which buffers text from one file to another in eight byte chunks so as not to consume a ton of memory:

fn main() -> Result<()> {
	let mut source = File::open("src.txt")?;
	let mut destination = File::open("out.txt")?;

	let mut buf = [0u8; 64];
	loop {
		let bytes_read = source.read(&mut buf)?;
		if bytes_read == 0 {
			break;
		}
		destination.write_all(&buffer[..bytes_read])?;
	}
	Ok(())
}

The assembly corresponding to the loop which is doing the copying would look something like this:

copy_loop:
	; read from src. text
	mov rax, 0                 ; sys_read
	mov rdi, [fd_source]       ; file descriptor of src.txt
	lea rsi, [buf]             ; address of buffer
	mov rdx, 64                ; number of bits to read
	syscall                    ; perform sys_read
	test rax, rax              ; check if EOF or error
	js handle_error            ; if rax < 0, handle error

	jz exit_loop               ; if rax == 0, EOF

	mov [bytes_read], rax      ; store number of bytes read

	; write the read_bytes to out.txt
	mov rax, 1                 ; sys_write
	mov rdi, [fd_destination]  ; file descriptor of out.txt
	lea rsi, [buf]             ; address of buffer
	mov rdx, [bytes_read]      ; number of bytes to write
	syscall                    ; perform sys_write
	test rax, rax              ; check for errors
	js handle_error            ; if rax < 0, handle error

	jmp copy_loop              ; repeat loop

exit_loop:

If we were to profile this machine code, we'd note that –despite the appearance of sequential/synchronous execution that loops at the jmp call back to the top of the labeled section– the sys_read and sys_write system calls take far more clock cycles than the innocuous instructions targeting local registers. This is because reading from and writing to files requires the program to invoke the OS, since file manipulation oftentimes requires hardware manipulation to retrieve the IO targets from disk. And IO is slow as hell. therefore, we cannot execute the subsequent test instructions immediately after the syscalls they follow since they are dependent on hardware which is external to the CPU itself.

As a result, the program does not use the CPU continuously, and instead is peppered with numerous windows of idle time during which the CPU could be completing erstwhile segments of other processes. Context switching processes during this idle time is not a premature optimization either since, like Bohr's electron model, this negative space is not the exception, but instead the norm for IO-bound applications and the systems governing those processes which account for like 95% of general programs and 100% of the programs relevant to my claim.

shaded regions where a process is actually using the CPU are called CPU bursts and the gaps between are called IO bursts. We can make some relevant, general observations about execution behavior which will inform our schedule design.

  1. Every process begins and ends with a CPU burst,
  2. CPU bursts are always followed by an IO burst and vice versa, with the final CPU burst which terminates a program.

While the duration of CPU bursts is process-dependent, the general distribution resembles the following histogram with a much higher density surrounding short bursts than long bursts:

To account for all this down time, our ideal scheduler should take advantage of IO burst windows and context switch to another process:

It's worth noting that, implicitly between each burst we incur a small-though-non-zero context switch penalty which is necessary in order to give the illusion of concurrency for even a single-core CPU:

Scheduling Criteria

At this point, it makes sense to revisit and define our scheduling criteria.

  1. CPU utilization: a measure of the % of time the CPU is actively executing processes as oppose to idling
  2. Throughput: processes completed per time unit,
  3. Turnaround time = completion - arrival
  4. Waiting time = turnaround - burst time
  5. Response time = time of execution - arrival time

We'll work our way through each criterion, tweaking our schedulers design along the way. First, an intelligent scheduler should (all else equal) allocate a process only for the duration of a single CPU burst of that process, rather than till termination (which may never even occur).

Lifecycle of a Process

Achieving burst-granular scheduling adds a layer of complexity since we now must account for the fact that processes can be in different state.

Every process begins in the new state which describes the modal existence of a launched-but-not-yet-executing process since its executable code is still being loaded into memory.

Once loaded and sufficiently resourced, a process enters the ready state meaning its ready for execution if/when the CPU is allocated to it.

When a CPU is allocated to the process, it is in the running state.

At some point, the process will likely make a system call (such as an IO request) at which point it will/should voluntarily yield the CPU to the OS so the scheduler can reallocate the CPU to a process in the waiting state.

Bear in mind as well that IO is not limited to thing like filesystem access or memory allocation. It can also include simply waiting for some external event such as user input or a network response (which we know is just a fancy filesystem access). In any event, the process should enter the waiting state until the IO request is completed such that it may then re-enter the ready state to be scheduled again.

Note that a process rarely/never immediately transitions from waiting straight to running since, in the limit, there are always other processes competing for CPU time, and so the gatekeeping scheduling will always considered the queue(s) of ready processes before the one that just had the CPU. All together, the lifecycle can be diagrammed like so:

Most application processes spend the majority of their lifecycles in the event loop and are designed to run indefinitely (barring SIGTERM from the user). General purpose operating systems oftentimes model these broad categories with more granularity granularity (e.g. more stages for new/terminating states to indicate a process initializing/initialized, and terminating/terminated). Note also that it is common to represent all but the running state as additional queues:

Dispatcher

The dispatcher is responsible for allocating a CPU to the process at the head of the ready queue whenever able. It's the element of OS which actually handles context switches in response to IO bursts or CPU interruptions, managing the record of the running process such that its CPU state can be captured in the relevant PCB before relegating it to the waiting queue. It's also responsible for restoring the CPU state from a queue'd PCB when it is allocated.

Together, the scheduler manages ingress to the ready queue, and the dispatcher manages running processes and waiting queue(s). Accordingly, the dispatcher adds some constant overhead to CPU bursts when performing context switches which, since they occur in proportion to our system's throughput, are not negligible. We denote this penalty as dispatch latency, defined as the time it takes to start/stop a running process.

Since the dispatcher is invoked for every context switch, it's highly optimized at both the hardware and software layers.

FCFS

Under a naïve First Come First Served policy, the scheduler employs a single FIFO queue for the ready state swimlane. As soon as a process makes an IO request, a multi-way pop & shuffle is executed by the dispatcher, moving the running process to the waiting queue, and popping the head of the ready queue into the CPU:

While this FIFO system does take CPU utilization into account, it's still not the most optimal scheduling technique we can devise. To understand why let's revisit the CPU burst density graph:

Observing the tail behavior for longer CPU bursts, we can note that, while they're less frequent, they still occur with non-zero probability/frequency.4 Processes characterized by a high number of sort bursts are said to be IO bound since their execution duration is primarily determine by the speed of the IO subsystems they depend on. In other words, their overall performance would improve proportionally to improvements to subsystem optimizations/improvements.

CPU bound processes, on the other hand, are those that are characterized by long CPU bursts. Unlike IO bound processes, these would realize comparatively less overall performance gains if IO subsystems were improved.

The existence of CPU bound processes causes an adverse effect on our current FCFS design. Consider the not-too-contrived example comprised of even just a single CPU bound process in a ready queue with several IO bound processes.

t0t_0

Initially, the CPU might be executing a mundane IO bound process.

t1t_1

Eventually, that process will require some IO response, and the OS will move it to the waiting queue and allocate the CPU to the next process in the ready queue

t2t_2

That process, too, will eventually be moved to the waiting queue and the CPU bound process will begin execution:

t?t_?

This process may now hog the CPU for an indeterminate amount of time, despite the other processes IO requests to complete and for the ready queue to begin to get backed up:

Thus, for any system with a mixed distribution of IO- and CPU-bound processes, at some point the CPU bound process will acquire the CPU, and all the IO bound processes will await (and possibly complete) their IO requests, so the ready queue will be stuck waiting for the CPU, leaving IO devices idle. Once the CPU burst of the CPU bound process is completed, it will be moved to the (possibly idling) waiting queue, all IO bound processes will accumulate behind it, leaving the CPI idle as well until the CPU bound processes IO request is completed at which point we repeat from the top.

It is both possible and likely that either the CPU or the IO devices will be severely underutilized when subjected to the presence of even a single CPU bound process absent any safeguards against this convoy effect.

Turns out that this is usually slower than if we just let IO bound processes "cut the line" and go ahead of CPU bound processes. This observation gives rise to the next scheduling technique:

Earliest Finish Time First

All else equal, this is actually the most optimal algorithm in terms of average wait time for a set of jobs (see Appendix)

The long and short is that schedule priority is given to the process with the shortest next CPU burst (ties can be broken arbitrarily I do not care).

This can be implemented as a priority queue where the priority is proportionate to the duration of the next CPU burst s.t. a lower priority value corresponds to a "higher" connotational priority. This allows for brief IO bound processes to overtake CPU bound processes in the ready queue to avoid aforementioned traffic jams. However, this approach conversely penalizes CPU bound processes which (sans adjustment of our design) may never be allowed so long as an IO bound process is in a competing queue.

N.B. that EFT is optimal for a fixed collection of processes which does not change after the optimal schedule is determined. This means that we can rely on it to make the correct scheduling time slice at each discrete time step, but taken together in the presence of a constantly changing ready queue, vanilla EFT is insufficient.

Thus we're faced with two issues. the first being that it's not .... possible?? to know the duration of a processes next CPU burst in order to even determine its finish time with certainty. The second is the issue of starvation of CPU bound processes.

Issue the First: Predicting the Future

It is impossible to know with certainty the duration of the next CPU burst. Even analyzing the machine code following the PC to determine when the next system call will occur is unreliable (isomorphic to the halting problem) since duration/amount of instruction primitives preceding the boundary calls could depend on external input like user input.

The best we can do is attempt to predict burst length (and prioritize accordingly) based on past CPU bursts from that process, assuming that the next burst will likely be similar in profile to the exponentially weighted average:

τn+1=αtn+(1α)τn\tau_{n+1} = \alpha t_n + (1 - \alpha)\tau_n

where:

  • τn+1\tau_{n+1} is the next burst we're predicting,
  • tnt_n is the actual duration of the most recent burst,
  • τn\tau_n is the corresponding predicted duration of the most recent burst, and
  • α\alpha is a discount hyperparameter.

Since τn+1\tau_{n+1} is recursively defined, we can expand τn\tau_n all the way back to τ0\tau_0 (usually initialized to some constant like 100ms), thus capturing the entire history of the process with a single term:

τn+1=αtn+(1α)τn=αtn+(1α)[αtn1+(1α)τn1]=αtn+(1α)αtn1+(1α)2τn1=αtn+(1α)αtn1+(1α)2[αtn2+(1α)τn2]\begin{aligned} \tau_{n+1} &= \alpha t_n + (1 - \alpha)\tau_n \\ &= \alpha t_n + (1 - \alpha)\big[\alpha t_{n-1} + (1 - \alpha)\tau_{n-1}\big] \\ &= \alpha t_n + (1 - \alpha)\alpha t_{n-1} + (1 - \alpha)^2\tau_{n-1} \\ &= \alpha t_n + (1 - \alpha)\alpha t_{n-1} + (1 - \alpha)^2[\alpha t_{n-2} + (1 - \alpha)\tau_{n-2}\big] \\ & \vdots \end{aligned}

Here, α\alpha determines how much weight to assign to historical predictions and observations versus the most recently observed value: α=0\alpha =0 ignores the most recent burst and only considers the history of the process, α=1\alpha = 1 ignores the historical bursts and fully assumes that the next burst will be identical to the most recent burst. A value like α=1/2\alpha = 1/2 gives equal weight to both, with older bursts being increasingly discounted by the exponential (1α)k(1 - \alpha)^k term.

Issue the Second: Finish your compute, there are processes starving in the L6 queue

As mentioned previously, EFT is optimal only for fixed sets of processes to be scheduled, but this is never the case in practice, begging the question: what should an ideal scheduler do if a CPU bound process has been allocated, and then an IO bound process with a predicted next burst shorter than the remainder of the predicted next burst of the CPU bound process enters the ready queue?

τIO<τCPU\tau_{\mathtt{IO}} < \tau_{\mathtt{CPU'}}

Should the CPU bound process be interrupted to yield to the IO bound process with an earlier finish time? For most modern, general purpose schedulers, the answer is yes. This is namely due to the non-zero likelihood of infinitely long CPU bursts. Without preemption, all other processes would freeze from the user's perspective, ruining the illusion of concurrency which is achievable via preemption on even a single core'd CPU.

Additionally, absent preemption, malware could easily monopolize the CPU if the OS relied on processes to voluntarily yield the CPU. Hence we need some sort of Hobbesian sword in order to enforce cooperation and prevent process starvation.

Round Robin

In order to account for this criterion of minimizing starvation, we can use a preemptive FIFO queue with the addition of a CPU hardware-level "shot clock" which will uniformly preempt any long-running process whose time on the CPU exceeds some threshold qq (say 100ms), and force a context switch to the next available process in the ready queue, placing the evicted process at the tail of the ready queue:

This adds a mechanism of fairly distributing CPU time and solves the issue of starvation. For a ready queue with nn processes, and a shot clock threshold of qq, each process will receive 1/n1/n of the CPU time in chunks of at most qq time units, meaning that no process will wait more than (n1)q(n-1)q time units before it gets to drive the boat. Under this system, wait time optimality is dictated by our threshold qq. If the scheduler chooses a qq too large, our system reduces to FCFS as we'll never utilize the shot clock. Conversely, a stringently small qq can lead to too many unnecessary context switches which result in pure overhead penalty since, despite consuming clock cycles, they produce no valuable compute. We can measure the optimality of qq in terms of turnaround time which is the time a process takes from start to finish, including all the time spent in the ready queue, the time spent running, and the time spent awaiting IO completion.

Relatedly, throughput of a scheduler is defined in terms of the processes completed (or, more specifically, CPU bursts completed) per time unit. It's important to account for throughput as well when measuring the overall efficacy of our schedule since, for some qq shorter than the time it takes to execute a context switch, we could achieve 100% CPU utilization but zero throughput since we'd be evicting-on-arrival every process. Utilization as a metric alone is useless if the work being done on the CPU is not related to process execution.

Thus, we want to choose qq to be just longer than the average next CPU burst in order to force the minimum possible number of context switches, and only preempt long-running processes that may otherwise continue to hog compute.

To quantify this goldilocks zone, we introduce yet another criterion of response time which captures the delta between the time of a process's first request to its first response.

For some "optimal" time quanta, round robin is able to give the impression concurrency across some number of processes beneath a certain threshold which is dictated by the CPUs clock speed. I.e., for any CPU, there is some number of processes at which point the addition of more processes will decrease responsiveness. In order to address this without reducing qq and paying undue penalties in the way of a context switch, we can either:

  1. upgrade our CPU so we can context switch faster, allowing us to decrease qq without compromising turnaround time;
  2. utilize multiple processors to decrease the load per CPU.

Both of these are copout answers. Given the constraints of fixed hardware, we must revisit our OS/scheduler's design if we want to milk out any more throughput. The next step would be to add more granular priorities to our processes, perhaps weighted towards user window focus, applications running in the "foreground," and other critical OS processes.

Priority Scheduling

As observed previously, vanilla priority scheduling is still subject to starvation. We can combine round robin with priority scheduling, but this will still falter under a heavy load of high priority processes which may prevent lower priority processes from every being allocated.

One fix we might consider is to adjust the priority of a process based on the amount of time that its spent in the ready queue in order to guarantee that even the lowest of priority jobs will eventually mature into high enough priority (even if only until its next IO burst).

However, sorted insertion into just a single queue can become costly for large systems with many processes as an O(logn)O(\log n) binary search must be executed every time a process enters the ready queue. thus, we have motivation to introduce different queues for different tiers of priority.

Multilevel Queue Scheduling

Suppose we have a n=32n=32-bit priority, distributed across nn priority classes such that the class is determined by the kind of process (which we denote with the leading 1-bit in the priority), and a granularity specified by the predicted next CPU burst duration.

Under this system, the scheduler will traverse the queues in order of priority, selecting an available job from the first non-empty queue. To account for new possibilities of higher order starvation, we can also implement a round robin mechanism on top of this multilevel queue, rotating the "head" of the scheduler across the queues (even if they still have ready processes available for selection) and lingering for some amount of qq periods in proportion to that queue's priority class.

We can also reprioritize a process each time it enters the ready state, for example – bringing an application from the background to the foreground should increase its priority. For non-interactive processes, we can infer the priority class based on the shotclock for that class (since, while the on-board shot clock exists as a singular timer, the scheduler can maintain different qq values for each queue). For example, we can initialize each process to the highest priority class, and then gradually deprioritize it as needed if it behaves like a CPU bound process relative to that queue's qq period. Otherwise, if its CPU burst concludes before it would get evicted, it can return to that same ready queue.

As mentioned, within each ready queue, we can order processes by predicted next burst time. With priority aging, this scheduling approach still guarantees that low-initial-priority CPU bound processes will still be executed eventually, but this sortition does indeed penalize them.

In practice

It's worth noting as well that IO bound processes are not necessarily interactive applications. Recall that IO refers to any interaction with an external device. The majority of processes are IO bound to some extent. Additionally, CPUs are fast. Even with hundreds of processes of running concurrently under light load, the majority of a modern computer's CPUs will be idle:

This implies that the higher priority ready queues are empty most of the time, and quickly scheduled when needed – responding in milliseconds. Several, multi-core processors typify modern personal computers, so many of the chokepoints/drawbacks of the aforementioned scheduler designs are obsolete outside of e.g. benchmarking scenarios. Nevertheless, modern CPUs, OSes, and their schedulers have their own modern problems. Nowadays, schedulers manage individual threads, rather than whole processes, which allows for the "illusion" of concurrency on a single-core'd CPU internal to a single process itself.

Sophisticated processes will divide compute tasks by thread, with CPU bound-like tasks being delegated to their own thread(s), segregated from the IO/interactive functionality which can be prioritized accordingly. This allows for CPU intensive applications to remain responsive by prioritizing UI thread(s) while still allowing for multi-queue feedback scheduling to appropriately handle the CPU intensive portions "in the background."

We also neglected to enhance our representation of the waiting queue for this post, but it –of course– is never just a singular FIFO queue. It makes far more sense to organize waiting queues by the IO device/nature of the event the processes within are awaiting since it doesn't make sense to cause a waiting queue pileup if a lackadaisical disk IO task is in front of a GPU task.

Appendix I: Optimality of EFT for Interval scheduling

Suppose that AA is the set of jobs computed by the Earliest First Time algorithm, that A=k|A| = k. We can sort the jobs in ascending order of finish time such that the jobs are ordered

a1,a2,...,ak1,aka_1, a_2, ..., a_{k-1}, a_k As a result of this ordering, we know that for every 1tk11 \leq t \leq k-1 then the finishing times are related by the inequality f(at)f(at+1)f(a_t) \leq f(a_{t+1}) for whatever process.

Now, suppose that EFT has not produced an optimal schedule. This implies that there exists some other set of jobs OO with O=m>k|O| = m > k jobs. Since OO is a solution to the problem, the jobs it contains must be mutually compatible. We can sort OO the same way we did AA in order to better compare the properties of the two schedules. Let

o1,o2,o3,...,om1,omo_1, o_2, o_3, ..., o_{m-1}, o_m

be the jobs in this order, allowing us to infer 1tm11 \leq t \leq m-1 then f(ot)f(ot+1)f(o_t) \leq f(o_{t+1}).

The key idea now is to compare the jobs at the same indices in both AA and OO. By construction, they must differ at some index (having different starting and/or ending times), otherwise we would have A=OA = O. Let pp be the first index where they differ, i.e. for every index

q<p,aq=oq,apopq < p, \quad a_q = o_q, \quad a_p \neq o_p

Note that it is possible that p=1p =1, that is the schedules differ from the very first job. Next, we replace opOo_p \in O with apa_p and show that OO still contains a compatible set of jobs such that the smallest index pp where AA differs from OO "bubbles up" by at least one index. We have three cases to consider:

Case 1: a1o1a_1 \neq o_1

Suppose a1o1a_1 \neq o_1 like so:

We can make some productive observations. Since a1a_1 is the first job scheduled by EFT, its finish time must be the smallest among all jobs in the input. Therefore, we can be sure that:

f(a1)f(o1)f(a_1) \leq f(o_1)

In other words, the situation illustrated in the Figure 1(a) above is not possible. Moreover, since the jobs in OO are mutually compatible, we have:

f(o1)s(o2)f(o_1) \leq s(o_2)

where ss is function assigning the start time to the job in the schedule:

s:ORs : O \rightarrow \mathbb R

Chaining these inequalities together, we get the scenario in Figure 1(b) where:

f(a1)s(o2)f(a_1) \leq s(o_2)

Therefore, if we replace o1o_1 with a1a_1, then the jobs in OO remain mutually compatible as illustrated in Figure 1(c).

Case 2: apopa_p \neq o_p for some 1<pk1 < p \leq k

Next, suppose that the smallest index where AA differs from OO is somewhere inclusively between the 2nd index and the last index: 1<pk1 < p \leq k. Recall that this implies that for every index

q<p,aq=oq,apopq < p, \quad a_q = o_q, \quad a_p \neq o_p

We can make a similar argument as before in two parts (since we now must consider the jobs preceding ap,opa_p, o_p). First, we can assert that

f(ap1)=f(op1)f(a_{p-1}) = f(o_{p-1})

since they're the same job. Moreover, since the jobs in OO are mutually compatible, we also have:

f(op1)s(op)f(o_{p-1}) \leq s(o_{p})

Combining these inequalities, we get:

f(ap1)s(op)f(a_{p-1}) \leq s(o_{p})

Therefore, opo_p is compatible with ap1a_{p-1} and would have been in the list of jobs available to the EFT algorithm when it selected apa_p. Since EFT selects the available job with the smallest finishing time, we can conclude that:

f(ap)f(op)f(a_p) \leq f(o_p)

For jobs following opo_p , we know that they all must be compatible with opo_p. Since we have just shown that f(ap)f(op)f(a_p) \leq f(o_p), we can conclude that apa_p is also compatible with all jobs after oi>pOo_{i > p} \in O. In other words, if we replace opo_p with apa_p in OO, the set of jobs in OO will remain mutually compatible.

We can iterate this exchange argument for every index where AA and OO differ. We must, however, make this argument index-by-index, starting at the smallest index pp and working our way forwards in order to guarantee the base assertion that f(ap1)=f(op1)f(a_{p-1}) = f(o_{p-1}).

Case 2: ap=opa_p = o_p for all 1pk1 \leq p \leq k except m>km > k

As long as the index of the differing job is less than or equal to kk, we can exchange the job in OO with the job in AA. Therefore, we can ensure that this sequence of jobs is mutually compatible (noting the change at index k+1k+1):

a1,a2,...,ak1,ak,ok+1,ok+2,om1,oma_1, a_2, ..., a_{k-1}, a_k, o_{k+1}, o_{k+2}, o_{m-1}, o_{m}

We still have not precluded the possibility that O>A|O| > |A| implying that OO has higher throughput than AA ~somehow, despite being unable to pin down how. Fortunately, it is easy to address this (im)possibility. If OO is indeed structured as above, then ok+1o_{k+1} is compatible with aka_k. Therefore, after EFT selected aka_k, it would not have processed all the jobs, which contradicts our assumption that EFT resulted in AA upon termination. Therefore, OO must also have kk jobs.

Footnotes

Footnotes

  1. Leviathan

  2. https://www.murphyandhislaw.com/blog/memory#2--heap

  3. the OS itself is not subject to the class of schedulers we're discussing here – it exists on a higher plane of hypervisation

  4. bayesians in my mentions sound off