Introduction
The scheduler is the orchestrator of your go programs. The scheduler is what it makes so that the goroutines are run and concurrently. It pauses and resumes them when they block on a channel or mutex operation. The scheduler also coordinates system calls, I/O and also the garbage collection. Go programs can run hundreds of thousand of goroutines thanks for this scheduler. Learning and understanding the scheduler is important since scheduling decisions have an impact on the performance of go programs.
Go need a scheduler to support goroutines, which are user-space threads. Goroutines has smaller memory footprint (2kb versus 8kb) and a context switch is faster.
The scheduler tries to use a small number of kernel threads to support high concurrency. It will leverage parallelism to scale to many cores.
How the multiplexing works?
The scheduler multiplexes goroutines onto kernel threads. To understand how this is done we need to know when to create the threads and how to distribute the goroutines.
Tracking scheduled goroutines: runqueues
Goroutines that are ready to run and need to be scheduled are tracked in a FIFO runqueues. There is one queue of work per CPU so threads don’t contend for work and there is another global runqueue (more on this global later). When a thread is paused, it can steal work from any other queue (work-stealing). Work stealing helps to balance the work in each thread.
Go scheduler will create threads when needed and will keep track of paused threads (thread parking) in a list.
GOMAXPROCS limits the number of goroutine running threads to a number of CPU cores. The scheduler will reuse threads.
Mechanism to transfer blocked thread from runqueue
There could be a situation where there are no paused threads and one of the active threads makes a blocking syscall, this will cause the kernel to be blocked. We could have work in our runqueue, but the thread will be blocked. Handoff is the mechanism that the scheduler uses to detect blocked threads. Using a background monitor thread it detects blocked threads and it will take and gives the runqueue away.
The handoff mechanism transfer a blocked thread runqueue to another thread. If there is no paused thread, then the scheduler could create a new thread. This will not break the limit of running threads since there is a blocked thread which is not running. One very important property of handoff is that it prevents goroutine starvation.
In summary the work across threads is balanced with work-stealing; handoff prevents starvation from blocked threads.
Preemption and the global runqueue
Go programs calls into the scheduler by channel blocking, goroutine creation, etc. But what happens if we have a goroutine that does not create a goroutine, that does not make use of a channel? Consider a go program which only has a CPU intensive algorithm, how does the scheduler which can starve the runqueues? Go scheduler implements preemption using a background thread called the sysmon. The sysmon thread detects long-running goroutines (> 10ms) and unschedule them when possible. Since this CPU intensive work was starving other goroutines the scheduler will not put it again in the distributed run queues, it will put it in a global runqueue. The global runqueue is like a low priority runqueue.