Smooth out thread unparking spikes
Currently, the scheduler unparks new threads (processors) as long as there is any incoming task to be executed. This can cause unwanted spikes in thread unparking, which leads to unnecessary work stealing especially when the number of worker threads is greater than the number of physical CPU threads.
Smooth this out by implementing a technique similar to what's done in Golang and Tokio -- only wake new threads up if an idle-spininng thread finds new work and there are still incoming work left to be taken. This does not apply if no thread is idle-spinning, of course. The IdleSpinMax value is also increased to make this approach actually useful.
The current implementation relies on a global mutex and an atomic, which causes some inconsistency in performance that probably needs to be investigated. However, when trying to benchmark with more worker threads than physical CPU threads, this now enables libfibre to perform on the same level as Golang and Tokio.