What's wrong with schedulers

Description:

I've kept silence for a long time, but enough is enough. Almost three years ago I had to invent yet another bicycle for scheduling for automated software testing tasks.

Those tasks were consisted of containers and virtual machines and had to be placed on appropriate clusters (Openshift and Openstack, they had to be available and had enough resources).

And because there were a lot of such tasks (hundreds nightly tasks, tens daily tasks) it was very important those tasks shouldn't suffer of compute nodes overloading/Internet unavailability/different maintenance tasks/whatever. Otherwise we had to investigate our test results and figure out where were software defects and where were problems with nodes and services.

That's why I had to grab different metrics from compute nodes and infrastructure in the whole (with help of Zabbix) and used those metrics for appropriate task scheduling (Jenkins looked into metrics and made decisions). Three years later I have to solve this problem again.

What's wrong with schedulers:

They operate objects of a cluster, they cannot consider things like database load or Internet availability outside of the cluster. It means all your tasks will fail just because your ISP is down and cannot resolve DNS queries. It automatically means that you should investigate why your tasks are failed, clean queue etc.
Schedulers are not aware of tasks usual resources consuming (tasks profiling).It means they cannot place and reorder tasks in more effective way. Short example. Your task needs 2 CPU cores, scheduler wait for appropriate node where task will be placed, but in reality task uses those 2 cores only for first 5 minutes and right after uses only 0.5 of a core. Same things happen to networks, storages, GPU RAM utilization etc. Hence we have poor harware utilization and longer tasks executions.
Every cluster software has its own schedule settings like timeouts of task placement, timeout of task completion, timeout of task response etc. and of course its own queue. That means you have to take account of these things and be ready to perform removing failed tasks from a queue (that can be hard to achieve because of node failures, IO timeouts etc.). It's much easier not putting tasks in a cluster with problems.
People cannot effectively describe/profile their tasks. It would be a great thing if clusters are capable to automatically profile your tasks for different aspects and consider those metrics in future runs. Or just let people to describe tasks in more detailed/natural form, like: low CPU usage of 2 cores, 2G of GPU RAM, 10Mb/s of Internet traffic and 100Mb/s of intranet traffic, storage 200 IOPS, 10GB of S3 etc.

Hence we need a system that meet the requirements:

We need a scheduler framework that would capable to generate custom schedulers (with custom/dynamic rules) for different systems (different metrics of these systems). It means, if we generate scheduler for k8s it might be integrated and controlled by k8s itself. If we generate scheduler for k8s and openstack simultaneously it should be outside of k8s and openstack and might be integrated with Jenkins/GitLab or be a standalone application/API service or whatever you want.
The logic of generated schedulers should be based on rules and rules should work with provided metrics. Simple example. We need a scheduler that will allocate CPU and RAM for tasks as usual (cores and gigabytes), check that Clickhouse database isn't overloaded, Internet connection works fine and bandwith is well enough. So, we need several metrics for proper scheduling: CPU, RAM, clickhouse, network. As you can see - you can make different combinations of such metrics for your unique scheduling or make chains of such schedulers.

Well, in this short note I just wanted to express my thoughts about scheduling tasks and that we still don't have tools for effective scheduling. Of course such tools should be developed more accurated than I do in this note.