previousnext up

On Multiprocessor System Scheduling

Abstract

In the most general case, a job submitted. for execution ean be represented by an unknown directed acyclic graph (dag, i.e., the data dependency relation graph of a program) revealed as execution proceeds. It is known that, for distributed memory parallel machine models, there is no scheduling algorithm which can complete execution of every unknown dag within a constant factor of optimal execution time (assuming the dag is known) [6]. On the other hand, for shared memory parallel machine models, using parallel profiles to represent parallel jobs, it is shown that the Dynamic Equi-partition Scheduling Policy (DEQ) [27] [33] guarantees a mean response time within a constant factor of the optimum which assumes complete information of job parallel profiles) [5].
As an effort to move those positive theoretical results for shared memory parallel models one step closer towards reality, we extend those results to all the unknown dags. We show the DEQ achieves a mean response time, for any unknown dag, within four times the optimum (assuming the dag is known). Therefore, for the same class of unknown dags, we have a scheduling strategy which achieves a near optimal solution in both the completion time and the mean response time for shared memory parallel models but no scheduling policy can guarantee a solution within a constant factor of the optimum in completion time (nor mean response time) for distributed memory models.
We also study the issue of effectiveness of private eaches for processors. Since time for all processors to access the shared memory simultaneously is usually much longer than the time for a processor to aceess its own private cache, scheduling with private caches falls into the distributed memory model so that the lower bound [6] applies. However, in this paper we are able to show the effectiveness of private caches by proving that a version of DEQ achieves a mean response time with five times the optimal mean response time in the cache clock time for a large class of parallel jobs well accepted in the parallel scheduling community. This shows an improvement of systemn performance by using private caches over that of purely shared memory. We also notice that the same idea can be applied to improve system performance of distributed memory parallel machine for special classes of jobs classified as computation intensive (informally time complexity being sufficiently larger than the space complexity).