ARGOS – a software framework to facilitate user transparent multi-threading

ARGOS – a software framework to facilitate user transparent multi-threading
Nils Petersen, Yulian Pastarmov, Didier Stricker
Many-core Applications Research Community Symposium Many-core Applications Research Community (MARC) Symposium (MARC-2011), 3rd, July 5-6, Ettlingen, Germany

Abstract:
In this paper we present a software framework called ARGOS designed for auto-parallelizing algorithms and distributing its parts among physical machines. The core approach is to split an algorithm into an abstract, a target specific static, and a run-time part. The abstract part is an abstract graph representation of the distinct steps of an algorithm with all data dependencies explicitly resolved. The target specific part is statically distributing graph-parallel parts of the algorithm among threads and physical machines and in addition associates each algorithm step with an optimal implementation for the respective target hardware. The run-time part covers load-balancing and - in presence of streaming data - parallelizes sequential branches of the graph by stage-parallel execution. The underlying programming model allows for optimal partitioning of the graph in terms of computational load and memory resp. bandwidth footprint. All thread synchronization and memory management is hereby carried out by the framework making the multi-threaded execution completely transparent to the user. The software is in daily use within our research group and has already been deployed to several industrial projects.

ARGOS – a software framework to facilitate user transparent multi-threading

ARGOS – a software framework to facilitate user transparent multi-threading
Nils Petersen, Yulian Pastarmov, Didier Stricker
Many-core Applications Research Community Symposium Many-core Applications Research Community (MARC) Symposium (MARC-2011), 3rd, July 5-6, Ettlingen, Germany

Abstract:
In this paper we present a software framework called ARGOS designed for auto-parallelizing algorithms and distributing its parts among physical machines. The core approach is to split an algorithm into an abstract, a target specific static, and a run-time part. The abstract part is an abstract graph representation of the distinct steps of an algorithm with all data dependencies explicitly resolved. The target specific part is statically distributing graph-parallel parts of the algorithm among threads and physical machines and in addition associates each algorithm step with an optimal implementation for the respective target hardware. The run-time part covers load-balancing and - in presence of streaming data - parallelizes sequential branches of the graph by stage-parallel execution. The underlying programming model allows for optimal partitioning of the graph in terms of computational load and memory resp. bandwidth footprint. All thread synchronization and memory management is hereby carried out by the framework making the multi-threaded execution completely transparent to the user. The software is in daily use within our research group and has already been deployed to several industrial projects.