MPI4Py 101¶
Beginning¶
Since Python is built on C, it’s possible to wrap a MPI library for Python to running MPI program. Hence we have MPI4Py.
Why write this post? There are few document relating MPI4Py, and most of them lack of example codes. Sometimes you have to figure out these primitives by reading MPI document and MPI4Py source code. This post would also help you understand the MPI primitives used in our codes.
Most of following aspects are generlized from our coding experience, we recommend you read this overview.
Two style functions¶
In some communication functions, there exists two styles you can choose, one could pass a buffer similar to MPI, one could pass a Python object which could be compressed as a pickle. For details see this post. In most cases, we would follow MPI style.
Default Communicator¶
Communicator is a instance connecting different process. If you need interprocess message passing on MPI, you will use a default communicator or create one. All MPI process is connected on the default communicator MPI.COMM_WORLD. You could create a communication which only covers part of processes.
Collective Communication¶
We have a example code [collective.py] , which is composed of Send and Recv to help you understand. And MPI have other collective primitives except we mentioned above, here’s a slide from UIUC providing more details about collective communication.
One-sided communication¶
In common scenario, the receiver and sender need both use expcilit primitives (e.g. send, recv) in communication. However, it will confuse programmer in some cases, like others access one specific node memory frequently.
There are three ways to start/end a one-sided communication.
Fence.FenceStart/Wait,Post/CompleteLock,Unlock
Warning
The Lock and Unlock doesn’t stand for mutex lock, it’s just a syntax similar to the others.
During cummunication, you could choose following common data movement primitives.
Get(self, origin, target_rank, target=None)Put(self, origin, target_rank, target=None)Accumulate(self, origin, target_rank, target=None, op=SUM)… More
In some specific primitives, such as Get, Put, there exists a parameter target, it’s a (target_disp, target_count, target_datatype) tuple and it locates a memory address similar to MPI. Another example [one_sided.py] explains the details.
We provide example codes relating gradient method using one-sided communication [gd_async.py].
In Python, you need to allocate a numpy array as the MPI Window. When you update the window in local process, do not write x = x + y, you will get a new x but the window won’t update. The right way is x += y, similar to [saga.py] Or directly call Win functions with specific target_rank, similar to [gd_async.py].
Here are some helpful reading help you understand one_sided communication.
Non-blocking communication¶
When a process is calling Send or Recv, it won’t proceed until communication finished, which are called blocking communication. Blocking communication performs a synchronous role, it may cause deadlock if program doesn’t handle it well. If you don’t need a synchronous communication, you may choose non-blocking functions. (If you are familiar with Javascript/Python3+, recall async/await model.)