MPI4Py 101¶

Beginning¶

Since Python is built on C, it’s possible to wrap a MPI library for Python to running MPI program. Hence we have MPI4Py.

Why write this post? There are few document relating MPI4Py, and most of them lack of example codes. Sometimes you have to figure out these primitives by reading MPI document and MPI4Py source code. This post would also help you understand the MPI primitives used in our codes.

Most of following aspects are generlized from our coding experience, we recommend you read this overview.

Two style functions¶

In some communication functions, there exists two styles you can choose, one could pass a buffer similar to MPI, one could pass a Python object which could be compressed as a pickle. For details see this post. In most cases, we would follow MPI style.

Default Communicator¶

Communicator is a instance connecting different process. If you need interprocess message passing on MPI, you will use a default communicator or create one. All MPI process is connected on the default communicator MPI.COMM_WORLD. You could create a communication which only covers part of processes.

Collective Communication¶

https://computing.llnl.gov/tutorials/mpi/images/collective_comm.gif

We have a example code [collective.py] , which is composed of Send and Recv to help you understand. And MPI have other collective primitives except we mentioned above, here’s a slide from UIUC providing more details about collective communication.

One-sided communication¶

In common scenario, the receiver and sender need both use expcilit primitives (e.g. send, recv) in communication. However, it will confuse programmer in some cases, like others access one specific node memory frequently.

There are three ways to start/end a one-sided communication.

Fence. Fence
Start/Wait, Post/Complete
Lock, Unlock

Warning

The Lock and Unlock doesn’t stand for mutex lock, it’s just a syntax similar to the others.

During cummunication, you could choose following common data movement primitives.

Get(self, origin, target_rank, target=None)
Put(self, origin, target_rank, target=None)
Accumulate(self, origin, target_rank, target=None, op=SUM)
… More

In some specific primitives, such as Get, Put, there exists a parameter target, it’s a (target_disp, target_count, target_datatype) tuple and it locates a memory address similar to MPI. Another example [one_sided.py] explains the details.

We provide example codes relating gradient method using one-sided communication [gd_async.py].

In Python, you need to allocate a numpy array as the MPI Window. When you update the window in local process, do not write x = x + y, you will get a new x but the window won’t update. The right way is x += y, similar to [saga.py] Or directly call Win functions with specific target_rank, similar to [gd_async.py].

Here are some helpful reading help you understand one_sided communication.

Lecture slides [1] [2] from UIUC.
DeinoMPI manual , it provide simple codes.
MPI4Py source code Win.pyx

Non-blocking communication¶

When a process is calling Send or Recv, it won’t proceed until communication finished, which are called blocking communication. Blocking communication performs a synchronous role, it may cause deadlock if program doesn’t handle it well. If you don’t need a synchronous communication, you may choose non-blocking functions. (If you are familiar with Javascript/Python3+, recall async/await model.)

Probing node¶

Probe helps to check if recevied message from others. It simply codes when you implement master-worker topology. Here’s a simple demo for producer/consumer problem.