In one of the assignments, we wanted to do performance analysis of chainermn in multinode environment. So we tried to install chainermn on osc. Turns out it’s bit complected. For impatients:

  1. Get the base modules:
    module load python/3.6-conda5.2  gnu/6.1.0   cuda/9.1.85   mvapich2/2.3rc2-gpu
    
  2. Create a virtualenv:
    conda create -n myenv python=3.6
    . /usr/local/python/3.6-conda5.2/etc/profile.d/conda.sh
    source activate myenv
    
  3. We need to install mpi4py from source code. (Not using pip).
    wget https://bitbucket.org/mpi4py/mpi4py/downloads/mpi4py-3.0.0.tar.gz
    tar xf mpi4py-3.0.0.tar.gz
    cd mpi4py-3.0.0
    
  4. Replace mpi.cfg with:
    [mpi]
    mpicc = /opt/mvapich2/gnu/6.1/2.3rc2-gpu/bin/mpicc
    mpicxx = /opt/mvapich2/gnu/6.1/2.3rc2-gpu/bin/mpicxx
    include_dirs = /opt/mvapich2/gnu/6.1/2.3rc2-gpu/include
    libraries = cudart
    library_dirs = /opt/mvapich2/gnu/6.1/2.3rc2-gpu/lib:/usr/local/cuda/9.1.85/lib64
    

    Observations:

    • We need cudart with the gpu enabled mpi.
    • : allows specifying multiple entries.
  5. Build and install mpi4py
    python setup.py build
    python setup.py install
    
  6. Install chainer.
    pip install chainer
    

    This will also install cupy if it doesn’t (check the output), you will need to install.

  7. Now the last command would install cupy-92 but remember our mpi version needed cuda 9.1. So once you login to the gpu node, we will be cheating like following:
    # we need to load modules again on the compute node
    module load python/3.6-conda5.2  gnu/6.1.0   cuda/9.1.85   mvapich2/2.3rc2-gpu
    # init conda defaults
    . /usr/local/python/3.6-conda5.2/etc/profile.d/conda.sh
    # activate the environment which has everything installed
    source activate myenv
    # do the cheating so the cupy-92 can load cuda 9.2 cublas and mpi
    # can still be happy with cuda 9.1
    export LD_LIBRARY_PATH=/usr/local/cuda/9.2.88/lib64:$LD_LIBRARY_PATH
    # make sure mvapich2 *actually* uses gpu!
    export MV2_USE_CUDA=1
    

With all this setup you should be able to run the chainermn on osc.

Happy coding!