Running Batch Jobs in Continuity

Multicore Batch Jobs

Installation

  • In order to run multicore / multiprocessor jobs you must have the correct mpi execution environment installed on your system. This varies depending on your platform.

Windows

  1. Download and install mpich2 from here: http://www.mpich.org/downloads/, scroll to the bottom and select the appropriate Windows version (32 or 64bit, make it the same as the bit type of Continuity).

  2. Mpich needs to access numpy, the easiest thing would be to use the version supplied with Continuity. To do that you need to add a new environmental variable: PYTHONPATH and point it to where Continuity’s numpy package is installed, for example:
PYTHONPATH=C:\Continuity\Continuity_6.4\pcty\MglToolsLibWin\pyCompilePkgs
  1. Alternatively, but not recommended, you can install a another version of numpy on your machine, with the credit bit type as the version of Python you are using.
  2. Make sure that mpiexec is available to your path. On this system it was found here: C:\Program Files\MPICH2\bin\mpiexec.exe
  3. Note: Matlab provides its own mpiexec that is NOT compatible with Continuity. You may need to modify your PATH environment variable so that C:\Program Files\MPICH2\bin\ is listed before matlab.

Windows Issues

  1. Follow the above instructions
  2. If you get an error like this:
    • Error while connecting to host.  No connection could be made because
      the target machine actively refused it. (10061)
      

      You may need to run a Command Prompt as admin (type “cmd” in start, right click on Command Prompt and select Run As Admin). Then you may need to type something like smpd -install.

  3. If you get an error like this:
    • ImportError: DLL load failed: The specified module could not be found.
      

      You may have installed the 32bit version of mpich2 instead of the 64bit version, or vice versa. Uninstall the current bit type and install the other type.

Mac OS X Installation

  1. Download and install mpi from here: http://www.open-mpi.org/software/ompi/v1.2/downloads/openmpi-1.2.4.dmg.gz

  2. Make sure that mpirun is available to your path. The installer will probably take care of this for you.

Linux32 Mpi Installation

  1. Download and install mpi from here: http://www.open-mpi.org/software/ompi/v1.2/

  2. open-mpi’s build instructions can be found here: http://www.open-mpi.org/faq/?category=building

  3. Make sure that mpirun is available to your path.

Linux64 Mpi Installation

  1. Download and install mpich from here: http://www-unix.mcs.anl.gov/mpi/mpich1/download.html

  2. Make sure that mpirun is available to your PATH.
  3. Mpi’s libraries may need to be available to your LD_LIBRARY_PATH

OpenMPI 1.3 work around

  1. When you configure make, use make sure to use this flag:
    •         --enable-shared --enable-static 
      

       

      This issue is discussed here and here.

Custom Installation

  • If your system already has a different implementation of mpi installed and you don’t wish to (or can’t) install the version of MPI above, you will need to compile mpi yourself. This can be a little bit tricky and is not recommended for novice Linux users. You may need to contact your cluster’s administrator for help.
    1. Download and extract the mympi source code

    2. Modify the Options_*.mk file that is closest to your platform.
      1. CONTROOT should be the root of your Continuity installation
      2. LLIBS and LDFLAGS depend on your MPI implementation.
    3. Copy the completed Options_*.mk file to Options.mk
    4. Type “make”. If you modified the Options file correctly it will compile! If not, keep modifying the Options file until you get it right.
    5. Type “make test” to do a sanity check. If it prints “Hello” from 4 different processes then everything is probably working.
    6. Type “make install” to copy the mpi binary to Continuity.
    7. This ought to work with mpich1, mpich2, and open-mpi. We’ve never seen it work with lam, but it’s probably possible.

Distributed SuperLU Installation

  • Continuity can also make use of Distributed SuperLU for its algebraic solver if you build it into your mpimodule.so binary. The main benefit of using Distributed SuperLU is to decrease the amount of memory needed on each node when running a very large problem on a cluster. Whether or not there is any performance gain is still being investigated. This requires having the Intel Fortran compiler and its libraries. Here are the steps:
    1. Download and extract the mympi source code

    2. Modify the make.inc to match your platform.
    3. Also Options.mk must be modified according to your platform.
      1. CONTROOT should be the root of your Continuity installation
      2. LLIBS and LDFLAGS depend on your MPI implementation.
    4. Type “make”. If you modified the Options file correctly it will compile! If not, keep modifying the Options file until you get it right.
    5. Type “make test” to do a sanity check.
    6. Type “make install” to copy the mpi binary to Continuity.
    7. This ought to work with mpich1, mpich2, and open-mpi. We’ve never seen it work with lam, but it’s probably possible.

Execution

Windows

  • ContinuityMultiCore.bat examples\biomechanics01\example_nogui_parallel_full.py 4
    

     

    where examples/biomechanics01/example_nogui_parallel_full.py is the example you’d like to run and 4 is the number of processors or cores available for computation.

Mac OS X and Linux

  • mpirun -np 4 continuityparallel pcty/examples/biomechanics01/example_nogui_parallel_full.py
    

     

    where pcty/examples/biomechanics01/example_nogui_parallel_full.py is the example you’d like to run and 4 is the number of processors or cores available for computation.

Addition example scripts

  • examples/biomechanics01/example_nogui_parallel_full.py
    examples/electrophysiology01/example_nogui_parallel_full_ep.py
    examples/electrophysiology07/test_whole_heart_nogui_parallel.py
    examples/electrophysiology08/test_mfhn_cylinder_nogui_parallel.py
    

     

Cluster Batch Jobs

  • Generally clusters use some kind of queue submission system to allow many users to share the large resource. Different steps must be taken depending upon which submission system your cluster is using.

SGE (Sun Grid Engine)

  1. Modify our queue submission script to match your environment.

  2. Type “qsub cont_sge.qsub” to submit the job.
  3. We recommend that you read this sge tutorial. It is the best we have seen.

  4. If you see an error like this: connect() failed with errno=111 it may mean that two nodes appear to sge to have the same ip address. You can verify this with running /sbin/ifconfig on various nodes and checking to see if “inet” shows up as the same value. If two interfaces on different nodes appear to use the same ip address you can disable them for openmpi by including the following command as a parameter to mpirun in your queue submission script:
--mca btl_tcp_if_exclude lo,eth1 
  • In this example, lo and eth1 were trouble-making interfaces and were therefore disabled. See this post for additional details.

Rocce

  1. More info coming…

Nimrod

  1. More info coming…

Load Leveler

  1. More info coming…

Serial Batch Jobs

Linux / Mac

  1. If you want to run a job in serial do this:
    • ./continuity --full --no-threads --batch examples/biomechanics01/example_nogui.py
      

       

  2. If the job needs to be able to run even after you close your terminal window, do this:
    • nohup ./continuity --full --no-threads --batch examples/biomechanics01/example_nogui.py > contoutput.txt &
      

       

  3. To check the progress of your job, I recommend that you use “tail -f” like this:
    • tail -f contoutput.txt
      

      And Ctl-C when you are done reading the output.

Windows

  • cd <continuity>/pcty
    ContinuityClient.bat --full --no-threads --batch examples/biomechanics01/example_nogui.py
    

     

Troubleshooting

ImportError: numpy.core.multiarray failed to import

  • ImportError: No module named numpy.core.multiarray
    Traceback (most recent call last):
      File "C:/Documents and Settings/jvandorn/Desktop/continuity/pcty/ContinuityMPI
    .py", line 11, in <module>
      File "C:/Documents and Settings/jvandorn/Desktop/continuity/pcty\server/mympi_
    API.py", line 5, in <module>
    ImportError: numpy.core.multiarray failed to import
    

    Did you forget to install numpy?

Older version notes

Out of Date Notes