To recap, on my last post we went through some of the steps that need to be taken when debugging an MPI application, namely:
-Install the x64 remote debugger
-Copy mpishim to an accessible loction
-Modify the registry to avoid UNC path problems in the future
Let’s go ahead and finish the rest of the steps in order to debug an MPI application.
Step 4: Configure an Empty Job with the Job Scheduler
The job scheduler is a utility by which all jobs that are submitted to the cluster are managed. If you want to have something done at the cluster for you, then you need to use the job scheduler. Debugging is no exception, as you need to create an empty job that will host your debugging application.
To get started, open the job scheduler and from the File menu, select Submit Job:
Name your job “Debugging Job” and move over to the Processors tab. Select the number of processors you would like to use for this job and then (this is actually quite important), check the box that says “Run Job until end of run time or until cancelled”. Failure to check this box will cause the empty job to run and finish – which is not what we want. We want the job to continually run, so that Visual Studio will then attach the running processes to this specific job. Don’t forget to mark this!:
Next, you need to move to the Advanced tab and select which nodes will be part of your debugging scheme. In this case, I will only use 2 nodes, namely Kim03a (the head node) and Kim02a:
Click on submit job, you should see your job running. Make sure you write down the ID of the job (in this case, it is 3) as you will need this info later on!!
Step 5: Configure Visual Studio
Open Visual studio and the project you are working on. Go to project properties and access the Debugging section. From there, instead of the Local Debugger, select MPI Cluster Debugger:
The following screenshot shows my debugger properties window with all necessary values filled in:
Let’s go ahead and talk about each of these values:
MPI Run Command: This needs to be mpiexec for MPI applications
MPIRun Arguments: The first argument “-job 3.0″ is to specify which is the job in the scheduler to use. In my case, it was 3 when I created the job, and the 0 is to specify the task, which every job has by default. We then have “-np 2″ which is used to specify that we will be using 2 nodes for this job. Finally you see I have “-machinefile \\kim03a\bin\machines.txt”. The “-machinefile” is used to specfify the UNV location of a text file that contains the names of the machines that will be part of this job. The text file should have the names of the machines on each line.
MPIRun Working Directory: Use this location to specify the path where any output will be written to. Remember NOT to use absolute paths but rather UNC paths to make sure that this location is available to every node.
Application Command: This is the UNV path to the MPI application that you would like to debug. This application HAS to be compiled to 64-bit and debugging symbols should be in that same directory as well.
MPIShim Location: In this location, specify the path to the mpishim.exe binary that you copied in step 2 of this tutorial. Remember, mpishim should exist on each and every one of the machines at the specified local path.
MPI network security mode: I usually change it to Accept connections from any address to avoid problems
You probably also noted that there is an Application Arguments window. In this row you would specify any additional commands you would like to send to the application.
Apply the settings, hit F5 and you should be ready to go and debug your processes. While trying to get this to work, I experienced pretty much every error out there, so post in the comments if you any issues and I will help you resolve them. Happy debugging!