Cray compiler generates one of the fastest code. On Cray XE6 lammps compiled by Cray Compiler(8.1.2, -O2) outperform gcc code (4.7, -Ofast) in 1.6 times. If you use Cray compiler, it has sense to use Cray’s perftoolkit for finding bottlenecks in your MPI/OpenMP application. This post is about tips about using these tools because I always forget details. I will build and analyze LAMMPS.
Before doing something, check tools available on your system and pick up the newest one. To see available versions of Cray compiler:
1
|
|
If the most recent version is, for example, 8.1.2, load compiler and switch to the last version:
1 2 |
|
Then check and load perftools:
1 2 |
|
Now, build an application and instrument your executable with profilers stuff:
1 2 |
|
It will create a new executable lmp< machine name>+pat. Further, you need to use this executable for your job. If you use SLURM, write in your sbatch script something like that:
1
|
|
where -n 32 means that you use 32 nodes, the hardware performance counter experiment is defined by setting the environment variable PAT_RT_HWPC. More info about this option can be found a the end of the following. page. When you job is done, call pat_report:
1
|
|
This application will create a file with extension ap2. You can explore performance of your application in text editor
(vi
1
|
|
Note, that in order to use it you need to have xwindow installed on your local machine and when you connect to your cluster you need to specify that it can use your monitor (-Y option):
1
|
|