Parallelization Recommendation

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
Abdulrahman_Allangawi
Newbie
Newbie
Posts: 3
Joined: Wed May 07, 2025 10:19 am

Parallelization Recommendation

#1 Post by Abdulrahman_Allangawi » Tue May 20, 2025 7:39 am

Hello all,

I am working on a HPC with large CPU nodes, on each node I have two sockets, each with 96 physical processors for a total of 192 physical processors on each node.

I am struggling with finding the best parallelization setting for my projects, especially for the gamma k-point calculations with 100+ atoms. As there are many variables to change, I am unsure of what is the usual recommended recipes. Of course, I know that it depends on the system, but I want to ask, what are some overall recommended tips to follow when trying to find the optimum settings?

Below I have added my current slurm script, from my initial tastings I found that these settings run well with "NCORE = 24; KPAR = 1". However, I am unsure if I am utilizing all the processors in my nodes efficiently.

Also, whichever setting I find best for VASP should be fine for VASPsol right?

You do not have the required permissions to view the files attached to this post.

henrique_miranda
Global Moderator
Global Moderator
Posts: 514
Joined: Mon Nov 04, 2019 12:41 pm
Contact:

Re: Parallelization Recommendation

#2 Post by henrique_miranda » Tue May 20, 2025 6:51 pm

Finding the best paralelizaiton is not always an easy task.

The performance does not depend always on the hardware but also on the type of calculation you are running.
Assuming that you are running a ground-state calculation, gamma only, and you stick to MPI paralelism then there is really only one INCAR tag that is relevant and is NCORE.
How to set it? If you are running the same ground-state calculation multiple times (for example a molecular dynamics run) then its maybe worth trying a few different values and timing the code. For that you can look for "LOOP: cpu time" in the OUTCAR.
If you don't want to test it, there are a few heuristics that tend to be reasonable:

  1. choose NCORE approx. number of atoms in the system
  2. don't choose NCORE larger than CPUs per socket

In your case you have 96 physical processors, maybe two sockets (?), that would mean NCORE=48 for example. But I think NCORE=24 is pretty reasonable too.
Only by looking at the timing you can know for sure.

About the slurm configuration, I would actually try:

Code: Select all

#SBATCH --ntasks-per-node=96
#SBATCH --cpus-per-task=2
export OMP_NUM_THREADS=1

in a sense this is just to avoid using multithreading. If you want to try using mutithreading then you should probably use

Code: Select all

#SBATCH --ntasks-per-node=96
#SBATCH --cpus-per-task=2
export OMP_NUM_THREADS=2

You can find more information on this page on our wiki:
https://www.vasp.at/wiki/index.php/Cate ... lelization

in particular:
https://www.vasp.at/wiki/index.php/Opti ... lelization


Post Reply