Page 1 of 2

Memory issue with TDDFT calculation

Posted: Fri May 02, 2025 10:20 am
by shweta_choudhary

Dear support team,

I am trying to do TDDFT calculations for a system of 264 atoms and I am facing following error:

Code: Select all

 min. memory requirement per mpi rank    579.8 MB, per node  27828.9 MB

 -----------------------------------------------------------------------------
|                                                                             |
|     EEEEEEE  RRRRRR   RRRRRR   OOOOOOO  RRRRRR      ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     EEEEE    RRRRRR   RRRRRR   O     O  RRRRRR       #       #       #      |
|     E        R   R    R   R    O     O  R   R                               |
|     E        R    R   R    R   O     O  R    R      ###     ###     ###     |
|     EEEEEEE  R     R  R     R  OOOOOOO  R     R     ###     ###     ###     |
|                                                                             |
|     Could not allocate body of response function on mpi rank 0 of size:     |
|     0 MB. Reducing NOMEGAPAR or using more computing nodes might solve      |
|     this problem.                                                           |
|                                                                             |
|       ---->  I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <----       |
|                                                                             |
 -----------------------------------------------------------------------------

However, I am giving enough memory in my jobscript:

Code: Select all

#!/bin/bash
#SBATCH -N 3
#SBATCH --ntasks-per-node=48
#SBATCH --job-name=k
#SBATCH --error=error.%J.err
#SBATCH --output=output.%J.out
#SBATCH --time=00-01:00:00
#SBATCH --partition=debug
#SBATCH --mem=672GB

source /home/VASP/vasp_var.sh
mpirun -np $SLURM_NTASKS /home/vasp.6.3.2/bin/vasp_std

Following is the INCAR:

Code: Select all

ISTART =  1; ICHARG =  0; LWAVE=T; LCHARG=F
LREAL=A; ENCUT  =  600; GGA= PS
ISMEAR =  0; SIGMA  =  0.01
EDIFF  = 1E-8; ISIF=2; NSW = 0; IBRION = -1
ALGO=TDHF; NBANDS = 1776; ANTIRES=0
IBSE=0; NBANDSO   = 20 ; NBANDSV   = 20; LORBITALREAL=T
LFXC  =T; LHARTREE  =T; LADDER=F; NOMEGAPAR=1

Could anyone please help with this?

Thanks,
Shweta


Re: Memory issue with TDDFT calculation

Posted: Fri May 02, 2025 12:27 pm
by fabien_tran1

Hi.

Which amount of memory is available on each node of the cluster? The option --mem specifies the memory requirement per node (https://slurm.schedmd.com/archive/slurm ... batch.html), and the maximum possible value depends of course on what is available on the node. According to the error message, each node should provide at least 28 GB. Help regarding memory requirement can be found at wiki/index.php/Category:Memory.


Re: Memory issue with TDDFT calculation

Posted: Fri May 02, 2025 8:12 pm
by shweta_choudhary

Hi,

Each node has 750GB available. I have already tried reducing NTAUPAR and NOMEGAPAR to 1 and NOMEGA to 30, but nothing seems to work. I do not understand why such memory issue is happening? Is vasp not able to read --mem tag?


Re: Memory issue with TDDFT calculation

Posted: Fri May 02, 2025 8:56 pm
by fabien_tran1

Can you please upload the files slurm_xxx.log and OUTCAR? Have you tried more nodes (it seems that you are using 3 nodes).


Re: Memory issue with TDDFT calculation

Posted: Sat May 03, 2025 8:00 am
by shweta_choudhary

Hi,

I tried more nodes as well but the issue remains the same. Somehow, this forum is not letting upload OUTCAR and .out files. I am wiring here the OUTCAR and log files initial lines.

Code: Select all

 vasp.6.3.2 27Jun22 (build Mar 17 2025 15:04:40) complex

 executed on             LinuxIFC date 2025.05.02  15:40:50
 running on  144 total cores
 distrk:  each k-point on  144 cores,    1 groups
 distr:  one band on NCORE=   1 cores,  144 groups


--------------------------------------------------------------------------------------------------------


 INCAR:
   ISTART = 1
   ICHARG = 0
   LWAVE = T
   LCHARG = F
   LREAL = A
   ENCUT = 600
   GGA = PS
   ISMEAR = 0
   SIGMA = 0.01
   EDIFF = 1E-8
   ISIF = 2
   NSW = 0
   IBRION = -1
   ALGO = TDHF
   NBANDS = 1776
   ANTIRES = 0
   IBSE = 0
   NBANDSO = 20
   NBANDSV = 20
   LORBITALREAL = T
   LFXC = T
   LHARTREE = T
   LADDER = F
   NOMEGAPAR = 1

 POTCAR:    PAW_PBE Bi_d_GW 14Apr2014
 POTCAR:    PAW_PBE Br_GW 20Mar2012
 POTCAR:    PAW_PBE O_GW 28Sep2005

Code: Select all

 values below the HOMO (VB) or above the LUMO (CB) will cause erroneous energies
 E-fermi :  -0.4650

 -----------------------------------------------------------------------------
 WAVEDER not read: bands not compatible    1780    1872
 -----------------------------------------------------------------------------
|                                                                             |
|           W    W    AA    RRRRR   N    N  II  N    N   GGGG   !!!           |
|           W    W   A  A   R    R  NN   N  II  NN   N  G    G  !!!           |
|           W    W  A    A  R    R  N N  N  II  N N  N  G       !!!           |
|           W WW W  AAAAAA  RRRRR   N  N N  II  N  N N  G  GGG   !            |
|           WW  WW  A    A  R   R   N   NN  II  N   NN  G    G                |
|           W    W  A    A  R    R  N    N  II  N    N   GGGG   !!!           |
|                                                                             |
|     The derivative of the wavefunctions with respect to k (WAVEDER) can     |
|     not be found. You should redo the groundstate calculation using         |
|     LOPTICS=.TRUE. in order to write the WAVEDER file. However for          |
|     metals, the present setting is ok.                                      |
|                                                                             |
 -----------------------------------------------------------------------------

 the WAVEDER file was not read
energies w=

 responsefunction array rank=  229968
 LDA part: xc-table for Pade appr. of Perdew

 min. memory requirement per mpi rank    579.8 MB, per node  27828.9 MB

 allocating   0 responsefunctions rank=229968
 -----------------------------------------------------------------------------
|                                                                             |
|     EEEEEEE  RRRRRR   RRRRRR   OOOOOOO  RRRRRR      ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     E        R     R  R     R  O     O  R     R     ###     ###     ###     |
|     EEEEE    RRRRRR   RRRRRR   O     O  RRRRRR       #       #       #      |
|     E        R   R    R   R    O     O  R   R                               |
|     E        R    R   R    R   O     O  R    R      ###     ###     ###     |
|     EEEEEEE  R     R  R     R  OOOOOOO  R     R     ###     ###     ###     |
|                                                                             |
|     Could not allocate body of response function on mpi rank 0 of size:     |
|     0 MB. Reducing NOMEGAPAR or using more computing nodes might solve      |
|     this problem.                                                           |
|                                                                             |
|       ---->  I REFUSE TO CONTINUE WITH THIS SICK JOB ... BYE!!! <----       |
|                                                                             |
 -----------------------------------------------------------------------------

Re: Memory issue with TDDFT calculation

Posted: Thu May 08, 2025 8:42 am
by fabien_tran1

Sorry for the late answer. If possible, could you please provide all input files (INCAR, KPOINTS and POSCAR), and give more info about the computer cluster that you are using?


Re: Memory issue with TDDFT calculation

Posted: Sun May 11, 2025 3:34 pm
by shweta_choudhary

Dear sir,

INCAR:

Code: Select all

ISTART =  1; ICHARG =  0; LWAVE=T; LCHARG=F
ENCUT  =  600; GGA= PS
ISMEAR =  0; SIGMA  =  0.01
EDIFF  = 1E-8; ISIF=2; NSW = 0; IBRION = -1
ALGO=TDHF; NBANDS = 1872; ANTIRES=0
IBSE=0; NBANDSO   = 20 ; NBANDSV   = 20; LORBITALREAL=T
LFXC  =T; LHARTREE  =T; LADDER=F

KPOINTS:

Code: Select all

kmesh
0
Gamma
   1   2   1
0.0  0.0  0.0
POSCAR.tar

Re: Memory issue with TDDFT calculation

Posted: Mon May 12, 2025 8:27 am
by fabien_tran1

Is an upper limit for the virtual memory set on your nodes? Can you show what produces "ulimit -a" on the command line?


Re: Memory issue with TDDFT calculation

Posted: Mon May 12, 2025 8:35 am
by shweta_choudhary

Hi,

Code: Select all

[shweta_cy.iitr@login02 ~]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 1541472
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 4096
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1541472
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Re: Memory issue with TDDFT calculation

Posted: Mon May 12, 2025 9:53 am
by fabien_tran1

The virtual memory was set to unlimited, which is fine. Now, I would like to repeat your calculation, but I need all details:
-The steps of the calculation (DFT followed by TDDFT, etc.)
-INCAR files for all steps.
-OUTCAR and .out files for all steps (if they are too large then compress them with zip).


Re: Memory issue with TDDFT calculation

Posted: Thu May 15, 2025 5:52 am
by shweta_choudhary

Dear Sir,

I have attached the ZIP file.

Many thanks,
Shweta


Re: Memory issue with TDDFT calculation

Posted: Fri May 16, 2025 3:08 pm
by fabien_tran1

Hi,

Your calculation requires much more memory than what is indicated in the output files. Your system is quite large and my colleague Alexey Tal (specialist of BSE/TDDFT) will give you recommendations for reducing the memory requirement to make the calculation hopefully feasible on your computer cluster.

However, before that, we would like you to provide us the correct input files, since you probably made some mistakes:
-The POTCAR file in the folder step3 is different from the other folders.
-The POSCAR files slightly differ.
-The value of NBANDS in step3 is different from step2, leading to the message "WAVEDER not read: bands not compatible 1780 1872".


Re: Memory issue with TDDFT calculation

Posted: Fri May 16, 2025 4:05 pm
by shweta_choudhary

Dear sir,

I used gw POTCAR in step 3 as recommended in vasp tutorial. Also, I have copied WAVECAR CHGCAR and CONTCAR from previous steps. Please confirm the correct procedure for this. Let me know to resolve this memory issue for large system. One node has around 700 GB memory in our cluster. I could use upto 5 6 nodes.

Many thanks,
Shweta


Re: Memory issue with TDDFT calculation

Posted: Fri May 16, 2025 4:51 pm
by alexey.tal

Dear Shweta,

I used gw POTCAR in step 3 as recommended in vasp tutorial. Also, I have copied WAVECAR CHGCAR and CONTCAR from previous steps. Please confirm the correct procedure for this.

It is necessary that all input files (WAVECAR, WAVEDER) in your TDDFT (ALGO=TDHF) calculation are produced with the same POTCAR file.

Let me know to resolve this memory issue for large system. One node has around 700 GB memory in our cluster. I could use upto 5 6 nodes.

Indeed, you are trying to perform a large calculation and the memory is likely to be an issue. However, there are a few things you could try to fit this calculation on your system.

The main bottleneck so far is the memory required to store the exchange-correlation kernel \(f_{xc}(G,G')\) as reported in your OUTCAR the basis set for the response function is:
maximum number of plane-waves: 229856. The kernel storage then amounts to 2298562*16E-9 = 845 Gb of memory. This array is not distributed and if you have 700 Gb per node increasing the number of nodes will not help as you need to provide more memory to the first MPI rank. A way to reduce the size of this array is to reduce the basis set size of the response function, i.e., reduce ENCUTGW. But keep in mind that the calculation needs to be thoroughly converged with ENCUTGW.

The number of bands included in the TDHF kernel calculation is very small in your OUTCAR, i.e., NBANDSO=NBANDSO=20. Since you have 2688 electrons or 1344 occupied bands, that means that you only account for around 1% of the occupied bands in your calculation, which is too little. For such a system you likely need hundreds/thousand of occupied and unoccupied bands. The rank of the Casida equation in the TDHF algorithm is NBANDSO x NBANDSV x NKPTS and you are going to need to solver a really large matrix to get a reasonably converged spectrum.

In VASP we have an alternative approach for the TDDFT calculation (ALGO=TIMEEV) which is discussed in detail on our wiki. This approach is much faster for calculations when the electron-hole interaction is not taken into account as in your case (LADDER=.FALSE.) and it requires much less memory.


Re: Memory issue with TDDFT calculation

Posted: Fri May 16, 2025 5:06 pm
by shweta_choudhary

Dear Alexey,

Thank you very much for detailed response.

1. I will use gw POTCAR during whole procedure.

2. Could you please comment if I want to perform BSE@GW or TDHF with parameters from GW to account for electron hole interaction at all for such large systems is it not feasible with our HPC configurations? Because similar memory issue i faced during GW band structure calculations.

So, optimizing ENCUTGW is the only way? How can I utilize full memory in nodes ? I could decrease ntasks per node upto 24.

3. Could you please comment on how to select NBANDO/V ?

I really appreciate your help.