Hi
i am doing a GW Band run for a semiconductor material.
it taking long time and i think the wall time given would not be sufficient, in such case can restart the run from where it stops after the wall time elapsed?
Also, its not allowing to use NPAR tag. [as i thought i can speed-up the run if i add this tag]
Regards
Can the GW run be restart from where it stops?
Moderators: Global Moderator, Moderator
- SKM
- Full Member
- Posts: 125
- Joined: Wed Oct 30, 2019 5:39 am
- License Nr.: 5-516
Can the GW run be restart from where it stops?
Regards
SKM
SKM
-
- Full Member
- Posts: 212
- Joined: Tue Jan 19, 2021 12:01 am
Re: Can the GW run be restart from where it stops?
Dear SKM,
Let me investigate which tags you can use to speed up your calculation.
For now, you might try checkpointing tools. Perhaps MPI-agnostic network-agnostic (MANA) transparent checkpointing can help you to safely stop and restart your GW calculation. It is implemented as a plugin in DMTCP: Distributed MultiThreaded CheckPointing.
Please consider sharing your experience here afterward!
Best regards,
Marie-Therese
Let me investigate which tags you can use to speed up your calculation.
For now, you might try checkpointing tools. Perhaps MPI-agnostic network-agnostic (MANA) transparent checkpointing can help you to safely stop and restart your GW calculation. It is implemented as a plugin in DMTCP: Distributed MultiThreaded CheckPointing.
Please consider sharing your experience here afterward!
Best regards,
Marie-Therese
- SKM
- Full Member
- Posts: 125
- Joined: Wed Oct 30, 2019 5:39 am
- License Nr.: 5-516
Re: Can the GW run be restart from where it stops?
Thanks Marie, for quick reply. Will check the DMTCP link and ask our Tech Admin, if its implemented in our supercomputer systems.
1. I just finished the GW0 run. After i sent the query in the forum, i have tested with KPAR= Number of nodes and KPAR=Number of cores in each node. These are two runs i tested and checked the duration of time taken for one NQ step. the observations are as below for my system with 3 types of elements and total 12 atoms. (3 elements each). Semiconductor. The resources: CPUs=288, each node = 48 cores, so 6 Nodes used.
a) with KPAR is used, each NQ step took around 1 hr. allowed to run for 7 NQ steps. thought it would be waste if its not complete. Then first tested the point (b), below
b) with KPAR=6 (number of nodes), each NQ step took 26 mins approx. so, 2 steps per 1 hr. while allowing it to run for some time, tested (c) below
c) with KPAR==48 (number of cores in each node), each NQ step took almost same time of 26 mins. So no improvement over point (b) above. So, i stopped both (a) and (b) runns and allowed the (c) to complete.
The Run-(c) took 8hr 12 mins, to finish total of 19 NQ steps.
N.B: But now i understand that to get the GW Band structure, i should use Wannier90 tag in INCAR. is that my run gone waste?
i asked another query similar to this using Si_GW example from VASP tutorial. Now my target is to get GW gap and Band structure, and then BSE optical absorption spectrum, Can i still use the above run? if needed, i will ask this question separately.
Regards
1. I just finished the GW0 run. After i sent the query in the forum, i have tested with KPAR= Number of nodes and KPAR=Number of cores in each node. These are two runs i tested and checked the duration of time taken for one NQ step. the observations are as below for my system with 3 types of elements and total 12 atoms. (3 elements each). Semiconductor. The resources: CPUs=288, each node = 48 cores, so 6 Nodes used.
a) with KPAR is used, each NQ step took around 1 hr. allowed to run for 7 NQ steps. thought it would be waste if its not complete. Then first tested the point (b), below
b) with KPAR=6 (number of nodes), each NQ step took 26 mins approx. so, 2 steps per 1 hr. while allowing it to run for some time, tested (c) below
c) with KPAR==48 (number of cores in each node), each NQ step took almost same time of 26 mins. So no improvement over point (b) above. So, i stopped both (a) and (b) runns and allowed the (c) to complete.
The Run-(c) took 8hr 12 mins, to finish total of 19 NQ steps.
N.B: But now i understand that to get the GW Band structure, i should use Wannier90 tag in INCAR. is that my run gone waste?
i asked another query similar to this using Si_GW example from VASP tutorial. Now my target is to get GW gap and Band structure, and then BSE optical absorption spectrum, Can i still use the above run? if needed, i will ask this question separately.
Regards
Regards
SKM
SKM
-
- Full Member
- Posts: 212
- Joined: Tue Jan 19, 2021 12:01 am
Re: Can the GW run be restart from where it stops?
Dear SKM,
Good news, your run did not go to waste!
You can do selective postprocessing using ALGO=None. For your case, set the following in the INCAR file:
And do not forget to also provide basic information in the wannier90.win file. Particularly, the projections block must be supplied. The k points can be added by VASP automatically.
Regarding restarting and speeding up GW calculations. Testing it on your infrastructure is really the best you can do Did MANA work?
Best regards,
Marie-Therese
Good news, your run did not go to waste!
You can do selective postprocessing using ALGO=None. For your case, set the following in the INCAR file:
Code: Select all
ALGO = NONE ! no electronic changes
NELM = 1
! set this as you want
ISMEAR =
SIGMA =
! set this as in your previous run
NBANDS =
! Wannier90
LWANNIER90 = T
NUM_WANN = ! number of Wannier orbitals
! do not overwrite
LWAVE = .FALSE. ! WAVECAR
LCHARG = .FALSE. ! CHGCAR
Regarding restarting and speeding up GW calculations. Testing it on your infrastructure is really the best you can do Did MANA work?
Best regards,
Marie-Therese
- SKM
- Full Member
- Posts: 125
- Joined: Wed Oct 30, 2019 5:39 am
- License Nr.: 5-516
Re: Can the GW run be restart from where it stops?
Thank you Marie, for the reply.
i could not try the MANA. Requested out system team to test. They informed it will take time.
May be once a i familiarized with the computations first, then will try on my own.
For this thread, its fine. but i faced some issue with this same run, for the GW/BSE combination.
Will start another thread, as not to mix-up topics
Regards
i could not try the MANA. Requested out system team to test. They informed it will take time.
May be once a i familiarized with the computations first, then will try on my own.
For this thread, its fine. but i faced some issue with this same run, for the GW/BSE combination.
Will start another thread, as not to mix-up topics
Regards
Regards
SKM
SKM