Parallel Wannier Projections
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 8
- Joined: Thu Jun 03, 2021 1:26 pm
Re: Parallel Wannier Projections
Dear Henrique,
Thanks for the reply. One thing I just realized I forgot to mention is that my newest calculations are spin-polarized, so these would normally take half as much time to run if they were instead spin-unpolarized. Perhaps that is what you are thinking about, in terms of the run time. Indeed, previous calculations I did years ago for my PhD work on spin-unpolarized systems took only 2-4 days at very high quality settings (2*default ENCUT, dense k-points, etc).
Furthermore, for my current spin-polarized calculations, I require about 3x the number of occupied bands because I need to wannierize the bottom 492 conduction states during the final wannier localization step. I have 492 occupied bands, so I have set NBANDS = 1476 for my conversion process in order to reliably resolve the first 492 conduction bands. Perhaps I can reduce NBANDS to something like 1100 or 1200 instead, but I need to ensure that the 984th band is reasonably "converged" in order to trust the final wannier step. To summarize, I'm really pushing these calculations in ways that, perhaps, haven't been done before, so the run times have turned out to be pretty severe as a result.
For my current calculations that are causing me issues, I'll address each of your points below. See attached files if you want to see exactly what I'm doing for the vasp2wannier conversion step.
1) I am using a fairly small vacuum spacing, at least with respect to the calculations I mentioned above: only 13.3 Angstroms. An issue with my current calculations is that I have a molecule attached to one surface facet, and this molecule is quite long on its own. Therefore, even though I am only using 13.3 Angstroms of vacuum spacing, the total surface-normal lattice vector is about 50 Angstroms in length. I could reduce the vacuum to 10 Angstroms, potentially, but that will only help shave off 1 day of time at most. Still, perhaps this is worth doing, so I'll consider it.
2) I split my calculations up into (1) a wavefunction generation step, (1) a vasp2wannier conversion step where I use IALGO=2 to read in the previous WAVECAR file, and (3) a wannier localization step running wannier90.x as a separate calculation. Therefore, for step 2 which is important for generating the wannier90.* files, I do not use NCORE or KPAR since the former results in a crash and the latter does not help. For the wavefunction generation step 1, I use both NCORE and KPAR, but that is its own calculation and has no effect on the vasp2wannier conversion process. As for the core count --- for the vasp2wannier conversion step, I use anywhere from 4-12 cores on my University supercomputer's high-memory nodes (2 TB of memory across 48 cores) since this step is very memory intensive. I have found that it doesn't really matter how many cores I use, so I typically use the smallest number of cores that gives me adequate memory. The MMN file writing seems to parallelize over the number of cores somewhat, but these files only take a few hours to write from the onset (roughly 4 hours per file on 4 cores, or 8 hours in total for the up and down channels together). However, the AMN file writing is largely unaffected by the number of cores. This is related to your reply above from Apr 28, 2021, where you noticed that using VASP compiled with Intel and MKL did not result in AMN parallelization over cores. This is what I have noticed as well over the years, since I compile VASP with Intel compilers. Furthermore, this is at the heart of my problem --- the AMN file writing is the bottleneck of my vasp2wannier calculations, taking up almost the entirety of the total run time. To be precise, the AMN file writing takes about 116 hours to complete for both the up and down channels, and the total runtime of the entire calculation, start to finish, was about 131 hours. Perhaps if I reduce the vacuum to 10 Angstroms and reduce NBANDS to 1100 or 1200, this run time can fall under 100 hours. In which case, I now have some room to breathe. However, I would still be interested in a KPAR parallelization routine.
3) See the end of my reply for the above point about AMN files and Intel-compiled VASP.
4) I do not use LWANNIER90_RUN=.TRUE. for the reason you mentioned.
To address your last comment and to reiterate a point I made previously: I have 5 unique k-points (a 3x3x1 gamma-centered grid and ISYM=0), so if I could use KPAR=5 for these vasp2wannier calculations, I could potentially see a substantial decrease in total run time. I know KPAR doesn't result in a perfectly 1:1 improvement, but I believe at worst, I could see a 2x speedup in my calculations. And more realistically, I could more likely see a 3-4x speedup in these calculations, which would make this 7-day max wall-time a non-issue.
Many thanks,
Peyton Cline
Thanks for the reply. One thing I just realized I forgot to mention is that my newest calculations are spin-polarized, so these would normally take half as much time to run if they were instead spin-unpolarized. Perhaps that is what you are thinking about, in terms of the run time. Indeed, previous calculations I did years ago for my PhD work on spin-unpolarized systems took only 2-4 days at very high quality settings (2*default ENCUT, dense k-points, etc).
Furthermore, for my current spin-polarized calculations, I require about 3x the number of occupied bands because I need to wannierize the bottom 492 conduction states during the final wannier localization step. I have 492 occupied bands, so I have set NBANDS = 1476 for my conversion process in order to reliably resolve the first 492 conduction bands. Perhaps I can reduce NBANDS to something like 1100 or 1200 instead, but I need to ensure that the 984th band is reasonably "converged" in order to trust the final wannier step. To summarize, I'm really pushing these calculations in ways that, perhaps, haven't been done before, so the run times have turned out to be pretty severe as a result.
For my current calculations that are causing me issues, I'll address each of your points below. See attached files if you want to see exactly what I'm doing for the vasp2wannier conversion step.
1) I am using a fairly small vacuum spacing, at least with respect to the calculations I mentioned above: only 13.3 Angstroms. An issue with my current calculations is that I have a molecule attached to one surface facet, and this molecule is quite long on its own. Therefore, even though I am only using 13.3 Angstroms of vacuum spacing, the total surface-normal lattice vector is about 50 Angstroms in length. I could reduce the vacuum to 10 Angstroms, potentially, but that will only help shave off 1 day of time at most. Still, perhaps this is worth doing, so I'll consider it.
2) I split my calculations up into (1) a wavefunction generation step, (1) a vasp2wannier conversion step where I use IALGO=2 to read in the previous WAVECAR file, and (3) a wannier localization step running wannier90.x as a separate calculation. Therefore, for step 2 which is important for generating the wannier90.* files, I do not use NCORE or KPAR since the former results in a crash and the latter does not help. For the wavefunction generation step 1, I use both NCORE and KPAR, but that is its own calculation and has no effect on the vasp2wannier conversion process. As for the core count --- for the vasp2wannier conversion step, I use anywhere from 4-12 cores on my University supercomputer's high-memory nodes (2 TB of memory across 48 cores) since this step is very memory intensive. I have found that it doesn't really matter how many cores I use, so I typically use the smallest number of cores that gives me adequate memory. The MMN file writing seems to parallelize over the number of cores somewhat, but these files only take a few hours to write from the onset (roughly 4 hours per file on 4 cores, or 8 hours in total for the up and down channels together). However, the AMN file writing is largely unaffected by the number of cores. This is related to your reply above from Apr 28, 2021, where you noticed that using VASP compiled with Intel and MKL did not result in AMN parallelization over cores. This is what I have noticed as well over the years, since I compile VASP with Intel compilers. Furthermore, this is at the heart of my problem --- the AMN file writing is the bottleneck of my vasp2wannier calculations, taking up almost the entirety of the total run time. To be precise, the AMN file writing takes about 116 hours to complete for both the up and down channels, and the total runtime of the entire calculation, start to finish, was about 131 hours. Perhaps if I reduce the vacuum to 10 Angstroms and reduce NBANDS to 1100 or 1200, this run time can fall under 100 hours. In which case, I now have some room to breathe. However, I would still be interested in a KPAR parallelization routine.
3) See the end of my reply for the above point about AMN files and Intel-compiled VASP.
4) I do not use LWANNIER90_RUN=.TRUE. for the reason you mentioned.
To address your last comment and to reiterate a point I made previously: I have 5 unique k-points (a 3x3x1 gamma-centered grid and ISYM=0), so if I could use KPAR=5 for these vasp2wannier calculations, I could potentially see a substantial decrease in total run time. I know KPAR doesn't result in a perfectly 1:1 improvement, but I believe at worst, I could see a 2x speedup in my calculations. And more realistically, I could more likely see a 3-4x speedup in these calculations, which would make this 7-day max wall-time a non-issue.
Many thanks,
Peyton Cline
You do not have the required permissions to view the files attached to this post.
-
- Newbie
- Posts: 24
- Joined: Thu Nov 26, 2020 10:27 am
Re: Parallel Wannier Projections
Dear Henrique,
Is there any updates on the improvement of the parallelization of the AMN calculation? Specifically for large systems with few k-points? You mentioned that k-point parallelization works well for small system, but for me the main problem is for large systems.
Any idea of when the latest changes to the AMN calculation will be released?
Best,
Jonathan
Is there any updates on the improvement of the parallelization of the AMN calculation? Specifically for large systems with few k-points? You mentioned that k-point parallelization works well for small system, but for me the main problem is for large systems.
Any idea of when the latest changes to the AMN calculation will be released?
Best,
Jonathan
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Parallel Wannier Projections
@joel_eaves
From the description of your workflow, I don't have an easy and obvious recommendation to speed up the calculation.
The vacuum you reported is between the tip of the molecule and the other side of the slab but I did not imagine the molecule is so elongated. Effectively you are using a large box (which means a lot of plane waves).
A spin-polarized calculation and k-points make everything more complicated.
I see two possibilities to reduce the computational cost:
1. running a gamma-only calculation?
2. spin unpolarized (ISPIN=1)?
These suggestions might or not affect the quality of the results (depending on what you need to do afterwards).
Other than that and also answering to @jbackman:
In the next release (December-January), we will add support parallelization with KPAR/=1 in the generation of the MMN and AMN files. It's actually a small change in the code that should speed up generating AMN and MMN for systems with k-points.
From the description of your workflow, I don't have an easy and obvious recommendation to speed up the calculation.
After looking at your input files I do agree.To summarize, I'm really pushing these calculations in ways that, perhaps, haven't been done before, so the run times have turned out to be pretty severe as a result.
The vacuum you reported is between the tip of the molecule and the other side of the slab but I did not imagine the molecule is so elongated. Effectively you are using a large box (which means a lot of plane waves).
A spin-polarized calculation and k-points make everything more complicated.
I see two possibilities to reduce the computational cost:
1. running a gamma-only calculation?
2. spin unpolarized (ISPIN=1)?
These suggestions might or not affect the quality of the results (depending on what you need to do afterwards).
Other than that and also answering to @jbackman:
In the next release (December-January), we will add support parallelization with KPAR/=1 in the generation of the MMN and AMN files. It's actually a small change in the code that should speed up generating AMN and MMN for systems with k-points.
-
- Newbie
- Posts: 24
- Joined: Thu Nov 26, 2020 10:27 am
Re: Parallel Wannier Projections
Dear Henrique,
You mentioned before that KPAR parallelization is adequate for small system with many k-points and that you "are testing other strategies for systems with many atoms and a few k-points". Did you implement anything new apart from KPAR parallelization? This will most likely not make a big difference for my large systems with few k-points.
Best,
Jonathan
You mentioned before that KPAR parallelization is adequate for small system with many k-points and that you "are testing other strategies for systems with many atoms and a few k-points". Did you implement anything new apart from KPAR parallelization? This will most likely not make a big difference for my large systems with few k-points.
Best,
Jonathan
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Parallel Wannier Projections
Hi Jonathan,
Yes, I am working on an improved scheme for the computation of projections for large systems (besides adding KPAR support which will be included in the next release).
The main difference is that the orbitals onto which the Bloch states are projected are generated in a distributed fashion and communicated instead of generated on all CPUs.
The results so far are promising (faster computation and better scaling).
This will be included in a future release, but I cannot yet guarantee it will be in the next one.
Yes, I am working on an improved scheme for the computation of projections for large systems (besides adding KPAR support which will be included in the next release).
The main difference is that the orbitals onto which the Bloch states are projected are generated in a distributed fashion and communicated instead of generated on all CPUs.
The results so far are promising (faster computation and better scaling).
This will be included in a future release, but I cannot yet guarantee it will be in the next one.
-
- Newbie
- Posts: 24
- Joined: Thu Nov 26, 2020 10:27 am
Re: Parallel Wannier Projections
What scheme for the computation of projections has been included in the latest release (6.3.0)?
I KPAR support now added for the projections? Was anything done to address large systems with few k-points in this release.
I KPAR support now added for the projections? Was anything done to address large systems with few k-points in this release.
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Parallel Wannier Projections
In the last release vasp 6.3.0, we re-organized the algorithm to compute the projections.
Now it scales with KPAR which is useful for small systems with a lot of k-points.
The generation of the orbitals onto which the Bloch states are projected is also distributed over CPUs which leads to a much better scaling with the number of bands (system size) independently of the number of k-points.
Looking forward to hearing about your experience with the new version
Now it scales with KPAR which is useful for small systems with a lot of k-points.
The generation of the orbitals onto which the Bloch states are projected is also distributed over CPUs which leads to a much better scaling with the number of bands (system size) independently of the number of k-points.
Looking forward to hearing about your experience with the new version
-
- Newbie
- Posts: 24
- Joined: Thu Nov 26, 2020 10:27 am
Re: Parallel Wannier Projections
Dear Henrique,
I have seen great improvement concerning the speed of the new projection version. It has been very helpful.
However, I have noticed a new problem. Now when I do the projections (in parallel) for larger systems VASP crashes when writing the amn file. When testing it before it only happened once done with the amn file, but when testing more materials it keeps happening when writing the amn file. This happens for several materials where I have larger supercells.
After adding some print statements to the code it looks like the problem occurs when running SUBROUTINE WRITE_AMN_FILE. As you can see in the attached files the wannier90.amn file was only partly written before the seg fault. mmn file is written without any problem. (I have cut down the amn file to save space) I have tried with and without the use of KPAR. When retrying the calculation it stops at different points in the amn file. You can see the error messages in VASP.err, there does not seem to be a memory issue.
I first run the GPU version of VASP and then restart with the CPU version to use the wannier90 interface. The CPU version is compiled without OpenMP since I had some problems with it.
The original SCF run can be found in data and the amn calculation in data/wannier. Best,
Jonathan
I have seen great improvement concerning the speed of the new projection version. It has been very helpful.
However, I have noticed a new problem. Now when I do the projections (in parallel) for larger systems VASP crashes when writing the amn file. When testing it before it only happened once done with the amn file, but when testing more materials it keeps happening when writing the amn file. This happens for several materials where I have larger supercells.
After adding some print statements to the code it looks like the problem occurs when running SUBROUTINE WRITE_AMN_FILE. As you can see in the attached files the wannier90.amn file was only partly written before the seg fault. mmn file is written without any problem. (I have cut down the amn file to save space) I have tried with and without the use of KPAR. When retrying the calculation it stops at different points in the amn file. You can see the error messages in VASP.err, there does not seem to be a memory issue.
I first run the GPU version of VASP and then restart with the CPU version to use the wannier90 interface. The CPU version is compiled without OpenMP since I had some problems with it.
The original SCF run can be found in data and the amn calculation in data/wannier. Best,
Jonathan
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Parallel Wannier Projections
Hi Jonathan,
Thank you for the feedback
I am glad you see an improvement in the new version.
As for the issue you are reporting it is really hard for me to reproduce it and track it down.
The calculation is rather expensive with the current settings: takes a lot of time and uses a lot of memory.
We don't have the resources to run it as is.
I tried running with lower settings:
and I was able to produce the AMN file without problem.
Do you have access to some memory monitoring tool in your cluster that you could use to check that you are not running out of memory when computing the AMN?
Can you try running the large calculations with these settings (specially the lower cutoff) and see if the problem still occurs?
Thank you for the feedback
I am glad you see an improvement in the new version.
As for the issue you are reporting it is really hard for me to reproduce it and track it down.
The calculation is rather expensive with the current settings: takes a lot of time and uses a lot of memory.
We don't have the resources to run it as is.
I tried running with lower settings:
Code: Select all
ENCUT=208.7 #(default from POTCAR)
ADDGRID=.False.
EDIFF = 1E-6
Do you have access to some memory monitoring tool in your cluster that you could use to check that you are not running out of memory when computing the AMN?
Can you try running the large calculations with these settings (specially the lower cutoff) and see if the problem still occurs?
-
- Newbie
- Posts: 24
- Joined: Thu Nov 26, 2020 10:27 am
Re: Parallel Wannier Projections
Hi Henrique,
I tried with the settings you recommended and the problem is the same. So it is not related to ENCUT or ADDGRID. Also, the larger ENCUT is needed for this calculation.
As I said, there does not seem to be any memory issue. If would be the case, it is usually reported in the VASP.err file. To be sure I tried the same calculations with up to 4 times the number of nodes with the same number of MPI processes, i.e. 4 times the memory available and the problem is still the same.
I tried with the settings you recommended and the problem is the same. So it is not related to ENCUT or ADDGRID. Also, the larger ENCUT is needed for this calculation.
As I said, there does not seem to be any memory issue. If would be the case, it is usually reported in the VASP.err file. To be sure I tried the same calculations with up to 4 times the number of nodes with the same number of MPI processes, i.e. 4 times the memory available and the problem is still the same.
-
- Newbie
- Posts: 8
- Joined: Wed Nov 02, 2022 10:30 am
Re: Parallel Wannier Projections
Dear Henrique,
I'm new to using vasp+wannier interface and I found this discussion very useful. I wonder if you could help me optimize my wannier calculations.
I was using Vasp6.2.1+Wannier90 v3 to study the anomalous hall effect by spin-orbit coupling. The system I have contains 32 atoms with 6x6x6 mesh and SOC will generate more than 600 bands.
In the NSCF calculation to generate .amn and .mmn files, the calculation could not finish writing the .mmn file within the 24h walltime limit.
The total number of lines in the .mmn should be approximately NBANDS^2 * NKPOINTS * NNtot, which in my case is more than 933,120,000 lines. By the time it reached 24h, only about 10% was written.
I've tried different KPAR: 8 and 16 but that didn't really help.
For the machines I used: I've tried 96 nodes with 128 cores/node (--ncore-per-node=16, --ntask-per-core=8). This is pretty sufficient for my SCF calculations but it does not make sense to write the .mmn so slowly.
Thank you,
Dongsheng
I'm new to using vasp+wannier interface and I found this discussion very useful. I wonder if you could help me optimize my wannier calculations.
I was using Vasp6.2.1+Wannier90 v3 to study the anomalous hall effect by spin-orbit coupling. The system I have contains 32 atoms with 6x6x6 mesh and SOC will generate more than 600 bands.
In the NSCF calculation to generate .amn and .mmn files, the calculation could not finish writing the .mmn file within the 24h walltime limit.
The total number of lines in the .mmn should be approximately NBANDS^2 * NKPOINTS * NNtot, which in my case is more than 933,120,000 lines. By the time it reached 24h, only about 10% was written.
I've tried different KPAR: 8 and 16 but that didn't really help.
For the machines I used: I've tried 96 nodes with 128 cores/node (--ncore-per-node=16, --ntask-per-core=8). This is pretty sufficient for my SCF calculations but it does not make sense to write the .mmn so slowly.
Thank you,
Dongsheng
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Parallel Wannier Projections
@jbackman Sorry for my late reply. I've been trying to reproduce the issue you reported but I was not able to do so.
Since you still have the issue with a lower cutoff then that is not the issue of course.
The point is that with a lower cutoff I was also able to run the calculation on our machines and I did not have any problem.
I tested with three of our toolchains:
https://www.vasp.at/wiki/index.php/Tool ... VASP.6.3.0
lines starting with:
intel-oneapi-compilers-2022.0.1
gcc-11.2.0
nvhpc-22.2 (OpenACC)
Not being able to reproduce the issue on our side makes it really difficult to track down this issue.
Could you maybe try re-running the calculation with LWRITE_MMN_AMN = .FALSE.?
Then we will know if it's really the writing that is causing trouble.
Since you still have the issue with a lower cutoff then that is not the issue of course.
The point is that with a lower cutoff I was also able to run the calculation on our machines and I did not have any problem.
I tested with three of our toolchains:
https://www.vasp.at/wiki/index.php/Tool ... VASP.6.3.0
lines starting with:
intel-oneapi-compilers-2022.0.1
gcc-11.2.0
nvhpc-22.2 (OpenACC)
Not being able to reproduce the issue on our side makes it really difficult to track down this issue.
Could you maybe try re-running the calculation with LWRITE_MMN_AMN = .FALSE.?
Then we will know if it's really the writing that is causing trouble.
-
- Global Moderator
- Posts: 506
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Parallel Wannier Projections
@dongshen_wen Could you try running the calculation using VASP 6.3.0?
There a new parallelization scheme is used for the computation of the AMN and MMN files which should speed up things quite a bit.
There a new parallelization scheme is used for the computation of the AMN and MMN files which should speed up things quite a bit.
-
- Newbie
- Posts: 8
- Joined: Wed Nov 02, 2022 10:30 am
Re: Parallel Wannier Projections
Dear Henrique,
Thank you. According to your response, I then switched to 6.3.2 but I found that the vasp_ncl reported errors like this after the NSCF:
Computing MMN (overlap matrix elements)
=======================================
internal error in: mpi.F at line: 1359
M_sumb_d: invalid vector size n -1953955840
=======================================
I found a similar ongoing discussion for 6.3.0 in the forum.
forum/viewtopic.php?p=23148#p23148
Do you have any idea?
Best,
Dongsheng
Thank you. According to your response, I then switched to 6.3.2 but I found that the vasp_ncl reported errors like this after the NSCF:
Computing MMN (overlap matrix elements)
=======================================
internal error in: mpi.F at line: 1359
M_sumb_d: invalid vector size n -1953955840
=======================================
I found a similar ongoing discussion for 6.3.0 in the forum.
forum/viewtopic.php?p=23148#p23148
Do you have any idea?
Best,
Dongsheng
-
- Newbie
- Posts: 24
- Joined: Thu Nov 26, 2020 10:27 am
Re: Parallel Wannier Projections
@henrique_miranda Sorry for my late reply. I have also tested with the flag LWRITE_MMN_AMN = .FALSE.
When doing this the MMN calculation is not performed but the AMN is. However, no writing of the .amn file should happen. When using this flag the same error occurs. This happens after all projections have been calculated. The error messages are the same as before.
This is in line with other tests I have made. For most materials I have tried (with a range of different cutoff and other settings) the crash happens while writing the .amn file. But in one case I have also had the crash happen just as the file writing is done.
When doing this the MMN calculation is not performed but the AMN is. However, no writing of the .amn file should happen. When using this flag the same error occurs. This happens after all projections have been calculated. The error messages are the same as before.
This is in line with other tests I have made. For most materials I have tried (with a range of different cutoff and other settings) the crash happens while writing the .amn file. But in one case I have also had the crash happen just as the file writing is done.