Page 1 of 1
Trying to write ML_FF
Posted: Tue Jul 08, 2025 12:13 pm
by mo_salha
Hi all
I've been trying to generate an ML_FFN file from my current ML_FF binary.
Require it with the header so that lammps will accept the format for its vaspml lib.
I've attached the INCAR, OUTCAR, ML_LOGFILE as well as the POTCAR and POSCAR in case anyone wants to try themselves
Was hoping someone could help explain why I'm getting the following error
Machine learning selected
Setting communicators for machine learning
Initializing machine learning
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 0 PID 1549708 RUNNING AT ddy2.apocrita
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 1 PID 1549709 RUNNING AT ddy2.apocrita
= KILLED BY SIGNAL: 9 (Killed)
===================================================================================
Seems like the hpc cluster (apocrita) is killing the job but I cant tell why
Any and all help is greatly appreciated
Best,
Mo
Re: Trying to write ML_FF
Posted: Tue Jul 08, 2025 12:28 pm
by mo_salha
Apologies, please refer to the POSCAR-LARGE attached here
The one from the earlier post is distorted yet I still get the a similar error/termination when using this one
Mo
Re: Trying to write ML_FF
Posted: Tue Jul 08, 2025 2:29 pm
by alexey.tal
Dear Mo,
Thank you for your question.
I don't see INCAR and KPOINTS files in the provided archives. In order to be able to reproduce this issue I need all the files necessary to run this calculation. Could you please make sure that you include all such files.
Often the processes are being killed by the operating system when the job runs out of memory. So one should make sure that sufficient amount of memory is available on the node where the calculation is performed.
Also, I see the following warning in the provided OUTCAR:
Code: Select all
-----------------------------------------------------------------------------
| |
| W W AA RRRRR N N II N N GGGG !!! |
| W W A A R R NN N II NN N G G !!! |
| W W A A R R N N N II N N N G !!! |
| W WW W AAAAAA RRRRR N N N II N N N G GGG ! |
| WW WW A A R R N NN II N NN G G |
| W W A A R R N N II N N GGGG !!! |
| |
| The distance between some ions is very small. Please check the |
| nearest-neighbor list in the OUTCAR file. |
| I HOPE YOU KNOW WHAT YOU ARE DOING! |
| |
-----------------------------------------------------------------------------
Which indicate that some of the atoms in the POSCAR file are too close to each other.
Best wishes,
Alexey
Re: Trying to write ML_FF
Posted: Wed Jul 09, 2025 10:44 am
by mo_salha
Hi Alexey thanks for this
The run is performed on 96 cores across 2 ddy nodes with MPI using this script
#!/bin/bash
#$ -cwd
#$ -j y
#$ -pe parallel 96
#$ -l infiniband=ddy-i
#$ -l h_rt=240:0:0
module load vasp
mpirun -np ${NSLOTS} vasp_std
I've corrected the error from the OUTCAR that you reposted by using the POSCAR I attached in my previous post.
The INCAR was in the .zip I originally attached but I've linked it and the KPOINTS here for good measure
Still getting the job killed error
Please let me know what you think
Mo
Re: Trying to write ML_FF
Posted: Thu Jul 10, 2025 7:36 am
by alexey.tal
Thank you. Unfortunately, I cannot run this calculation as you are using ML_MODE=refit which requires that an ML_AB file is provided. Could you please attach the ML_AB file?
Re: Trying to write ML_FF
Posted: Fri Jul 11, 2025 10:47 am
by mo_salha
Hi alexey
I can't attach the ML_AB here as it's 12MB
Any chance you can download it from here: https://github.com/Salha777/ML_AB
Mo
Re: Trying to write ML_FF
Posted: Fri Jul 11, 2025 2:50 pm
by alexey.tal
Hi Mo,
After running your calculation I found that it requires around 1.2Tb of memory. So indeed your job is killed because it runs out of memory.
The simplest way to avoid it is to use a "smaller" POSCAR, i.e., you should not need to use this large POSCAR for refitting as it is not actually used for ML_FFN anyway. So I would suggest that you start refitting with the first structure from your ML_AB and then use the ML_FFN in the calculation with the large POSCAR.
I attach the POSCAR from your first ML_AB configuration.