Calculation crashing with NCORE > 1 (Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2)

Queries about input and output files, running specific calculations, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
philipp_schulz
Newbie
Newbie
Posts: 2
Joined: Thu Oct 05, 2023 10:47 am

Calculation crashing with NCORE > 1 (Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2)

#1 Post by philipp_schulz » Sat Oct 28, 2023 6:21 am

Hi all,

I'm trying to perform a geometry optimisation for a 2 x 2 x 6 slab of Ni/Ni3Al but the calculation crashes within the first iteration and in the output file I receive an error message "Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2". I tried running the simulation with varying numbers of cores (80 - 240) and NCORE from 1 - 12 with the same result. For running parallel processes, I am using the GErun comand (https://github.com/UCL/GERun). In addition to the error message in the output file, I receive this one from GErun:

Code: Select all

GERun: GErun command being run:
GERun:  mpirun --rsh=ssh -machinefile /tmpdir/job/423545.undefined/machines.unique -np 120 -rr /home/ucempvs/vasp.6.4.2/bin/vasp_std
Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(224).....................: MPI_Recv(buf=0xba92120, count=0, MPI_BYTE, src=118, tag=9, comm=0xc4000007, status=0x7ffefcc6cb50) failed
MPIDI_CH3U_Receive_data_found(131): Message from rank 118 and tag 9 truncated; 512 bytes received but buffer size is 0
Fatal error in MPI_Recv: Message truncated, error stack:
MPI_Recv(224)................: MPI_Recv(buf=0xc9b4f80, count=0, MPI_BYTE, src=111, tag=9, comm=0xc400000c, status=0x7fffb24e0e50) failed
MPID_nem_tmi_handle_rreq(687): Message from rank 111 and tag 9 truncated; 0 bytes received but buffer size is 0 (384 0 61)
I've noticed this error only with this specific supercell and ones with fewer atoms work perfectly with NCORE > 1. Does anyone know what could cause this?
Attached you can find a zip file containing the input & output files, as well as the job script.

Thank you for your help!

Best wishes,
Philipp

philipp_schulz
Newbie
Newbie
Posts: 2
Joined: Thu Oct 05, 2023 10:47 am

Re: Calculation crashing with NCORE > 1 (Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2)

#2 Post by philipp_schulz » Sat Oct 28, 2023 2:22 pm

Apologies, attached to this reply are the input files.
Philipp
You do not have the required permissions to view the files attached to this post.

marie-therese.huebsch
Full Member
Full Member
Posts: 212
Joined: Tue Jan 19, 2021 12:01 am

Re: Calculation crashing with NCORE > 1 (Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2)

#3 Post by marie-therese.huebsch » Wed Nov 01, 2023 10:58 am

Hi,

Thank you for your inquiry and uploading the files!

Let's get to the bottom of this. However, I cannot reproduce the issue locally. Could you try to run the job directly via mpirun instead of gerun? Does this change anything? And could you try to stay on 1 node? Does this change anything?

Sorry, I don't have an immediate fix.

Marie-Therese

Locked