MetaGGA tests fail on AMD 7742 based system with GCC

Message

aturner-epcc · #1 Post by **aturner-epcc** » Fri Mar 25, 2022 12:09 pm

Hi,

I work for the support team on the UK National Supercomputing Service, ARCHER2 (https://www.archer2.ac.uk) and we are seeing failures with some of the tests in the test suite with VASP 6.3.0. The "bulk_BN_SCAN" family of tests seem to produce results that differ from the reference results and I wondered if anyone had any insight on:

1) Has this been seen before?
2) How significant these differences are?

I have put an example output from the test suite for the "bulk_BN_SCAN" test at the bottom of this message. I have tried the following combinations of compilers, compiler optimisation flags and numerical libraries and they all show the same error (with the same numerical values):

* Compilers: GCC 10.2.0, GCC 9.3.0
* Optimisation: -Ofast, -O2, -O1 (-O0 was also tested but produced executables that did not work)
* Numerical libraries: HPE Cray LibSci 20.08, Intel MKL 21.2-2883, AMD AOCL 3.1

All builds used HPE Cray MPICH 8.1.4 as the MPI library.

Example output from test suite:

Code: Select all

CASE: bulk_BN_SCAN
------------------------------------------------------------------
CASE: bulk_BN_SCAN
entering run_recipe bulk_BN_SCAN
bulk_BN_SCAN step STD
------------------------------------------------------------------
bulk_BN_SCAN step STD
entering run_vasp
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    4 mpi-ranks, with    1 threads/rank
 distrk:  each k-point on    2 cores,    2 groups
 distr:  one band on    1 cores,    2 groups
 vasp.6.3.0 20Jan22 (build Mar 25 2022 10:53:59) complex                         
 POSCAR found :  2 types and       2 ions
 scaLAPACK will be used
 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 WAVECAR not read
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1     0.129689760597E-01    0.12969E-01   -0.40961E+03   808   0.803E+02
DAV:   2    -0.183142569444E+02   -0.18327E+02   -0.17426E+02  1080   0.119E+02
DAV:   3    -0.183986504527E+02   -0.84394E-01   -0.84394E-01   864   0.100E+01
DAV:   4    -0.183987020585E+02   -0.51606E-04   -0.51606E-04   976   0.226E-01
DAV:   5    -0.183987020752E+02   -0.16781E-07   -0.16781E-07   712   0.314E-03    0.672E+00
DAV:   6    -0.175200105897E+02    0.87869E+00   -0.34250E+00   888   0.130E+01    0.336E+00
DAV:   7    -0.174460993060E+02    0.73911E-01   -0.93666E-02   992   0.295E+00    0.196E+00
DAV:   8    -0.174314335601E+02    0.14666E-01   -0.42219E-02   816   0.187E+00    0.192E-01
DAV:   9    -0.174316296968E+02   -0.19614E-03   -0.58511E-04   832   0.248E-01    0.768E-02
DAV:  10    -0.174317068909E+02   -0.77194E-04   -0.13550E-04   904   0.115E-01
   1 F= -.17431707E+02 E0= -.17431707E+02  d E =-.165088E-10
 writing wavefunctions
exiting run_vasp

bulk_BN_SCAN step MGGA
------------------------------------------------------------------
bulk_BN_SCAN step MGGA
entering run_vasp
 ----------------------------------------------------
    OOO  PPPP  EEEEE N   N M   M PPPP
   O   O P   P E     NN  N MM MM P   P
   O   O PPPP  EEEEE N N N M M M PPPP   -- VERSION
   O   O P     E     N  NN M   M P
    OOO  P     EEEEE N   N M   M P
 ----------------------------------------------------
 running    4 mpi-ranks, with    1 threads/rank
 distrk:  each k-point on    2 cores,    2 groups
 distr:  one band on    1 cores,    2 groups
 vasp.6.3.0 20Jan22 (build Mar 25 2022 10:53:59) complex                         
 POSCAR found :  2 types and       2 ions
 scaLAPACK will be used
 LDA part: xc-table for Pade appr. of Perdew
 found WAVECAR, reading the header
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ... GRIDC
 FFT: planning ... GRID_SOFT
 FFT: planning ... GRID
 reading WAVECAR
 the WAVECAR file was read successfully
 initial charge from wavefunction
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)
DAV:   1    -0.195038308774E+02   -0.19504E+02   -0.30364E-01   736   0.582E+00    0.474E-01
DAV:   2    -0.195020211619E+02    0.18097E-02   -0.10520E-02  1000   0.126E+00    0.312E-01
DAV:   3    -0.195016428920E+02    0.37827E-03   -0.40405E-02   920   0.132E+00    0.251E-01
DAV:   4    -0.195002942130E+02    0.13487E-02   -0.35628E-03   856   0.546E-01    0.110E-01
DAV:   5    -0.195033733722E+02   -0.30792E-02   -0.28005E-02   840   0.118E+00    0.350E-01
DAV:   6    -0.194999457118E+02    0.34277E-02   -0.69212E-03   936   0.668E-01    0.942E-02
DAV:   7    -0.195002994457E+02   -0.35373E-03   -0.40555E-03   904   0.431E-01    0.423E-02
DAV:   8    -0.195003676375E+02   -0.68192E-04   -0.13364E-04   752   0.121E-01    0.335E-02
DAV:   9    -0.195003590214E+02    0.86161E-05   -0.48319E-06  1008   0.215E-02    0.324E-02
DAV:  10    -0.195003334372E+02    0.25584E-04   -0.68429E-05   904   0.559E-02    0.894E-03
DAV:  11    -0.195003399149E+02   -0.64777E-05   -0.71866E-05   888   0.569E-02    0.957E-03
DAV:  12    -0.195003448973E+02   -0.49824E-05   -0.31508E-06   736   0.156E-02    0.931E-03
DAV:  13    -0.195003447099E+02    0.18741E-06   -0.17556E-05   936   0.285E-02    0.330E-03
DAV:  14    -0.195003440493E+02    0.66065E-06   -0.20175E-06   792   0.108E-02    0.465E-03
DAV:  15    -0.195003424571E+02    0.15922E-05   -0.16333E-06   896   0.913E-03    0.738E-04
DAV:  16    -0.195003429589E+02   -0.50178E-06   -0.24613E-07   944   0.360E-03    0.501E-04
DAV:  17    -0.195003431762E+02   -0.21731E-06   -0.11303E-08   736   0.841E-04    0.502E-04
DAV:  18    -0.195003432292E+02   -0.53070E-07   -0.36191E-10   920   0.233E-04    0.491E-04
DAV:  19    -0.195003432326E+02   -0.33607E-08   -0.34214E-10   464   0.128E-04    0.488E-04
DAV:  20    -0.195003432278E+02    0.48011E-08   -0.20135E-10   392   0.101E-04    0.489E-04
DAV:  21    -0.195003432160E+02    0.11822E-07   -0.20213E-10   368   0.103E-04    0.488E-04
DAV:  22    -0.195003432185E+02   -0.25257E-08   -0.10464E-10   368   0.751E-05    0.490E-04
DAV:  23    -0.195003432208E+02   -0.23001E-08   -0.95145E-11   384   0.668E-05    0.492E-04
DAV:  24    -0.195003432255E+02   -0.46427E-08   -0.23502E-12   368   0.242E-05    0.491E-04
DAV:  25    -0.195003432258E+02   -0.30514E-09   -0.15432E-12   368   0.801E-06    0.491E-04
DAV:  26    -0.195003432255E+02    0.27990E-09   -0.64277E-13   368   0.453E-06    0.491E-04
DAV:  27    -0.195003432253E+02    0.15871E-09   -0.25817E-13   368   0.236E-06    0.491E-04
DAV:  28    -0.195003432254E+02   -0.84583E-10   -0.15576E-12   368   0.845E-06
   1 F= -.19500343E+02 E0= -.19500343E+02  d E =-.628853E-12
 writing wavefunctions
exiting run_vasp

exiting run_recipe bulk_BN_SCAN
 STOP  
the frequencies are:
the frequencies are correct, run successful
-------------------------------------------
ERROR: the test yields different results for the energies, please check
-----------------------------------------------------------------------
        -19.50034323	        -19.50310506
        -19.50034323	        -19.50310506
 ---------------------------------------------------------------------------
 Comparing files: energy_outcar and energy_outcar.ref
                   2  number(s) differ.
       Max diff.:  2.7618300000007423E-3
  (at row number:  1  column number:  1 )
       Tolerance:  5.0000000000000001E-4
 ---------------------------------------------------------------------------
 
the forces are:
  -0.00000   -0.00000    0.84438
   0.00000    0.00000   -0.84438
the forces are correct, run successful
ERROR: the stress tensor is different, please check
---------------------------------------------------
    -59.73     -59.73     -61.21       3.65       0.00       0.00
 ---------------------------------------------------------------------------
 Comparing files: stress and stress.ref
                   4  number(s) differ.
       Max diff.:  11.830000000000005
  (at row number:  1  column number:  1 )
       Tolerance:  0.10000000000000001
 ---------------------------------------------------------------------------

#2 Post by **marie-therese.huebsch** » Fri Mar 25, 2022 3:25 pm

Hi,

Thank you for reporting your observation. I would like to investigate this issue further. Could you please upload the makefile.include file that you used to compile the version, which you used to obtain the example output from the test suite?

Cheers,
Marie-Therese

aturner-epcc · #3 Post by **aturner-epcc** » Fri Mar 25, 2022 7:51 pm

Hi Marie-Therese, thanks for replying so quickly.

Apologies, I should have added the makefile.include as part of the original post.

The contents of my "makefile.include" are below

Code: Select all

# Precompiler options
CPP_OPTIONS= -DHOST=\"HPECrayEX_GCC\" \
             -DMPI -DMPI_BLOCK=32000 -Duse_collective \
             -DscaLAPACK \
             -DCACHE_SIZE=2000 \
             -Davoidalloc \
             -DMPI_INPLACE \
             -DnoAugXCmeta \
             -Dvasp6 \
             -Duse_bse_te \
             -Dtbdyn \
             -Dfock_dblbuf \
             -D_OPENMP

CPP        = gcc -E -P -C -w $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)

FC         = ftn -fopenmp
FCL        = ftn -fopenmp

FREE       = -ffree-form -ffree-line-length-none

MKL_PATH   =

FFLAGS     = -fallow-argument-mismatch -w -ffpe-summary=invalid,zero,overflow
OFLAG      = -Ofast
OFLAG_IN   = $(OFLAG)
DEBUG      = -O0

BLAS       = 
LAPACK     = 
BLACS      = 
SCALAPACK  =

LLIBS      = 

INCS       = 

OBJECTS    = fftmpiw.o fftmpi_map.o  fftw3d.o  fft3dlib.o

OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB     = $(FC)
CC_LIB     = cc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB   = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# For the parser library
CXX_PARS   = CC
LLIBS      += -lstdc++

### For the fft library
##CXX_FFTLIB = g++ -fopenmp -std=c++11 -DFFTLIB_THREADSAFE 
##INCS_FFTLIB= -I./include -I$(FFTW)/include
##LIBS       += fftlib
##LLIBS      += -ldl

# Normally no need to change this
SRCDIR     = ../../src
BINDIR     = ../../bin

The "ftn" wrapper scripts within the HPE Cray programming environment automatically include the all the paths and options required to link in HPE Cray LibSci and HPE Cray MPICH and FFTW 3.3.8.9. As I mentioned, we see the same behaviour if we link to Intel MKL 21.2-2883 and AOCL 3.1 so the choice of numerical library seems to make no difference.

ARCHER2 is an HPE Cray EX system with the Slingshot interconnect (though these tests only used 4 cores on a single node). There are 2x AMD 7742 64-core processors per node. Here are more details on the processor:

Code: Select all

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       43 bits physical, 48 bits virtual
CPU(s):              256
On-line CPU(s) list: 0-255
Thread(s) per core:  2
Core(s) per socket:  64
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               49
Model name:          AMD EPYC 7742 64-Core Processor
Stepping:            0
CPU MHz:             2250.000
CPU max MHz:         2250.0000
CPU min MHz:         1500.0000
BogoMIPS:            4500.23
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15,128-143
NUMA node1 CPU(s):   16-31,144-159
NUMA node2 CPU(s):   32-47,160-175
NUMA node3 CPU(s):   48-63,176-191
NUMA node4 CPU(s):   64-79,192-207
NUMA node5 CPU(s):   80-95,208-223
NUMA node6 CPU(s):   96-111,224-239
NUMA node7 CPU(s):   112-127,240-255
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_l2 mwaitx cpb cat_l3 cdp_l3 hw_pstate sme ssbd sev ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

#4 Post by **andreas.singraber** » Mon Mar 28, 2022 3:50 pm

Hello!

Thank you for your detailed description of the problem and for testing multiple compiler/library combinations. I was able to reproduce the failed tests starting with bulk_BN_SCAN and found that the origin was the deprecated -DnoAugXCmeta flag in makefile.include. Please have a look at this corresponding Wiki section .

In principle the binaries you obtained when compiling WITH -DnoAugXCmeta are not broken! However, their use should be restricted to special cases, e.g. when you want to work with the functionals mentioned in the Wiki entry above. Or, when you need to reproduce older results where numerical backward compatibility is required. The flag was removed from the makefile.include examples in the arch directory in March 2017.

In all other cases, in particular if you want to create a general-purpose build of VASP for users, the recommendation is to avoid the -DnoAugXCmeta flag.

I hope that fixes the tests for you too, I would appreciate if you could report back your findings.

All the best,

Andreas Singraber

aturner-epcc · #5 Post by **aturner-epcc** » Wed Mar 30, 2022 6:43 pm

Hi Andreas,

Thanks for looking at this and getting back with a solution and explanation so quickly.

I have tested the compile without the -DnoAugXCmeta flag on ARCHER2 and it does, indeed, lead to the "SCAN" tests giving outputs that match the reference values in the test suite. I will update the builds on ARCHER2.

Best regards
Andy

My Community

MetaGGA tests fail on AMD 7742 based system with GCC

MetaGGA tests fail on AMD 7742 based system with GCC

Re: MetaGGA tests fail on AMD 7742 based system with GCC

Re: MetaGGA tests fail on AMD 7742 based system with GCC

Re: MetaGGA tests fail on AMD 7742 based system with GCC

Re: MetaGGA tests fail on AMD 7742 based system with GCC