[molpro-user] Molpro 2010.1 PL20 compilation problem with openmpi 1.4.1 and ga-5-0-2

Panwang Zhou pwzhou at dicp.ac.cn
Thu May 5 03:03:00 BST 2011


Dear Andy,

Thank you for your advice. I have recompiled molpro with intel mpi and mvapich2, both of them work fine whether within one node or more than one node.

I think that the problem is only related to the openmpi and may be resulted from the MPI_Finalize calling, as the molpro have given the correct result.

==============================================
Panwang Zhou   
State Key Laboratory of Molecular Reaction Dynamics
Dalian Institute of Chemical Physics
Chinese Academy of Sciences.
Tel: 0411-84379195 Fax: 0411-84675584
===============================================


-----邮件原件-----
发件人: mayaj1 at Cardiff.ac.uk [mailto:mayaj1 at Cardiff.ac.uk] 
发送时间: 2011年5月4日 20:38
收件人: Panwang Zhou
抄送: molpro-user at molpro.net
主题: Re: [molpro-user] Molpro 2010.1 PL20 compilation problem with openmpi 1.4.1 and ga-5-0-2

Panwang,

I'm our cluster system is currently down for maintenance so I'm unable to try and reproduce the problem directly.

I believe there are known problems with openmpi and multiple nodes, although I can't recall specifically what they are. Perhaps you could instead try building with mvapich2 to see if this solves the problem.

Best wishes,

Andy

On 03/05/11 06:08, Panwang Zhou wrote:
> Dear all:
> 
> I encounter the following problem while compiling molpro 2010.1 PL20 
> with openmpi and ga-5-0-2 in our Linux Cluster.
> 
> OS:SLES 10 SP3 x86_64
> 
> `uname –a`: Linux cn001 2.6.16.60-0.54.5-smp #1 SMP Fri Sep 4 01:28:03 
> UTC 2009 x86_64 x86_64 x86_64 GNU/Linux
> 
> Openmpi was compiled with icc and ifort, and I have compiled some 
> other problem such as nwchem, CPMD etc with it, all of those work fine.
> 
> First I source the intel compiler and openmpi env using the following
> command:
> 
> source /hptc_cluster3/application/env/intel10_openmpi1.4.rc
> 
> The configure parameters for ga:
> 
> mkdir bld && cd bld
> 
> ../configure --prefix=`pwd` --with-scalapack=no --enable-f77 F77=ifort 
> CC=icc CXX=icpc 
> --with-mpi="/hptc_cluster3/application/mpi/openmpi/1.4.1/icc_ifort/lib
> -I/hptc_cluster3/application/mpi/openmpi/1.4.1/icc_ifort/include" 
> --with-openib 2>&1 | tee configure.log
> 
> make –j 8 2>&1 | tee make.log
> 
> make install
> 
> The configure parameters for molpro 2010.1:
> 
> ./configure -icc -ifort -mpp -mppbase 
> /hptc_cluster3/software/chem/molpro_2010.1/build_openmpi/ga-5-0-2/bld
> -openmpi -nohdf5 -var LIBS="-libverbs"
> 
> The compilation was done successfully without problem. When I run 
> molpro within one node with 8 cpu cores, the jobs can be done 
> successfully without any problem. However, when I run molpro within 
> two nodes with 16 cpus, although the jobs can also be done and the 
> result is also correct, it print the following information to stdout 
> (I submit jobs with LSF, and the following is written to stdout, not stderr):
> 
> ARMCI configured for 2 cluster nodes. Network protocol is 'OpenIB 
> Verbs API'.
> 
> 0:Segmentation Violation error, status=: 11
> 
> (rank:0 hostname:cn040 pid:8271):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 13:Segmentation Violation error, status=: 11
> 
> (rank:13 hostname:cn043 pid:21877):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 0:Segmentation Violation error, status=: 11
> 
> (rank:0 hostname:cn040 pid:8287):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 0:Segmentation Violation error, status=: 11
> 
> (rank:0 hostname:cn040 pid:8286):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 0:Segmentation Violation error, status=: 11
> 
> (rank:0 hostname:cn040 pid:8288):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 13:Segmentation Violation error, status=: 11
> 
> (rank:13 hostname:cn043 pid:21898):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 13:Segmentation Violation error, status=: 11
> 
> (rank:13 hostname:cn043 pid:21897):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> 13:Segmentation Violation error, status=: 11
> 
> (rank:13 hostname:cn043 pid:21899):ARMCI DASSERT fail. 
> ../../armci/src/signaltrap.c:SigSegvHandler():312 cond:0
> 
> Anybody know how to resolve this prolem? Thanks.
> 
> ==============================================
> Panwang Zhou
> State Key Laboratory of Molecular Reaction Dynamics Dalian Institute 
> of Chemical Physics Chinese Academy of Sciences.
> Tel: 0411-84379195 Fax: 0411-84675584
> ===============================================
> 
> 
> 
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user




More information about the Molpro-user mailing list