[molpro-user] MPI Error

Quandt, Robert rwquand at ilstu.edu
Thu Aug 1 18:00:26 BST 2013


Andy,
Thank you for the prompt response. Opening the firewall had no effect, but it did get me thinking about mpich. We have Opensuse 12.3 on our machines and it turns out that mpich is not installed by default. Once installed, my test run worked (you may want to mention that in the install notes). I have to wait for some calculations to finish to do a full test but it looks encouraging. 
Thanks again,
Bob

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr. Bob Quandt                             Office: 324 SLB 
Department of Chemistry                    Phone: (309) 438-8576 
Illinois State University                  Fax: (309) 438-5538
Normal, IL 61761-4160                      email: quandt at ilstu.edu
 
Nihil tam absurde dici potest, quod non dicatur ab aliquo philosophorum. 
- Marcus Tullius Cicero (106-43 BC)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= 


-----Original Message-----
From: mayaj1 at Cardiff.ac.uk [mailto:mayaj1 at Cardiff.ac.uk] 
Sent: Thursday, August 01, 2013 9:24 AM
To: Quandt, Robert
Cc: molpro-user at molpro.net
Subject: Re: [molpro-user] MPI Error

Bob,

The 2012.1.3 binaries are built with MPICH instead of GA's TCGMSG (which is no longer supported). I managed to reproduce your problem, and it went away when I disabled the firewall on all machines involved. 
Probably you can do something less extreme by setting, for example:

export MPICH_PORT_RANGE="2000:3000"

so that you don't need to open up all ports.

Best wishes,

Andy

On 31/07/13 18:23, Quandt, Robert wrote:
> Molpro users,
>
> I recently upgraded from the 2012.1.0 to the 2012.1.3 binaries and now 
> get an error message, see below, when I try to run a multi-node job 
> (it works fine on one node). The job below was running fine before I 
> killed it to do the upgrade, so it isn't an input problem. Has anyone 
> run into this problem with 2012.1.3? Any ideas on how to fix it?  Any 
> help would be greatly apprecieated.
>
> Thanks in advance,
>
> Bob
>
> gimli /home/qg/calcs>molpro -n 12 -N
> qg:gimli:4,qg:legolas:4,qg:aragorn:4 dzpro.inp &
>
> [2] 31782
>
> gimli /home/qg/calcs>Fatal error in PMPI_Comm_dup: A process has 
> failed, error stack:
>
> PMPI_Comm_dup(175)...................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x1928e4c0) failed
>
> PMPI_Comm_dup(160)...................:
>
> MPIR_Comm_dup_impl(55)...............:
>
> MPIR_Comm_copy(1552).................:
>
> MPIR_Get_contextid(799)..............:
>
> MPIR_Get_contextid_sparse_group(1064):
>
> MPIR_Allreduce_impl(719).............:
>
> MPIR_Allreduce_intra(201)............:
>
> allreduce_intra_or_coll_fn(110)......:
>
> MPIR_Allreduce_intra(539)............:
>
> MPIDI_CH3U_Recvq_FDU_or_AEP(667).....: Communication error with rank 4
>
> MPIR_Allreduce_intra(212)............:
>
> MPIR_Bcast_impl(1369)................:
>
> MPIR_Bcast_intra(1199)...............:
>
> MPIR_Bcast_binomial(220).............: Failure during collective
>
> Fatal error in PMPI_Comm_dup: A process has failed, error stack:
>
> PMPI_Comm_dup(175)...................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x1928e4c0) failed
>
> PMPI_Comm_dup(160)...................:
>
> MPIR_Comm_dup_impl(55)...............:
>
> MPIR_Comm_copy(1552).................:
>
> MPIR_Get_contextid(799)..............:
>
> MPIR_Get_contextid_sparse_group(1064):
>
> MPIR_Allreduce_impl(719).............:
>
> MPIR_Allreduce_intra(201)............:
>
> allreduce_intra_or_coll_fn(110)......:
>
> MPIR_Allreduce_intra(362)............:
>
> dequeue_and_set_error(888)...........: Communication error with rank 4
>
> MPIR_Allreduce_intra(212)............:
>
> MPIR_Bcast_impl(1369)................:
>
> MPIR_Bcast_intra(1199)...............:
>
> MPIR_Bcast_binomial(220).............: Failure during collective
>
> ======================================================================
> =============
>
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>
> =   EXIT CODE: 1
>
> =   CLEANING UP REMAINING PROCESSES
>
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ======================================================================
> =============
>
> [proxy:0:0 at gimli] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
>
> [proxy:0:0 at gimli] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
>
> [proxy:0:0 at gimli] main (./pm/pmiserv/pmip.c:206): demux engine error 
> waiting for event
>
> [mpiexec at gimli] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes 
> terminated badly; aborting
>
> [mpiexec at gimli] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error 
> waiting for completion
>
> [mpiexec at gimli] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for 
> completion
>
> [mpiexec at gimli] main (./ui/mpich/mpiexec.c:331): process manager error 
> waiting for completion
>
> [2]    Exit 255                      molpro -n 12 -N
> qg:gimli:4,qg:legolas:4,qg:aragorn:4 dzpro.inp
>
> gimli /home/qg/calcs>
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> =-=-=-=-=-=-=
>
> Dr. Bob Quandt                             Office: 324 SLB
>
> Department of Chemistry                    Phone: (309) 438-8576
>
> Illinois State University                  Fax: (309) 438-5538
>
> Normal, IL 61761-4160                      email: quandt at ilstu.edu
> <mailto:quandt at ilstu.edu>
>
> Nihil tam absurde dici potest, quod non dicatur ab aliquo philosophorum.
>
> - Marcus Tullius Cicero (106-43 BC)
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
> =-=-=-=-=-=-=
>
>
>
>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user
>



More information about the Molpro-user mailing list