[molpro-user] MPI Error

Andy May MayAJ1 at cardiff.ac.uk
Thu Aug 1 15:23:45 BST 2013


Bob,

The 2012.1.3 binaries are built with MPICH instead of GA's TCGMSG (which 
is no longer supported). I managed to reproduce your problem, and it 
went away when I disabled the firewall on all machines involved. 
Probably you can do something less extreme by setting, for example:

export MPICH_PORT_RANGE="2000:3000"

so that you don't need to open up all ports.

Best wishes,

Andy

On 31/07/13 18:23, Quandt, Robert wrote:
> Molpro users,
>
> I recently upgraded from the 2012.1.0 to the 2012.1.3 binaries and now
> get an error message, see below, when I try to run a multi-node job (it
> works fine on one node). The job below was running fine before I killed
> it to do the upgrade, so it isn’t an input problem. Has anyone run into
> this problem with 2012.1.3? Any ideas on how to fix it?  Any help would
> be greatly apprecieated.
>
> Thanks in advance,
>
> Bob
>
> gimli /home/qg/calcs>molpro -n 12 -N
> qg:gimli:4,qg:legolas:4,qg:aragorn:4 dzpro.inp &
>
> [2] 31782
>
> gimli /home/qg/calcs>Fatal error in PMPI_Comm_dup: A process has failed,
> error stack:
>
> PMPI_Comm_dup(175)...................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x1928e4c0) failed
>
> PMPI_Comm_dup(160)...................:
>
> MPIR_Comm_dup_impl(55)...............:
>
> MPIR_Comm_copy(1552).................:
>
> MPIR_Get_contextid(799)..............:
>
> MPIR_Get_contextid_sparse_group(1064):
>
> MPIR_Allreduce_impl(719).............:
>
> MPIR_Allreduce_intra(201)............:
>
> allreduce_intra_or_coll_fn(110)......:
>
> MPIR_Allreduce_intra(539)............:
>
> MPIDI_CH3U_Recvq_FDU_or_AEP(667).....: Communication error with rank 4
>
> MPIR_Allreduce_intra(212)............:
>
> MPIR_Bcast_impl(1369)................:
>
> MPIR_Bcast_intra(1199)...............:
>
> MPIR_Bcast_binomial(220).............: Failure during collective
>
> Fatal error in PMPI_Comm_dup: A process has failed, error stack:
>
> PMPI_Comm_dup(175)...................: MPI_Comm_dup(MPI_COMM_WORLD,
> new_comm=0x1928e4c0) failed
>
> PMPI_Comm_dup(160)...................:
>
> MPIR_Comm_dup_impl(55)...............:
>
> MPIR_Comm_copy(1552).................:
>
> MPIR_Get_contextid(799)..............:
>
> MPIR_Get_contextid_sparse_group(1064):
>
> MPIR_Allreduce_impl(719).............:
>
> MPIR_Allreduce_intra(201)............:
>
> allreduce_intra_or_coll_fn(110)......:
>
> MPIR_Allreduce_intra(362)............:
>
> dequeue_and_set_error(888)...........: Communication error with rank 4
>
> MPIR_Allreduce_intra(212)............:
>
> MPIR_Bcast_impl(1369)................:
>
> MPIR_Bcast_intra(1199)...............:
>
> MPIR_Bcast_binomial(220).............: Failure during collective
>
> ===================================================================================
>
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>
> =   EXIT CODE: 1
>
> =   CLEANING UP REMAINING PROCESSES
>
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
>
> [proxy:0:0 at gimli] HYD_pmcd_pmip_control_cmd_cb
> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
>
> [proxy:0:0 at gimli] HYDT_dmxu_poll_wait_for_event
> (./tools/demux/demux_poll.c:77): callback returned error status
>
> [proxy:0:0 at gimli] main (./pm/pmiserv/pmip.c:206): demux engine error
> waiting for event
>
> [mpiexec at gimli] HYDT_bscu_wait_for_completion
> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
> terminated badly; aborting
>
> [mpiexec at gimli] HYDT_bsci_wait_for_completion
> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting
> for completion
>
> [mpiexec at gimli] HYD_pmci_wait_for_completion
> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
> completion
>
> [mpiexec at gimli] main (./ui/mpich/mpiexec.c:331): process manager error
> waiting for completion
>
> [2]    Exit 255                      molpro -n 12 -N
> qg:gimli:4,qg:legolas:4,qg:aragorn:4 dzpro.inp
>
> gimli /home/qg/calcs>
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> Dr. Bob Quandt                             Office: 324 SLB
>
> Department of Chemistry                    Phone: (309) 438-8576
>
> Illinois State University                  Fax: (309) 438-5538
>
> Normal, IL 61761-4160                      email: quandt at ilstu.edu
> <mailto:quandt at ilstu.edu>
>
> Nihil tam absurde dici potest, quod non dicatur ab aliquo philosophorum.
>
> - Marcus Tullius Cicero (106-43 BC)
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
>
>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user
>



More information about the Molpro-user mailing list