[molpro-user] MPI Error

Andy May MayAJ1 at cardiff.ac.uk
Fri Aug 2 09:41:13 BST 2013


Bob,

This is quite strange. All the machines I tested on were also openSUSE 
12.3 but without mpich installed.

I can't see how the mpich package from openSUSE would help since we ship 
with version 3.0.4 and the package from repositories is 1.2.7. The 
Molpro and mpiexec binaries we ship are statically linked so shouldn't 
gain anything from the openSUSE mpich libraries being installed. The 
openSUSE package does not by default set up PATH to find it's 
executables, so it's unlikely they are helping (especially since the 
launcher was called mpirun in version 1.x and is called mpiexec in 
version 3.x).

Is it possible that installing the package opened some ports on the 
machine or forced some new firewall settings to become active?

Anyway, the important thing is that it's working which is great.

Best wishes,

Andy

On 01/08/13 18:00, Quandt, Robert wrote:
> Andy,
> Thank you for the prompt response. Opening the firewall had no effect, but it did get me thinking about mpich. We have Opensuse 12.3 on our machines and it turns out that mpich is not installed by default. Once installed, my test run worked (you may want to mention that in the install notes). I have to wait for some calculations to finish to do a full test but it looks encouraging.
> Thanks again,
> Bob
>
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Dr. Bob Quandt                             Office: 324 SLB
> Department of Chemistry                    Phone: (309) 438-8576
> Illinois State University                  Fax: (309) 438-5538
> Normal, IL 61761-4160                      email: quandt at ilstu.edu
>
> Nihil tam absurde dici potest, quod non dicatur ab aliquo philosophorum.
> - Marcus Tullius Cicero (106-43 BC)
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
>
> -----Original Message-----
> From: mayaj1 at Cardiff.ac.uk [mailto:mayaj1 at Cardiff.ac.uk]
> Sent: Thursday, August 01, 2013 9:24 AM
> To: Quandt, Robert
> Cc: molpro-user at molpro.net
> Subject: Re: [molpro-user] MPI Error
>
> Bob,
>
> The 2012.1.3 binaries are built with MPICH instead of GA's TCGMSG (which is no longer supported). I managed to reproduce your problem, and it went away when I disabled the firewall on all machines involved.
> Probably you can do something less extreme by setting, for example:
>
> export MPICH_PORT_RANGE="2000:3000"
>
> so that you don't need to open up all ports.
>
> Best wishes,
>
> Andy
>
> On 31/07/13 18:23, Quandt, Robert wrote:
>> Molpro users,
>>
>> I recently upgraded from the 2012.1.0 to the 2012.1.3 binaries and now
>> get an error message, see below, when I try to run a multi-node job
>> (it works fine on one node). The job below was running fine before I
>> killed it to do the upgrade, so it isn't an input problem. Has anyone
>> run into this problem with 2012.1.3? Any ideas on how to fix it?  Any
>> help would be greatly apprecieated.
>>
>> Thanks in advance,
>>
>> Bob
>>
>> gimli /home/qg/calcs>molpro -n 12 -N
>> qg:gimli:4,qg:legolas:4,qg:aragorn:4 dzpro.inp &
>>
>> [2] 31782
>>
>> gimli /home/qg/calcs>Fatal error in PMPI_Comm_dup: A process has
>> failed, error stack:
>>
>> PMPI_Comm_dup(175)...................: MPI_Comm_dup(MPI_COMM_WORLD,
>> new_comm=0x1928e4c0) failed
>>
>> PMPI_Comm_dup(160)...................:
>>
>> MPIR_Comm_dup_impl(55)...............:
>>
>> MPIR_Comm_copy(1552).................:
>>
>> MPIR_Get_contextid(799)..............:
>>
>> MPIR_Get_contextid_sparse_group(1064):
>>
>> MPIR_Allreduce_impl(719).............:
>>
>> MPIR_Allreduce_intra(201)............:
>>
>> allreduce_intra_or_coll_fn(110)......:
>>
>> MPIR_Allreduce_intra(539)............:
>>
>> MPIDI_CH3U_Recvq_FDU_or_AEP(667).....: Communication error with rank 4
>>
>> MPIR_Allreduce_intra(212)............:
>>
>> MPIR_Bcast_impl(1369)................:
>>
>> MPIR_Bcast_intra(1199)...............:
>>
>> MPIR_Bcast_binomial(220).............: Failure during collective
>>
>> Fatal error in PMPI_Comm_dup: A process has failed, error stack:
>>
>> PMPI_Comm_dup(175)...................: MPI_Comm_dup(MPI_COMM_WORLD,
>> new_comm=0x1928e4c0) failed
>>
>> PMPI_Comm_dup(160)...................:
>>
>> MPIR_Comm_dup_impl(55)...............:
>>
>> MPIR_Comm_copy(1552).................:
>>
>> MPIR_Get_contextid(799)..............:
>>
>> MPIR_Get_contextid_sparse_group(1064):
>>
>> MPIR_Allreduce_impl(719).............:
>>
>> MPIR_Allreduce_intra(201)............:
>>
>> allreduce_intra_or_coll_fn(110)......:
>>
>> MPIR_Allreduce_intra(362)............:
>>
>> dequeue_and_set_error(888)...........: Communication error with rank 4
>>
>> MPIR_Allreduce_intra(212)............:
>>
>> MPIR_Bcast_impl(1369)................:
>>
>> MPIR_Bcast_intra(1199)...............:
>>
>> MPIR_Bcast_binomial(220).............: Failure during collective
>>
>> ======================================================================
>> =============
>>
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>
>> =   EXIT CODE: 1
>>
>> =   CLEANING UP REMAINING PROCESSES
>>
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>
>> ======================================================================
>> =============
>>
>> [proxy:0:0 at gimli] HYD_pmcd_pmip_control_cmd_cb
>> (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
>>
>> [proxy:0:0 at gimli] HYDT_dmxu_poll_wait_for_event
>> (./tools/demux/demux_poll.c:77): callback returned error status
>>
>> [proxy:0:0 at gimli] main (./pm/pmiserv/pmip.c:206): demux engine error
>> waiting for event
>>
>> [mpiexec at gimli] HYDT_bscu_wait_for_completion
>> (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes
>> terminated badly; aborting
>>
>> [mpiexec at gimli] HYDT_bsci_wait_for_completion
>> (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error
>> waiting for completion
>>
>> [mpiexec at gimli] HYD_pmci_wait_for_completion
>> (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for
>> completion
>>
>> [mpiexec at gimli] main (./ui/mpich/mpiexec.c:331): process manager error
>> waiting for completion
>>
>> [2]    Exit 255                      molpro -n 12 -N
>> qg:gimli:4,qg:legolas:4,qg:aragorn:4 dzpro.inp
>>
>> gimli /home/qg/calcs>
>>
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> =-=-=-=-=-=-=
>>
>> Dr. Bob Quandt                             Office: 324 SLB
>>
>> Department of Chemistry                    Phone: (309) 438-8576
>>
>> Illinois State University                  Fax: (309) 438-5538
>>
>> Normal, IL 61761-4160                      email: quandt at ilstu.edu
>> <mailto:quandt at ilstu.edu>
>>
>> Nihil tam absurde dici potest, quod non dicatur ab aliquo philosophorum.
>>
>> - Marcus Tullius Cicero (106-43 BC)
>>
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
>> =-=-=-=-=-=-=
>>
>>
>>
>>
>> _______________________________________________
>> Molpro-user mailing list
>> Molpro-user at molpro.net
>> http://www.molpro.net/mailman/listinfo/molpro-user
>>



More information about the Molpro-user mailing list