[molpro-user] MRCI problem when running on multiple nodes

Andy May MayAJ1 at cardiff.ac.uk
Thu Dec 10 10:46:54 GMT 2009


Hi,

Could you send me the input file used and a copy of the CONFIG file from
the build?

Thanks,

Andy

On 03/12/09 13:13, aristotle Papakondylis wrote:
> Dear all
> I am trying to run a MRCI calculation with molpro 2009.1 on two nodes ( 
> 4 Itanium  processors each) of
> my system but  molpro crashes with the error message attached below. 
> However if I run the same calculation
> on a single node using for example 4 processors the job finishes without 
> any problems.  Molpro was built with
> ga-4-2 and tcgmsg and I use Infiniband. Any suggestions would be appreciated
> Thanks
> 
> A. Papakondylis
> Laboratory of Physical Chemistry
> Department of Chemistry
> University of Athens
> 
> 
> 
> The output:
> 
> ......................................................................................................................................
>  Number of blocks in overlap matrix:    20   Smallest eigenvalue:  0.30D-06
>  Number of N-2 electron functions:     210
>  Number of N-1 electron functions:  139698
> 
>  Number of internal configurations:                20627
>  Number of singly external configurations:       6446250
>  Number of doubly external configurations:        894852
>  Total number of contracted configurations:      7361729
>  Total number of uncontracted configurations:  636698630
> 
>  Diagonal Coupling coefficients finished.               Storage: 9109747 
> words, CPU-Time:      5.02 seconds.
>  Energy denominators for pairs finished in 1 passes.    Storage:  893537 
> words, CPU-time:      0.09 seconds.
> 
>   ITER. STATE  ROOT     SQ.NORM     CORR.ENERGY   TOTAL ENERGY   ENERGY 
> CHANGE       DEN1      VAR(S)    VAR(P)      TIME
>     1     1     1     1.00000000     0.00000000 -1384.06452996     
> 0.00000000    -0.14494506  0.17D-01  0.37D-01    58.08
>     1     2     2     1.00000000     0.00000000 -1384.03046431     
> 0.00000000    -0.18878279  0.16D-01  0.56D-01    58.08
> 
>  GLOBAL ERROR fehler on processor   5
> 
>  GLOBAL ERROR fehler on processor   4
> 
>  GLOBAL ERROR fehler on processor   7
> Last System Error Message from Task 7:: Invalid argument
> Last System Error Message from Task 5:: Invalid argument
> 7:7:fehler:: 1010707757
> (rank:7 hostname:nodeib_08 pid:32674):ARMCI DASSERT fail. 
> armci.c:ARMCI_Error():260 cond:0
> Last System Error Message from Task 4:: Invalid argument
> 5:5:fehler:: 1010707757
> (rank:5 hostname:nodeib_08 pid:32622):ARMCI DASSERT fail. 
> armci.c:ARMCI_Error():260 cond:0
>   5: ARMCI aborting 0 (0).
> system error message: Invalid argument
>   5: ARMCI aborting 0 (0).
>   7: ARMCI aborting 0 (0).
>   7: ARMCI aborting 0 (0).
> system error message: Invalid argument
> 
>  GLOBAL ERROR fehler on processor   
> 6                                        
> Last System Error Message from Task 6:: Invalid argument
> 6:6:fehler:: 1010707757
> (rank:6 hostname:nodeib_08 pid:32648):ARMCI DASSERT fail. 
> armci.c:ARMCI_Error():260 cond:0
>   6: ARMCI aborting 0 (0).
>   6: ARMCI aborting 0 (0).
> system error message: Invalid argument
>   8: interrupt(1)
> 2:SigIntHandler: interrupt signal was caught: 2
> 1:SigIntHandler: interrupt signal was caught: 2
> 3:SigIntHandler: interrupt signal was caught: 2
> Last System Error Message from Task 2:: Numerical result out of range
> Last System Error Message from Task 1:: Numerical result out of range
> Last System Error Message from Task 3:: Numerical result out of range
> 2:SigIntHandler: abort signal was caught: cleaning up: 2
>   2: ARMCI aborting 0 (0).
> agonal Coupling coefficients finished.               Storage: 9109747 
> words, CPU-Time:      5.02 seconds.
>  Energy denominators for pairs  Diagonal Coupling coefficients 
> finished.               Storage: 9109747 words, CPU-Time:      5.02 seconds.
>  Energy denominators for pairs
> system error message: Illegal seek
> 1:SigIntHandler: abort signal was caught: cleaning up: 2
>   1: ARMCI aborting 0 (0).
>   1: ARMCI aborting 0 (0).
> system error message: Illegal seek
> 0:SigIntHandler: interrupt signal was caught: 2
> (rank:0 hostname:nodeib_07 pid:4468):ARMCI DASSERT fail. 
> signaltrap.c:SigIntHandler():69 cond:0
> 3:SigIntHandler: abort signal was caught: cleaning up: 2
>   3: ARMCI aborting 0 (0).
>   3: ARMCI aborting 0 (0).
> system error message: Illegal seek
> Last System Error Message from Task 0:: Inappropriate ioctl for device
> 4:4:fehler:: 1010707757
> (rank:4 hostname:nodeib_08 pid:32596):ARMCI DASSERT fail. 
> armci.c:ARMCI_Error():260 cond:0
> 4:SigIntHandler: interrupt signal was caught: 2
> 4:SigIntHandler: abort signal was caught: cleaning up: 2
>   4: ARMCI aborting 0 (0).
>   4: ARMCI aborting 0 (0).
> system error message: Transport endpoint is not connected
> WaitAll: Child (4471) finished, status=0x100 (exited with code 1).
> WaitAll: Child (4470) finished, status=0x100 (exited with code 1).
> WaitAll: Child (4469) finished, status=0x100 (exited with code 1).
> 
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user



More information about the Molpro-user mailing list