[molpro-user] test job hangs on FC2 Xeon Node

H. -J. Werner werner at theochem.uni-stuttgart.de
Tue Dec 7 09:15:39 GMT 2004


The dsyev problem is known, and apparently due to a bug in mkl.
On some systems I have trouble with mkl701, but mkl61 works fine. 
You can probably avoid the problem by setting ftcflag "olddiag2".
Joachim Werner
On Di, 07 Dez 2004, Dr Seth OLSEN wrote:

>
>
>Hi Nick,
>
>The MKL libraries did not work either.  In the end, the way I got around the problem was to install RedHat on the nodes in question.  I am not sure that the problem here is with MolPro.  I have similar problems installing other program suites on these nodes and the general symptomology is identical - processes become unkillable, taking up all the %CPU (all of which, it appears, is SYSTEM, not USER) and almost %mem.  It appears that the problem was with Fedora on this particular architecture.
>
>Cheers,
>
>Seth
>ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>
>Dr Seth Olsen, PhD
>Postdoctoral Fellow, Computational Systems Biology Group
>Centre for Computational Molecular Science
>Chemistry Building,
>The University of Queensland
>Qld 4072, Brisbane, Australia
>
>tel (617) 33653732
>fax (617) 33654623
>email: s.olsen1 at uq.edu.au
>Web: www.ccms.uq.edu.au 
>
>ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>
>
>
>
>----- Original Message -----
>From: Nick Wilson <WilsonNT at Cardiff.ac.uk>
>Date: Wednesday, November 24, 2004 2:52 am
>Subject: Re: [molpro-user] test job hangs on FC2 Xeon Node
>
>> Dear Seth,
>> 
>> Using a combination of dual xeon, redhat7.3, ifc7.0 and atlas I 
>> think I 
>> might have reproduced a similar race condition.
>> 
>> It went away when I used the mkl_ia32 blas library by editing CONFIG:
>> 
>> BLASLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_ia32 
>> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
>> LAPACKLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_lapack 
>> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
>> 
>> (you might also need -lguide if you don't have -openmp) and doing 
>> "make"
>> or when I used the blas shipped with molpro by editing the CONFIG 
>> file thus:
>> 
>> FTCFLAGS="mpp eaf blas0"
>> BLASLIB=""
>> LAPACKLIB=""
>> BLASLIB_p4=""
>> LAPACKLIB_p4=""
>> 
>> Then doing:
>> 
>> rm lib/libmolpro.a
>> make
>> 
>> 
>> Can you test whether either intel's or molpro's blas/lapack fixes 
>> your 
>> problem.
>> Best wishes,
>> Nick
>> 
>> 
>> Dr Seth OLSEN wrote:
>> > Hello Molpro-Users,
>> > 
>> > As outlined in previous communiques, I have been having no luck 
>> in getting molpro2002.6 to run on a Dual Xeon node with Fedora Core 
>> 2, either as the installed rpm or as a self-compiled version done 
>> with ifc7 or ifc8.  The problem is as follows.  After the integral 
>> sort, the process writes no more to output but becomes unkillable 
>> with 99.9%CPU and 1.0%Mem as given by 'top'.
>> > 
>> > In order to help diagnose the problem, I have turned the 
>> 'gprint,io,cpu' directive on in a given failing job 
>> (bccd_opt.test).  The following are the last lines written to 
>> output for that job with the io printing turned on:
>> > 
>> >  EXTENDING RECORD    1300.1 BY        34949. WORDS TO      38820. 
>> IMPLEMENTATION=df    EXTENSION 0
>> >  
>> >  NUMBER OF SORTED TWO-ELECTRON INTEGRALS:      34949.     BUFFER 
>> LENGTH:  32768
>> >  NUMBER OF SEGMENTS:   1  SEGMENT LENGTH:      34949      RECORD 
>> LENGTH: 524288
>> >  
>> >  Memory used in sort:       0.59 MW
>> >  OPENW FILE 24  NAME=/scratch/root/eaf_T2400002627.TMP  
>> IMPLEMENTATION=eaf   STATUS=scratch   HANDLE=     2
>> >  OPEN EAF FILE 24  NAME=  IMPLEMENTATION=eaf
>> >  CLOSEW FILE 21  NAME=eaf_T2100002627.TMP  IMPLEMENTATION=eaf   
>> HANDLE=     1
>> >  CLOSE EAF FILE 21
>> > 
>> > To determine what files might be opened by molpro at the time 
>> that the program stops functioning, I issue a 'lsof | grep molpro' 
>> command while the program is running in it's 'unkillable' final 
>> status.  The following is the output of that command:
>> > 
>> > bash      2210     root  cwd    DIR        8,1    12288     
>> 295282 /opt/molpro/testjobs
>> > molpro    2624     root  cwd    DIR        8,2     4096    
>> 2796193 /scratch/root
>> > molpro    2624     root  rtd    DIR        8,1     4096          
>> 2 /
>> > molpro    2624     root  txt    REG        8,1    41552     
>> 491923 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molpro
>> > molpro    2624     root  mem    REG        8,1  1455084      
>> 82119 /lib/tls/libc-2.3.3.so
>> > molpro    2624     root  mem    REG        8,1   106892     
>> 375519 /lib/ld-2.3.3.so
>> > molpro    2624     root    0u   CHR      136,1                   
>> 3 /dev/pts/1
>> > molpro    2624     root    1u   CHR      136,1                   
>> 3 /dev/pts/1
>> > molpro    2624     root    2u   CHR      136,1                   
>> 3 /dev/pts/1
>> > molpro    2624     root    4u   REG        8,1      823      
>> 18013 /tmp/tmpfuX2LXr (deleted)
>> > parallel  2626     root  txt    REG        8,1    30180     
>> 491926 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/parallel
>> > parallel  2626     root    1u   REG        8,1    12113     
>> 295235 /opt/molpro/testjobs/bccd_opt.out
>> > molprop_2 2627     root  cwd    DIR        8,2     4096    
>> 2796193 /scratch/root
>> > molprop_2 2627     root  rtd    DIR        8,1     4096          
>> 2 /
>> > molprop_2 2627     root  txt    REG        8,1 19346064     
>> 491925 /usr/local/lib/molpro-mpp-Linux-i686-i4-
>> 2002.6/molprop_2002_6_p4_tcgmsg.exe> molprop_2 2627     root  mem   
>> REG        8,1    96248     375542 /lib/libnsl-2.3.3.so
>> > molprop_2 2627     root  mem    REG        8,1   106892     
>> 375519 /lib/ld-2.3.3.so
>> > molprop_2 2627     root  mem    REG        8,1  1455084      
>> 82119 /lib/tls/libc-2.3.3.so
>> > molprop_2 2627     root  mem    REG        8,1   214796      
>> 82121 /lib/tls/libm-2.3.3.so
>> > molprop_2 2627     root  mem    REG        8,1    43528     
>> 375552 /lib/libnss_nis-2.3.3.so
>> > molprop_2 2627     root  mem    REG        8,1    50944     
>> 375549 /lib/libnss_files-2.3.3.so
>> > molprop_2 2627     root    0u   REG        8,1      823      
>> 18013 /tmp/tmpfuX2LXr (deleted)
>> > molprop_2 2627     root    1u   REG        8,1    12113     
>> 295235 /opt/molpro/testjobs/bccd_opt.out
>> > molprop_2 2627     root    2u   CHR      136,1                   
>> 3 /dev/pts/1
>> > molprop_2 2627     root    3u  IPv4       4464                 
>> TCP sphinx128.giza:32846->sphinx128.giza:32844 (ESTABLISHED)
>> > molprop_2 2627     root    4u   REG        8,1     1457      
>> 18014 /tmp/forttempG1uyhO
>> > molprop_2 2627     root    5u   REG        8,1       74      
>> 18015 /tmp/forttempfYdJb0
>> > molprop_2 2627     root    6u   REG        8,1        0      
>> 18016 /tmp/forttemp2hFU5b
>> > molprop_2 2627     root    7u   REG        8,1        0      
>> 18017 /tmp/forttemp9p96Zn
>> > molprop_2 2627     root    8u   REG        8,2  3006888    
>> 2796194 /scratch/root/df_T0100002627.TMP (deleted)
>> > molprop_2 2627     root    9u   REG        8,2   182344    
>> 2796195 /scratch/root/df_T0200002627.TMP (deleted)
>> > molprop_2 2627     root   10u   REG        8,2   182344    
>> 2796196 /scratch/root/df_T0300002627.TMP (deleted)
>> > molprop_2 2627     root   11u   REG        8,2        0    
>> 2796197 /scratch/root/df_T0400002627.TMP (deleted)
>> > molprop_2 2627     root   12r   REG        8,1   476967     
>> 491914 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/libmol.index
>> > molprop_2 2627     root   13u   REG        8,2  3428352    
>> 2796199 /scratch/root/eaf_T2400002627.TMP (deleted)
>> > 
>> > So, it appears that the *.TMP files that molpro has most recently 
>> opened and closed are listed as deleted but still open.  I cannot 
>> find these files in the specified directory, which makes sense if 
>> they are deleted, but if they are deleted than how can they be 
>> currently open files?
>> > 
>> > Cheers,
>> > 
>> > Seth Olsen
>> > 
>> > 
>> > 
>> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>> > 
>> > Dr Seth Olsen, PhD
>> > Postdoctoral Fellow, Computational Systems Biology Group
>> > Centre for Computational Molecular Science
>> > Chemistry Building,
>> > The University of Queensland
>> > Qld 4072, Brisbane, Australia
>> > 
>> > tel (617) 33653732
>> > fax (617) 33654623
>> > email: s.olsen1 at uq.edu.au
>> > Web: www.ccms.uq.edu.au 
>> > 
>> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
>> > 
>> > 
>> > 
>> > 
>> 

-- 
Prof. Hans-Joachim Werner
Institute for Theoretical Chemistry
University of Stuttgart
Pfaffenwaldring 55
D-70569 Stuttgart, Germany
Tel.: (0049) 711 / 685 4400
Fax.: (0049) 711 / 685 4442
e-mail: werner at theochem.uni-stuttgart.de



More information about the Molpro-user mailing list