[molpro-user] test job hangs on FC2 Xeon Node

Dr Seth OLSEN s.olsen1 at uq.edu.au
Tue Dec 7 05:42:22 GMT 2004



Hi Nick,

The MKL libraries did not work either.  In the end, the way I got around the problem was to install RedHat on the nodes in question.  I am not sure that the problem here is with MolPro.  I have similar problems installing other program suites on these nodes and the general symptomology is identical - processes become unkillable, taking up all the %CPU (all of which, it appears, is SYSTEM, not USER) and almost %mem.  It appears that the problem was with Fedora on this particular architecture.

Cheers,

Seth
ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms

Dr Seth Olsen, PhD
Postdoctoral Fellow, Computational Systems Biology Group
Centre for Computational Molecular Science
Chemistry Building,
The University of Queensland
Qld 4072, Brisbane, Australia

tel (617) 33653732
fax (617) 33654623
email: s.olsen1 at uq.edu.au
Web: www.ccms.uq.edu.au 

ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms




----- Original Message -----
From: Nick Wilson <WilsonNT at Cardiff.ac.uk>
Date: Wednesday, November 24, 2004 2:52 am
Subject: Re: [molpro-user] test job hangs on FC2 Xeon Node

> Dear Seth,
> 
> Using a combination of dual xeon, redhat7.3, ifc7.0 and atlas I 
> think I 
> might have reproduced a similar race condition.
> 
> It went away when I used the mkl_ia32 blas library by editing CONFIG:
> 
> BLASLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_ia32 
> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
> LAPACKLIB_p4="-L/opt/intel/mkl61/lib/32 -lmkl_lapack 
> -Wl,-rpath,/opt/intel/mkl61/lib/32 "
> 
> (you might also need -lguide if you don't have -openmp) and doing 
> "make"
> or when I used the blas shipped with molpro by editing the CONFIG 
> file thus:
> 
> FTCFLAGS="mpp eaf blas0"
> BLASLIB=""
> LAPACKLIB=""
> BLASLIB_p4=""
> LAPACKLIB_p4=""
> 
> Then doing:
> 
> rm lib/libmolpro.a
> make
> 
> 
> Can you test whether either intel's or molpro's blas/lapack fixes 
> your 
> problem.
> Best wishes,
> Nick
> 
> 
> Dr Seth OLSEN wrote:
> > Hello Molpro-Users,
> > 
> > As outlined in previous communiques, I have been having no luck 
> in getting molpro2002.6 to run on a Dual Xeon node with Fedora Core 
> 2, either as the installed rpm or as a self-compiled version done 
> with ifc7 or ifc8.  The problem is as follows.  After the integral 
> sort, the process writes no more to output but becomes unkillable 
> with 99.9%CPU and 1.0%Mem as given by 'top'.
> > 
> > In order to help diagnose the problem, I have turned the 
> 'gprint,io,cpu' directive on in a given failing job 
> (bccd_opt.test).  The following are the last lines written to 
> output for that job with the io printing turned on:
> > 
> >  EXTENDING RECORD    1300.1 BY        34949. WORDS TO      38820. 
> IMPLEMENTATION=df    EXTENSION 0
> >  
> >  NUMBER OF SORTED TWO-ELECTRON INTEGRALS:      34949.     BUFFER 
> LENGTH:  32768
> >  NUMBER OF SEGMENTS:   1  SEGMENT LENGTH:      34949      RECORD 
> LENGTH: 524288
> >  
> >  Memory used in sort:       0.59 MW
> >  OPENW FILE 24  NAME=/scratch/root/eaf_T2400002627.TMP  
> IMPLEMENTATION=eaf   STATUS=scratch   HANDLE=     2
> >  OPEN EAF FILE 24  NAME=  IMPLEMENTATION=eaf
> >  CLOSEW FILE 21  NAME=eaf_T2100002627.TMP  IMPLEMENTATION=eaf   
> HANDLE=     1
> >  CLOSE EAF FILE 21
> > 
> > To determine what files might be opened by molpro at the time 
> that the program stops functioning, I issue a 'lsof | grep molpro' 
> command while the program is running in it's 'unkillable' final 
> status.  The following is the output of that command:
> > 
> > bash      2210     root  cwd    DIR        8,1    12288     
> 295282 /opt/molpro/testjobs
> > molpro    2624     root  cwd    DIR        8,2     4096    
> 2796193 /scratch/root
> > molpro    2624     root  rtd    DIR        8,1     4096          
> 2 /
> > molpro    2624     root  txt    REG        8,1    41552     
> 491923 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/molpro
> > molpro    2624     root  mem    REG        8,1  1455084      
> 82119 /lib/tls/libc-2.3.3.so
> > molpro    2624     root  mem    REG        8,1   106892     
> 375519 /lib/ld-2.3.3.so
> > molpro    2624     root    0u   CHR      136,1                   
> 3 /dev/pts/1
> > molpro    2624     root    1u   CHR      136,1                   
> 3 /dev/pts/1
> > molpro    2624     root    2u   CHR      136,1                   
> 3 /dev/pts/1
> > molpro    2624     root    4u   REG        8,1      823      
> 18013 /tmp/tmpfuX2LXr (deleted)
> > parallel  2626     root  txt    REG        8,1    30180     
> 491926 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/parallel
> > parallel  2626     root    1u   REG        8,1    12113     
> 295235 /opt/molpro/testjobs/bccd_opt.out
> > molprop_2 2627     root  cwd    DIR        8,2     4096    
> 2796193 /scratch/root
> > molprop_2 2627     root  rtd    DIR        8,1     4096          
> 2 /
> > molprop_2 2627     root  txt    REG        8,1 19346064     
> 491925 /usr/local/lib/molpro-mpp-Linux-i686-i4-
> 2002.6/molprop_2002_6_p4_tcgmsg.exe> molprop_2 2627     root  mem   
> REG        8,1    96248     375542 /lib/libnsl-2.3.3.so
> > molprop_2 2627     root  mem    REG        8,1   106892     
> 375519 /lib/ld-2.3.3.so
> > molprop_2 2627     root  mem    REG        8,1  1455084      
> 82119 /lib/tls/libc-2.3.3.so
> > molprop_2 2627     root  mem    REG        8,1   214796      
> 82121 /lib/tls/libm-2.3.3.so
> > molprop_2 2627     root  mem    REG        8,1    43528     
> 375552 /lib/libnss_nis-2.3.3.so
> > molprop_2 2627     root  mem    REG        8,1    50944     
> 375549 /lib/libnss_files-2.3.3.so
> > molprop_2 2627     root    0u   REG        8,1      823      
> 18013 /tmp/tmpfuX2LXr (deleted)
> > molprop_2 2627     root    1u   REG        8,1    12113     
> 295235 /opt/molpro/testjobs/bccd_opt.out
> > molprop_2 2627     root    2u   CHR      136,1                   
> 3 /dev/pts/1
> > molprop_2 2627     root    3u  IPv4       4464                 
> TCP sphinx128.giza:32846->sphinx128.giza:32844 (ESTABLISHED)
> > molprop_2 2627     root    4u   REG        8,1     1457      
> 18014 /tmp/forttempG1uyhO
> > molprop_2 2627     root    5u   REG        8,1       74      
> 18015 /tmp/forttempfYdJb0
> > molprop_2 2627     root    6u   REG        8,1        0      
> 18016 /tmp/forttemp2hFU5b
> > molprop_2 2627     root    7u   REG        8,1        0      
> 18017 /tmp/forttemp9p96Zn
> > molprop_2 2627     root    8u   REG        8,2  3006888    
> 2796194 /scratch/root/df_T0100002627.TMP (deleted)
> > molprop_2 2627     root    9u   REG        8,2   182344    
> 2796195 /scratch/root/df_T0200002627.TMP (deleted)
> > molprop_2 2627     root   10u   REG        8,2   182344    
> 2796196 /scratch/root/df_T0300002627.TMP (deleted)
> > molprop_2 2627     root   11u   REG        8,2        0    
> 2796197 /scratch/root/df_T0400002627.TMP (deleted)
> > molprop_2 2627     root   12r   REG        8,1   476967     
> 491914 /usr/local/lib/molpro-mpp-Linux-i686-i4-2002.6/libmol.index
> > molprop_2 2627     root   13u   REG        8,2  3428352    
> 2796199 /scratch/root/eaf_T2400002627.TMP (deleted)
> > 
> > So, it appears that the *.TMP files that molpro has most recently 
> opened and closed are listed as deleted but still open.  I cannot 
> find these files in the specified directory, which makes sense if 
> they are deleted, but if they are deleted than how can they be 
> currently open files?
> > 
> > Cheers,
> > 
> > Seth Olsen
> > 
> > 
> > 
> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> > 
> > Dr Seth Olsen, PhD
> > Postdoctoral Fellow, Computational Systems Biology Group
> > Centre for Computational Molecular Science
> > Chemistry Building,
> > The University of Queensland
> > Qld 4072, Brisbane, Australia
> > 
> > tel (617) 33653732
> > fax (617) 33654623
> > email: s.olsen1 at uq.edu.au
> > Web: www.ccms.uq.edu.au 
> > 
> > ccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccmsccms
> > 
> > 
> > 
> > 
> 




More information about the Molpro-user mailing list