[molpro-user] Large differences between CPU TIME and REAL TIME

Thu Apr 21 17:17:10 BST 2011

Thanks to all for the ideas. For this particular computer, the hard disks are
all single drive configuration (no RAID). Looking at "top" there seem 
to be the
expected number of processes running, but most are not at 100% CPU 
usage. We are
trying some tests with different memory usage and will post an update if we
reach any firm conclusions.
Thanks again,
Greg

Thanks to eQuoting Gerald Knizia <knizia at theochem.uni-stuttgart.de>:

> On Tuesday 19 April 2011 23:26, Gregory Magoon wrote:
>> One of our users has noticed large differences between CPU TIME and REAL
>> TIME in several runs and I was wondering if anyone had any tips for getting
>> the REAL TIME more in line with the CPU TIME.
>> One of the more obvious examples of a large time gap is for an mppx
>> frequency run on a 48 processor compute node (using all 48 processors):
>>  PROGRAMS   *        TOTAL      FREQ      OPTG   CCSD(T)        HF
>> INT CPU TIMES  *     20130.37  10666.84   8936.71    501.37     12.08
>> 13.09 REAL TIME  *    193915.08 SEC
>> The real time is over 9 times longer than the CPU time. The full output
>> file for this case is attached.
>
> As Kirk said, this would typically indicate a problem with the disk
> I/O-performance. The CCSD(T) program does everything it can to minimize the
> amount of disk I/O using the memory you give it, but there are some things
> for which simply cannot be avoided. And of course I/O per node scales
> linearly with the number of processes you run on that node.[1]
>
> However, that concrete job actually looks rather harmless on first sight
> (takes less than 1 GB disk space per processes) so I'm surprised by this. One
> would guess that on a 48 core machine there would be almost enough memory
> such that the OS would cache the entire working set in this case.. apparently
> that does not happen, at all.
>
> One thing you could try is to give molpro either more memory (to make 
> it's own
> caching more efficient) or less memory (to give the OS more freedom with its
> system cache). Apart from that I'm rather puzzled.
>
> [1] Another thing to look out for is whether the OS actually schedules all 48
> jobs on this node in a sensible manner. If for some reason all of them would
> want to run on only 8 of the cores that this would also produce the results
> you've seen.
> --
> Gerald Knizia
>