[molpro-user] Permanent installation of dependencies (was: Non-reproducible stuck state when running Molpro on NFS drive)

Gregory Magoon gmagoon at MIT.EDU
Sun Jul 24 04:12:19 BST 2011


After some work I was finally able to trace this to some sort of issue between
NFSv4 and MPICH2; I can get this to work properly when I mount the NFS drives
as NFSv3 (as opposed to NFSv4), so the issue is now more-or-less resolved.

A quick follow-up question: Is there a recommended approach for permanent
installation of the mpich2 dependency (and maybe also GA?) when using the auto
build approach? By default, it seems that the installation scripts leave the
mpiexec in the compile directory. I saw that the makefile/installation scripts
mention an option called REDIST which seemed like it might allow this, 
but they
don't seem to make use of this option (REDIST=NO).

Thanks,
Greg

Quoting Gregory Magoon <gmagoon at MIT.EDU>:

> Hi,
> I have successfully compiled molpro (with Global Arrays/TCGMSG; mpich2 from
> Ubuntu package) on one of our compute nodes for our new server, and installed
> it in an NFS directory on our head node. The initial tests on the 
> compute node
> ran fine but since the installation, I've had issues with running 
> molpro on the
> compute nodes (it seems to work fine on the head node). Sometimes 
> (sorry I can't
> be more precise, but it does not seem to be reproducible), when 
> running on the
> compute node, the job will get stuck in the early stages, producing a 
> lot (~14+
> Mbps outbound to headnode and 7Mbps inbound from headnode) of NFS traffic and
> causing fairly high nfsd process CPU% usage on the head node. Molpro 
> processes
> in the stuck state are shown in "top" command display at the bottom of the
> e-mail. I have also attached example verbose output for a case that 
> works and a
> case that gets stuck.
>
> Some notes:
> -/usr/local is mounted as NFS read-only file system; /home is mounted 
> as NFS rw
> file system
> -It seems like runs with fewer processors (e.g. 6) are more likely to run
> successfully
>
> I've tried several approaches for addressing the issue, including 1. Mounting
> /usr/local as rw file system, and 2. Changing the rsize and wsize parameters
> for the NFS filesystem but none seem to work. We also tried piping < 
> /dev/null
> when calling the process, which seemed like it was helping at first, 
> but later
> tests suggested that this wasn't actually helping.
>
> If anyone has any tips or ideas to help diagnose the issue here, it would be
> greatly appreciated. If there are any additional details I can 
> provide to help
> describe the problem, I'd be happy to provide them.
>
> Thanks very much,
> Greg
>
> Top processes in "top" output in stuck state:
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>   10 root      20   0     0    0    0 S   10  0.0   0:16.50 kworker/0:1
>    2 root      20   0     0    0    0 S    6  0.0   0:10.86 kthreadd
> 1496 root      20   0     0    0    0 S    1  0.0   0:04.73 kworker/0:2
>    3 root      20   0     0    0    0 S    1  0.0   0:00.93 ksoftirqd/0
>
> Processes in "top" output for user in stuck state:
>  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 29961 user      20   0 19452 1508 1072 R    0  0.0   0:00.05 top
> 1176 user      20   0 91708 1824  868 S    0  0.0   0:00.01 sshd
> 1177 user      20   0 24980 7620 1660 S    0  0.0   0:00.41 bash
> 1289 user      20   0 91708 1824  868 S    0  0.0   0:00.00 sshd
> 1290 user      20   0 24980 7600 1640 S    0  0.0   0:00.32 bash
> 1386 user      20   0  4220  664  524 S    0  0.0   0:00.01 molpro
> 1481 user      20   0 18764 1196  900 S    0  0.0   0:00.00 mpiexec
> 1482 user      20   0 18828 1092  820 S    0  0.0   0:00.00 hydra_pmi_proxy
> 1483 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1484 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1485 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1486 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1487 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1488 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1489 user      20   0 18860  488  212 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1490 user      20   0 18860  488  208 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1491 user      20   0 18860  488  208 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1492 user      20   0 18860  488  208 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1493 user      20   0 18860  488  208 D    0  0.0   0:00.00 hydra_pmi_proxy
> 1494 user      20   0 18860  492  212 D    0  0.0   0:00.00 hydra_pmi_proxy
>
>
>
>





More information about the Molpro-user mailing list