[molpro-user] Job termination during Pipek-Mezey Localisation

Rika Kobayashi Rika.Kobayashi at anu.edu.au
Wed Dec 20 10:28:52 CET 2017


Hello,
Signal 9 usually indicates to us that the job was killed from exceeding
memory and the logs indeed show:
12/19/2017 20:04:20;0008;pbs_python;Job;2169493.r-man2;Cgroup memory limit
exceeded: Killed process
24728 (molpro.exe) total-vm:17508688kB, anon-rss:9439868kB,
file-rss:4816kB, shmem-rss:264kB
This indicates a sudden memory hike that killed the job before it had a
chance to get logged by PBS.
Rika


On 19 December 2017 at 21:10, Seth Olsen <seth.olsen at uq.edu.au> wrote:

> Hi Molpro-User,
>
> I’ve been having a job fail during orbital localization.  It is a CASSCF
> (8 electron in 8 orbital) job.  The output ends abruptly:
>  ***********************************************************
> ***********************************************************************
>
>
>  Program * Orbital Localization         Authors:  W. Meyer, H.-J. Werner
>
>  Pipek-Mezey Localization
>
>  Molecular orbitals read from record     2141.2  Type=MCSCF/NATURAL
>  Density matrix read from record         2141.2  Type=MCSCF/CHARGE (state
> averaged)
>
> …but the standard output has a little more description
>
> --------------------------------------------------------------------------
> MPI_ABORT was invoked on rank 1 in communicator MPI COMMUNICATOR 4 DUP
> FROM 0
> with errorcode 15.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --------------------------------------------------------------------------
> 0:Terminate signal was sent, status=: 15
> (rank:0 hostname:r2398 pid:24716):ARMCI DASSERT fail.
> src/common/signaltrap.c:SigTermHandler():477 cond:0
> --------------------------------------------------------------------------
> mpirun noticed that process rank 7 with PID 24730 on node r2398 exited on
> signal 9 (Killed).
> --------------------------------------------------------------------------
>
> ============================================================
> ==========================
>                   Resource Usage on 2017-12-19 20:04:32:
>    Job Id:             2169493.r-man2
>    Project:            tv58
>    Exit Status:        0
>    Service Units:      0.55
>    NCPUs Requested:    8                      NCPUs Used: 8
>                                            CPU Time Used: 00:24:10
>
>    Memory Requested:   48.0GB                Memory Used: 31.96GB
>    Walltime requested: 03:00:00            Walltime Used: 00:04:09
>    JobFS requested:    150.0GB                JobFS used: 941.47MB
> ============================================================
> ==========================
>
> Any ideas?  Anyone seen this before?
>
> Many Thanks,
> Seth
> ===========================
> Seth Olsen, PhD.
> Honorary Fellow
> School of Mathematics & Physics
> The University of Queensland
> QLD 4072  Australia
> Ph: +61 7 3365 2816 <+61%207%203365%202816>
> ===========================
> A PGP public key for this address has been uploaded to the key servers.
>
>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.molpro.net/pipermail/molpro-user/attachments/20171220/a1f625ef/attachment.html>


More information about the Molpro-user mailing list