[molpro-user] MolPro problem in HEAD NODE of CLUSTER

Jeff Hammond jhammond at alcf.anl.gov
Thu May 30 01:46:21 BST 2013


ListenAndAccept is part of the TCGMSG messaging library inside of
Global Arrays.  It is likely that the login node doesn't have the same
network view as the compute node and thus cannot work.  This is quite
common.  To work around it, build Molpro separately for the login and
compute nodes.  If that doesn't work, then the problem is due to how
your login node is configured.

Also, Andy is absolutely right that you should talk to your sysadmin.
This is a local problem related to how your machine is configured and
really has nothing to do with Molpro itself.  A competent sysadmin
will be able to figure out what connection is different and/or blocked
on the login.

You might want to run /usr/bin/hostname on the compute nodes and login
node to see what is different.

Best,

Jeff

On Wed, May 29, 2013 at 5:59 PM, Andy May <MayAJ1 at cardiff.ac.uk> wrote:
>
> If nothing Molpro related has changed it seems this is due to a change in
> the system. Perhaps some block has been put in to stop jobs being run on the
> head node, or perhaps some network related change has been made. The
> 'waiting for connection' message could be something like waiting for ssh
> password or trying to connect to an incorrect hostname, maybe /etc/hosts
> does not contain the system hostname.
>
> I suggest to contact the sysadmin to find out any recent changes.
>
> Best wishes,
>
> Andy
>
> On 25/05/13 19:00, SAIKAT MUKHERJEE wrote:
>>
>>
>> Hello Developers and Users
>>
>> We had installed MolPro 2010 in our IBM cluster long back,
>> and it was running very well in parallel in the cluster.
>> The cluster has one head node and three computing nodes.
>>
>> Recently no molpro job (both serial and parallel) is running
>> in the head node though the jobs are running in the compute
>> nodes without any problem.
>>
>> In the head node, after submitting a job
>>
>>    $ molpro abc.molpro &
>>
>> no out and xml file is produced. In the job list no job is shown.
>>
>> I am giving the description of submitting a job below:
>>
>> *******************************************************
>> $ molpro abc.molpro &
>> [1] 4232
>> [molpro at cluster test]$ tmp =
>>
>> /export/home/molpro/pdir//usr/local/bin/molpro_bin/molprop_2010_1_Linux_x86_64_i8.exe.p
>>   Creating: host=cluster.hpc.org, user=molpro,
>>
>> file=/usr/local/bin/molpro_bin/molprop_2010_1_Linux_x86_64_i8.exe,
>> port=37632
>> /usr/local/bin/molpro_bin/molprop_2010_1_Linux_x86_64_i8.exe, len=60
>> abc.molpro, len=10
>>     -master, len=7
>> cluster.hpc.org, len=15
>>       37632, len=5
>>           1, len=1
>>           1, len=1
>>           0, len=1
>>           0, len=1
>> *************************************************************
>>
>> After a long time it ends uo with this error
>>
>> 1: ListenAndAccept: timeout waiting for connection 0 (0).
>>    1: ListenAndAccept: timeout waiting for connection 0 (0).
>>    0: interrupt(1)
>>
>>
>> I noticed that in the /tmp directory no account is created.
>>
>> Please guide me to resolve the issue.
>>
>> ---------------------------------------------------------
>> SAIKAT MUKHERJEE
>> Junior Research Fellow (INSPIRE Fellow)
>> Dr. Satrajit Adhikari Group
>> Physical Chemistry Department
>> I.A.C.S; Jadavpur
>> Kolkata - 700032
>>
>>
>>
>>
>> _______________________________________________
>> Molpro-user mailing list
>> Molpro-user at molpro.net
>> http://www.molpro.net/mailman/listinfo/molpro-user
>>
> _______________________________________________
> Molpro-user mailing list
> Molpro-user at molpro.net
> http://www.molpro.net/mailman/listinfo/molpro-user



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides



More information about the Molpro-user mailing list