[molpro-user] MPI parallel jobs over TCP/IP

Kirk Peterson kipeters at wsu.edu
Mon Nov 7 22:02:31 GMT 2005


I'm hoping someone has run into this problem too and has found where  
it lies.  We have a small Opteron cluster consisting of 5 dual- 
processor nodes with a simple GigE network.  While parallel Molpro  
built with tcgmsg works ok as long as we run just 2-way parallel on a  
single node, we often run into problems if we run large jobs across  
nodes.  To perhaps bypass this, we wanted to build a  version of  
Molpro using an MPI implementation.  With the latest GA tools (3-4b)  
and MPICH (1.2.7p1), the standard Molpro testjobs work just fine.   
The problems occurs for large open-shell CCSD(T) jobs where the  
amount of GA memory gets large (~500 MB).  (Note that large MRCI jobs  
seem to work fine.)   For example, if we modify the standard molpro  
benchmark normal_ccsd.com by removing the MP4 step and replacing CCSD  
by UCCSD(T) and then run this across 2 nodes,  the CCSD energy is  
correct, but the contribution due to triples is completely wrong by  
many mEh.  I've tried the same job using a myrinet-based Opteron  
cluster (similar build, but of course the myrinet-based mpich  
software) and it worked just fine, so it's not anything intrinsic to  
an Opteron (I think).

I'd really appreciate any help.

best regards,


Kirk A. Peterson
Professor of Chemistry and Materials Science
Washington State University
Pullman, WA 99164-4630

Office: (509) 335-7867
Fax:    (509) 335-8867
kipeters at wsu.edu

