[molpro-user] MPI parallel jobs over TCP/IP

Kirk Peterson kipeters at wsu.edu
Mon Nov 7 22:02:31 GMT 2005


Hi,

I'm hoping someone has run into this problem too and has found where  
it lies.  We have a small Opteron cluster consisting of 5 dual- 
processor nodes with a simple GigE network.  While parallel Molpro  
built with tcgmsg works ok as long as we run just 2-way parallel on a  
single node, we often run into problems if we run large jobs across  
nodes.  To perhaps bypass this, we wanted to build a  version of  
Molpro using an MPI implementation.  With the latest GA tools (3-4b)  
and MPICH (1.2.7p1), the standard Molpro testjobs work just fine.   
The problems occurs for large open-shell CCSD(T) jobs where the  
amount of GA memory gets large (~500 MB).  (Note that large MRCI jobs  
seem to work fine.)   For example, if we modify the standard molpro  
benchmark normal_ccsd.com by removing the MP4 step and replacing CCSD  
by UCCSD(T) and then run this across 2 nodes,  the CCSD energy is  
correct, but the contribution due to triples is completely wrong by  
many mEh.  I've tried the same job using a myrinet-based Opteron  
cluster (similar build, but of course the myrinet-based mpich  
software) and it worked just fine, so it's not anything intrinsic to  
an Opteron (I think).

I'd really appreciate any help.

best regards,

Kirk



--------------------------------------------
Kirk A. Peterson
Professor of Chemistry and Materials Science
Washington State University
Pullman, WA 99164-4630

Office: (509) 335-7867
Fax:    (509) 335-8867
kipeters at wsu.edu
http://tyr0.chem.wsu.edu/~kipeters/
------------------------------------------------------------------------




More information about the Molpro-user mailing list