Post-processing of output and databases
Output
Molpro produces an output file that is in XML format with appropriate mark-up for all important results. This file, which has a .xml
suffix, conforms strictly to a well-defined schema, and the the schema definition file can be found in the main Molpro source or installation tree in the directory lib/schema
. The principal elements of marked-up output are
- jobstep The results from one job step.
- molecule A container for data on a single molecule.
- cml:molecule Molecular geometry in the Chemical Markup Language (CML) format.
- property A computed property, for example an energy or dipole moment.
- table The output of Molpro’s
TABLE
command in XHTML format. - basisSet A self-contained description of the orbital basis set.
- orbitals A set of orbitals.
- vibrations Harmonic normal vibrational modes.
- variables Molpro’s internal variables.
- platform Information about the computing system on which a job was run.
Not all of these elements are produced by default in the regular job transcript .xml
file; some of them can result from using the PUT,XML
to make a separate dump file.
molpro-output
is understood by several post-processing programs, including Jmol, sjef and gmolpro.
The following gives an example of extracting and manipulating the details of the wavefunction using Python.
- examples/basis-from-xml.inp
geometry={o;h,o,r;h,o,r,h,theta} r=1 angstrom, theta=104 degree basis,cc-pvqz rhf {casscf;closed,2;expec,delta,o;expec,delta,h} put,xml,h2ocasscf.xml if (.not.iswindows) then system,'./evaluate-charge-density.py',' h2ocasscf.xml',' > basis-from-xml.log' endif
- examples/evaluate-charge-density.py
#!/usr/bin/env python import sys import math import re from lxml import etree from numpy import * import numpy as np tree = etree.parse(sys.argv[1]) document = tree.getroot() namespaces={ 'molpro-output': 'http://www.molpro.net/schema/molpro-output', 'xsd': 'http://www.w3.org/1999/XMLSchema', 'cml': 'http://www.xml-cml.org/schema', 'stm': 'http://www.xml-cml.org/schema', 'xhtml': 'http://www.w3.org/1999/xhtml', 'xlink': 'http://www.w3.org/1999/xlink'} def evaluateBasis(molecule,points): # evaluate basis set at a number of points cartesianAngularQuantumNumbers=array( [[0, 1,0,0, 2,0,0,1,1,0, 3,0,0,2,2,1,0,1,0,1, 4,0,0,3,3,1,0,1,0,2,2,0,2,1,1, 5,4,4,3,3,3,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0, 6,5,5,4,4,4,3,3,3,3,2,2,2,2,2,1,1,1,1,1,1,0,0,0,0,0,0,0], [0, 0,1,0, 0,2,0,1,0,1, 0,3,0,1,0,2,2,0,1,1, 0,4,0,1,0,3,3,0,1,2,0,2,1,2,1, 0,1,0,2,1,0,3,2,1,0,4,3,2,1,0,5,4,3,2,1,0, 0,1,0,2,1,0,3,2,1,0,4,3,2,1,0,5,4,3,2,1,0,6,5,4,3,2,1,0], [0, 0,0,1, 0,0,2,0,1,1, 0,0,3,0,1,0,1,2,2,1, 0,0,4,0,1,0,1,3,3,0,2,2,1,1,2, 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5, 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5,6]]) normfac=array([1, 1,1,1, 3,3,3,1,1,1, 15,15,15,3,3,3,3,3,3,1, 105,105,105,15,15,15,15,15,15,9,9,9,1,1,1, 945,105,105,45,15,45,45,9,9,45,105,15,9,15,105,945,105,45,45,105,945]) sphtran=[] sphtran.append(array([[1]])) sphtran.append(array([[1,0,0],[0,1,0],[0,0,1]])) sphtran.append(array([[-0.5,-0.5,1,0,0,0],[1,-1,0,0,0,0],[0,0,0,0,1,0],[0,0,0,1,0,0],[0,0,0,0,0,1]])) for answer in molecule.xpath('//molpro-output:variables/molpro-output:variable[@name="_ANGSTROM"]/molpro-output:value',namespaces=namespaces): Angstrom=np.float64(answer.text) atoms = molecule.xpath('//cml:atomArray/cml:atom',namespaces=namespaces) basisSets = molecule.xpath('molpro-output:basisSet[@id="ORBITAL"]',namespaces=namespaces) if len(basisSets) != 1: raise Exception('something is wrong: there should be just one orbital basisSet') basisSet = basisSets[0] print('Basis length =',basisSet.get('length'),' angular =',basisSet.get('angular'),' type =',basisSet.get('type'),' groups =',basisSet.get('groups')) basisGroups = basisSet.xpath('molpro-output:basisGroup',namespaces=namespaces) # print #print(str(len(basisGroups))+' basisGroups:') #for basisGroup in basisGroups: # print(basisGroup.get('id')+' '+basisGroup.get('minL')+' '+basisGroup.get('maxL')+' '+basisGroup.get('primitives')+' '+basisGroup.get('angular')+' '+basisGroup.get('contractions')) #print # now walk through basis set and evaluate it at some points basisAtPoints=empty((0,len(points)),dtype=np.float64) nbasis=0 iatom=0 for atom in atoms: nuclearCoordinates=[np.float64(atom.get('x3'))*Angstrom,np.float64(atom.get('y3'))*Angstrom,np.float64(atom.get('z3'))*Angstrom] query = 'molpro-output:association/molpro-output:atoms[@xlink:href[contains(.,"@id=\'' + atom.get('id') + '\'")]]/..' basisGroupAssociation = basisSet.xpath(query,namespaces=namespaces) if len(basisGroupAssociation) != 1: raise Exception('something is wrong: there should be a unique association of an atom with a basis set') bases=basisGroupAssociation[0].xpath('molpro-output:bases',namespaces=namespaces) if len(bases) != 1: raise Exception('something is wrong: there should be a bases node in association') basesString=bases[0].get('{'+namespaces['xlink']+'}href') basesString=basesString[basesString.find('basisGroup['):] basesString=basesString[basesString.find('[')+1:].lstrip() basesString=basesString[:basesString.find(']')].rstrip() basesString=basesString.replace('or','').replace('\n','').replace("'",'') list = basesString.split("@id="); for item in list: item=item.lstrip().rstrip() if item.isalnum() : basisGroup=basisSet.xpath('molpro-output:basisGroup[@id="'+item+'"]',namespaces=namespaces)[0] lquant =int(basisGroup.get('minL')) if lquant > 5: raise Exception("Sorry, I was too lazy to write this for i basis functions and higher") if lquant != int(basisGroup.get('maxL')): raise Exception("This program cannot handle multiple-angular momentum sets") #print(basisGroup.get('id')+' '+basisGroup.get('minL')+' '+basisGroup.get('maxL')+' '+basisGroup.get('primitives')+' '+basisGroup.get('angular')+' '+basisGroup.get('contractions')) alpha=np.float64(re.sub(' +',' ',basisGroup.xpath('molpro-output:basisExponents',namespaces=namespaces)[0].text.replace('\n','').lstrip().rstrip()).split(" ")) #first evaluate all the primitives for this shell on the grid ncompc=int(((lquant+2)*(lquant+1))/2) # number of cartesian components primitivesc = empty((ncompc,len(alpha),len(points))) lqbase=int((lquant*(lquant+1)*(lquant+2))/6) for ip in arange(len(points)): xyz = np.subtract(points[ip],nuclearCoordinates) r2 = np.dot(xyz,xyz) for ia in arange(len(alpha)): alph = alpha[ia] norm = sqrt(sqrt(2*alph/math.pi)**3*(4*alph)**lquant) value = norm * math.exp(-alph*r2) for icomp in range(ncompc): k=cartesianAngularQuantumNumbers[0,icomp+lqbase] l=cartesianAngularQuantumNumbers[1,icomp+lqbase] m=cartesianAngularQuantumNumbers[2,icomp+lqbase] primitivesc[icomp,ia,ip] = value * xyz[0]**k * xyz[1]**l * xyz[2]**m/math.sqrt(normfac[icomp+lqbase]) # transformation to spherical harmonics if molecule.xpath('//molpro-output:orbitals',namespaces=namespaces)[0].get('angular')=='spherical': # Molpro 2012.1 does not produced this, but cartesian only. The spherical code here is not finished, but not presently needed ncomp=2*lquant+1 else: ncomp=ncompc if ncomp < ncompc and lquant >= len(sphtran): raise Exception("Spherical functions not yet coded") elif ncomp < ncompc: primitives = empty((ncomp,len(alpha),len(points))) print("primitivesc",primitivesc) print("sphtran[lquant]",sphtran[lquant]) for ip in arange(len(points)): #print sphtran[lquant] #print primitivesc[:,:,ip] #print sphtran[lquant].shape #print primitivesc[:,:,ip].shape primitives[:,:,ip] = dot(sphtran[lquant],primitivesc[:,:,ip]) #print "primitives",primitives else: primitives = primitivesc # next loop over contractions for basisContraction in basisGroup.xpath('molpro-output:basisContraction',namespaces=namespaces): cc=np.float64(re.sub(' +',' ',basisContraction.text.replace('\n','').lstrip().rstrip()).split(" ")) # loop over angular components in the contraction basisAtPoints=np.append(basisAtPoints,empty((ncomp,len(points)),dtype=np.float64),axis=0) for m in range(ncomp): basisAtPoints[nbasis][:]=dot(cc,primitives[m,:,:]) nbasis+=1 # completed evaluating this basis function iatom+=1 #print "basisAtPoints" #print basisAtPoints #print outer(basisAtPoints[:,1],basisAtPoints[:,1]) return basisAtPoints # end basis evaluation # the main program for answer in document.xpath('//molpro-output:variables/molpro-output:variable[@name="_ANGSTROM"]/molpro-output:value',namespaces=namespaces): Angstrom=np.float64(answer.text) molecules = document.xpath('/*/molpro-output:molecule',namespaces=namespaces) print(str(len(molecules))+' molecules:') for molecule in molecules: print('Molecule ' + molecule.get('id')) metadata = molecule.xpath('stm:metadataList/stm:metadata',namespaces=namespaces) print(str(len(metadata))+' metadata items:') for md in metadata: print(md.get('name') + ' = ' + md.get('content')) atoms = molecule.xpath('//cml:atomArray/cml:atom',namespaces=namespaces) points=zeros((len(atoms),3),dtype=np.float64) print(str(len(atoms))+' atoms:') ids=empty(len((atoms)),dtype='a4') elementTypes=empty((len(atoms)),dtype='a4') iatom=0 for atom in atoms: print(atom.get('id')+' '+atom.get('elementType')+' '+atom.get('x3')+' '+atom.get('y3')+' '+atom.get('z3')) ids[iatom]=atom.get('id') elementTypes[iatom]=atom.get('elementType') points[iatom,:]=[np.float64(atom.get('x3'))*Angstrom,np.float64(atom.get('y3'))*Angstrom,np.float64(atom.get('z3'))*Angstrom] iatom+=1 basisAtPoints = evaluateBasis(molecule,points) # evaluate the basis set at the nuclei results=zeros(len(points),dtype=np.float64) orbitalSets = molecule.xpath('molpro-output:orbitals',namespaces=namespaces) if len(orbitalSets) != 1: raise Exception('something is wrong: there should be just one orbital set') orbitalSet = orbitalSets[0] for orbital in orbitalSet.xpath('molpro-output:orbital',namespaces=namespaces): occ = np.float64(orbital.get('occupation')) if occ > 0: # print 'orbital occupation=',occ mos=array(re.sub(' +',' ',orbital.text.lstrip().rstrip().replace('\n','')).split(" ")) mopoint=zeros(len(points),dtype=np.float64) for ic in arange(len(basisAtPoints)): for ipoint in arange(len(points)): mopoint[ipoint]+=np.float64(mos[ic])*basisAtPoints[ic][ipoint] for ipoint in range(len(points)): results[ipoint]+=mopoint[ipoint]**2*occ print("Calculated densities at the nuclei:") print(results) for answer in molecule.xpath('molpro-output:variables/molpro-output:variable[contains(@name,"DELTA")]/molpro-output:value',namespaces=namespaces): name=answer.xpath('..',namespaces=namespaces)[0].get('name').replace('_','') print('From Molpro:',name,'=',answer.text) closeness=10000000000 for ipoint in range(len(points)): distance=math.sqrt((results[ipoint]-np.float64(answer.text))**2) if distance < closeness: closeness=distance closest=ipoint print('... nearest to result ',closest,' which differs by ',closeness) print('End of molecule '+molecule.get('id'))
Databases
A facility is provided to store and interrogate sets of molecules, together with information about how they are to be combined in balanced chemical equations. This collection of information is referred to as a database, and can be generated completely manually, or partially by running appropriate Molpro calculations. Analysis of the database can give a summary of the energy changes associated with each described reaction, and two or more similar databases can be compared reaction by reaction, to give a statistical analysis of the differences between them.
All of the files associated with the database facility can be found in the directory database
in the main Molpro source or installation tree.
Description and specification of databases
A database is an XML file conforming to the molpro-database
schema, and consists one or more occurrences of each of the following two principal elements.
- molecule Information about a single molecular species in the
molpro-output
XML format. This will usually be the result ofPUT,XML
in a Molpro calculation, but can also be constructed directly from an external data source. The important quantities that are used are the geometry and energy, together with metadata such as the method and basis set, and other quantities such as spin and symmetry that might be useful for constructing a new Molpro job for the molecule. - reaction A list of
species
specifications that point uniquely to one of themolecule
nodes, together with information on how the species appears stoichimetrically in the reaction, and whether it is a special point such as a transition state.species
specifications can also be given without either of these tags, allowing additional geometries, for example along a reaction coordinate or potential surface cut, to be included.
Normally, the molecule
nodes will be in separate self-contained files that are then referenced in the main database file through the syntax of XInclude. There are three reasons for this. Firstly, these files can be produced directly by a Molpro calculation, with the rest of the database being constructed by hand. Secondly, they allow the possibility that the molecule files be replaced in the future by, for example, running all the molecule calculations again using a different method; in that case, the rest of the database, i.e. the reaction specifications, does not need to change. This supports the possibility of having several databases that have the same structure – specification of reactions – but different numerical data, and therefore being capable of numerical comparison. Thirdly, several databases can coexist in the same directory, and share some of the same molecule files. An example of this is a supplementary database that consists of a subset of the reactions contained in the main database.
The following is an example of a complete database of four reactions involving the species $\rm O, H_2, H_2O, H_2O_2$ and $\rm CH_2O$. Note that the association between the species
and the molpro-output:molecule
nodes is achieved through the use of InChI tags, which PUT,XML
will produce provided that OpenBabel is installed on the system. An alternative is through syntax such as
{PUT,XML,file.xml; index,73}
and the use of <species index="73">
in the database file. Note that sometimes different species have the same InChI, and so the use of index
is necessary to resolve ambiguities.
- database/sets/examples/reactions/reactions.xml
<?xml version="1.0"?> <database xmlns="http://www.molpro.net/schema/molpro-database" xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="h2o.xml" /> <xi:include href="h2o2.xml"/> <xi:include href="h2.xml"/> <xi:include href="o2.xml"/> <xi:include href="co.xml"/> <xi:include href="h2co.xml"/> <xi:include href="h2cots.xml"/> <xi:include href="o.xml"/> <reaction title="insertion of O into H2"> <species InChI="InChI=1S/H2O/h1H2" count="1"/> <species InChI="InChI=1S/O" count="-1"/> <species InChI="InChI=1S/2H2/h2*1H" count="-1"/> </reaction> <reaction title="H2O formation"> <species InChI="InChI=1S/H2O/h1H2" count="1"/> <species InChI="InChI=1S/O2/c1-2" count="-0.5"/> <species InChI="InChI=1S/2H2/h2*1H" count="-1"/> </reaction> <reaction title="H2O2 formation"> <species InChI="InChI=1S/H2O2/c1-2/h1-2H" count="1"/> <species InChI="InChI=1S/O2/c1-2" count="-1"/> <species InChI="InChI=1S/2H2/h2*1H" count="-1"/> </reaction> <reaction title="Formaldehyde dissociation"> <species InChI="InChI=1S/CH2O/c1-2/h1H2" count="-1"/> <species InChI="InChI=1S/CO/c1-2" count="1"/> <species InChI="InChI=1S/2H2/h2*1H" count="1"/> <species InChI="InChI=1S/CHO.H/c1-2;/h1H;" transitionState="true"/> </reaction> </database>
For full specification of the possible structure of a database, see the schema file
Interrogation and manipulation of databases
The directory database/utilities
contains several Python scripts that manipulate databases. For convenience, they can be run through the script
molpro –database
script-name arguments …
so long as Python (version 3 preferred) is installed on the system. You need the lxml and requests package included in your Python installation:
pip install lxml requests
(or pip3
if you are using Python 3).
The script validate
checks whether a database conforms to the schema, for example
cd Molpro # assuming below that we are in Molpro source tree, but works from anywhere bin/molpro --database validate \ database/sets/examples/reactions/reactions.xml
Computation of new database data
The script clone
takes an existing database, for which the file name should be provided as an argument, and generates a set of Molpro jobs that will run the same method on each of the molecules, with the end result that a new database is created. If the database master file is the only one in the directory that declares itself to belong to the molpro-database
schema, then you can just give the directory name as the argument to this and other scripts instead. In addition, if the database master file has the suffix .xml
, the suffix does not need to be specified. For the above example, this could be
cd Molpro # assuming below that we are in Molpro source tree, but works from anywhere bin/molpro --database clone \ database/sets/examples/reactions
This will create a new directory reactions.d
with the following contents.
original/ reactions.xml runall/ procedures.molpro run/ reactions.d/original: co.xml h2co.xml h2o.xml o.xml h2.xml h2cots.xml h2o2.xml o2.xml reactions.d/run: co.molpro h2co.molpro h2o.molpro o.molpro h2.molpro h2cots.molpro h2o2.molpro o2.molpro reactions.d/runall: reactions.molpro
The file procedures.molpro
contains a procedure that will be run on every molecule, and it should be edited to use the desired methods. Then the calculations can be run, either via each of the individual Molpro input files in run/
, or the single input file reactions.molpro
in the directory runall
. Once these jobs have completed, then the directory contains a complete database with the original reaction scheme but new data.
Analysis of databases
cd Molpro bin/molpro --database analyse \ database/sets/examples/reactions/reactions.xml
will analyse the database, and report the energy change for each described reaction. If two or more databases are given as arguments, the analysis will be done on each, and also on the difference between the first database and the second and any subsequent, including a statistical summary. For the example given above, one might say
cd Molpro/reactions.d ../bin/molpro --database analyse original .
analyse
has a number of options that are described by running
molpro --database analyse --help
Library of databases
The directory database/sets
contains several standard databases. Within each one is a description of its origin, contents and purpose. The scripts described above can take these databases as inputs, for example database/sets/examples/reactions/reactions.xml
; as a shortcut, one could simply instead use examples/reactions
which will find the system database irrespective of the current working directory.