Post-processing of output and databases

Molpro produces an output file that is in XML format with appropriate mark-up for all important results. This file, which has a .xml suffix, conforms strictly to a well-defined schema, and the the schema definition file can be found in the main Molpro source or installation tree in the directory lib/schema. The principal elements of marked-up output are

jobstep The results from one job step.
molecule A container for data on a single molecule.
cml:molecule Molecular geometry in the Chemical Markup Language (CML) format.
property A computed property, for example an energy or dipole moment.
table The output of Molpro’s TABLE command in XHTML format.
basisSet A self-contained description of the orbital basis set.
orbitals A set of orbitals.
vibrations Harmonic normal vibrational modes.
variables Molpro’s internal variables.
platform Information about the computing system on which a job was run.

Not all of these elements are produced by default in the regular job transcript .xml file; some of them can result from using the PUT,XML to make a separate dump file.

molpro-output is understood by several post-processing programs, including Jmol, sjef and gmolpro.

The following gives an example of extracting and manipulating the details of the wavefunction using Python.

examples/basis-from-xml.inp

geometry={o;h,o,r;h,o,r,h,theta}
r=1 angstrom, theta=104 degree
basis,cc-pvqz
rhf
{casscf;closed,2;expec,delta,o;expec,delta,h}
put,xml,h2ocasscf.xml
if (.not.iswindows) then
system,'./evaluate-charge-density.py',' h2ocasscf.xml',' > basis-from-xml.log'
endif

examples/evaluate-charge-density.py

#!/usr/bin/env python
import sys
import math
import re
from lxml import etree
from numpy import *
import numpy as np

tree = etree.parse(sys.argv[1])
document = tree.getroot()
namespaces={
    'molpro-output': 'http://www.molpro.net/schema/molpro-output',
    'xsd': 'http://www.w3.org/1999/XMLSchema',
    'cml': 'http://www.xml-cml.org/schema',
    'stm': 'http://www.xml-cml.org/schema',
    'xhtml': 'http://www.w3.org/1999/xhtml',
    'xlink': 'http://www.w3.org/1999/xlink'}

def evaluateBasis(molecule,points): # evaluate basis set at a number of points
    cartesianAngularQuantumNumbers=array(
     [[0, 1,0,0, 2,0,0,1,1,0, 3,0,0,2,2,1,0,1,0,1, 4,0,0,3,3,1,0,1,0,2,2,0,2,1,1, 5,4,4,3,3,3,2,2,2,2,1,1,1,1,1,0,0,0,0,0,0, 6,5,5,4,4,4,3,3,3,3,2,2,2,2,2,1,1,1,1,1,1,0,0,0,0,0,0,0],
      [0, 0,1,0, 0,2,0,1,0,1, 0,3,0,1,0,2,2,0,1,1, 0,4,0,1,0,3,3,0,1,2,0,2,1,2,1, 0,1,0,2,1,0,3,2,1,0,4,3,2,1,0,5,4,3,2,1,0, 0,1,0,2,1,0,3,2,1,0,4,3,2,1,0,5,4,3,2,1,0,6,5,4,3,2,1,0],
      [0, 0,0,1, 0,0,2,0,1,1, 0,0,3,0,1,0,1,2,2,1, 0,0,4,0,1,0,1,3,3,0,2,2,1,1,2, 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5, 0,0,1,0,1,2,0,1,2,3,0,1,2,3,4,0,1,2,3,4,5,0,1,2,3,4,5,6]])
    normfac=array([1, 1,1,1, 3,3,3,1,1,1, 15,15,15,3,3,3,3,3,3,1, 105,105,105,15,15,15,15,15,15,9,9,9,1,1,1, 945,105,105,45,15,45,45,9,9,45,105,15,9,15,105,945,105,45,45,105,945])
    sphtran=[]
    sphtran.append(array([[1]]))
    sphtran.append(array([[1,0,0],[0,1,0],[0,0,1]]))
    sphtran.append(array([[-0.5,-0.5,1,0,0,0],[1,-1,0,0,0,0],[0,0,0,0,1,0],[0,0,0,1,0,0],[0,0,0,0,0,1]]))

    for answer in molecule.xpath('//molpro-output:variables/molpro-output:variable[@name="_ANGSTROM"]/molpro-output:value',namespaces=namespaces):
        Angstrom=np.float64(answer.text)
    atoms = molecule.xpath('//cml:atomArray/cml:atom',namespaces=namespaces)

    basisSets = molecule.xpath('molpro-output:basisSet[@id="ORBITAL"]',namespaces=namespaces)
    if len(basisSets) != 1:
        raise Exception('something is wrong: there should be just one orbital basisSet')
    basisSet = basisSets[0]

    print('Basis length =',basisSet.get('length'),' angular =',basisSet.get('angular'),' type =',basisSet.get('type'),' groups =',basisSet.get('groups'))
    basisGroups = basisSet.xpath('molpro-output:basisGroup',namespaces=namespaces)
    #    print
        #print(str(len(basisGroups))+' basisGroups:')
        #for basisGroup in basisGroups:
        #    print(basisGroup.get('id')+' '+basisGroup.get('minL')+' '+basisGroup.get('maxL')+' '+basisGroup.get('primitives')+' '+basisGroup.get('angular')+' '+basisGroup.get('contractions'))
        #print

   # now walk through basis set and evaluate it at some points
    basisAtPoints=empty((0,len(points)),dtype=np.float64)
    nbasis=0
    iatom=0
    for atom in atoms:
        nuclearCoordinates=[np.float64(atom.get('x3'))*Angstrom,np.float64(atom.get('y3'))*Angstrom,np.float64(atom.get('z3'))*Angstrom]
        query = 'molpro-output:association/molpro-output:atoms[@xlink:href[contains(.,"@id=\'' + atom.get('id') + '\'")]]/..'
        basisGroupAssociation = basisSet.xpath(query,namespaces=namespaces)
        if len(basisGroupAssociation) != 1:
            raise Exception('something is wrong: there should be a unique association of an atom with a basis set')
        bases=basisGroupAssociation[0].xpath('molpro-output:bases',namespaces=namespaces)
        if len(bases) != 1:
            raise Exception('something is wrong: there should be a bases node in association')
        basesString=bases[0].get('{'+namespaces['xlink']+'}href')
        basesString=basesString[basesString.find('basisGroup['):]
        basesString=basesString[basesString.find('[')+1:].lstrip()
        basesString=basesString[:basesString.find(']')].rstrip()
        basesString=basesString.replace('or','').replace('\n','').replace("'",'')
        list = basesString.split("@id=");
        for item in list:
            item=item.lstrip().rstrip()
            if item.isalnum() :
                basisGroup=basisSet.xpath('molpro-output:basisGroup[@id="'+item+'"]',namespaces=namespaces)[0]
                lquant =int(basisGroup.get('minL'))
                if lquant > 5:
                    raise Exception("Sorry, I was too lazy to write this for i basis functions and higher")
                if lquant != int(basisGroup.get('maxL')):
                    raise Exception("This program cannot handle multiple-angular momentum sets")
                #print(basisGroup.get('id')+' '+basisGroup.get('minL')+' '+basisGroup.get('maxL')+' '+basisGroup.get('primitives')+' '+basisGroup.get('angular')+' '+basisGroup.get('contractions'))
                alpha=np.float64(re.sub(' +',' ',basisGroup.xpath('molpro-output:basisExponents',namespaces=namespaces)[0].text.replace('\n','').lstrip().rstrip()).split(" "))
                #first evaluate all the primitives for this shell on the grid
                ncompc=int(((lquant+2)*(lquant+1))/2) # number of cartesian components
                primitivesc = empty((ncompc,len(alpha),len(points)))
                lqbase=int((lquant*(lquant+1)*(lquant+2))/6)
                for ip in arange(len(points)):
                    xyz = np.subtract(points[ip],nuclearCoordinates)
                    r2 = np.dot(xyz,xyz)
                    for ia in arange(len(alpha)):
                        alph = alpha[ia]
                        norm = sqrt(sqrt(2*alph/math.pi)**3*(4*alph)**lquant)
                        value = norm * math.exp(-alph*r2)
                        for icomp in range(ncompc):
                            k=cartesianAngularQuantumNumbers[0,icomp+lqbase]
                            l=cartesianAngularQuantumNumbers[1,icomp+lqbase]
                            m=cartesianAngularQuantumNumbers[2,icomp+lqbase]
                            primitivesc[icomp,ia,ip] = value * xyz[0]**k * xyz[1]**l * xyz[2]**m/math.sqrt(normfac[icomp+lqbase])

                # transformation to spherical harmonics
                if molecule.xpath('//molpro-output:orbitals',namespaces=namespaces)[0].get('angular')=='spherical': # Molpro 2012.1 does not produced this, but cartesian only. The spherical code here is not finished, but not presently needed
                    ncomp=2*lquant+1
                else:
                    ncomp=ncompc
                if ncomp < ncompc and lquant >= len(sphtran):
                    raise Exception("Spherical functions not yet coded")
                elif ncomp < ncompc:
                    primitives = empty((ncomp,len(alpha),len(points)))
                    print("primitivesc",primitivesc)
                    print("sphtran[lquant]",sphtran[lquant])
                    for ip in arange(len(points)):
                        #print sphtran[lquant]
                        #print primitivesc[:,:,ip]
                        #print sphtran[lquant].shape
                        #print primitivesc[:,:,ip].shape
                        primitives[:,:,ip] = dot(sphtran[lquant],primitivesc[:,:,ip])
                    #print "primitives",primitives
                else:
                    primitives = primitivesc

                # next loop over contractions
                for basisContraction in basisGroup.xpath('molpro-output:basisContraction',namespaces=namespaces):
                    cc=np.float64(re.sub(' +',' ',basisContraction.text.replace('\n','').lstrip().rstrip()).split(" "))
                    # loop over angular components in the contraction
                    basisAtPoints=np.append(basisAtPoints,empty((ncomp,len(points)),dtype=np.float64),axis=0)
                    for m in range(ncomp):
                        basisAtPoints[nbasis][:]=dot(cc,primitives[m,:,:])
                        nbasis+=1 # completed evaluating this basis function
        iatom+=1
    #print "basisAtPoints"
    #print basisAtPoints
    #print outer(basisAtPoints[:,1],basisAtPoints[:,1])
    return basisAtPoints # end basis evaluation


# the main program

for answer in document.xpath('//molpro-output:variables/molpro-output:variable[@name="_ANGSTROM"]/molpro-output:value',namespaces=namespaces):
    Angstrom=np.float64(answer.text)

molecules = document.xpath('/*/molpro-output:molecule',namespaces=namespaces)
print(str(len(molecules))+' molecules:')
for molecule in molecules:
    print('Molecule ' + molecule.get('id'))


    metadata = molecule.xpath('stm:metadataList/stm:metadata',namespaces=namespaces)
    print(str(len(metadata))+' metadata items:')
    for md in metadata:
        print(md.get('name') + ' = ' + md.get('content'))

    atoms = molecule.xpath('//cml:atomArray/cml:atom',namespaces=namespaces)
    points=zeros((len(atoms),3),dtype=np.float64)
    print(str(len(atoms))+' atoms:')
    ids=empty(len((atoms)),dtype='a4')
    elementTypes=empty((len(atoms)),dtype='a4')
    iatom=0
    for atom in atoms:
        print(atom.get('id')+' '+atom.get('elementType')+' '+atom.get('x3')+' '+atom.get('y3')+' '+atom.get('z3'))
        ids[iatom]=atom.get('id')
        elementTypes[iatom]=atom.get('elementType')
        points[iatom,:]=[np.float64(atom.get('x3'))*Angstrom,np.float64(atom.get('y3'))*Angstrom,np.float64(atom.get('z3'))*Angstrom]
        iatom+=1

    basisAtPoints = evaluateBasis(molecule,points) # evaluate the basis set at the nuclei

    results=zeros(len(points),dtype=np.float64)
    orbitalSets = molecule.xpath('molpro-output:orbitals',namespaces=namespaces)
    if len(orbitalSets) != 1:
        raise Exception('something is wrong: there should be just one orbital set')
    orbitalSet = orbitalSets[0]
    for orbital in orbitalSet.xpath('molpro-output:orbital',namespaces=namespaces):
        occ = np.float64(orbital.get('occupation'))
        if occ > 0:
            #        print 'orbital occupation=',occ
            mos=array(re.sub(' +',' ',orbital.text.lstrip().rstrip().replace('\n','')).split(" "))
            mopoint=zeros(len(points),dtype=np.float64)
            for ic in arange(len(basisAtPoints)):
                for ipoint in arange(len(points)):
                    mopoint[ipoint]+=np.float64(mos[ic])*basisAtPoints[ic][ipoint]
            for ipoint in range(len(points)):
                results[ipoint]+=mopoint[ipoint]**2*occ

    print("Calculated densities at the nuclei:")
    print(results)

    for answer in molecule.xpath('molpro-output:variables/molpro-output:variable[contains(@name,"DELTA")]/molpro-output:value',namespaces=namespaces):
        name=answer.xpath('..',namespaces=namespaces)[0].get('name').replace('_','')
        print('From Molpro:',name,'=',answer.text)
        closeness=10000000000
        for ipoint in range(len(points)):
            distance=math.sqrt((results[ipoint]-np.float64(answer.text))**2)
            if distance < closeness:
                closeness=distance
                closest=ipoint
        print('... nearest to result ',closest,' which differs by ',closeness)


    print('End of molecule '+molecule.get('id'))

A facility is provided to store and interrogate sets of molecules, together with information about how they are to be combined in balanced chemical equations. This collection of information is referred to as a database, and can be generated completely manually, or partially by running appropriate Molpro calculations. Analysis of the database can give a summary of the energy changes associated with each described reaction, and two or more similar databases can be compared reaction by reaction, to give a statistical analysis of the differences between them.

All of the files associated with the database facility can be found in the directory database in the main Molpro source or installation tree.

Description and specification of databases

A database is an XML file conforming to the molpro-database schema, and consists one or more occurrences of each of the following two principal elements.

molecule Information about a single molecular species in the molpro-output XML format. This will usually be the result of PUT,XML in a Molpro calculation, but can also be constructed directly from an external data source. The important quantities that are used are the geometry and energy, together with metadata such as the method and basis set, and other quantities such as spin and symmetry that might be useful for constructing a new Molpro job for the molecule.
reaction A list of species specifications that point uniquely to one of the molecule nodes, together with information on how the species appears stoichimetrically in the reaction, and whether it is a special point such as a transition state. species specifications can also be given without either of these tags, allowing additional geometries, for example along a reaction coordinate or potential surface cut, to be included.

Normally, the molecule nodes will be in separate self-contained files that are then referenced in the main database file through the syntax of XInclude. There are three reasons for this. Firstly, these files can be produced directly by a Molpro calculation, with the rest of the database being constructed by hand. Secondly, they allow the possibility that the molecule files be replaced in the future by, for example, running all the molecule calculations again using a different method; in that case, the rest of the database, i.e. the reaction specifications, does not need to change. This supports the possibility of having several databases that have the same structure – specification of reactions – but different numerical data, and therefore being capable of numerical comparison. Thirdly, several databases can coexist in the same directory, and share some of the same molecule files. An example of this is a supplementary database that consists of a subset of the reactions contained in the main database.

The following is an example of a complete database of four reactions involving the species $\rm O, H_2, H_2O, H_2O_2$ and $\rm CH_2O$. Note that the association between the species and the molpro-output:molecule nodes is achieved through the use of InChI tags, which PUT,XML will produce provided that OpenBabel is installed on the system. An alternative is through syntax such as

{PUT,XML,file.xml; index,73}

and the use of <species index="73"> in the database file. Note that sometimes different species have the same InChI, and so the use of index is necessary to resolve ambiguities.

database/sets/examples/reactions/reactions.xml

<?xml version="1.0"?>
<database xmlns="http://www.molpro.net/schema/molpro-database"
	  xmlns:xi="http://www.w3.org/2001/XInclude">
     <xi:include href="h2o.xml" />
     <xi:include href="h2o2.xml"/>
     <xi:include href="h2.xml"/>
     <xi:include href="o2.xml"/>
     <xi:include href="co.xml"/>
     <xi:include href="h2co.xml"/>
     <xi:include href="h2cots.xml"/>
     <xi:include href="o.xml"/>
     <reaction title="insertion of O into H2">
       <species InChI="InChI=1S/H2O/h1H2" count="1"/>
       <species InChI="InChI=1S/O" count="-1"/>
       <species InChI="InChI=1S/2H2/h2*1H" count="-1"/>
     </reaction>
     <reaction title="H2O formation">
       <species InChI="InChI=1S/H2O/h1H2" count="1"/>
       <species InChI="InChI=1S/O2/c1-2" count="-0.5"/>
       <species InChI="InChI=1S/2H2/h2*1H" count="-1"/>
     </reaction>
     <reaction title="H2O2 formation">
       <species InChI="InChI=1S/H2O2/c1-2/h1-2H" count="1"/>
       <species InChI="InChI=1S/O2/c1-2" count="-1"/>
       <species InChI="InChI=1S/2H2/h2*1H" count="-1"/>
     </reaction>
     <reaction title="Formaldehyde dissociation">
       <species InChI="InChI=1S/CH2O/c1-2/h1H2" count="-1"/>
       <species InChI="InChI=1S/CO/c1-2" count="1"/>
       <species InChI="InChI=1S/2H2/h2*1H" count="1"/>
       <species InChI="InChI=1S/CHO.H/c1-2;/h1H;"
		 transitionState="true"/>
     </reaction>
</database>

For full specification of the possible structure of a database, see the schema file

The directory database/utilities contains several Python scripts that manipulate databases. For convenience, they can be run through the script

molpro –database script-name arguments …

so long as Python (version 3 preferred) is installed on the system. You need the lxml and requests package included in your Python installation:

pip install lxml requests

(or pip3 if you are using Python 3).

The script validate checks whether a database conforms to the schema, for example

cd Molpro  # assuming below that we are in Molpro source tree, but works from anywhere
bin/molpro --database validate \
 database/sets/examples/reactions/reactions.xml

Computation of new database data

The script clone takes an existing database, for which the file name should be provided as an argument, and generates a set of Molpro jobs that will run the same method on each of the molecules, with the end result that a new database is created. If the database master file is the only one in the directory that declares itself to belong to the molpro-database schema, then you can just give the directory name as the argument to this and other scripts instead. In addition, if the database master file has the suffix .xml, the suffix does not need to be specified. For the above example, this could be

cd Molpro  # assuming below that we are in Molpro source tree, but works from anywhere
bin/molpro --database clone \
 database/sets/examples/reactions

This will create a new directory reactions.d with the following contents.

original/          reactions.xml      runall/
procedures.molpro  run/

reactions.d/original:
co.xml      h2co.xml    h2o.xml     o.xml
h2.xml      h2cots.xml  h2o2.xml    o2.xml

reactions.d/run:
co.molpro      h2co.molpro    h2o.molpro     o.molpro
h2.molpro      h2cots.molpro  h2o2.molpro    o2.molpro

reactions.d/runall:
reactions.molpro

The file procedures.molpro contains a procedure that will be run on every molecule, and it should be edited to use the desired methods. Then the calculations can be run, either via each of the individual Molpro input files in run/, or the single input file reactions.molpro in the directory runall. Once these jobs have completed, then the directory contains a complete database with the original reaction scheme but new data.

Analysis of databases

cd Molpro
bin/molpro --database analyse \
 database/sets/examples/reactions/reactions.xml

will analyse the database, and report the energy change for each described reaction. If two or more databases are given as arguments, the analysis will be done on each, and also on the difference between the first database and the second and any subsequent, including a statistical summary. For the example given above, one might say

cd Molpro/reactions.d
../bin/molpro --database analyse original .

analyse has a number of options that are described by running

molpro --database analyse --help

The directory database/sets contains several standard databases. Within each one is a description of its origin, contents and purpose. The scripts described above can take these databases as inputs, for example database/sets/examples/reactions/reactions.xml; as a shortcut, one could simply instead use examples/reactions which will find the system database irrespective of the current working directory.

Post-processing of output and databases

Output

Databases

Description and specification of databases

Interrogation and manipulation of databases

Computation of new database data

Analysis of databases

Library of databases