66.2.1 Description and specification of databases

A database is an XML file conforming to the molpro-database schema, and consists one or more occurrences of each of the following two principal elements.
molecule
Information about a single molecular species in the molpro-output XML format. This will usually be the result of PUT,XML in a MOLPRO calculation, but can also be constructed directly from an external data source. The important quantities that are used are the geometry and energy, together with metadata such as the method and basis set, and other quantities such as spin and symmetry that might be useful for constructing a new MOLPRO job for the molecule.
reaction
A list of species specifications that point uniquely to one of the molecule nodes, together with information on how the species appears stoichimetrically in the reaction, and whether it is a special point such as a transition state. species specifications can also be given without either of these tags, allowing additional geometries, for example along a reaction coordinate or potential surface cut, to be included.
Normally, the molecule nodes will be in separate self-contained files that are then referenced in the main database file through the syntax of XInclude. There are three reasons for this. Firstly, these files can be produced directly by a MOLPRO calculation, with the rest of the database being constructed by hand. Secondly, they allow the possibility that the molecule files be replaced in the future by, for example, running all the molecule calculations again using a different method; in that case, the rest of the database, i.e. the reaction specifications, does not need to change. This supports the possibility of having several databases that have the same structure - specification of reactions - but different numerical data, and therefore being capable of numerical comparison. Thirdly, several databases can coexist in the same directory, and share some of the same molecule files. An example of this is a supplementary database that consists of a subset of the reactions contained in the main database.

The following is an example of a complete database of four reactions involving the species $\rm O, H_2, H_2O, H_2O_2$ and $\rm CH_2O$. Note that the association between the species and the
molpro-output:molecule nodes is achieved through the use of InChI
(http://www.iupac.org/home/publications/e-resources/inchi.html) tags, which PUT,XML will produce provided that OpenBabel (http://openbabel.org) is installed on the system. An alternative is through syntax such as

{PUT,XML,file.xml; index,73}
and the use of <species index="73"> in the database file. Note that sometimes different species have the same InChI, and so the use of index is necessary to resolve ambiguities. ../database/sets/examples/reactions/reactions.xml

For full specification of the possible structure of a database, see the schema file

molpro-database.xsd

The directory database/utilities contains several Python scripts that manipulate databases. For convenience, they can be run through the script

molpro -database script-name arguments ...

so long as Python (version 3 preferred) is installed on the system. You need the lxml and requests package included in your Python installation:

pip install lxml requests
(or pip3 if you are using Python 3).

The script validate checks whether a database conforms to the schema, for example

cd Molpro  # assuming below that we are in Molpro source tree, but works from anywhere
bin/molpro --database validate \
 database/sets/examples/reactions/reactions.xml

molpro@molpro.net 2018-12-10