Getting started

Installation

We recommend installing oximachine_featurizer in a clean virtual environment environment (e.g., a conda environment) The latest stable release can be installed from the Python package index (PyPi):

pip install oximachine_featurizer

The development version can be installed directly from GitHub

pip install git+https://github.com/kjappelbaum/oximachine_featurizer.git

Some parts of the code are accelerated using just-in-time compilation (jit) using numba. This can benefit from threading layers. You can enable this using pip install tbb. If you do not do so, you might see warnings like The TBB threading layer requires TBB version 2019.5 or later.

Featurizing a structure

To featurize one structure with the default options you can use the following Python snippet

from oximachine_featurizer import featurize
X, metal_indices, metals = featurize(structure)

Where structure is a pymatgen.Structure object. Under the hood, this function calls two different classes, the GetFeatures class that computes all features that we considered during development and the FeatureCollector that selects the relevant ones.

Alternatively, if you want to featurize directly on the command line, you can use the following syntax

run_featurization <structurefile> <outname>

For example,

run_featurization examples/structures/ACODAA.cif test.npy

This command line tool will attempt to read the structurefile using pymatgen and then write the features as npy file file to outname. The numpy array in this file can be feed directly into the StandardScaler and VotingClassifier objects that can be created with the learnmofox Python package.

Additional tools

Scripts that are prefixed with an underscore are part of the private API and may contain hard coded paths. For example, _run_featurization_slurm_serial.py contains code that is specific to our cluster infrastructure.

Parsing the CSD

The GetOxStatesCSD can be used to retrieve the oxidation states from a list of CSD identifiers. This feature requires a CSD license and you need to export CSD_HOME for the CSD API to work.

You can for example use the following snippet of Python

from oximachine_featurizer.parse import GetOxStatesCSD
getoxstates_instance= GetOxStatesCSD(names_cleaned)

outputdict = getoxstates_instance.run_parsing(njobs=4)

outputdict will be a nested dictionary of the form {'id': {'symbol': [oxidation states]}}.

The run_parsing command line tool allows you to run the parsing for a folder of structures that are names with the CSD refcodes.

run_parsing <indir> <outname>

The output dictionary will be saved in to a pickle file with the name outname.

Parsing the Materials Project

Using this code requires that you export the MP_API_KEY environment variable containing your API key for the Materials Project. For example, the oximachine_featurizer.run.run_mine_mp.py script will retrieve all binary halides, sulfides, oxides, … that are stable (zero energy above comblex hull) and calculate the oxidation states.

run_mine_mp

Will write a dataframe with the results mp_parsing_results.csv to the current working directory.