Getting started¶
Installation¶
We recommend installing oximachine_featurizer in a clean virtual environment environment (e.g., a conda environment) The latest stable release can be installed from the Python package index (PyPi):
pip install oximachine_featurizer
The development version can be installed directly from GitHub
pip install git+https://github.com/kjappelbaum/oximachine_featurizer.git
Some parts of the code are accelerated using just-in-time compilation (jit) using numba. This can benefit from threading layers. You can enable this using pip install tbb
. If you do not do so, you might see warnings like The TBB threading layer requires TBB version 2019.5 or later
.
Featurizing a structure¶
To featurize one structure with the default options you can use the following Python snippet
from oximachine_featurizer import featurize
X, metal_indices, metals = featurize(structure)
Where structure
is a pymatgen.Structure
object.
Under the hood, this function calls two different classes, the GetFeatures
class that computes all features that we considered during development and the FeatureCollector
that selects the relevant ones.
Alternatively, if you want to featurize directly on the command line, you can use the following syntax
run_featurization <structurefile> <outname>
For example,
run_featurization examples/structures/ACODAA.cif test.npy
This command line tool will attempt to read the structurefile
using pymatgen and then write the features as npy file file to outname
. The numpy array in this file can be feed directly into the StandardScaler
and VotingClassifier
objects that can be created with the learnmofox
Python package.
Additional tools¶
Scripts that are prefixed with an underscore are part of the private API and may contain hard coded paths. For example, _run_featurization_slurm_serial.py
contains code that is specific to our cluster infrastructure.
Parsing the CSD¶
The GetOxStatesCSD
can be used to retrieve the oxidation states from a list of CSD identifiers. This feature requires a CSD license and you need to export CSD_HOME
for the CSD API to work.
You can for example use the following snippet of Python
from oximachine_featurizer.parse import GetOxStatesCSD
getoxstates_instance= GetOxStatesCSD(names_cleaned)
outputdict = getoxstates_instance.run_parsing(njobs=4)
outputdict
will be a nested dictionary of the form {'id': {'symbol': [oxidation states]}}
.
The run_parsing
command line tool allows you to run the parsing for a folder of structures that are names with the CSD refcodes.
run_parsing <indir> <outname>
The output dictionary will be saved in to a pickle file with the name outname
.
Parsing the Materials Project¶
Using this code requires that you export the MP_API_KEY
environment variable containing your API key for the Materials Project.
For example, the oximachine_featurizer.run.run_mine_mp.py
script will retrieve all binary halides, sulfides, oxides, … that are stable (zero energy above comblex hull) and calculate the oxidation states.
run_mine_mp
Will write a dataframe with the results mp_parsing_results.csv
to the current working directory.