Getting started ==================== Installation -------------- We recommend installing oximachine_featurizer in a clean virtual environment environment (e.g., a `conda environment `_) The latest stable release can be installed from the Python package index (PyPi): .. code:: bash pip install oximachine_featurizer The development version can be installed directly from GitHub .. code:: bash pip install git+https://github.com/kjappelbaum/oximachine_featurizer.git Some parts of the code are accelerated using just-in-time compilation (jit) using numba. This can benefit from `threading layers `_. You can enable this using :code:`pip install tbb`. If you do not do so, you might see warnings like :code:`The TBB threading layer requires TBB version 2019.5 or later`. Featurizing a structure -------------------------- To featurize one structure with the default options you can use the following Python snippet .. code:: python from oximachine_featurizer import featurize X, metal_indices, metals = featurize(structure) Where :code:`structure` is a :code:`pymatgen.Structure` object. Under the hood, this function calls two different classes, the :py:obj:`~oximachine_featurizer.featurize.GetFeatures` class that computes all features that we considered during development and the :py:obj:`~oximachine_featurizer.featurize.FeatureCollector` that selects the relevant ones. Alternatively, if you want to featurize directly on the command line, you can use the following syntax .. code:: bash run_featurization For example, .. code:: bash run_featurization examples/structures/ACODAA.cif test.npy This command line tool will attempt to read the :code:`structurefile` using pymatgen and then write the features as `npy file `_ file to :code:`outname`. The numpy array in this file can be feed directly into the :code:`StandardScaler` and :code:`VotingClassifier` objects that can be created with the :code:`learnmofox` Python package. Additional tools ------------------ Scripts that are prefixed with an underscore are part of the private API and may contain hard coded paths. For example, :code:`_run_featurization_slurm_serial.py` contains code that is specific to our cluster infrastructure. Parsing the CSD ................. The :py:class:`~oximachine_featurizer.parse.GetOxStatesCSD` can be used to retrieve the oxidation states from a list of CSD identifiers. This feature requires a CSD license and you need to export :code:`CSD_HOME` for the `CSD API `_ to work. You can for example use the following snippet of Python .. code-block:: python from oximachine_featurizer.parse import GetOxStatesCSD getoxstates_instance= GetOxStatesCSD(names_cleaned) outputdict = getoxstates_instance.run_parsing(njobs=4) :code:`outputdict` will be a nested dictionary of the form :code:`{'id': {'symbol': [oxidation states]}}`. The :py:mod:`~oximachine_featurizer.run.run_parsing` command line tool allows you to run the parsing for a folder of structures that are names with the CSD refcodes. .. code-block:: bash run_parsing The output dictionary will be saved in to a pickle file with the name :code:`outname`. Parsing the Materials Project ................................ Using this code requires that you export the :code:`MP_API_KEY` environment variable containing your API key for the Materials Project. For example, the :py:mod:`oximachine_featurizer.run.run_mine_mp.py` script will retrieve all binary halides, sulfides, oxides, ... that are stable (zero energy above comblex hull) and calculate the oxidation states. .. code-block:: bash run_mine_mp Will write a dataframe with the results :code:`mp_parsing_results.csv` to the current working directory.