oximachine_featurizer API documentation¶
The featurization module¶
Featurization functions for the oxidation state mining project. Wrapper around matminer
-
class
oximachine_featurizer.featurize.
FeatureCollector
(inpath=None, labelpath=None, outdir_labels='data/labels', outdir_features='data/features', outdir_helper='data/helper', percentage_holdout=0, outdir_holdout=None, forbidden_picklepath=None, exclude_dir=None, selected_features=['local_property_stats', 'column', 'row', 'valenceelectrons', 'diffto18electrons', 'sunfilled', 'punfilled', 'dunfilled', 'crystal_nn_fingerprint'], old_format=False, training_set_size=None, racsfile=None, selectedracs=['D_mc-I-0-all', 'D_mc-I-1-all', 'D_mc-I-2-all', 'D_mc-I-3-all', 'D_mc-S-0-all', 'D_mc-S-1-all', 'D_mc-S-2-all', 'D_mc-S-3-all', 'D_mc-T-0-all', 'D_mc-T-1-all', 'D_mc-T-2-all', 'D_mc-T-3-all', 'D_mc-Z-0-all', 'D_mc-Z-1-all', 'D_mc-Z-2-all', 'D_mc-Z-3-all', 'D_mc-chi-0-all', 'D_mc-chi-1-all', 'D_mc-chi-2-all', 'D_mc-chi-3-all', 'mc-I-0-all', 'mc-I-1-all', 'mc-I-2-all', 'mc-I-3-all', 'mc-S-0-all', 'mc-S-1-all', 'mc-S-2-all', 'mc-S-3-all', 'mc-T-0-all', 'mc-T-1-all', 'mc-T-2-all', 'mc-T-3-all', 'mc-Z-0-all', 'mc-Z-1-all', 'mc-Z-2-all', 'mc-Z-3-all', 'mc-chi-0-all', 'mc-chi-1-all', 'mc-chi-2-all', 'mc-chi-3-all'], drop_duplicates=True)[source]¶ Bases:
object
convert features from a folder of pickle files to three pickle files for feature matrix, label vector and names list.
-
__init__
(inpath=None, labelpath=None, outdir_labels='data/labels', outdir_features='data/features', outdir_helper='data/helper', percentage_holdout=0, outdir_holdout=None, forbidden_picklepath=None, exclude_dir=None, selected_features=['local_property_stats', 'column', 'row', 'valenceelectrons', 'diffto18electrons', 'sunfilled', 'punfilled', 'dunfilled', 'crystal_nn_fingerprint'], old_format=False, training_set_size=None, racsfile=None, selectedracs=['D_mc-I-0-all', 'D_mc-I-1-all', 'D_mc-I-2-all', 'D_mc-I-3-all', 'D_mc-S-0-all', 'D_mc-S-1-all', 'D_mc-S-2-all', 'D_mc-S-3-all', 'D_mc-T-0-all', 'D_mc-T-1-all', 'D_mc-T-2-all', 'D_mc-T-3-all', 'D_mc-Z-0-all', 'D_mc-Z-1-all', 'D_mc-Z-2-all', 'D_mc-Z-3-all', 'D_mc-chi-0-all', 'D_mc-chi-1-all', 'D_mc-chi-2-all', 'D_mc-chi-3-all', 'mc-I-0-all', 'mc-I-1-all', 'mc-I-2-all', 'mc-I-3-all', 'mc-S-0-all', 'mc-S-1-all', 'mc-S-2-all', 'mc-S-3-all', 'mc-T-0-all', 'mc-T-1-all', 'mc-T-2-all', 'mc-T-3-all', 'mc-Z-0-all', 'mc-Z-1-all', 'mc-Z-2-all', 'mc-Z-3-all', 'mc-chi-0-all', 'mc-chi-1-all', 'mc-chi-2-all', 'mc-chi-3-all'], drop_duplicates=True)[source]¶ Initializes a feature collector.
WARNING! The fingerprint selection function assumes that the full feature vector in the pickle files has the columns as specified in FEATURE_LABELS_ALL
- Keyword Arguments
inpath (Union[str, Path]) – None)
labelpath (Union[str, Path]) – None)
outdir_labels (Union[str, Path]) – “data/labels”)
outdir_features (Union[str, Path]) – “data/features”)
outdir_helper (Union[str, Path]) -- path to output directory for helper files (feature names, structure names) – “data/helper”)
percentage_holdout (float) –
outdir_holdout (Union[str, Path]) -- directory into which the files for the holdout set are written (names, X and y) –
forbidden_picklepath (Union[str, Path]) – None)
exclude_dir (Union[str, Path]) – None)
selected_features (List[str]) – (default: [“crystal_nn_fingerprint”,”ward_prd”,”bond_orientational”,”behler_parinello”])
old_format (bool) – {True})
training_set_size (int) –
racsfile (str) -- path to file with RACs (pd.DataFrame saved as csv) –
selectedracs (List[str]) –
-
__weakref__
¶ list of weak references to the object (if defined)
-
static
create_dict_for_feature_table
(picklefile)[source]¶ Reads in a pickle with features and returns a list of dictionaries with one dictionary per metal site.
- Parameters
picklefile (Union[str, Path]) –
- Return type
List
[dict
]- Returns
List[dict] – list of dicionary
-
static
create_dict_for_feature_table_from_dict
(d)[source]¶ Reads in a pickle with features and returns a list of dictionaries with one dictionary per metal site.
- Parameters
d (dict) –
- Return type
List
[dict
]- Returns
List[dict] – list of dicionary
-
static
create_feature_list
(picklefiles, forbidden_list, old_format=True)[source]¶ Reads a list of pickle files into dictionary
- Parameters
picklefiles (List[Union[str, Path]]) –
forbidden_list (list) -- list of "forbidden" names (CSD naming convention) – that will not be used
old_format (bool) – “legacy” format. Default: True
- Return type
list
- Returns
list – parsed pickle contents
-
dump_featurecollection
()[source]¶ Collect features and write features, labels and names to seperate files
- Return type
None
-
static
make_labels_table
(raw_labels)[source]¶ Read raw labeling output into a dictionary format that can be used to construct pd.DataFrames
Warning: assumes that each metal in the structure has the same oxidation states as it takes the first list element. Cases in which this is not fulfilled need to be filtered out earlier.
- Parameters
raw_labels (Dict[str, dict]) – {metal: [oxidationstates]}}
- Returns
, ‘metal’:, ‘oxidationstate’:}]
- Return type
List[dict] – list of dictionaries of the form [{‘name’
-
-
class
oximachine_featurizer.featurize.
GetFeatures
(structure, outpath)[source]¶ Bases:
object
Featurizer
-
__init__
(structure, outpath)[source]¶ Generates features for a structures
- Parameters
structure (Structure) – Pymatgen Structure object
outpath (Union[str, Path]) – path to which the features will be dumped
Returns:
-
__weakref__
¶ list of weak references to the object (if defined)
-
property
cutoff
¶ Chose a cutoff for a given structure
-
property
featurizer
¶ Return the featurizer (with the suitable cutoff)
-
classmethod
from_file
(structurepath, outpath)[source]¶ - Construct a featurizer class from path to structure
and an output path
- Parameters
structurepath (Union[str, Path]) – Path to structure file
outpath (Union[str, Path]) – Path to which the outputs should be written.
- Returns
Instance of the GetFeatures class
- Return type
object
-
classmethod
from_string
(structurestring, outpath)[source]¶ Constructor for the webapp, using a string of a structure file, e.g., a CIF
- Parameters
structurestring (str) – Fileconent of a CIF as string
outpath (Union[str, Path]) – Path to which the output should be written.
- Raises
ValueError – In case the CIF could not be parsed
- Returns
Instance of GetFeatures
- Return type
object
-
-
oximachine_featurizer.featurize.
featurize
(structure, featureset=['local_property_stats', 'column', 'row', 'valenceelectrons', 'diffto18electrons', 'sunfilled', 'punfilled', 'dunfilled', 'crystal_nn_no_steinhardt'])[source]¶ Finds metals in the structure, featurizes the metal sites and collects the features
- Parameters
structure (pymatgen.Structure) – Structure to featurize
featureset (List[str]) – Features to be used in the final output
- Returns
[description]
- Return type
Union[np.array, list, list]
-
oximachine_featurizer.featurize.
get_feature_names
(selected_features, offset=0)[source]¶ Given a set of selected feature categories, return all feature names
- Parameters
selected_features (List[str]) – feature categories
offset (int, optional) – To offset the feature ranges, to be used with RACs. Defaults to 0.
- Returns
list of feature names
- Return type
List[str]
The parsing module¶
Parsing functions for the oxidation state mining project
-
class
oximachine_featurizer.parse.
GetOxStatesCSD
(cds_ids)[source]¶ Bases:
object
Main parsing class
-
__init__
(cds_ids)[source]¶ Parses CSD structures for oxidation states
- Parameters
cds_ids (List[str]) – list of CSD database identifiers
- Returns
None
-
__weakref__
¶ list of weak references to the object (if defined)
-
parse_csd_entry
(database_id)[source]¶ Looks up a CSD id and runs the parsing
- Parameters
database_id (str) – CSD database identifier
- Returns
symbol - oxidation state dictionary
- Return type
dict
- Exception:
- returns empy dict upon exception
(if it cannot find the structure in the database)
-
parse_name
(chemical_name_string)[source]¶ Takes the chemical name string from the CSD database and returns, if it finds it, a dictionary with the oxidation states for the metals
- Parameters
chemical_name_string (str) – full chemical name
- Returns
dictionary of symbol: oxidation states (list)
- Return type
dict
-