Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crystal_space #189

Merged
merged 2 commits into from
Dec 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
210 changes: 210 additions & 0 deletions examples/Crystal_Space/0_screening.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exploring Chemical Space with SMACT and Materials Project Database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we undertake a comprehensive exploration of binary chemical compositions. This approach can also be extended to explore ternary and quaternary compositions. Our methodology involves two primary tools: the SMACT filter for generating compositions and the Materials Project database for additional data acquisition. \n",
"\n",
"The final phase will categorize the compositions into four distinct categories based on their properties. The categorization is based on whether a composition is allowed by the SMACT filter (smact_allowed) and whether it is present in the Materials Project database (mp). The categories are as follows:\n",
"\n",
"| smact_allowed | mp | label |\n",
"|---------------|------|------------|\n",
"| yes | yes | standard |\n",
"| yes | no | missing |\n",
"| no | yes | interesting|\n",
"| no | no | unlikely |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Generate compositions with the SMACT filter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We begin by generating binary compositions using the SMACT filter. The SMACT filter serves as a chemical filter including oxidation states and electronegativity test.\n",
"\n",
"[`generate_composition_with_smact`](./generate_composition_with_smact.py) function generates a composition with the SMACT filter. The function takes in the following parameters:\n",
"\n",
"num_elements: number of elements in the composition\n",
"\n",
"max_stoich: maximum stoichiometry of each element\n",
"\n",
"max_atomic_num: maximum atomic number of each element\n",
"\n",
"num_processes: number of processes to run in parallel\n",
"\n",
"save_path: path to save the dataframe containing the compositions with the SMACT filter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from generate_composition_with_smact import generate_composition_with_smact"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_smact = generate_composition_with_smact(\n",
" num_elements=2,\n",
" max_stoich=8,\n",
" max_atomic_num=103,\n",
" num_processes=8,\n",
" save_path=\"data/binary/df_binary_label.pkl\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Download data from the Materials Project database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we download data from the Materials Project database using the `MPRester` class from the [`pymatgen`](https://pymatgen.org/) library. \n",
"\n",
"[`download_mp_data`](./download_compounds_with_mp_api.py) function takes in the following parameters:\n",
"\n",
"mp_api_key: Materials Project API key\n",
"\n",
"num_elements: number of elements in the composition\n",
"\n",
"max_stoich: maximum stoichiometry of each element\n",
"\n",
"save_dir: path to save the downloaded data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mp_api_key = None # replace with your own MP API key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from download_compounds_with_mp_api import download_mp_data\n",
"\n",
"# download data from MP for binary compounds\n",
"save_mp_dir = \"data/binary/mp_data\"\n",
"docs = download_mp_data(\n",
" mp_api_key=mp_api_key,\n",
" num_elements=2,\n",
" max_stoich=8,\n",
" save_dir=save_mp_dir,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Categorize compositions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we categorize the compositions into four lables: standard, missing, interesting, and unlikely."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mp_data = {p.stem: True for p in Path(save_mp_dir).glob(\"*.json\")}\n",
"df_mp = pd.DataFrame.from_dict(mp_data, orient=\"index\", columns=[\"mp\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make category dataframe\n",
"df_category = df_smact.join(df_mp, how=\"left\").fillna(False)\n",
"# make label for each category\n",
"dict_label = {\n",
" (True, True): \"standard\",\n",
" (True, False): \"missing\",\n",
" (False, True): \"interesting\",\n",
" (False, False): \"unlikely\",\n",
"}\n",
"df_category[\"label\"] = df_category.apply(\n",
" lambda x: dict_label[(x[\"smact_allowed\"], x[\"mp\"])], axis=1\n",
")\n",
"df_category[\"label\"].apply(dict_label.get)\n",
"\n",
"# count number of each label\n",
"print(df_category[\"label\"].value_counts())\n",
"\n",
"# save dataframe\n",
"df_category.to_pickle(\"data/binary/df_binary_category.pkl\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "smact",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading