Skip to content

Commit

Permalink
crystal_space
Browse files Browse the repository at this point in the history
  • Loading branch information
hspark1212 committed Dec 6, 2023
1 parent 133b065 commit 6ccfba3
Show file tree
Hide file tree
Showing 5 changed files with 1,007 additions and 0 deletions.
210 changes: 210 additions & 0 deletions examples/Crystal_Space/0_screening.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exploring Chemical Space with SMACT and Materials Project Database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this notebook, we undertake a comprehensive exploration of binary chemical compositions. This approach can also be extended to explore ternary and quaternary compositions. Our methodology involves two primary tools: the SMACT filter for generating compositions and the Materials Project database for additional data acquisition. \n",
"\n",
"The final phase will categorize the compositions into four distinct categories based on their properties. The categorization is based on whether a composition is allowed by the SMACT filter (smact_allowed) and whether it is present in the Materials Project database (mp). The categories are as follows:\n",
"\n",
"| smact_allowed | mp | label |\n",
"|---------------|------|------------|\n",
"| yes | yes | standard |\n",
"| yes | no | missing |\n",
"| no | yes | interesting|\n",
"| no | no | unlikely |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Generate compositions with the SMACT filter"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We begin by generating binary compositions using the SMACT filter. The SMACT filter serves as a chemical filter including oxidation states and electronegativity test.\n",
"\n",
"[`generate_composition_with_smact`](./generate_composition_with_smact.py) function generates a composition with the SMACT filter. The function takes in the following parameters:\n",
"\n",
"num_elements: number of elements in the composition\n",
"\n",
"max_stoich: maximum stoichiometry of each element\n",
"\n",
"max_atomic_num: maximum atomic number of each element\n",
"\n",
"num_processes: number of processes to run in parallel\n",
"\n",
"save_path: path to save the dataframe containing the compositions with the SMACT filter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from generate_composition_with_smact import generate_composition_with_smact"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_smact = generate_composition_with_smact(\n",
" num_elements=2,\n",
" max_stoich=8,\n",
" max_atomic_num=103,\n",
" num_processes=8,\n",
" save_path=\"data/binary/df_binary_label.pkl\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2. Download data from the Materials Project database"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Next, we download data from the Materials Project database using the `MPRester` class from the [`pymatgen`](https://pymatgen.org/) library. \n",
"\n",
"[`download_mp_data`](./download_compounds_with_mp_api.py) function takes in the following parameters:\n",
"\n",
"mp_api_key: Materials Project API key\n",
"\n",
"num_elements: number of elements in the composition\n",
"\n",
"max_stoich: maximum stoichiometry of each element\n",
"\n",
"save_dir: path to save the downloaded data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mp_api_key = None # replace with your own MP API key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from download_compounds_with_mp_api import download_mp_data\n",
"\n",
"# download data from MP for binary compounds\n",
"save_mp_dir = \"data/binary/mp_data\"\n",
"docs = download_mp_data(\n",
" mp_api_key=mp_api_key,\n",
" num_elements=2,\n",
" max_stoich=8,\n",
" save_dir=save_mp_dir,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Categorize compositions"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, we categorize the compositions into four lables: standard, missing, interesting, and unlikely."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from pathlib import Path\n",
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"mp_data = {p.stem: True for p in Path(save_mp_dir).glob(\"*.json\")}\n",
"df_mp = pd.DataFrame.from_dict(mp_data, orient=\"index\", columns=[\"mp\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make category dataframe\n",
"df_category = df_smact.join(df_mp, how=\"left\").fillna(False)\n",
"# make label for each category\n",
"dict_label = {\n",
" (True, True): \"standard\",\n",
" (True, False): \"missing\",\n",
" (False, True): \"interesting\",\n",
" (False, False): \"unlikely\",\n",
"}\n",
"df_category[\"label\"] = df_category.apply(\n",
" lambda x: dict_label[(x[\"smact_allowed\"], x[\"mp\"])], axis=1\n",
")\n",
"df_category[\"label\"].apply(dict_label.get)\n",
"\n",
"# count number of each label\n",
"print(df_category[\"label\"].value_counts())\n",
"\n",
"# save dataframe\n",
"df_category.to_pickle(\"data/binary/df_binary_category.pkl\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "smact",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.18"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading

0 comments on commit 6ccfba3

Please sign in to comment.