Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parse_species to handle non-integer oxidation states #170

Merged
merged 6 commits into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,7 @@ repos:
args: [--toml, pyproject.toml]
additional_dependencies:
- tomli
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.18.0
hooks:
- id: blacken-docs
52 changes: 35 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,21 +71,25 @@ With -e pip will create links to the source folder so that changes to the code w

For simple usage, you can instantiate an Embedding object using one of the embeddings in the [data directory](src/elementembeddings/data/element_representations/README.md). For this example, let's use the magpie elemental representation.

```python
```pycon
# Import the class
>>> from elementembeddings.core import Embedding

# Load the magpie data
>>> magpie = Embedding.load_data('magpie')
>>> magpie = Embedding.load_data("magpie")
```

We can access some of the properties of the `Embedding` class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.

```python
```pycon
# Print out some of the properties of the ElementEmbeddings class
>>> print(f'The magpie representation has embeddings of dimension {magpie.dim}')
>>> print(f'The magpie representation contains these elements: \n {magpie.element_list}') # prints out all the elements considered for this representation
>>> print(f'The magpie representation contains these features: \n {magpie.feature_labels}') # Prints out the feature labels of the chosen representation
>>> print(f"The magpie representation has embeddings of dimension {magpie.dim}")
>>> print(
... f"The magpie representation contains these elements: \n {magpie.element_list}"
... ) # prints out all the elements considered for this representation
>>> print(
... f"The magpie representation contains these features: \n {magpie.feature_labels}"
... ) # Prints out the feature labels of the chosen representation

The magpie representation has embeddings of dimension 22
The magpie representation contains these elements:
Expand All @@ -102,26 +106,40 @@ We can quickly generate heatmaps of distance/similarity measures between the ele
from elementembeddings.plotter import heatmap_plotter, dimension_plotter
import matplotlib.pyplot as plt

magpie.standardise(inplace=True) # Standardises the representation
magpie.standardise(inplace=True) # Standardises the representation

fig, ax = plt.subplots(1, 1, figsize=(6,6))
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
heatmap_params = {"vmin": -1, "vmax": 1}
heatmap_plotter(embedding=magpie, metric="cosine_similarity",show_axislabels=False,cmap="Blues_r",ax=ax, **heatmap_params)
heatmap_plotter(
embedding=magpie,
metric="cosine_similarity",
show_axislabels=False,
cmap="Blues_r",
ax=ax,
**heatmap_params
)
ax.set_title("Magpie cosine similarities")
fig.tight_layout()
fig.show()

```

<img src="resources/magpie_cosine_sim_heatmap.png" alt = "Cosine similarity heatmap of the magpie representation" width="50%"/>

```python
fig, ax = plt.subplots(1, 1, figsize=(6,6))

reducer_params={"n_neighbors": 30, "random_state":42}
scatter_params = {"s":100}

dimension_plotter(embedding=magpie, reducer="umap",n_components=2,ax=ax,adjusttext=True,reducer_params=reducer_params, scatter_params=scatter_params)
fig, ax = plt.subplots(1, 1, figsize=(6, 6))

reducer_params = {"n_neighbors": 30, "random_state": 42}
scatter_params = {"s": 100}

dimension_plotter(
embedding=magpie,
reducer="umap",
n_components=2,
ax=ax,
adjusttext=True,
reducer_params=reducer_params,
scatter_params=scatter_params,
)
ax.set_title("Magpie UMAP (n_neighbours=30)")
ax.legend().remove()
handles, labels = ax1.get_legend_handles_labels()
Expand Down Expand Up @@ -149,7 +167,7 @@ The `composition_featuriser` function can be used to featurise the data. The com
```python
from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

df_featurised
```
Expand Down
3 changes: 2 additions & 1 deletion contributing.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Contributing
`# Contributing

This is a quick guide on how to follow best practice and contribute smoothly to `ElementEmbeddings`.

Expand Down Expand Up @@ -49,3 +49,4 @@ pre-commit run --all-files # optionally run hooks on all files
```

Pre-commit hooks will check all files when you commit changes, automatically fixing any files which are not formatted correctly. Those files will need to be staged again before re-attempting the commit.
`
4 changes: 2 additions & 2 deletions docs/embeddings/element.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ The 118 200-dimensional vectors in `random_200_new` were generated using the fol
```python
import numpy as np

mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
```

### skipatom
Expand Down
183 changes: 161 additions & 22 deletions docs/tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,25 +8,150 @@ For simple usage, you can instantiate an Embedding object using one of the embed

```python
# Import the class
>>> from elementembeddings.core import Embedding
from elementembeddings.core import Embedding

# Load the magpie data
>>> magpie = Embedding.load_data('magpie')
magpie = Embedding.load_data("magpie")
```

We can access some of the properties of the `Embedding` class. For example, we can find the dimensions of the elemental representation and the list of elements for which an embedding exists.

```python
# Print out some of the properties of the ElementEmbeddings class
>>> print(f'The magpie representation has embeddings of dimension {magpie.dim}')
>>> print(f'The magpie representation contains these elements: \n {magpie.element_list}') # prints out all the elements considered for this representation
>>> print(f'The magpie representation contains these features: \n {magpie.feature_labels}') # Prints out the feature labels of the chosen representation

The magpie representation has embeddings of dimension 22
The magpie representation contains these elements:
['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk']
The magpie representation contains these features:
['Number', 'MendeleevNumber', 'AtomicWeight', 'MeltingT', 'Column', 'Row', 'CovalentRadius', 'Electronegativity', 'NsValence', 'NpValence', 'NdValence', 'NfValence', 'NValence', 'NsUnfilled', 'NpUnfilled', 'NdUnfilled', 'NfUnfilled', 'NUnfilled', 'GSvolume_pa', 'GSbandgap', 'GSmagmom', 'SpaceGroupNumber']
print(f"The magpie representation has embeddings of dimension {magpie.dim}")
print(
f"The magpie representation contains these elements: \n {magpie.element_list}"
) # prints out all the elements considered for this representation
print(
f"The magpie representation contains these features: \n {magpie.feature_labels}"
) # Prints out the feature labels of the chosen representation

# The magpie representation has embeddings of dimension 22
# The magpie representation contains these elements:
[
"H",
"He",
"Li",
"Be",
"B",
"C",
"N",
"O",
"F",
"Ne",
"Na",
"Mg",
"Al",
"Si",
"P",
"S",
"Cl",
"Ar",
"K",
"Ca",
"Sc",
"Ti",
"V",
"Cr",
"Mn",
"Fe",
"Co",
"Ni",
"Cu",
"Zn",
"Ga",
"Ge",
"As",
"Se",
"Br",
"Kr",
"Rb",
"Sr",
"Y",
"Zr",
"Nb",
"Mo",
"Tc",
"Ru",
"Rh",
"Pd",
"Ag",
"Cd",
"In",
"Sn",
"Sb",
"Te",
"I",
"Xe",
"Cs",
"Ba",
"La",
"Ce",
"Pr",
"Nd",
"Pm",
"Sm",
"Eu",
"Gd",
"Tb",
"Dy",
"Ho",
"Er",
"Tm",
"Yb",
"Lu",
"Hf",
"Ta",
"W",
"Re",
"Os",
"Ir",
"Pt",
"Au",
"Hg",
"Tl",
"Pb",
"Bi",
"Po",
"At",
"Rn",
"Fr",
"Ra",
"Ac",
"Th",
"Pa",
"U",
"Np",
"Pu",
"Am",
"Cm",
"Bk",
]
# The magpie representation contains these features:
[
"Number",
"MendeleevNumber",
"AtomicWeight",
"MeltingT",
"Column",
"Row",
"CovalentRadius",
"Electronegativity",
"NsValence",
"NpValence",
"NdValence",
"NfValence",
"NValence",
"NsUnfilled",
"NpUnfilled",
"NdUnfilled",
"NfUnfilled",
"NUnfilled",
"GSvolume_pa",
"GSbandgap",
"GSmagmom",
"SpaceGroupNumber",
]
```

### Plotting
Expand All @@ -37,26 +162,40 @@ We can quickly generate heatmaps of distance/similarity measures between the ele
from elementembeddings.plotter import heatmap_plotter, dimension_plotter
import matplotlib.pyplot as plt

magpie.standardise(inplace=True) # Standardises the representation
magpie.standardise(inplace=True) # Standardises the representation

fig, ax = plt.subplots(1, 1, figsize=(6,6))
fig, ax = plt.subplots(1, 1, figsize=(6, 6))
heatmap_params = {"vmin": -1, "vmax": 1}
heatmap_plotter(embedding=magpie, metric="cosine_similarity",show_axislabels=False,cmap="Blues_r",ax=ax, **heatmap_params)
heatmap_plotter(
embedding=magpie,
metric="cosine_similarity",
show_axislabels=False,
cmap="Blues_r",
ax=ax,
**heatmap_params
)
ax.set_title("Magpie cosine similarities")
fig.tight_layout()
fig.show()

```

![Magpie cosine similarity heatmap](images/magpie_cosine_sim_heatmap.png)

```python
fig, ax = plt.subplots(1, 1, figsize=(6,6))

reducer_params={"n_neighbors": 30, "random_state":42}
scatter_params = {"s":100}

dimension_plotter(embedding=magpie, reducer="umap",n_components=2,ax=ax,adjusttext=True,reducer_params=reducer_params, scatter_params=scatter_params)
fig, ax = plt.subplots(1, 1, figsize=(6, 6))

reducer_params = {"n_neighbors": 30, "random_state": 42}
scatter_params = {"s": 100}

dimension_plotter(
embedding=magpie,
reducer="umap",
n_components=2,
ax=ax,
adjusttext=True,
reducer_params=reducer_params,
scatter_params=scatter_params,
)
ax.set_title("Magpie UMAP (n_neighbours=30)")
ax.legend().remove()
handles, labels = ax1.get_legend_handles_labels()
Expand Down Expand Up @@ -84,7 +223,7 @@ The `composition_featuriser` function can be used to featurise the data. The com
```python
from elementembeddings.composition import composition_featuriser

df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean","sum"])
df_featurised = composition_featuriser(df, embedding="magpie", stats=["mean", "sum"])

df_featurised
```
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

module_dir = os.path.dirname(os.path.abspath(__file__))

VERSION = "0.6"
VERSION = "0.6.1"
DESCRIPTION = "Element Embeddings"
with open(os.path.join(module_dir, "README.md"), encoding="utf-8") as f:
LONG_DESCRIPTION = f.read()
Expand Down
4 changes: 2 additions & 2 deletions src/elementembeddings/data/element_representations/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ The 118 200-dimensional vectors in `random_200_new` were generated using the fol
```python
import numpy as np

mu , sigma = 0 , 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118,200))
mu, sigma = 0, 1 # mean and standard deviation s = np.random.normal(mu, sigma, 1000)
s = np.random.default_rng(seed=42).normal(mu, sigma, (118, 200))
```

### skipatom
Expand Down
2 changes: 1 addition & 1 deletion src/elementembeddings/plotter.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ def dimension_plotter(
signs = [get_sign(charge) for _, charge in parsed_species]

species_labels = [
rf"$\mathregular{{{element}^{{{abs(charge)}{sign}}}}}}}$"
rf"$\mathregular{{{element}^{{{abs(charge)}{sign}}}}}$"
for (element, charge), sign in zip(parsed_species, signs)
]

Expand Down
3 changes: 3 additions & 0 deletions src/elementembeddings/tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,6 @@ def test_parse_species(self):
assert species.parse_species("Fe1-") == ("Fe", -1)
assert species.parse_species("Fe+") == ("Fe", 1)
assert species.parse_species("Fe-") == ("Fe", -1)
assert species.parse_species("Fe2.5+") == ("Fe", 2.5)
assert species.parse_species("Fe2.5-") == ("Fe", -2.5)
assert species.parse_species("Fe2.555+") == ("Fe", 2.555)
4 changes: 2 additions & 2 deletions src/elementembeddings/utils/species.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ def _parse_species_old(species: str) -> tuple[str, int]:
"""
ele = re.match(r"[A-Za-z]+", species).group(0)

charge_match = re.search(r"\d+", species)
ox_state = int(charge_match.group(0)) if charge_match else 0
charge_match = re.search(r"(\d+\.\d+|\d+)", species)
ox_state = float(charge_match.group(1)) if charge_match else 0

if "-" in species:
ox_state *= -1
Expand Down