Skip to content

Commit

Permalink
Merge pull request #68 from jan-janssen/book
Browse files Browse the repository at this point in the history
Add jupyter book
  • Loading branch information
jan-janssen authored Sep 13, 2024
2 parents b54150f + 748ab37 commit 8b961cc
Show file tree
Hide file tree
Showing 6 changed files with 212 additions and 0 deletions.
23 changes: 23 additions & 0 deletions .github/workflows/book.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: Jupyterbook

on:
pull_request:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: conda-incubator/setup-miniconda@v3
with:
auto-update-conda: true
python-version: "3.11"
auto-activate-base: false
- name: Install Jupyterbook
shell: bash -l {0}
run: |
conda install -y -c conda-forge jupyter-book
jupyter-book build . --path-output public
- run: mv public/_build/html public_html
- run: touch public_html/.nojekyll
30 changes: 30 additions & 0 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: Deploy

on:
push:
branches: [ main ]

jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: conda-incubator/setup-miniconda@v3
with:
auto-update-conda: true
python-version: "3.11"
auto-activate-base: false
- name: Install Jupyterbook
shell: bash -l {0}
run: |
conda install -y -c conda-forge jupyter-book
jupyter-book build . --path-output public
- run: mv public/_build/html public_html
- run: touch public_html/.nojekyll
- name: Deploy 🚀
uses: JamesIves/[email protected]
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BRANCH: gh-pages # The branch the action should deploy to.
FOLDER: public_html # The folder the action should deploy.
CLEAN: true
13 changes: 13 additions & 0 deletions _config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
title: LangSim
author: Jan Janssen

execute:
execute_notebooks : off

repository:
url : https://github.com/jan-janssen/langsim
path_to_book : ""

launch_buttons:
notebook_interface : jupyterlab
binderhub_url : https://mybinder.org
4 changes: 4 additions & 0 deletions _toc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
format: jb-book
root: book/Introduction.md
chapters:
- file: book/Benchmark.md
104 changes: 104 additions & 0 deletions book/Benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Benchmark
The hallucinations of large language models (LLM) limit their application to control atomistic simulation. Especially
it is hard for beginners to judge the results predicted from the LLM. Even when very specific inputs are provided, the
current generation of LLMs struggle to create the input for popular software tools like the [Atomic Simulation Environment (ASE)](https://wiki.fysik.dtu.dk/ase/).

Example Input:
```
Calculate the bulk modulus for aluminium with ASE using the EMT simulation code.
```

## ChatGPT 3.5
On the 2nd of May 2024 replied with the following Python code:
```python
from ase import Atoms
from ase.calculators.emt import EMT
from ase.optimize import BFGS
from ase.io import write
import numpy as np

# Step 1: Create aluminum structure
aluminum = Atoms('Al', [(0, 0, 0)], cell=(5, 5, 5))

# Step 2: Relax the structure
relaxation = BFGS(aluminum)
relaxation.run(fmax=0.01)

# Step 3: Apply hydrostatic pressure
pressures = np.linspace(0, 10, 11) # Example pressures in GPa
energies = []

for pressure in pressures:
volume_scaling_factor = (1 - 3 * pressure * 1e-9) ** (1 / 3)
strained_structure = aluminum.copy()
strained_structure.set_cell(aluminum.cell * volume_scaling_factor, scale_atoms=True)

# Step 4: Calculate energy
strained_structure.set_calculator(EMT())
energy = strained_structure.get_potential_energy()
energies.append(energy)

# Step 5: Fit equation of state
# (Not included here, but you can use scipy.optimize.curve_fit)

# Step 6: Determine bulk modulus
# (From the fitted equation of state)
```
This is wrong in a number of ways:
* No code is provided for Step 5 and Step 6 - a beginner user without prior experience with ASE has no chance to
complete this Python script.
* By applying 11 pressures ranging from 0 to 10, only the right half of the energy volume curve is calculated, so even
with the input to steps 5 and 6 this script would lead to poor predictions.
* Finally, the script has unused imports like `from ase.io import write` and is not using the `calculate_eos()` function
which the ASE provides to simplify energy volume curve calculation.

## Chat GPT 4.0
On the 2nd of May 2024 replied with the following Python code:
```python
from ase.build import bulk
from ase.calculators.emt import EMT
from ase.eos import calculate_eos

# Create a bulk aluminium structure (face-centered cubic)
al = bulk('Al', 'fcc', a=4.05) # You can adjust 'a' to experimental lattice constant

# Set up the EMT calculator
al.calc = EMT()

# Calculate the equation of state
eos = calculate_eos(al)
v0, e0, B = eos.fit()

# Print the bulk modulus (B is in eV/Angstrom^3 which can be converted to GPa)
print(f"Bulk Modulus: {B / 1.60219e-19 * 1e21 / 1e9} GPa")
```
This code uses the `calculate_eos()` function ASE provides to simplify the calculation of the bulk modulus and it is a
complete code example which can be executed without any modification. The only mistake is the unit conversion to
Gigapascal (GPa). The correct unit conversion is:
```python
from ase.units import kJ
print(B / kJ * 1.0e24, 'GPa')
```
This information is even provided in the Documentation of the `calculate_eos()` function:
```
Signature: eos.fit(warn=True)
Docstring:
Calculate volume, energy, and bulk modulus.
Returns the optimal volume, the minimum energy, and the bulk
modulus. Notice that the ASE units for the bulk modulus is
eV/Angstrom^3 - to get the value in GPa, do this::
v0, e0, B = eos.fit()
print(B / kJ * 1.0e24, 'GPa')
```
So the result provided by Chat GPT 4.0 is 90% correct and a scientist without prior knowledge of ASE would be able to
correct the unit conversion, but the risk of disregarding a calculation because of a wrong unit conversion is too high.

## Summary
While the performance improves with increasing training size from ChatGPT 3.5 to 4.0, the risk of small hallucinations
like a wrong unit conversion leading to a wrong calculation result is too high. For science, it is not sufficient to be
right 90% of the time or even 99%.

Based on this experience the LangSim team decided to develop simulation agents which can be called from the LLM to
produce reliable and scientifically correct predictions.
38 changes: 38 additions & 0 deletions book/Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# LangSim
The LangSim projects aims to couple Large Language Models with atomistic simulation and provide a Language Simulation
Engine (LangSim). This project started as part of the [LLM Hackathon for Applications in Materials and Chemistry](https://www.eventbrite.com/e/llm-hackathon-for-applications-in-materials-and-chemistry-tickets-868303598437)
on May 8th 2024 organized by [Benjamin J. Blaiszik](https://github.com/blaiszik) from Argonne
National Laboratory. The LangSim team was able to [win the first prize](https://medium.com/@blaiszik/llms-to-accelerate-discovery-in-materials-science-and-chemistry-refections-on-a-hackathon-b8364ca32242)
sponsored by [RadicalAI](https://www.radical-ai.com).

## Winning submission
The aim of the hackathon was to present a video which demonstrates the functionality of the prototype and highlights the
use case for Large Language Models and their application to materials and chemistry:
[![Demo](https://img.youtube.com/vi/7JFncD9WaIY/0.jpg)](https://www.youtube.com/watch?v=7JFncD9WaIY)

A full list of all submissions is available on [github.com/llmhackathon](https://github.com/llmhackathon).

## Contributors
Lead by: [Jan Janssen](https://github.com/jan-janssen) (Max-Planck Institute for Sustainable Materials)

List of contributors to the LangSim project during the hackathon:
* [Yuan Chiang](https://github.com/chiang-yuan) (UC Berkeley, Lawrence Berkeley National Laboratory)
* [Giuseppe Fisicaro](https://github.com/giuseppefisicaro) (CNR Institute for Microelectronics and Microsystems)
* [Greg Juhasz](https://github.com/gjuhasz) (Tokyo Institute of Technology)
* [Sarom Leang](https://github.com/saromleang) (EP Analytics, Inc.)
* [Bernadette Mohr](https://github.com/Bernadette-Mohr) (FAIRmat — HU Berlin, University of Amsterdam)
* [Utkarsh Pratiush](https://github.com/utkarshp1161) (University of Tennessee, Knoxville)
* [Francesco Ricci](https://github.com/fraricci) (Lawrence Berkeley National Laboratory)
* [Leopold Talirz](https://github.com/ltalirz) (Schott)
* [Pablo Andres Unzueta](https://github.com/pablo-unzueta) (Stanford University)
* [Trung Vo](https://github.com/btrungvo) (University of Illinois Chicago)
* [Gabriel Vogel](https://github.com/GaVogel) (Delft University of Technology)
* [Sebastian Pagel](https://github.com/pagel-s) (University of Glasgow)

Collaboration as part of the [Center for Scientific Foundation Models](https://scifm.ai):
* [Mohammad Babar](https://github.com/mbabar09) (University of Michigan)
* [Ziqi Wang](https://github.com/wuziqiqiqi) (University of Michigan)
* [Hancheng Zhao](https://github.com/hancheng2000) (University of Michigan)

Students at the [Max-Planck Institute for Sustainable Materials](https://www.mpie.de):
* [Kishan Limbasiya](https://github.com/limbasiya521)

0 comments on commit 8b961cc

Please sign in to comment.