-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #68 from jan-janssen/book
Add jupyter book
- Loading branch information
Showing
6 changed files
with
212 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
name: Jupyterbook | ||
|
||
on: | ||
pull_request: | ||
branches: [ main ] | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: conda-incubator/setup-miniconda@v3 | ||
with: | ||
auto-update-conda: true | ||
python-version: "3.11" | ||
auto-activate-base: false | ||
- name: Install Jupyterbook | ||
shell: bash -l {0} | ||
run: | | ||
conda install -y -c conda-forge jupyter-book | ||
jupyter-book build . --path-output public | ||
- run: mv public/_build/html public_html | ||
- run: touch public_html/.nojekyll |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
name: Deploy | ||
|
||
on: | ||
push: | ||
branches: [ main ] | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- uses: conda-incubator/setup-miniconda@v3 | ||
with: | ||
auto-update-conda: true | ||
python-version: "3.11" | ||
auto-activate-base: false | ||
- name: Install Jupyterbook | ||
shell: bash -l {0} | ||
run: | | ||
conda install -y -c conda-forge jupyter-book | ||
jupyter-book build . --path-output public | ||
- run: mv public/_build/html public_html | ||
- run: touch public_html/.nojekyll | ||
- name: Deploy 🚀 | ||
uses: JamesIves/[email protected] | ||
with: | ||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
BRANCH: gh-pages # The branch the action should deploy to. | ||
FOLDER: public_html # The folder the action should deploy. | ||
CLEAN: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
title: LangSim | ||
author: Jan Janssen | ||
|
||
execute: | ||
execute_notebooks : off | ||
|
||
repository: | ||
url : https://github.com/jan-janssen/langsim | ||
path_to_book : "" | ||
|
||
launch_buttons: | ||
notebook_interface : jupyterlab | ||
binderhub_url : https://mybinder.org |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
format: jb-book | ||
root: book/Introduction.md | ||
chapters: | ||
- file: book/Benchmark.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Benchmark | ||
The hallucinations of large language models (LLM) limit their application to control atomistic simulation. Especially | ||
it is hard for beginners to judge the results predicted from the LLM. Even when very specific inputs are provided, the | ||
current generation of LLMs struggle to create the input for popular software tools like the [Atomic Simulation Environment (ASE)](https://wiki.fysik.dtu.dk/ase/). | ||
|
||
Example Input: | ||
``` | ||
Calculate the bulk modulus for aluminium with ASE using the EMT simulation code. | ||
``` | ||
|
||
## ChatGPT 3.5 | ||
On the 2nd of May 2024 replied with the following Python code: | ||
```python | ||
from ase import Atoms | ||
from ase.calculators.emt import EMT | ||
from ase.optimize import BFGS | ||
from ase.io import write | ||
import numpy as np | ||
|
||
# Step 1: Create aluminum structure | ||
aluminum = Atoms('Al', [(0, 0, 0)], cell=(5, 5, 5)) | ||
|
||
# Step 2: Relax the structure | ||
relaxation = BFGS(aluminum) | ||
relaxation.run(fmax=0.01) | ||
|
||
# Step 3: Apply hydrostatic pressure | ||
pressures = np.linspace(0, 10, 11) # Example pressures in GPa | ||
energies = [] | ||
|
||
for pressure in pressures: | ||
volume_scaling_factor = (1 - 3 * pressure * 1e-9) ** (1 / 3) | ||
strained_structure = aluminum.copy() | ||
strained_structure.set_cell(aluminum.cell * volume_scaling_factor, scale_atoms=True) | ||
|
||
# Step 4: Calculate energy | ||
strained_structure.set_calculator(EMT()) | ||
energy = strained_structure.get_potential_energy() | ||
energies.append(energy) | ||
|
||
# Step 5: Fit equation of state | ||
# (Not included here, but you can use scipy.optimize.curve_fit) | ||
|
||
# Step 6: Determine bulk modulus | ||
# (From the fitted equation of state) | ||
``` | ||
This is wrong in a number of ways: | ||
* No code is provided for Step 5 and Step 6 - a beginner user without prior experience with ASE has no chance to | ||
complete this Python script. | ||
* By applying 11 pressures ranging from 0 to 10, only the right half of the energy volume curve is calculated, so even | ||
with the input to steps 5 and 6 this script would lead to poor predictions. | ||
* Finally, the script has unused imports like `from ase.io import write` and is not using the `calculate_eos()` function | ||
which the ASE provides to simplify energy volume curve calculation. | ||
|
||
## Chat GPT 4.0 | ||
On the 2nd of May 2024 replied with the following Python code: | ||
```python | ||
from ase.build import bulk | ||
from ase.calculators.emt import EMT | ||
from ase.eos import calculate_eos | ||
|
||
# Create a bulk aluminium structure (face-centered cubic) | ||
al = bulk('Al', 'fcc', a=4.05) # You can adjust 'a' to experimental lattice constant | ||
|
||
# Set up the EMT calculator | ||
al.calc = EMT() | ||
|
||
# Calculate the equation of state | ||
eos = calculate_eos(al) | ||
v0, e0, B = eos.fit() | ||
|
||
# Print the bulk modulus (B is in eV/Angstrom^3 which can be converted to GPa) | ||
print(f"Bulk Modulus: {B / 1.60219e-19 * 1e21 / 1e9} GPa") | ||
``` | ||
This code uses the `calculate_eos()` function ASE provides to simplify the calculation of the bulk modulus and it is a | ||
complete code example which can be executed without any modification. The only mistake is the unit conversion to | ||
Gigapascal (GPa). The correct unit conversion is: | ||
```python | ||
from ase.units import kJ | ||
print(B / kJ * 1.0e24, 'GPa') | ||
``` | ||
This information is even provided in the Documentation of the `calculate_eos()` function: | ||
``` | ||
Signature: eos.fit(warn=True) | ||
Docstring: | ||
Calculate volume, energy, and bulk modulus. | ||
Returns the optimal volume, the minimum energy, and the bulk | ||
modulus. Notice that the ASE units for the bulk modulus is | ||
eV/Angstrom^3 - to get the value in GPa, do this:: | ||
v0, e0, B = eos.fit() | ||
print(B / kJ * 1.0e24, 'GPa') | ||
``` | ||
So the result provided by Chat GPT 4.0 is 90% correct and a scientist without prior knowledge of ASE would be able to | ||
correct the unit conversion, but the risk of disregarding a calculation because of a wrong unit conversion is too high. | ||
|
||
## Summary | ||
While the performance improves with increasing training size from ChatGPT 3.5 to 4.0, the risk of small hallucinations | ||
like a wrong unit conversion leading to a wrong calculation result is too high. For science, it is not sufficient to be | ||
right 90% of the time or even 99%. | ||
|
||
Based on this experience the LangSim team decided to develop simulation agents which can be called from the LLM to | ||
produce reliable and scientifically correct predictions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# LangSim | ||
The LangSim projects aims to couple Large Language Models with atomistic simulation and provide a Language Simulation | ||
Engine (LangSim). This project started as part of the [LLM Hackathon for Applications in Materials and Chemistry](https://www.eventbrite.com/e/llm-hackathon-for-applications-in-materials-and-chemistry-tickets-868303598437) | ||
on May 8th 2024 organized by [Benjamin J. Blaiszik](https://github.com/blaiszik) from Argonne | ||
National Laboratory. The LangSim team was able to [win the first prize](https://medium.com/@blaiszik/llms-to-accelerate-discovery-in-materials-science-and-chemistry-refections-on-a-hackathon-b8364ca32242) | ||
sponsored by [RadicalAI](https://www.radical-ai.com). | ||
|
||
## Winning submission | ||
The aim of the hackathon was to present a video which demonstrates the functionality of the prototype and highlights the | ||
use case for Large Language Models and their application to materials and chemistry: | ||
[![Demo](https://img.youtube.com/vi/7JFncD9WaIY/0.jpg)](https://www.youtube.com/watch?v=7JFncD9WaIY) | ||
|
||
A full list of all submissions is available on [github.com/llmhackathon](https://github.com/llmhackathon). | ||
|
||
## Contributors | ||
Lead by: [Jan Janssen](https://github.com/jan-janssen) (Max-Planck Institute for Sustainable Materials) | ||
|
||
List of contributors to the LangSim project during the hackathon: | ||
* [Yuan Chiang](https://github.com/chiang-yuan) (UC Berkeley, Lawrence Berkeley National Laboratory) | ||
* [Giuseppe Fisicaro](https://github.com/giuseppefisicaro) (CNR Institute for Microelectronics and Microsystems) | ||
* [Greg Juhasz](https://github.com/gjuhasz) (Tokyo Institute of Technology) | ||
* [Sarom Leang](https://github.com/saromleang) (EP Analytics, Inc.) | ||
* [Bernadette Mohr](https://github.com/Bernadette-Mohr) (FAIRmat — HU Berlin, University of Amsterdam) | ||
* [Utkarsh Pratiush](https://github.com/utkarshp1161) (University of Tennessee, Knoxville) | ||
* [Francesco Ricci](https://github.com/fraricci) (Lawrence Berkeley National Laboratory) | ||
* [Leopold Talirz](https://github.com/ltalirz) (Schott) | ||
* [Pablo Andres Unzueta](https://github.com/pablo-unzueta) (Stanford University) | ||
* [Trung Vo](https://github.com/btrungvo) (University of Illinois Chicago) | ||
* [Gabriel Vogel](https://github.com/GaVogel) (Delft University of Technology) | ||
* [Sebastian Pagel](https://github.com/pagel-s) (University of Glasgow) | ||
|
||
Collaboration as part of the [Center for Scientific Foundation Models](https://scifm.ai): | ||
* [Mohammad Babar](https://github.com/mbabar09) (University of Michigan) | ||
* [Ziqi Wang](https://github.com/wuziqiqiqi) (University of Michigan) | ||
* [Hancheng Zhao](https://github.com/hancheng2000) (University of Michigan) | ||
|
||
Students at the [Max-Planck Institute for Sustainable Materials](https://www.mpie.de): | ||
* [Kishan Limbasiya](https://github.com/limbasiya521) |