Merge pull request #68 from jan-janssen/book

Add jupyter book
jan-janssen · Sep 13, 2024 · 8b961cc · 8b961cc
2 parents b54150f + 748ab37
commit 8b961cc
Show file tree

Hide file tree

Showing 6 changed files with 212 additions and 0 deletions.
diff --git a/.github/workflows/book.yml b/.github/workflows/book.yml
@@ -0,0 +1,23 @@
+name: Jupyterbook
+
+on:
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - uses: conda-incubator/setup-miniconda@v3
+      with:
+        auto-update-conda: true
+        python-version: "3.11"
+        auto-activate-base: false
+    - name: Install Jupyterbook
+      shell: bash -l {0}
+      run: |
+        conda install -y -c conda-forge jupyter-book 
+        jupyter-book build . --path-output public
+    - run: mv public/_build/html public_html
+    - run: touch public_html/.nojekyll
diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -0,0 +1,30 @@
+name: Deploy
+
+on:
+  push:
+    branches: [ main ]
+
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v4
+    - uses: conda-incubator/setup-miniconda@v3
+      with:
+        auto-update-conda: true
+        python-version: "3.11"
+        auto-activate-base: false
+    - name: Install Jupyterbook
+      shell: bash -l {0}
+      run: |
+        conda install -y -c conda-forge jupyter-book 
+        jupyter-book build . --path-output public
+    - run: mv public/_build/html public_html
+    - run: touch public_html/.nojekyll
+    - name: Deploy 🚀
+      uses: JamesIves/[email protected]
+      with:
+        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+        BRANCH: gh-pages # The branch the action should deploy to.
+        FOLDER: public_html # The folder the action should deploy.
+        CLEAN: true
diff --git a/_config.yml b/_config.yml
@@ -0,0 +1,13 @@
+title: LangSim
+author: Jan Janssen
+
+execute:
+  execute_notebooks           : off
+
+repository:
+    url                       : https://github.com/jan-janssen/langsim
+    path_to_book              : ""
+
+launch_buttons:
+  notebook_interface          : jupyterlab
+  binderhub_url               : https://mybinder.org
diff --git a/_toc.yml b/_toc.yml
@@ -0,0 +1,4 @@
+format: jb-book
+root: book/Introduction.md
+chapters:
+- file: book/Benchmark.md
diff --git a/book/Benchmark.md b/book/Benchmark.md
@@ -0,0 +1,104 @@
+# Benchmark
+The hallucinations of large language models (LLM) limit their application to control atomistic simulation. Especially
+it is hard for beginners to judge the results predicted from the LLM. Even when very specific inputs are provided, the
+current generation of LLMs struggle to create the input for popular software tools like the [Atomic Simulation Environment (ASE)](https://wiki.fysik.dtu.dk/ase/).
+
+Example Input:
+```
+Calculate the bulk modulus for aluminium with ASE using the EMT simulation code.
+```
+
+## ChatGPT 3.5
+On the 2nd of May 2024 replied with the following Python code: 
+```python
+from ase import Atoms
+from ase.calculators.emt import EMT
+from ase.optimize import BFGS
+from ase.io import write
+import numpy as np
+
+# Step 1: Create aluminum structure
+aluminum = Atoms('Al', [(0, 0, 0)], cell=(5, 5, 5))
+
+# Step 2: Relax the structure
+relaxation = BFGS(aluminum)
+relaxation.run(fmax=0.01)
+
+# Step 3: Apply hydrostatic pressure
+pressures = np.linspace(0, 10, 11)  # Example pressures in GPa
+energies = []
+
+for pressure in pressures:
+    volume_scaling_factor = (1 - 3 * pressure * 1e-9) ** (1 / 3)
+    strained_structure = aluminum.copy()
+    strained_structure.set_cell(aluminum.cell * volume_scaling_factor, scale_atoms=True)
+
+    # Step 4: Calculate energy
+    strained_structure.set_calculator(EMT())
+    energy = strained_structure.get_potential_energy()
+    energies.append(energy)
+
+# Step 5: Fit equation of state
+# (Not included here, but you can use scipy.optimize.curve_fit)
+
+# Step 6: Determine bulk modulus
+# (From the fitted equation of state)
+```
+This is wrong in a number of ways: 
+* No code is provided for Step 5 and Step 6 - a beginner user without prior experience with ASE has no chance to 
+  complete this Python script. 
+* By applying 11 pressures ranging from 0 to 10, only the right half of the energy volume curve is calculated, so even
+  with the input to steps 5 and 6 this script would lead to poor predictions. 
+* Finally, the script has unused imports like `from ase.io import write` and is not using the `calculate_eos()` function
+  which the ASE provides to simplify energy volume curve calculation. 
+
+## Chat GPT 4.0
+On the 2nd of May 2024 replied with the following Python code: 
+```python
+from ase.build import bulk
+from ase.calculators.emt import EMT
+from ase.eos import calculate_eos
+
+# Create a bulk aluminium structure (face-centered cubic)
+al = bulk('Al', 'fcc', a=4.05)  # You can adjust 'a' to experimental lattice constant
+
+# Set up the EMT calculator
+al.calc = EMT()
+
+# Calculate the equation of state
+eos = calculate_eos(al)
+v0, e0, B = eos.fit()
+
+# Print the bulk modulus (B is in eV/Angstrom^3 which can be converted to GPa)
+print(f"Bulk Modulus: {B / 1.60219e-19 * 1e21 / 1e9} GPa")
+```
+This code uses the `calculate_eos()` function ASE provides to simplify the calculation of the bulk modulus and it is a
+complete code example which can be executed without any modification. The only mistake is the unit conversion to 
+Gigapascal (GPa). The correct unit conversion is: 
+```python
+from ase.units import kJ
+print(B / kJ * 1.0e24, 'GPa')
+```
+This information is even provided in the Documentation of the `calculate_eos()` function:
+```
+Signature: eos.fit(warn=True)
+Docstring:
+Calculate volume, energy, and bulk modulus.
+
+Returns the optimal volume, the minimum energy, and the bulk
+modulus.  Notice that the ASE units for the bulk modulus is
+eV/Angstrom^3 - to get the value in GPa, do this::
+
+  v0, e0, B = eos.fit()
+  print(B / kJ * 1.0e24, 'GPa')
+```
+So the result provided by Chat GPT 4.0 is 90% correct and a scientist without prior knowledge of ASE would be able to 
+correct the unit conversion, but the risk of disregarding a calculation because of a wrong unit conversion is too high.
+
+## Summary
+While the performance improves with increasing training size from ChatGPT 3.5 to 4.0, the risk of small hallucinations
+like a wrong unit conversion leading to a wrong calculation result is too high. For science, it is not sufficient to be
+right 90% of the time or even 99%. 
+
+Based on this experience the LangSim team decided to develop simulation agents which can be called from the LLM to 
+produce reliable and scientifically correct predictions. 
diff --git a/book/Introduction.md b/book/Introduction.md
@@ -0,0 +1,38 @@
+# LangSim 
+The LangSim projects aims to couple Large Language Models with atomistic simulation and provide a Language Simulation 
+Engine (LangSim). This project started as part of the [LLM Hackathon for Applications in Materials and Chemistry](https://www.eventbrite.com/e/llm-hackathon-for-applications-in-materials-and-chemistry-tickets-868303598437)
+on May 8th 2024 organized by [Benjamin J. Blaiszik](https://github.com/blaiszik) from Argonne 
+National Laboratory. The LangSim team was able to [win the first prize](https://medium.com/@blaiszik/llms-to-accelerate-discovery-in-materials-science-and-chemistry-refections-on-a-hackathon-b8364ca32242)
+sponsored by [RadicalAI](https://www.radical-ai.com). 
+
+## Winning submission
+The aim of the hackathon was to present a video which demonstrates the functionality of the prototype and highlights the
+use case for Large Language Models and their application to materials and chemistry: 
+[![Demo](https://img.youtube.com/vi/7JFncD9WaIY/0.jpg)](https://www.youtube.com/watch?v=7JFncD9WaIY)
+
+A full list of all submissions is available on [github.com/llmhackathon](https://github.com/llmhackathon).
+
+## Contributors
+Lead by: [Jan Janssen](https://github.com/jan-janssen) (Max-Planck Institute for Sustainable Materials)
+
+List of contributors to the LangSim project during the hackathon: 
+* [Yuan Chiang](https://github.com/chiang-yuan) (UC Berkeley, Lawrence Berkeley National Laboratory)
+* [Giuseppe Fisicaro](https://github.com/giuseppefisicaro) (CNR Institute for Microelectronics and Microsystems)
+* [Greg Juhasz](https://github.com/gjuhasz) (Tokyo Institute of Technology)
+* [Sarom Leang](https://github.com/saromleang) (EP Analytics, Inc.)
+* [Bernadette Mohr](https://github.com/Bernadette-Mohr) (FAIRmat — HU Berlin, University of Amsterdam)
+* [Utkarsh Pratiush](https://github.com/utkarshp1161) (University of Tennessee, Knoxville)
+* [Francesco Ricci](https://github.com/fraricci) (Lawrence Berkeley National Laboratory)
+* [Leopold Talirz](https://github.com/ltalirz) (Schott)
+* [Pablo Andres Unzueta](https://github.com/pablo-unzueta) (Stanford University)
+* [Trung Vo](https://github.com/btrungvo) (University of Illinois Chicago)
+* [Gabriel Vogel](https://github.com/GaVogel) (Delft University of Technology)
+* [Sebastian Pagel](https://github.com/pagel-s) (University of Glasgow)
+
+Collaboration as part of the [Center for Scientific Foundation Models](https://scifm.ai): 
+* [Mohammad Babar](https://github.com/mbabar09) (University of Michigan)
+* [Ziqi Wang](https://github.com/wuziqiqiqi) (University of Michigan)
+* [Hancheng Zhao](https://github.com/hancheng2000) (University of Michigan)
+
+Students at the [Max-Planck Institute for Sustainable Materials](https://www.mpie.de):
+* [Kishan Limbasiya](https://github.com/limbasiya521)