Skip to content

Commit

Permalink
Merge branch 'main' into add_adsorp
Browse files Browse the repository at this point in the history
  • Loading branch information
WardLT committed Oct 24, 2023
2 parents 70ff68f + c7e87d8 commit 7136722
Show file tree
Hide file tree
Showing 50 changed files with 1,668 additions and 1,152 deletions.
13 changes: 10 additions & 3 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,18 @@ jobs:

steps:
- uses: actions/checkout@v3
- name: Set up environment
uses: conda-incubator/setup-miniconda@v2
- uses: conda-incubator/setup-miniconda@v2
with:
environment-file: ${{ matrix.os == 'ubuntu-latest' && 'envs/environment-cpu.yml' || 'envs/environment-macos.yml' }}
mamba-version: ${{ matrix.os == 'ubuntu-latest' && '*' || null }}
activate-environment: test
auto-activate-base: true
auto-update-conda: false
remove-profiles: true
architecture: x64
clean-patched-environment-file: true
run-post: true
use-mamba: true
miniforge-version: latest
- name: Display Environment
run: conda list
- name: Install test dependencies
Expand Down
1 change: 1 addition & 0 deletions docs/api/examol.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ API Documentation
examol.score
examol.select
examol.simulate
examol.solution
examol.specify
examol.start
examol.steer
Expand Down
7 changes: 7 additions & 0 deletions docs/api/examol.solution.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
examol.solution
===============

.. automodule:: examol.solution
:members:
:undoc-members:
:show-inheritance:
8 changes: 8 additions & 0 deletions docs/api/examol.steer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@ examol.steer.base
:undoc-members:
:show-inheritance:

examol.steer.baseline
---------------------

.. automodule:: examol.steer.baseline
:members:
:undoc-members:
:show-inheritance:

examol.steer.single
-------------------

Expand Down
22 changes: 22 additions & 0 deletions docs/api/examol.store.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,28 @@ examol.store
:members:
:show-inheritance:

examol.store.db
---------------

.. automodule:: examol.store.db
:members:
:show-inheritance:

examol.store.db.base
--------------------

.. automodule:: examol.store.db.base
:members:
:show-inheritance:

examol.store.db.memory
----------------------

.. automodule:: examol.store.db.memory
:members:
:show-inheritance:


examol.store.models
-------------------

Expand Down
1 change: 1 addition & 0 deletions docs/components/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ The ExaMol library is built around components each dedicated to different aspect
score
select
simulate
solution
start
steer
store
21 changes: 21 additions & 0 deletions docs/components/solution.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
Solution
========

The Solution modules of an ExaMol specification contain the components used for different strategies of optimizing a material.
Descriptions of a solution use different ExaMol components (e.g., `Scorer classes <score.html>`_)
and the same solution can be enacted with different `Steering strategies <steer.html>`_.

Available Methods
-----------------

ExaMol provides multiple solution methods, each described using a different class.

.. list-table::
:header-rows: 1

* - Class
- Description
* - :class:`~examol.specify.base.SolutionSpecification`
- The base strategy. Lacks any strategy for using new data to select the next computations.
* - :class:`~examol.specify.solution.SingleFidelityActiveLearning`
- Use predictions from machine learning models to select the next computation.
18 changes: 8 additions & 10 deletions docs/components/start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ there is enough data available to train a machine learning model.
Available Methods
-----------------

ExaMol provides a few different start methods
ExaMol provides a few different start methods, each with a maximum recommended search space size.

.. list-table::
:header-rows: 1
Expand All @@ -25,16 +25,14 @@ ExaMol provides a few different start methods
Using a Starter
---------------

All :class:`~examol.start.base.Starter` methods require setting
the dataset size under which it will be run,
and the maximum number of molecules to consider.

There is an (optional) threshold on the size of molecules to consider as ExaMol is intended to be used
for enormous search spaces.

Once defined, provide an iterator over the names of molecules to consider:
Simply provide an iterator over the names of molecules to consider:

.. code-block:: python
starter = RandomStarter(threshold=4)
starter = RandomStarter()
starting_pool = starter.select(['C', 'O', 'N'], 2) # Will generate two choices
The starter will provide a list of SMILES strings from those that were provided.

Increase the speed of selection by setting the ``max_to_consider`` option of the Starter,
which will truncate the list of molecules strings at a specific size before running the selection algorithm.
25 changes: 22 additions & 3 deletions docs/components/steer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,40 @@ Steer
ExaMol scales to use large supercomputers by managing many tasks together.
The logic for when to launch tasks and how to process completed tasks are defined
as `Colmena <https://colmena.readthedocs.io/>`_ "Thinker" classes.
ExaMol will contain several different Thinkers, which each use different strategies
ExaMol contains several different Thinkers, which each use different strategies
for deploying tasks on a supercomputer.

Available Methods
-----------------

Each steering strategy is associated with a specific `Solution strategy <solution.html>`_.

.. list-table::
:header-rows: 1

* - Class
- Solution
- Description
* - :class:`~examol.steer.baseline.BruteForceThinker`
- :class:`~examol.specify.base.SolutionSpecification`
- Evaluate all molecules in an initial population
* - :class:`~examol.steer.single.SingleStepThinker`
- :class:`~examol.specify.solution.SingleFidelityActiveLearning`
- Run all recipes for each selected molecule


Single Objective Thinker as an Example
--------------------------------------

The :class:`~examol.steer.single.SingleStepThinker` is a good example for explaining how Thinkers work in ExaMol.

The strategy for this thinker is three parts:

#. Never leave nodes on the super
#. Never leave nodes on the supercomputer idle
#. Update the list of selected calculations with new data as quickly as possible
#. Wait until resources are free until submitting the next calculation.

This strategy is achieved by writing out a series of simple policies, such as:
This strategy is achieved by a series of simple policies, such as:

- Submit a new quantum chemistry calculation when another completes
- Begin re-training models as soon as a recipe is complete for any molecule
Expand Down
41 changes: 34 additions & 7 deletions docs/components/store.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,39 @@ Store

The Store module handles capturing data about molecules and using collected data to compute derived properties.

Data Stores
-----------

ExaMol provides access to the data via a :class:`~examol.store.db.base.MoleculeStore` interface.
There are several implementations of these types of stores, each with different use cases.

.. list-table::
:header-rows: 1

* - Store
- Description
* - :class:`~examol.store.db.memory.InMemoryStore`
- Store all data in memory, periodically write to a single file

Using a Data Store
++++++++++++++++++

All ``MoleculeStore`` classes are used the same way.
The most important note is that all operations for a store should be performed inside a context,
which ensures that the operations on the dataset will be persisted to disk after the context closes.

.. code-block:: python
with InMemoryStore('db.json') as store:
for record in records:
store.update_record(record)
print(f'The store contains {len(store)} records')
It is also important to know that the store is thread safe but the
store may not be accessed from separate processes.
Specifically, any data written to a store with the :meth:`~examol.store.db.base.MoleculeStore.update_record`
method will be available to all threads before the call exits.

Data Models
-----------

Expand All @@ -18,7 +51,7 @@ Each of the Conformer objects are different geometries\ [2]_ and we store the en

Create a record and populate information about it by
creating a blank Record from a molecule identifier (i.e., SMILES)
then providing a simulation result to its `add_energies` method.
then providing a simulation result to its ``add_energies`` method.

.. code-block:: python
Expand All @@ -41,12 +74,6 @@ For example, ExaMol provides a utility operation for finding the lowest-energy c
assert isclose(energy, -1)
assert conf.xyz.startswith('5\nmethane\n0.0000')
Technical Details
~~~~~~~~~~~~~~~~~

The data models are implemented as MongoEngine :class:`~mongoengine.Document` objects
so that they are easy to store in MongoDB, convert to JSON objects, etc.

Recipes
-------

Expand Down
77 changes: 52 additions & 25 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Let us consider the
as a way to learn how to use ExaMol.

.. note::
This example assumes you installed xTB and other optional dependencies.
This example assumes you installed MOPAC and other optional dependencies.
We recommend you install the `CPU version of ExaMol via Anaconda <installation#recommended-anaconda>`_.

Running ExaMol
Expand All @@ -22,7 +22,7 @@ within that file which is the specification object.
examol run examples/redoxmers/spec.py:spec
ExaMol will start writing logging messages to the screen to tell you what it is doing,
which is first to execute the specification file and load in the file you want
which starts with loading the specification

.. code-block::
Expand Down Expand Up @@ -72,15 +72,20 @@ A simple example looks something like:
.. code-block:: python
recipe = RedoxEnergy(charge=1, compute_config='xtb') # What we're trying to optimize
spec = ExaMolSpecification(
database='training-data.json',
recipes=[recipe], # ExaMol supports multi-objective optimization
search_space=['search_space.smi'],
selector=GreedySelector(n_to_select=8, maximize=True),
simulator=ASESimulator(scratch_dir='/tmp'),
solution = SingleFidelityActiveLearning( # How we are going to optimize it
starter=RandomStarter(),
minimum_training_size=4,
scorer=RDKitScorer(),
models=[[KNeighborsRegressor()]], # Ensemble of models for each recipe
selector=GreedySelector(10, maximize=True),
num_to_run=8,
)
spec = ExaMolSpecification( # How to set up ExaMol
database=(my_path / 'training-data.json'),
recipes=[recipe],
search_space=[(my_path / 'search_space.smi')],
solution=solution,
simulator=ASESimulator(scratch_dir='./tmp'),
thinker=SingleStepThinker,
thinker_options=dict(num_workers=2),
compute_config=config,
Expand All @@ -95,9 +100,9 @@ Quantum Chemistry

The ``recipes`` and ``simulator`` options define which molecule property to compute
and an interface for ExaMol to compute it, respectively.

Both recipes and simulator are designed to ensure all calculations in a set are performed with consistent settings.
ExaMol defines a set of pre-defined levels of accuracies, which are enumerated in

ExaMol defines a set quantum chemistry methods, which are accessible via the Simulator and enumerated in
`the Simulate documentation <components/simulate.html#levels>`_.

Recipes are based on the :class:`~examol.store.recipes.base.PropertyRecipe` class,
Expand All @@ -115,24 +120,48 @@ See how to create one in the `Simulate documentation <components/simulate.html#t
Starting Data
~~~~~~~~~~~~~

The starting data for a project is a line-delimited JSON describing what molecular properties are already known.
Each line of the file is a different molecule, with data following the :class:`~examol.store.models.MoleculeRecord` format.
The starting data for this project is a line-delimited JSON file describing what molecular properties are already known.
Each line is a different molecule, with data following the :class:`~examol.store.models.MoleculeRecord` format.

ExaMol supports a few different kinds of stores for molecule data.
Learn more in the `Store documentation <components/store.html>`_.

Use a `starter <components/start.html>`_ method if your dataset is too small to train machine learning models.
Search Space
~~~~~~~~~~~~

The ``search_space`` parameter defines a list of molecules from which to search.
It expects a list of files that are either ``*.smi`` files containing a list of smiles strings
or a ``*.json`` file containing a list of ``MoleculeRecord``.
Either type of file can be compressed using GZIP.

Solution Strategy
~~~~~~~~~~~~~~~~~

There are many ways to solve an optimization problem, and ExaMol provides :class:`~examol.specify.base.SolutionSpecification`
classes to describe different routes.
Solution classes themselves use common components and
:class:`~examol.specify.solution.SingleFidelityActiveLearning` uses all of the major cones.

Starting
++++++++

`Starter <components/start.html>`_ methods are used when a dataset is too small to train machine learning models.
The solution specification includes a :class:`~examol.start.base.Starter` class and
a ``minimum_training_size`` to define when to start using machine learning.
The default for ExaMol is to train so long as there are 10 molecules available for training,
and select computations randomly by default.
and select computations randomly for smaller datasets.

.. tip::

We recommend creating the initial database by running a seed set of molecules with a purpose-built scripts.
See our `validation scripts from the redoxmer example <https://github.com/exalearn/ExaMol/tree/main/scripts/redoxmers/check-chemistry-settings>`_
See `scripts from the redoxmer example <https://github.com/exalearn/ExaMol/tree/main/scripts/redoxmers/2_initial-data>`_
to see how to run simulations outside of the ``examol`` CLI then compile them into a database.

Machine Learning
~~~~~~~~~~~~~~~~

ExaMol uses machine learning (ML) to estimate the output of computations.
The specification requires you to define an interface to run machine learning models (``scorer``) and
The solution specification requires you to define an interface to run machine learning models (``scorer``) and
then a set of models (``models``) to be trained using that interface.

The Scorer, like the `Simulator used in quantum chemistry <#quantum-chemistry>`_, defines an interface
Expand All @@ -145,16 +174,14 @@ Each model for each recipe will be trained using a different subset of the train
and the predictions of all models will be combined to produce predictions with uncertainties for each molecule.

Search Algorithm
~~~~~~~~~~~~~~~~

The design process is defined by the space of molecules (``search_space``),
how to search through them (``selector``),
and how many quantum chemistry computations will be run (``num_to_run``).
++++++++++++++++

The ``search_space`` option requires the path to a list of SMILES strings as a list of files.
A search algorithm is defined by how to search (``selector``),
and how many quantum chemistry computations to run (``num_to_run``).

The selector defines an adaptive experimental design algorithm -- an algorithm which uses the predictions
The ``selector`` defines an adaptive experimental design algorithm -- an algorithm which uses the predictions
from machine learning models to identify the best computations.

ExaMol includes `several selection routines <components/select.html#available-selectors>`_.

Steering Strategy
Expand All @@ -163,8 +190,8 @@ Steering Strategy
The ``thinker`` provides the core capability behind ExaMol scaling to large supercomputers:
the ability to schedule many different different tasks at once.
A Thinker strategy defines when to submit new tasks and what to do once they complete.
There is only one strategy available in ExaMol right now, :class:`~examol.steer.single.SingleStepThinker`,
but more will become available as we build the library.
For example, the :class:`~examol.steer.single.SingleStepThinker` runs all calculations for all recipes
for each molecule when it is selected by the ``selector``.

Learn more in the `component documentation <components/steer.html>`_.

Expand Down
Loading

0 comments on commit 7136722

Please sign in to comment.