v1.0.0 (#1)

First release of jDAS
martijnende · Sep 8, 2021 · cb4d31a · cb4d31a
1 parent fb51ca1
commit cb4d31a
Show file tree

Hide file tree

Showing 33 changed files with 2,431 additions and 28 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,5 @@
+.virtual_documents/
+
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]

diff --git a/.travis.yml b/.travis.yml
@@ -0,0 +1,24 @@
+language: python
+
+python:
+  - 3.8
+
+before_install:
+  # Install the latest version of Miniconda
+  - wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
+  - chmod +x miniconda.sh
+  - ./miniconda.sh -b
+  - export PATH=/home/travis/miniconda2/bin:$PATH
+  - conda update --yes conda  # Update CONDA without command line prompt
+
+install:
+  # Create a new Conda environment
+  - conda create --yes -n test python=$TRAVIS_PYTHON_VERSION
+  # Activate it
+  - source activate test
+  # Install various dependencies
+  - conda install --yes -c conda-forge "tensorflow-gpu>=2.2.0" numpy scipy
+
+script:
+  - cd $TRAVIS_BUILD_DIR/test/
+  - "travis_wait python build_test.py"
diff --git a/README.md b/README.md
@@ -1,2 +1,96 @@
-# jDAS
-Coherence-based Deep Learning denoising of DAS data
+<p align="center">
+    <img src="docs/source/img/jDAS_logo.svg" alt="jDAS logo" height="200" />
+</p>
+
+# Deep Learning denoising of DAS data
+
+[![Documentation Status](https://readthedocs.org/projects/jdas/badge/?version=latest)](https://jdas.readthedocs.io/en/latest/?badge=latest)
+
+--------------
+
+Contents: [overview](#overview) | [example](#example) | [quickstart](#quickstart) | [5-minute explainer video](#explainer-video) | [citing _jDAS_](#citing-jdas)
+
+--------------
+
+## Overview
+
+_jDAS_ is a self-supervised Deep Learning model for denoising of Distributed Acoustic Sensing (DAS) data. The principle that underlies _jDAS_ is that spatio-temporally coherent signals can be interpolated, while incoherent noise cannot. Leveraging the framework laid out by Batson & Royer ([2019; ICML](http://arxiv.org/abs/1901.11365)), _jDAS_ predicts the recordings made at a target channel using the target's surrounding channels. As such, it is a self-supervised method that does not require "clean" (noise-free) waveforms as labels. 
+
+Retraining the model on new data is quick and easy, and will produce an optimal separation between coherent signals and incoherent noise for your specific dataset:
+```
+from jDAS import JDAS
+jdas = JDAS()
+data_loader = jdas.init_dataloader(data)
+model = jdas.load_model()
+model.fit(data_loader, epochs=50)
+```
+Denoising your data is then done through a single function call:
+```
+clean_data = JDAS.denoise(data)
+```
+That's all!
+
+For a more detailed description of the methods, see the [documentation](https://jdas.readthedocs.io/). In-depth examples on *jDAS* denoising and retraining are provided in the `examples` directory.
+
+--------------
+
+## Example
+
+The example below is taken from a submarine DAS experiment conducted offshore Greece. At around 25 seconds and earthquake hits the DAS cable and induces a spatio-temporally coherent strain field. _jDAS_ removes the incoherent background noise while keeping the earthquake signals.
+
+<p align="center">
+    <img src="docs/source/img/jDAS_example.jpg" alt="Example of jDAS denoising performance" />
+</p>
+
+Note that some aliasing artifacts have been introduced in rendering this static JPEG. A code example to reproduce this figure is included in the `examples` directory of the project.
+
+--------------
+
+## Quickstart
+
+_jDAS_ depends on the following Python libraries:
+
+- [TensorFlow](https://www.tensorflow.org/) (`>= 2.2.0`): while training and inference is much faster on a GPU, the CPU version of TensorFlow is sufficient in case problems arise installing the CUDA dependencies.
+- [NumPy](https://numpy.org/) and [SciPy](https://scipy.org/) for numerical manipulations.
+- [Matplotlib](https://matplotlib.org/) for visualisation.
+- [h5py](https://www.h5py.org/) for IO.
+- (Optional) [Jupyter](https://jupyter.org/) notebook or lab to run the examples
+
+All of these dependencies can be installed with [Anaconda](https://www.anaconda.com/products/individual):
+```
+conda install -c conda-forge numpy scipy matplotlib h5py "tensorflow-gpu>=2.2.0"
+```
+
+To obtain the _jDAS_ source code, you can pull it directly from the GitHub repository:
+```
+git clone https://github.com/martijnende/jDAS.git
+```
+No additional building is required. To test the installation, try running one of the examples Jupyter notebooks in the `examples` directory.
+
+Please open a ticket under the tab "Issues" on the GitHub repository if you have trouble setting-up _jDAS_.
+
+--------------
+
+## Explainer video
+
+<p align="center">
+    <a href="https://youtu.be/9NNElFOIzK8">
+        <img src="docs/source/img/jDAS_youtube.png" alt="Example of jDAS denoising performance" />
+    </a>
+</p>
+
+--------------
+
+## Citing _jDAS_
+
+For use of _jDAS_ in scientific applications, please consider citing the following publication:
+
+```
+@article{vandenEnde2021,
+    author={van den Ende, Martijn Peter Anton and Lior, Itzhak and Ampuero, Jean-Paul and Sladen, Anthony and Ferrari, André and Richard, Cédric},
+    title={A Self-Supervised Deep Learning Approach for Blind Denoising and Waveform Coherence Enhancement in Distributed Acoustic Sensing Data}, 
+    publisher={EarthArxiv}, doi={10.31223/X55K63}, year={2021}, volume={0}
+}
+```
+
+<!-- An identical preprint is available from EarthArxiv: https://eartharxiv.org/repository/view/2136/ -->
diff --git a/docs/Makefile b/docs/Makefile
@@ -5,8 +5,8 @@
 # from the environment for the first two.
 SPHINXOPTS    ?=
 SPHINXBUILD   ?= sphinx-build
-SOURCEDIR     = .
-BUILDDIR      = _build
+SOURCEDIR     = source
+BUILDDIR      = build
 
 # Put it first so that "make" without argument is like "make help".
 help:

diff --git a/docs/index.rst b/docs/index.rst
diff --git a/docs/make.bat b/docs/make.bat
@@ -7,8 +7,8 @@ REM Command file for Sphinx documentation
 if "%SPHINXBUILD%" == "" (
 	set SPHINXBUILD=sphinx-build
 )
-set SOURCEDIR=.
-set BUILDDIR=_build
+set SOURCEDIR=source
+set BUILDDIR=build
 
 if "%1" == "" goto help
 

diff --git a/docs/conf.py → docs/source/conf.py b/docs/conf.py → docs/source/conf.py
@@ -36,7 +36,7 @@
 # List of patterns, relative to source directory, that match files and
 # directories to ignore when looking for source files.
 # This pattern also affects html_static_path and html_extra_path.
-exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']
+exclude_patterns = []
 
 
 # -- Options for HTML output -------------------------------------------------
@@ -49,4 +49,8 @@
 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
-html_static_path = ['_static']
+html_static_path = ['_static']
+
+html_theme_options = {
+#     "logo": "J_logo_small.svg",
+}
diff --git a/docs/source/details.rst b/docs/source/details.rst
@@ -0,0 +1,55 @@
+Technical details
+-----------------
+
+This section describes first qualitatively, then quantitatively, the underlying principles of *jDAS*. These descriptions are an interpretation of the framework laid out by Batson & Royer (`2019; ICML <http://arxiv.org/abs/1901.11365>`_), who developed a detailed and thorough body of theory with additional proofs and numerical demonstrations. In this section we will restrict ourselves to the application of :math:`J`-invariance in Deep Learning.
+
+
+Qualitative description
+=======================
+
+Consider an example of a checkerboard. If someone were to cover a part of the checkerboard with a piece of paper, you would still be able to predict the pattern that is hidden by the paper with great accuracy. This is because the checkerboard pattern exhibits long-range correlations that can be used for interpolation (and extrapolation). On the other hand, if someone were to introduce a random speckle pattern onto the checkerboard, the details of the speckle pattern underneath the cover paper cannot be predicted; the speckle pattern exhibits no spatial correlations, and an observation of the speckles at one location cannot be used to inform predictions about a different location.
+
+.. _concept-figure:
+
+.. figure:: img/jDAS_concept.png
+    :width: 100%
+    :align: center
+
+    Fig. 1: concept of :math:`J`-invariance underlying *jDAS*
+
+This notion that long-range correlations can be interpolated, while short-range correlations cannot, is what drives *jDAS*. Imagine that you'd be given a checkerboard sprayed with a black-and-white speckle pattern (for clarity shown in red and cyan in Fig. 1), but that a few tiles are missing. You are then tasked to reconstruct those tiles as accurately as possible. Aside from reconstructing the missing tiles, you could decide to add a self-made speckle pattern on top. But because you don't know *exactly* which speckle goes where, you will likely never guess all the speckles correctly. The best you can do to reconstruct the missing tiles is to estimate the *average* of the speckles, which is zero if you assume that black and white speckles cancel each other out. Hence, your reconstruction will be informed by the long-range patterns on the checkerboard, but does not include all of the individual speckles. If you now repeat this procedure for different parts of the checkerboard, reconstructing a few tiles at a time, you end up with a reconstruction with no speckle noise.
+
+A similar idea underlies the *jDAS* filtering approach. Given some DAS data with a spatial and a temporal component, a Deep Learning model can learn to extract correlated patterns in the data, and use those to interpolate gaps in the data. If we create a gap in the data and ask the *jDAS* model to predict what is inside the gap, and systematically repeat this procedure such that all the data points are "gapped" once, we can collect the Deep Learning predictions for each gap, and put them together to make a noise-free reconstruction of the DAS data. And note that this procedure is entirely based on the presence (or absence) or coherent patterns; we do not need to know *a-priori* what the noise-free data actually look like. This renders *jDAS* a so-called "self-supervised" Deep Learning method. The main advantage over "supervised" methods (for which you need to know what the clean data look like) is that you can easily retrain the model on new data (for instance: a new DAS experiment in a different location).
+
+
+Quantitative description
+========================
+
+To make the above description more quantitative and precise, define a feature-space partition :math:`J`. In the case of an image, the feature-space is defined by the pixels, so :math:`J` would represent a patch of pixels. The values of the pixels in :math:`J` are collectively denoted by :math:`x_J`. Let's now define some function :math:`f: x \rightarrow y`, which takes :math:`x` as an argument and produces some output :math:`y`. We say that this function is :math:`J`-invariant if :math:`f(x)_J = y_J` does not depend on :math:`x_J`.
+
+To bring this definition back to the example of the checkerboard, the colour of the tiles (including the speckles) at a given location is denoted by :math:`x`, and we hide a part of the checkerboard under a piece of paper (the partition :math:`J`). We then give :math:`x` to a function :math:`f` that produces a reconstruction of the input, :math:`y`. But as we've seen above, to make this reconstruction we do not necessarily need to see what is underneath the paper (:math:`x_J`) in order to make a good reconstruction (:math:`y_J`). We can therefore say that interpolating the checkerboard patterns is a :math:`J`-invariant operation.
+
+It would of course be a trivial exercise to predict :math:`y_J` if we had direct access to :math:`x_J`, which is basically the identity operation. In order to efficiently train a Deep Learning model to *not* learn the identity operation, we need to restrict the input of our model to the complement of :math:`J`, denoted by :math:`J^c`. In that way, the Deep Learning model needs to use the surroundings of :math:`x_J` to predict :math:`y_J`. Practically this is achieved through a masking operation :math:`\Pi_J(x)`, which sets all the values of :math:`x` outside of :math:`J` to zero. 
+
+As opposed to the original procedure adopted by Batson & Royer (2019), we train our Deep Learning model on batches of data, sampled from a larger dataset, and we try to optimise the model parameters by averaging the performance over an entire batch :math:`K`. Let :math:`f(\cdot | \theta)` denote the Deep Learning model parametrised by :math:`\theta`. The model input for the :math:`k`-th sample (:math:`k \in K`) is then :math:`u_k := \Pi_{J^c_k} \left( x_k \right)`, and its output is :math:`v_k := \Pi_{J_k} \left( f (u_k | \theta) \right)`. The training objective is then defined as:
+
+.. math::
+
+    \hat{\theta} = \arg \min \frac{1}{|K|} \sum_{k \in K} || v_k - \Pi_{J_k} \left(x_k \right)||^2
+
+While this precise training objective is a bit heavy on the notation, it says nothing more but "*find the model parameters* :math:`\theta` *that minimise the mean squared difference between* :math:`x_J` *and the prediction* :math:`y_J`*, without seeing* :math:`x_J` *directly*".
+
+
+
+Model architecture
+==================
+
+To describe the *jDAS* model architecture, we will need to (slightly) move away from the checkerboard analogy, in which the length scale of the correlations was the same along each dimension. In DAS data, however, the two dimensions represent time and space, and the correlations of interest have different wavelengths in each dimension. So instead of applying a square patch like in Fig. 1, we mask one waveform recorded at a random DAS channel by setting it to zero ("blanking"). The blanked DAS channel defines the partition :math:`J`, and so the target for the model is to predict the waveform in :math:`J` using only the neighbouring DAS channels (:math:`J^c`). In total we use a set of 11 consecutive channels, each 2048 time samples in length. The pretrained model provided in the GitHub repository was trained at a 50 Hz sampling rate, so 2048 samples corresponds to roughly 41 seconds in time.
+
+.. figure:: img/jDAS_architecture.png
+    :width: 100%
+    :align: center
+
+    Fig. 2: *jDAS* model architecture
+
+The Deep Learning model is based on the U-Net architecture (Ronneberger *et al.*, `2015; MICCAI <http://arxiv.org/abs/1505.04597>`_), and features a number of convolutional layers followed by anti-aliasing and resampling layers, as well as the skip connections that are the hallmark of U-Nets. Empirically we found that anti-aliasing before downsampling improves the model performance, possibly because the progressive downsampling brings equivalent Nyquist frequency way below the data frequency band (1-10 Hz). See Zhang (`2019; ICML <http://arxiv.org/abs/1904.11486>`_) for a detailed exhibition of internal anti-aliasing.
diff --git a/docs/source/img/jDAS_architecture.png b/docs/source/img/jDAS_architecture.png
diff --git a/docs/source/img/jDAS_concept.png b/docs/source/img/jDAS_concept.png
diff --git a/docs/source/img/jDAS_example.jpg b/docs/source/img/jDAS_example.jpg