Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readthedocs documentation #160

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
build
_static
_templates
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = SmartDispatch
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14 changes: 14 additions & 0 deletions docs/source/autoresume.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
====================
Automatic resumption
====================

Oftentimes, there is a hard limit on maximum amount of time you can run your
job for on the cluster (we refer to it as **walltime**). Smart Dispatch allows you
to partially overcome that and run your tasks for longer periods. This is done
by enchancing generated PBS files with additional code that reschedules your
tasks as soon as they hit the walltime. The caveat here is that your tasks
**must be resumable**, i.e. be capable of restoring their state after being
killed and rerun.

You can engage the autoresumption by passing ``-m`` or ``--autoresume`` during
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-m -> -r

``smart-dispatch`` execution. See :doc:`usage` for details.
156 changes: 156 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# -*- coding: utf-8 -*-
#
# Smart Dispatch documentation build configuration file, created by
# sphinx-quickstart on Fri Feb 17 15:44:10 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- General configuration ------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc', 'sphinxcontrib.autoprogram']

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'

# The master toctree document.
master_doc = 'index'

# General information about the project.
project = u'Smart Dispatch'
copyright = u'2017, Stanislas Lauly, Marc-Alexandre Côté, Mathieu Germain'
author = u'Stanislas Lauly, Marc-Alexandre Côté, Mathieu Germain'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = u'2.0'
# The full version, including alpha/beta/rc tags.
release = u'2.0.1'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = []

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False


# -- Options for HTML output ----------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
# html_theme = 'alabaster'

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']


# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = 'SmartDispatchdoc'


# -- Options for LaTeX output ---------------------------------------------

latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',

# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',

# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',

# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'SmartDispatch.tex', u'Smart Dispatch Documentation',
u'Stanislas Lauly, Marc-Alexandre Côté, Mathieu Germain', 'manual'),
]


# -- Options for manual page output ---------------------------------------

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'smartdispatch', u'Smart Dispatch Documentation',
[author], 1)
]


# -- Options for Texinfo output -------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'SmartDispatch', u'Smart Dispatch Documentation',
author, 'SmartDispatch', 'One line description of project.',
'Miscellaneous'),
]



62 changes: 62 additions & 0 deletions docs/source/examples.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
========
Examples
========

Launch job
----------

::

smart-dispatch -q qtest@mp2 launch python my_script.py 2 80 tanh 0.1

Will launch ``python my_script.py 2 80 tanh 0.1`` on the queue ``qtest@mp2``.


Launch batch of jobs
--------------------

Automatically generate commands from combinations of arguments. ::

smart-dispatch -q qtest@mp2 launch python my_script.py [1 2] 80 [tanh sigmoid] 0.1

Will generate 4 different commands and launch them on the queue ``qtest@mp2``: ::

python my_script.py 1 80 sigmoid 0.1
python my_script.py 1 80 tanh 0.1
python my_script.py 2 80 sigmoid 0.1
python my_script.py 2 80 tanh 0.1


Another possiblility is to generate argument from a range. ::

smart-dispatch -q qtest@mp2 launch python my_script.py [1:4]

Will generate: ::

python my_script.py 1
python my_script.py 2
python my_script.py 3

You can also add a step size to the range as the 3rd argument. ::

smart-dispatch -q qtest@mp2 launch python my_script.py [1:10:2]

Will generate: ::

python my_script.py 1
python my_script.py 3
python my_script.py 5
python my_script.py 7
python my_script.py 9


Resuming job/batch
------------------

::

smart-dispatch -q qtest@mp2 resume {batch_id}

Jobs that did not terminate properly, for example, it exceeded the walltime, can be resumed using the ``{batch_id}`` given to you upon launch. Of course, all this assuming your script is resumable.

*Note: Jobs are always in a batch, even if it's a batch of one.*
40 changes: 40 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
Welcome to Smart Dispatch's documentation!
==========================================

Smart Dispatch is an easy to use job launcher for supercomputers with PBS compatible job manager.

Features
--------
* Launch multiple jobs with a single line.
* Automatically generate combinations of arguments. See :doc:`examples`.
* Automatic resources management. Determine for you the optimal fit for your commands on nodes.
* Resume batch of commands.
* Easily manage logs.
* Advanced mode for complete control.
* Use automatic rescheduling of jobs that hit the walltime. See :doc:`autoresume`.


Installing
----------

Use ``pip`` package manager: ::

pip install git+https://github.com/SMART-Lab/smartdispatch


Contents
--------

.. toctree::
:maxdepth: 2

usage
examples
autoresume


Indices and tables
==================

* :ref:`genindex`
* :ref:`search`
76 changes: 76 additions & 0 deletions docs/source/usage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
=====
Usage
=====

.. autoprogram:: scripts/smart-dispatch:get_parser()
:prog: smart-dispatch


Hierarchy of generated files
----------------------------

In order to understand the contents of the generated folders/files, it's good to know how ``smart-dispatch`` deals with **commands** that a user requests to launch on the cluster:

* Each invokation of ``smart-dispatch`` creates a so-called **batch** of **jobs**. Smart Dispatch will do its best to create as many simultaneous jobs so as to effecitvely utilze the allocated resources.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invokation -> invocation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stumble each time I try to read the second sentence. 😟

What about something like this?

Smart Dispatch will distribute commands to jobs such that each of the latter uses an entire node. Jobs may run many commands concurrently if necessary to use a maximum number of cores and GPUs. The distribution is based on number of cores per node / per command and number of GPUs per node / per command.

* Each job is basically a single PBS file that is run by the queue management system on the cluster (either ``msub`` or ``qsub``).
* A job spawns mulitple concurrent **workers** that all cooperate to execute the requested commands.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mulitple -> multiple

Copy link
Collaborator

@bouthilx bouthilx Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: You will need to adapt this line based on how you change line 14. (See comment)

* Each worker (basically, a python script) is executing commands sequentially.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the worker or the command that is basically a python script? If the prior, I don't understand why it needs to be specified.


A typical hierarchy of ``./SMART_DISPATCH_LOGS/{batch_id}/`` should look like this: ::

commands/
job_commands_0.sh
job_commands_1.sh
...
commands.txt
running_commands.txt
failed_commands.txt
logs/
job/
150472.gpu-srv1.helios.err
150472.gpu-srv1.helios.out
...
worker/
150472.gpu-srv1.helios_worker_0.e
150472.gpu-srv1.helios_worker_0.o
150472.gpu-srv1.helios_worker_1.e
150472.gpu-srv1.helios_worker_1.o
...
4d501b8b9805796ee913153e2493d7069a8bfb1aa469a50279940752bf26c935.err
4d501b8b9805796ee913153e2493d7069a8bfb1aa469a50279940752bf26c935.out
...
command_line.log
jobs_id.txt

The root directory contains two files:

``command_line.log``:
A full command that was used to invoke ``smart-dispatch``.
``jobs_id.txt``:
A list of job IDs being run.

Now let's go through the subdirectories.


``commands/``
^^^^^^^^^^^^^

This directory holds generated PBS files (``job_commands_{pbs_index}.sh``) as well as three command lists:

``commands.txt``:
A list pending commands (this is where the workers are taking their next commands to execute from).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list pending -> list of pending

``running_commands.txt``:
A list of currently running commands.
``failed_commands.txt``:
A list of failed commands.


``logs/``
^^^^^^^^^

Output and error logs in are saved in this directory. The root level contains logs for actual commands. There are also two additional subfolder:
Copy link
Member

@MarcCote MarcCote Mar 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in are -> are

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subfolder -> subfolders


``job/``:
Holds logs for the PBS files.
``worker/``:
Holds logs for workers.
7 changes: 6 additions & 1 deletion scripts/smart-dispatch
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def main():
print "\nLogs, command, and jobs id related to this batch will be in:\n {smartdispatch_folder}".format(smartdispatch_folder=path_job)


def parse_arguments():
def get_parser():
parser = argparse.ArgumentParser()
parser.add_argument('-q', '--queueName', required=True, help='Queue used (ex: qwork@mp2, qfat256@mp2, gpu_1)')
parser.add_argument('-n', '--batchName', required=False, help='The name of the batch. Default: The commands launched.')
Expand Down Expand Up @@ -215,6 +215,11 @@ def parse_arguments():
resume_parser.add_argument('--expandPool', type=int, nargs='?', const=sys.maxsize, help='Add workers to the given batch. Default: # pending jobs.')
resume_parser.add_argument("batch_uid", help="Batch UID of the jobs to resume.")

return parser


def parse_arguments():
parser = get_parser()
args = parser.parse_args()

# Check for invalid arguments in
Expand Down
Loading