Skip to content

Latest commit

 

History

History
204 lines (156 loc) · 6.29 KB

example.md

File metadata and controls

204 lines (156 loc) · 6.29 KB

Notedown example

This is an example of a notebook generated from markdown using notedown. The markdown source is on github and the generated output can be viewed using nbviewer.

Try opening example.ipynb as a notebook and running the cells to see how notedown works.

ipython notebook example.ipynb 

Notedown is simple. It converts markdown to the IPython notebook JSON format. It does this by splitting the markdown source into code blocks and not-code blocks and then creates a notebook using these as the input for the new cells (code and markdown).

We make use of the IPython api:

import re
import sys
import argparse

from IPython.nbformat.v3.rwbase import NotebookReader
from IPython.nbformat.v3.nbjson import JSONWriter

import IPython.nbformat.v3.nbbase as nbbase

We create a new class MarkdownReader that inherits from NotebookReader. The only requirement on this new class is that it has a .reads(self, s, **kwargs) method that returns a notebook JSON string.

We search for code blocks using regular expressions, making use of named groups:

fenced_regex = r"""
\n*                     # any number of newlines followed by
^(?P<fence>`{3,}|~{3,}) # a line starting with a fence of 3 or more ` or ~
(?P<language>           # followed by the group 'language',
[\w+-]*)                # a word of alphanumerics, _, - or +
[ ]*                    # followed by spaces
(?P<options>.*)         # followed by any text
\n                      # followed by a newline
(?P<content>            # start a group 'content'
[\s\S]*?)               # that includes anything
\n(?P=fence)$           # up until the same fence that we started with
\n*                     # followed by any number of newlines
"""

# indented code
indented_regex = r"""
\n*                        # any number of newlines
(?P<icontent>              # start group 'icontent'
(?P<indent>^([ ]{4,}|\t))  # an indent of at least four spaces or one tab
[\s\S]*?)                  # any code
\n*                        # any number of newlines
^(?!(?P=indent))           # stop when there is a line without at least
                            # the indent of the first one
"""

code_regex = r"({}|{})".format(fenced_regex, indented_regex)
code_pattern = re.compile(code_regex, re.MULTILINE | re.VERBOSE)

Then say we have some input text:

text = u"""### Create IPython Notebooks from markdown

This is a simple tool to convert markdown with code into an IPython
Notebook.

Usage:

```
notedown input.md > output.ipynb
```


It is really simple and separates your markdown into code and not
code. Code goes into code cells, not-code goes into markdown cells.

Installation:

    pip install notedown
"""

We can parse out the code block matches with

code_matches = [m for m in code_pattern.finditer(text)]

Each of the matches has a start() and end() method telling us the position in the string where each code block starts and finishes. We use this and do some rearranging to get a list of all of the blocks (text and code) in the order in which they appear in the text:

code = u'code'
markdown = u'markdown'
python = u'python'

def pre_process_code_block(block):
    """Preprocess the content of a code block, modifying the code
    block in place.

    Remove indentation and do magic with the cell language
    if applicable.
    """
    # homogenise content attribute of fenced and indented blocks
    block['content'] = block.get('content') or block['icontent']

    # dedent indented code blocks
    if 'indent' in block and block['indent']:
        indent = r"^" + block['indent']
        content = block['content'].splitlines()
        dedented = [re.sub(indent, '', line) for line in content]
        block['content'] = '\n'.join(dedented)

    # alternate descriptions for python code
    python_aliases = ['python', 'py', '', None]
    # ensure one identifier for python code
    if 'language' in block and block['language'] in python_aliases:
        block['language'] = python

    # add alternate language execution magic
    if 'language' in block and block['language'] != python:
        code_magic = "%%{}\n".format(block['language'])
        block['content'] = code_magic + block['content']

# determine where the limits of the non code bits are
# based on the code block edges
text_starts = [0] + [m.end() for m in code_matches]
text_stops = [m.start() for m in code_matches] + [len(text)]
text_limits = zip(text_starts, text_stops)

# list of the groups from the code blocks
code_blocks = [m.groupdict() for m in code_matches]
# update with a type field
code_blocks = [dict(d.items() + [('type', code)]) for d in
                                                            code_blocks]

# remove indents, add code magic, etc.
map(pre_process_code_block, code_blocks)

text_blocks = [{'content': text[i:j], 'type': markdown} for i, j
                                                   in text_limits]

# create a list of the right length
all_blocks = range(len(text_blocks) + len(code_blocks))

# cells must alternate in order
all_blocks[::2] = text_blocks
all_blocks[1::2] = code_blocks

# remove possible empty first, last text cells
all_blocks = [cell for cell in all_blocks if cell['content']]

Once we've done that it is easy to convert the blocks into IPython notebook cells and create a new notebook:

cells = []
for block in all_blocks:
    if block['type'] == code:
        kwargs = {'input': block['content'],
                  'language': block['language']}

        code_cell = nbbase.new_code_cell(**kwargs)
        cells.append(code_cell)

    elif block['type'] == markdown:
        kwargs = {'cell_type': block['type'],
                  'source': block['content']}

        markdown_cell = nbbase.new_text_cell(**kwargs)
        cells.append(markdown_cell)

    else:
        raise NotImplementedError("{} is not supported as a cell"
                                    "type".format(block['type']))

ws = nbbase.new_worksheet(cells=cells)
nb = nbbase.new_notebook(worksheets=[ws])

JSONWriter gives us nicely formatted JSON output:

writer = JSONWriter()
print writer.writes(nb)