A CLI and Python module to parse Markdown and Mdx
pip install git+https://github.com/11/touchdown@main
Touchdown has the capability to parse Markdown files into HTML
, or into an alternative web-friendly JSON
format. This alternative JSON
format is useful for non-static websites, creating your own website build tools, or in situations where markup is asynchronously added to a page.
# Parse Markdown files into HTML
touchdown blog.md
touchdown --output=HTML blog.md
# Write HTML output to file
touchdown blog.md > blog.html
# Parse Markdown to JSON
touchdown --output=JSON blog.md
touchdown --output=JSON blog.md > blog.json
# Write JSON output to file
touchdown --output=JSON blog.md > blog.json
Touchdown will automatically sanatize your Markdown. This in turn means that syntax errors are raised while parsing.
As an example, if you were to try parse the following markdown:
* This text should be bold
In this scenario, Touchdown would throw an error because special characters (bold, italic, strikethrough, math, code, etc.) require a closing character. When an error occurs, Touchdown will print an error message to stderror
MarkdownSyntaxError - "File example.md": line 1
`*` does not have a matching closing character
Touchdown is also a python module that you can use in your own projects.
from pathlib import Path
from touchdown import (
to_html,
to_dict,
to_json,
MarkdownSyntaxError,
)
blog = Path('./blogs/blog.md')
try:
blog_html = to_html(blog) # parses blod.md into a string of HTML
blog_dict = to_dict(blog) # parses blog.md into a web-friendly dictionary format
blog_json = to_json(blog) # parses blog.md into a web-friendly dictionary format, but then returns the result as a JSON string
except MarkdownSyntaxError as md_err:
print(md_err)
As it is common in programs that parse text, Touchdown creates an intermediate format of the Markdown it parses. This intermediate format takes on the structure of an abstract syntax tree. This abstract syntax tree was designed to be easily parsable by anyone wanting more control over the parsing process.
from pathlib import Path
from touchdown import (
to_ast,
MarkdownSyntaxError,
)
blog = Path('./blogs/blog.md')
try:
ast = to_ast(blog)
# The abstract syntax tree object is an iterable object.
# This was an intentional design choice so users could easily
# iterate through each node in the tree tree without having
# to understand the details of the abstract syntax tree structure.
for token in ast:
if token['type'] == 'header':
token['type'] == 'paragraph'
token['tag'] = 'p'
except MarkdownSyntaxError as md_err:
print(md_error)
Touchdown's Markdown syntax comes in 2 flavors depending on the file extension you use with your Markdown:
.md
: If you use the.md
extension, Touchdown will support the generic syntax spec.mdx
: If you use the.mdx
extension, Touchdown will use its custom extended syntax spec