At The Marshall Project, stories are edited in Google Docs. I wrote a quick tool to convert the HTML export from a Google Doc to Markdown. (Internally, our stories are stored as Markdown). Turns out, parsing CSS with regexes is not a great idea. This gem is the next iteration.
Here's the strategy:
- Inline the CSS for
font-weight: bold;
andfont-style: italic;
based on the.c01
(etc) classes with theroadie
gem. - Parse the inline styles into a hash of CSS properties with the
css_parser
gem. - Wrap the
<span>
with either a<strong>
or<em>
based on the CSS properties on it. A single<span>
may get wrapped multiple times if the text is both bold and italic, for example. Then remove all the<span>
s. - Pass this cleaned HTML to
kramdown
to yield markdown.
Add this line to your application's Gemfile:
gem 'googledoc_markdown', github: 'ivarvong/googledoc_markdown', tag: 'v0.1.1'
And then execute:
$ bundle
This gem is not stable and probably shouldn't be used yet. The spec might be useful.
require 'googledoc_markdown'
converter = GoogledocMarkdown::Converter.new(html: your_google_doc_html)
markdown = converter.to_markdown
After checking out the repo, run bin/setup
to install dependencies. Then, run guard
to run the tests.
Bug reports and pull requests are welcome on GitHub at https://github.com/ivarvong/googledoc_markdown.
The gem is available as open source under the terms of the MIT License.