You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cmark-gfm is used by a number of apps that interface with text-generating Large Language Models (LLMs) (this one and this one are the ones I know). These models produce a few characters of Markdown every 200ms (on my machine) and cmark-gfm is used continuously to render the output text so far as Markdown. This is inefficient because (as far as I can tell) the entire generated Markdown has to be re-parsed from the beginning for every generated token, even though it has already been parsed except for the latest token.
cmark-gfm has a streaming interface of cmark_parser_feed and cmark_parser_finish but it seems like I need to call cmark_parser_finish every time I actually want to parse and I need to re-create a parser after that, I can't feed more tokens and re-parse. I would have expected there to be a way to cmark_parser_feed and then cmark_parser_parse and then doing cmark_parser_feed again, or a more complicated interface for editing the parse tree like tree-sitter has.
Also, while we're at it, the other issue is that the syntax isn't stable when it hasn't yet seen the entire input. Namely, a trailing single backtick ` should open a code block until the end of the line/input even if there's no closing backtick. The way it is now leads to jittering in the UI, where the UI first prints a backtick and a few seconds later removes it and re-renders everything after it in monospace when the LLM generates the closing backtick. This is also a problem for horizontal rules and bold/italic but definitely the latter isn't doable because many people use single * characters for multiplication.
The text was updated successfully, but these errors were encountered:
nerocui
pushed a commit
to nerocui/cmark-gfm
that referenced
this issue
May 29, 2024
cmark-gfm is used by a number of apps that interface with text-generating Large Language Models (LLMs) (this one and this one are the ones I know). These models produce a few characters of Markdown every 200ms (on my machine) and cmark-gfm is used continuously to render the output text so far as Markdown. This is inefficient because (as far as I can tell) the entire generated Markdown has to be re-parsed from the beginning for every generated token, even though it has already been parsed except for the latest token.
cmark-gfm has a streaming interface of
cmark_parser_feed
andcmark_parser_finish
but it seems like I need to callcmark_parser_finish
every time I actually want to parse and I need to re-create a parser after that, I can't feed more tokens and re-parse. I would have expected there to be a way tocmark_parser_feed
and thencmark_parser_parse
and then doingcmark_parser_feed
again, or a more complicated interface for editing the parse tree like tree-sitter has.Also, while we're at it, the other issue is that the syntax isn't stable when it hasn't yet seen the entire input. Namely, a trailing single backtick ` should open a code block until the end of the line/input even if there's no closing backtick. The way it is now leads to jittering in the UI, where the UI first prints a backtick and a few seconds later removes it and re-renders everything after it in monospace when the LLM generates the closing backtick. This is also a problem for horizontal rules and bold/italic but definitely the latter isn't doable because many people use single
*
characters for multiplication.The text was updated successfully, but these errors were encountered: