The milestone was to eliminate the lexer dependency from libdparse by replacing libdparse.lexer
with dmd.tokens
and dmd.lexer
across the dfmt codebase.
Upon realising that the rewrite would not be a simple one-to-one port, I used the opportunity to explore how dfmt can be designed better with the DMD library. Currently, dfmt roughly works as follows:
- Lex the source code and store the token stream
- Build the AST and walk it, storing location information for different types of nodes
- Traverse the token stream and format it, occasionally using the location information to make decisions
While this works, it requires storing both the token stream and the AST in memory to be able to format the code. I looked into two different approaches:
- dfmt does not perform any semantic analysis or use any of the data that the AST provides. Validating the correctness of the code being formatted (whether it can generate a valid AST or not) is not the responsibility of the formatter. So, we might be able to perform the formatting passes purely using the token stream from the lexer. Upon researching deeper into this approach, I decided to not pursue it due to poor error handling, lack of semantic information, and poor future-proofing in case we want to add context-sensitive format rules going forward.
- By using the AST built with DMD, we can walk the nodes in preorder and format them without having the store the lexed token stream in memory. The location information would also not be required, since the context of a particular node can be identified by climbing up the tree with the node's parents. This is the approach currently being used to implement the formatting passes.
I read a lot of articles and forum posts, but the two most insightful ones are below:
The first step in doing this was to introduce the DMD library as a dependency of dfmt in dub with the dmd package. Then, the correct compile commands had to be added to the Makefile to compile the necessary library files and expose all the features provided in the library. A number of these features are gated behind the DMDLIB
version flag, which was identified and added to the compile commands later. Following this, the rewrite mainly involved replacing all instances of libdparse.lexer.tok
with dmd.tokens.TOK
, and also required a change in DMD to mark a utility function with pure @nogc
. Adding these attributes enabled us to retain the existing attributes on most of the functions in dfmt.
- chore(deps): add dmd library
- refactor: lexer in
tokens.d
- refactor: lexer in
indentation.d
andwrapping.d
- feat: make it compile with dmd AST
The next milestone involves two components:
- Eliminate the parser dependency from libdparse and completely move to the DMD library
- Implement 4 transformation passes using the DMD AST