Skip to content

Commit

Permalink
updated shortened md
Browse files Browse the repository at this point in the history
  • Loading branch information
dorchard committed Dec 3, 2024
1 parent c6eab0e commit aeac051
Showing 1 changed file with 28 additions and 131 deletions.
159 changes: 28 additions & 131 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,7 @@ and static analysis of Fortran source code. It provides an
interface to build other tools, e.g., for
static analysis, automated refactoring, verification, and compilation.
The library supports FORTRAN 66, FORTRAN 77, Fortran 90, Fortran 95,
some legacy extensions, and partially Fortran 2003, with
a shared Abstract Syntax Tree representation.
some legacy extensions, and partially Fortran 2003.
The library has been deployed in several language tool projects in academia and industry.

# Statement of need
Expand All @@ -65,7 +64,7 @@ As one of the oldest surviving programming languages [@backus1978history], Fortr
of legacy software, but is also used to write new software. Fortran remains a popular language in the international scientific community; @vanderbauwhede2022making reports data from 2016 on the UK's \`\`Archer'' supercomputer, showing the
vast majority of use being Fortran code. Fortran is
particularly notable for its prevalence in earth sciences, e.g., for
implementing global climate models that inform international policy
implementing climate models that inform international policy
decisions [@mendez2014climate]. In 2024, Fortran re-entered the Top 10 programming languages in
the [TIOBE Index](https://www.tiobe.com/tiobe-index/), showing its enduring popularity.
The continued use of Fortran, particularly in
Expand All @@ -77,8 +76,8 @@ I-IV, FORTRAN 66 and 77, Fortran 90, 95, 2003, 2008, etc.).
Newer standards often deprecate features
which are known to be a ready source of errors, or difficult to
specify or understand. However, compilers often support an amalgam of features across
standards (@urmaetal2014).
This enables developers to keep using deprecated features and mix
standards (@urmaetal2014),
enabling developers to keep using deprecated features and mix
language standards. This complicates the development of new tools for manipulating Fortran
source code; one must tame the weight of decades of language evolution.

Expand Down Expand Up @@ -107,126 +106,33 @@ tools for refactoring Fortran [@vanderbauwhede2022making]:
No comprehensive lexing, parsing, and analysis library was available from which to
build new tools.

# Functionality
# Functionality in brief

* Lexing and parsing of Fortran to an expressive Abstract Syntax Tree;
* Various static analyses, e.g., data flow analysis;
* Lexing (of both fixed and free form code) and parsing of Fortran to an expressive unified Abstract Syntax Tree;
* Static analyses, e.g., general data flow analysis including:
- Reaching definitions;
- Def-use/use-def;
- Constant evaluation;
- Constant propagation;
- Live variable analysis;
- Induction variable analysis.
* Type checking;
* Module graph analysis;
* Pretty printing;
* "Reprinting", or patching sections of source code without removing secondary
notation such as comments;
* "Reprinting" (patching sections of source code without removing secondary
notation such as comments);
* Exporting to JSON.

fortran-src is primarily a Haskell library but it also packages a command-line
tool. By exporting parsed code to JSON, the
parsing and analyses that fortran-src provides may be utilized by
non-Haskell tools.

The library's top-level module is `Language.Fortran`.

## Lexing and parsing

Static analysis of Fortran requires a choice in the lexing and parsing
front end: either to take the approach of many compilers, allowing an amalgam of features (e.g., gfortran with
its hand-written parser), or to
enforce language standards at the exclusion of some code that is
accepted by major compilers. fortran-src takes roughly the latter
approach, though it has an extended Fortran 77 mode for supporting
legacy extensions influenced by vendor-specific compilers popular in the past.

The Fortran language has evolved through two broad syntactic forms:

* _fixed source form_, used by FORTRAN 66 and FORTRAN 77 standards, where each
line of source code follows a strict format (motivated by its original use
with punched cards). The first 6 columns of a line are reserved for labels
and continuation markers. The character `C` in column 1 indicates a comment line
to be ignored by the compiler, else the line properly begins from column 7.

* _free source form_, first specified in Fortran 90 and subsequent standards, which has fewer restrictions on the line format and a different method
of encoding line continuations.

Therefore, two lexers are provided: the fixed form lexer, for handling earlier
versions of the language: FORTRAN 66 and FORTRAN 77 (and additional
`Legacy` and `Extended` modes), and the free form lexer, for Fortran
90 onwards.

fortran-src defines one parser per supported standard (grouped
under `Language.Fortran.Parser.Fixed` and `Language.Fortran.Parser.Free` depending
on the lexing form), plus a parser
for handling non-standard extended features.
Later standards such as Fortran 2003 are generally comparable to Fortran
90, but with additional syntactic constructs. The parser `gates' certain features by the language standard being parsed.

The lexers are auto-generated via the [`alex`](https://github.com/haskell/alex) tool.
The suite of parsers is automatically generated from
attribute grammar definitions in the Bison format, via the
[`happy`](https://github.com/haskell/happy) tool.
CPP (the C pre-processor) can be run prior to lexing or parsing.

## Unified Fortran AST

The parsers share a common abstract syntax tree (AST) representation (`Language.Fortran.AST`)
defined via mutually-recursive data
types. All such data types are _parametric data types_, parameterised by
the type of "annotations" that can be stored in the nodes of the
tree. For example, the top-level of the AST is the `ProgramFile a`
type, which comprises a list of `ProgramUnit a` values, parameterised
by the annotation type `a` (i.e., that is the generic type parameter).
The annotation facility is useful for collecting information about types within the tree nodes or flagging whether the particular node of the tree has been rewritten.

Some simple transformations are provided on ASTs:

* Grouping transformation, turning unstructured ASTs into structured ASTs;
(`Language.Fortran.Transformation.Grouping`);
* Disambiguation of array indexing vs. function calls (as they share
the same syntax in Fortran) (`Language.Fortran.Transformation.Disambiguation`),
and intrinsic calls from regular function calls,
(`Language.Fortran.Transformation.Disambiguation.Intrinsic`),
e.g.
`a(i)` is both the syntax for indexing array `a` at index `i` and
for calling a function named `a` with argument `i`;
* Fresh name transformation (obeying scoping) (`Language.Fortran.Analysis.Renaming`).

These transformations are applied to the AST following
parsing (with some slight permutations on the grouping transformations
depending on whether the code is FORTRAN 66 or not).

## Static analyses

Static analysis techniques available within fortran-src:

* Control-flow analysis (building a super graph) (`Language.Fortran.Analysis.BBlocks`);
* General data flow analyses (`Language.Fortran.Analysis.DataFlow`), including:
- Reaching definitions;
- Def-use/use-def;
- Constant evaluation;
- Constant propagation;
- Live variable analysis;
- Induction variable analysis.
* Type analysis (`Language.Fortran.Analysis.Types`);
* Module graph analysis (`Language.Fortran.Analysis.ModGraph`);

An abstract representation
is provided for evaluation of expressions and for semantic analysis
(`Language.Fortran.Repr`). Constant expression evaluation
(`Language.Fortran.Repr.Eval.Value`) leverages this representation
and enables some symbolic manipulation too, providing some partial evaluation.

Functionality and example usage of the tool and library is described in detail on the [fortran-src wiki](https://github.com/camfort/fortran-src/wiki/).
A demonstration of fortran-src for static analysis is provided
by a small demo tool which detects if an allocatable array is used
before it has been allocated.\footnote{\url{https://github.com/camfort/allocate-analysis-example}}

## Pretty printing, reprinting, and rewriting

A common feature of language tools is to generate source code.
We thus provide pretty printing features to generate textual source
code from the internal AST (`Language.Fortran.PrettyPrint`).

Furthermore, fortran-src provides a diff-like patching feature for
(unparsed) Fortran source code that accounts for the fixed-form style,
handling the fixed-form lexing of lines, and comments in its
application of patches (`Language.Fortran.Rewriter`). This aids development of refactoring tools.

# Work building on fortran-src

## CamFort
Expand All @@ -245,25 +151,21 @@ of which fortran-src is the core infrastructure.
CamFort provides automatic refactoring of
deprecated or error-prone programming patterns, with the goal of helping
to meet core quality requirements, such as maintainability
(@DBLP:conf/oopsla/OrchardR13). For example, it can rewrite
(@DBLP:conf/oopsla/OrchardR13). It can rewrite
EQUIVALENCE and COMMON blocks (both of which were deprecated in the
Fortran 90 standard) into more modern Fortran style.
Fortran 90 standard) into more modern style.

CamFort also provides code analysis and
lightweight verification tools (@contrastin2016lightning). Source-code
annotations (comments) provide specifications of certain aspects of a
program's meaning or behaviour. CamFort can then check that code
program's meaning or behaviour. CamFort can check that code
conforms to these specifications (and for some features can suggest places to insert specifications or infer specifications
from existing code). Facilities include: units-of-measure typing
(@DBLP:journals/corr/abs-2011-06094,@DBLP:journals/jocs/OrchardRO15,@danish2024incremental),
array access patterns (for capturing the shape of stencil computations)
(@orchard2017verifying), deductive reasoning via pre- and
post-conditions in Hoare logic style, and various code safety checks.

CamFort also provides an advanced rewriting alogrithm
that fuses a depth-first traversal of the AST with a textual diff algorithm
on the original source code, called "reprinting" (@clarke2017scrap).

CamFort has been previously
deployed at the Met Office, with its analysis tooling run on the Unified
Model (@walters2017met) to ensure internal code quality standards are met.
Expand All @@ -285,26 +187,21 @@ instead of variable names.

## Nonstandard INTEGER refactoring

fortran-src has been used to build other (closed
source) refactoring tools to help migration and improve the quality
of large legacy codebases, building on top of the library's AST, analysis, and
reprinting features.

One example of this has been an effort to fix a number of issues regarding the
use of integers used where logical types are expected. A tool was written
to refactor many expressions by using the fortran-vars typechecker to find
integer expressions and normalise them while flagging anything
potentially changing behaviour for further manual inspection. These might be situations
fortran-src has been used to build refactoring tools to help migration and improve the quality
of large legacy codebases. One example is an effort to fix issues around the
use of integers where logical types are expected. This tool
uses the typechecker to find
integer expressions which are then normalised while flagging anything
potentially changing behaviour for further manual inspection.
These might be situations
in which some code is hard to statically analyse but safe, or it may have uncovered an existing
bug. The tool uncovered many such bugs in a particular codebase during this effort, including several in the
form of the snippet above.

This effort, along with a number of others, allowed the team working at Bloomberg
(a subset of the authors here) to eventually migrate a
codebase from a legacy compiler to a modified GFortran, with no change in
behaviour. Ongoing efforts are using fortran-src to remove the patches on top of
GFortran, as well as to introduce interfaces for more robust type checking in this
code base.
behaviour.

# Project maintenance and documentation

Expand Down

0 comments on commit aeac051

Please sign in to comment.