Skip to content

Latest commit

 

History

History
117 lines (94 loc) · 3.39 KB

CHANGELOG.md

File metadata and controls

117 lines (94 loc) · 3.39 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

TBD

Fixed

  • Updated triton dependency [#418]
  • Fixed strides for QKV gradients for cutlass attention [#535]

Added

[0.0.12] - 2022-08-08

Fixed

  • Removed duplicated biases in the FusedMLP layers [#317]
  • Rotary embeddings respecting input types [#326]
  • Poolformer style instantiating useless projection layers [#349]
  • Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
  • Pass use_triton flag to LayerNorm module [#336]

Added

  • Four blocksparsity layouts from DeepSpeed [#320]
  • Support several initialization options [#312]
  • Conv2DFeedforward feedforward part [#321]
  • VisualAttention [#329]
  • Automatic blocksparse for causal attention [#334]
  • Better hierarchical transformer generation [#345]
  • Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
  • Refactor LRA code to use Pytorch Lightning [#343]

[0.0.11] - 2022-05-30

Fixed

  • Fix some torchscriptability [#246]
  • Fix FourierMix being compatible with AMP [#258]
  • Better asserts on QKV dimensions [#264]
  • Better perfs for FusedMLP and FusedLinearLayer [#283]
  • Deepnorm init missing self-attention [#284]

Added

  • Simplicial Embeddings [#259]
  • Mem efficient attention, FW pass [#267]
  • MHA benchmark
  • MLP benchmark
  • Move all triton kernels to triton v2 [#272]
  • Mem efficient attention, BW pass [#281]
  • Metaformer support [#294]

[0.0.10] - 2022-03-14

Fixed

  • Expose bias flag for feedforwards, same default as Timm [#220]
  • Update eps value for layernorm, same default as torch [#221]
  • PreNorm bugfix, only one input was normalized [#233]
  • Fix bug where embedding dimensions that did not match model dim would lead to a crash [#244]

Added

  • Add DeepNet (DeepNorm) residual path and init [#227]

[0.0.9] - 2022-02-09

Added

  • Compositional Attention [#41]
  • Experimental Ragged attention [#189]
  • Mixture of Experts [#181]
  • BlockSparseTensor [#202]
  • Nd-tensor support for triton softmax [#210]

Fixed

  • Bugfix Favor, single feature map [#183]
  • Sanity check blocksparse settings [#207]
  • Fixed some picklability [#204]

[0.0.8] - 2022-01-07

Fixed

  • Much faster fused dropout [#164]
  • Fused dropout repeatability [#173]

Added

  • Embedding weight tying option [#172]

[0.0.7] - 2021-11-30

Fixed

  • Dropout setting not properly passed in many attentions [#123]

[0.0.6] - 2021-11-24

Fixed

  • Fix self attention optimization not being triggered, broken residual path [#119]
  • Improve speed by not using contiguous Tensors when not needed [#119]

Added

  • Attention mask wrapper [#113]
  • ViT comparison benchmark [#117]

[0.0.4] - 2021-11-16

Fixed

  • Homogenizing the masks, additive or bool [#79][#85][#86]
  • Fix causality flag not being respected [#103]
  • Enabling FusedLayerNorm by default in the factory if Triton is available
  • Fixing Favor with fp16
  • Fixing Favor trainability

Added

  • Fused dropout/bias/activation layer [#58]
  • Fused layernorm used by default in the factory [#92]

[0.0.3] - 2021-11-01

Fixed

  • Nystrom causal attention [#75]

[0.0.2] - 2021-11-01

Fixed

  • More robust blocksparse [#24]

Added

  • Rotary embeddings [#32]
  • More flexible layernorm [#50]