Skip to content

friedelwolff/fmmap

Repository files navigation

fmmap -- A fast implementation of mmap

This module is a reimplementation of Python's builtin module mmap. It aims to provide better performance while being API compatible with the builtin module. Development tracks new Python versions, therefore this module is mostly usable as a backport to older Python versions -- consult the documentation about any changes to the mmap API in Python. You should be able to shadow the builtin module and forget about it.

Install on the command line:

pip install --upgrade fmmap

Import in Python under the name mmap:

import fmmap as mmap

Memory mapping is a technique of providing access to a file by mapping it into the virtual address space of the process and letting the operating system handle the input and output instead of explicitly reading from or writing to the file. It can provide better performance over normal file access in some cases. The builtin mmap module in Python exposes this functionality, but some of the implementation is not as fast as possible.

Summary of the project status:

The find() and rfind() functions in fmmap should be faster than the version in the standard library. These two functions also release the global interpreter lock (GIL) while searching, which might provide some benefit if you have multi-threaded code.

A number of features, bug fixes and API changes introduced in the standard library between Python 3.5 - Python 3.9 are supported in fmmap when running on older versions, notably:

  • The API of flush() works like Python > 3.7.
  • madvise() is implemented and most of the MADV_... constants are exposed.

Requirements and Assumptions

The following requirements are supported and tested:

  • Python versions: 3.4, 3.5, 3.6, 3.7, 3.8.
  • Interpreters: CPython.
  • Operating systems:
    • Linux
    • BSD systems (FreeBSD, NetBSD, OpenBSD)
    • SunOS/Solaris (illumos/OpenIndiana)

The following operating systems receive limited testing, but should work:

  • macOS
  • Windows

To implement the searching functionality, fmmap makes use of functions in the C library. The performance characteristics therefore are platform and version dependent. Recent versions of glibc is known to be very good. Some characteristics of your data can also influence performance. The performance of fmmap should be better than the built-in mmap module in most cases.

For non-Windows platforms fmmap currently assumes that your platform has an madvise(2) implementation and the header file <sys/mman.h>.

Credits and Resources

The code and tests in this project are based on the standard library's mmap module. Additional tests from the pypy project are also duplicated here which helped to identify a few bugs. Most functionality is just inherited from the current runtime. The rest is implemented in optimized Cython code.

Further reading on Wikipedia:

Contributing

  1. Clone this repository (git clone ...)
  2. Create a virtualenv
  3. Install package dependencies: pip install --upgrade pytest tox
  4. Install package in development mode: pip install -e .
  5. Change some code
  6. Generate the compiled module: cythonize src/fmmap.pyx
  7. Run the tests: in the project root simply execute pytest, and afterwards preferably tox to test the full test matrix. Consider installing as many supported interpreters as possible (having them in your PATH is often sufficient).
  8. Submit a pull request and check for any errors reported by the Continuous Integration service.

License

The MPL 2.0 License

Copyright (c) 2020 Friedel Wolff.