A simple implementation of sparse bitsets for non-negative integers.
The representation is very simple, and uses a sequence of (offset, bits) pairs. It is similar to that of Go's x/tools/container/intsets
and Java's java.util.BitSet
. However, Go's package caters to negative integers as well, which I do not need.
The original motivation for sparsebitset
comes from a need to store custom indexes of documents in a database. Accordingly, sparsebitset
trades CPU time for space.
sparsebitset
may be useful for sparse sets for which the space overhead of bitset
(BitSet) may be high.
For purely in-memory operations, when adequate memory is available, bitset
usually performs much better than sparsebitset
. In particular, when data is dense or periodic with a short period, bitset
is a better choice. Please test with your real life data - in reasonable volumes - before choosing the package that is more suitable for you.
sparsebitset
tries to provide an API that is mostly similar to that of bitset
. Users can switch between the two implementations with a little effort, based on the evolution (either dense --> sparse or sparse --> dense) of their data.
Here are a few differences to note.
sparsebitset
operates withuint64
rather thanuint
almost everywhere. This makes several parts of the code uniform.sparsebitset
does not panic. Upon encountering errors, it returnsnil
where*BitSet
is expected. Elsewhere, it returns an additionalerror
value that must be checked.- A few methods are not implemented. Examples include JSON (de)serialisation methods.
The tests have been adopted from those for bitset
, and modified appropriately to account for the small API differences. Therefore, the tests are governed by the license of bitset
.
sparsebitset
has no external dependencies.
go get -v 'github.com/js-ojus/sparsebitset'
sparsebitset
passes all applicable tests of bitset
, and is hence reasonably correct. Please report an issue, should you encounter any incorrect behaviour.
sparsebitset
has not been optimised in any way, yet. All help is highly appreciated!
Please see the tests for several examples of usage. Here is a quick example.
a := New(100)
b := New(200)
for i := uint64(1); i < 100; i += 2 {
a.Set(i)
b.Set(i - 1)
}
for i := uint64(100); i < 200; i++ {
b.Set(i)
}
na, _ := a.UnionCardinality(b)
if na != 200 {
t.Errorf("Union should have 200 bits set, but had %d", na)
}
nb, _ := b.UnionCardinality(a)
if na != nb {
t.Errorf("Union should be symmetric")
}
c := a.Union(b)
d := b.Union(a)
if c.Count() != 200 {
t.Errorf("Union should have 200 bits set, but had %d", c.Count())
}
if !c.Equal(d) {
t.Errorf("Union should be symmetric")
}
Here is another.
a := New(500)
b := New(300)
c := New(200)
// Setup bitsets
// a and b overlap
// only c is (strict) super set
for i := uint64(0); i < 100; i++ {
a.Set(i)
}
for i := uint64(50); i < 150; i++ {
b.Set(i)
}
for i := uint64(0); i < 200; i++ {
c.Set(i)
}
if a.IsSuperSet(b) == true {
t.Errorf("IsSuperSet fails")
}
if a.IsSuperSet(c) == true {
t.Errorf("IsSuperSet fails")
}
if b.IsSuperSet(a) == true {
t.Errorf("IsSuperSet fails")
}
if b.IsSuperSet(c) == true {
t.Errorf("IsSuperSet fails")
}
if c.IsSuperSet(a) != true {
t.Errorf("IsSuperSet fails")
}
if c.IsSuperSet(b) != true {
t.Errorf("IsSuperSet fails")
}
if a.IsStrictSuperSet(b) == true {
t.Errorf("IsStrictSuperSet fails")
}
if a.IsStrictSuperSet(c) == true {
t.Errorf("IsStrictSuperSet fails")
}
if b.IsStrictSuperSet(a) == true {
t.Errorf("IsStrictSuperSet fails")
}
if b.IsStrictSuperSet(c) == true {
t.Errorf("IsStrictSuperSet fails")
}
if c.IsStrictSuperSet(a) != true {
t.Errorf("IsStrictSuperSet fails")
}
if c.IsStrictSuperSet(b) != true {
t.Errorf("IsStrictSuperSet fails")
}