Bug: Non-linear, very long index restore durations (python 3.10.12, usearch==2.16.0) #514

kennon · 2024-10-30T19:32:52Z

Describe the bug

With larger usearch index sizes, restore times become impractically long. For a range of different index sizes, restore durations range from ~10s for an 11GB / 6m embedding index up to 45m (!) for a 32GB / 20m embedding index. This happens with memory mapping on or off (i.e. view=True or view=False). During the entire load time, 1 cpu core is pegged out at 100%. After being loaded, index appears to behave normally.

We are running this on an ec2 instance with 64GB of ram, so the entire index should fit very comfortably in memory even with memory_map turned off. The index files are being loaded from ephemeral SSDs attached to the ec2 instance, so disk read time should not be a major factor.

We are running this inside of docker (ECS), however we have not experienced similar file load issues with other software (we use a variety of python and non-python libraries that involve loading large files from this same storage, regularly >= 100GB) so it seems unlikely to be something at the OS/docker level 🤷 (the ECS task has access to the full amount of memory)

Steps to reproduce

import usearch.index
index = usearch.index.Index.restore(path_to_index, view=True) # also happens with view=False

The index was built with usearch using all defaults, then saved to disk via index.save(index_path). Once loaded, index functions normally.

Expected behavior

We would expect a somewhat linear-ish relationship between index size / embedding count and load times.

Thank you for such an awesome project, we have fallen in love with usearch and hope we can figure this one out, which is currently blocking us from using it!

USearch version

v2.16.0

Operating System

Ubuntu 22.04 (dockerized ECS)

Hardware architecture

x86

Which interface are you using?

Python bindings

Contact Details

[email protected]

Are you open to being tagged as a contributor?

I am open to being mentioned in the project .git history as a contributor

Is there an existing issue for this?

I have searched the existing issues

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

kennon · 2024-10-30T19:34:40Z

An example index info for what we're building:

usearch.Index(ScalarKind.BF16 x 768, MetricKind.IP, multi: False, connectivity: 16, expansion: 128 & 64, 6,738,822 vectors in 5 levels, haswell hardware acceleration)

This one took ~10s to load, another one with ~20m vectors took 45 minutes.

ashvardanian · 2024-10-31T09:56:06Z

@kennon, interesting, looking into it!

kennon · 2024-10-31T11:43:45Z

@ashvardanian awesome, thanks! I don’t want to post a public url but if you drop me an email I can send you a link to the index files we are trying to load. Let me know if there is any more information I can provide, thanks!

kennon added the bug Something isn't working label Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Non-linear, very long index restore durations (python 3.10.12, usearch==2.16.0) #514

Bug: Non-linear, very long index restore durations (python 3.10.12, usearch==2.16.0) #514

kennon commented Oct 30, 2024 •

edited

Loading

kennon commented Oct 30, 2024

ashvardanian commented Oct 31, 2024

kennon commented Oct 31, 2024

Bug: Non-linear, very long index restore durations (python 3.10.12, usearch==2.16.0) #514

Bug: Non-linear, very long index restore durations (python 3.10.12, usearch==2.16.0) #514

Comments

kennon commented Oct 30, 2024 • edited Loading

Describe the bug

Steps to reproduce

Expected behavior

USearch version

Operating System

Hardware architecture

Which interface are you using?

Contact Details

Are you open to being tagged as a contributor?

Is there an existing issue for this?

Code of Conduct

kennon commented Oct 30, 2024

ashvardanian commented Oct 31, 2024

kennon commented Oct 31, 2024

kennon commented Oct 30, 2024 •

edited

Loading