Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

literal storage: 4x performance improvement using a hashmap #5131

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ronanj
Copy link

@ronanj ronanj commented Mar 20, 2024

This PR is used to improve the literal storage performance:

Currently, literals are stored using a linked list: The drawback of the linked list is that when the number of literal entries increases, the time it takes to add a new entry increases proportionally to the number of entries: The reason is that before adding new entries, the storage engine first needs to iterate over all existing entries to find out if the "new" entry is a duplicate of an existing one or a new entity.

When running on a 200 Mhz CPU with a heap stored in PSRAM, if 1000 entries are already in the literal storage linked list, then it takes almost 1 millisecond to add a new entry (eg 1 millisecond to iterate over all existing 1000 entries). This makes loading snapshots very slow, especially when many background modules have already been loaded (where each background module adds its literals). Empirically, we measured that it takes 200 milliseconds for jerry_exec_snapshot to load and execute a snapshot containing about 200 literals when 1000 literals are already in the list before loading (measured on 200Mhz CPU with heap in PSRAM).

The change introduces an additional hashmap to speed up the look-up of existing literals: This way, it is not necessary to iterate through the entire literal linked list (in O(n)) - and instead, an O(1) lookup operation can be used. Empirically, we have noticed that the application that took 200 milliseconds to load now takes 50 ms (measured from jerry_exec_snapshot), bringing a 4 times performance improvement.

This feature can be enabled using the flag JERRY_LIT_HASHMAP. The test configuration has been updated to include this feature flag. Unit and functional tests have successfully passed. Test has also been done on the Open Harmony Jerry script version.

JerryScript-DCO-1.0-Signed-off-by: Ronan Jezequel [email protected]

@ronanj ronanj changed the title literal storage: 4x performance improvement using an hashmap literal storage: 4x performance improvement using a hashmap Mar 20, 2024
JerryScript-DCO-1.0-Signed-off-by: Ronan Jezequel [email protected]
@ronanj ronanj marked this pull request as ready for review March 20, 2024 08:07
Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A 4x times speedup is pretty nice! Nice job!
The patch looks good. I am not sure I understand the "unlicense" license of the original work.

hashmap_uint32_t initial_capacity; /**< initial hashmap capacity */
} hashmap_create_options_t;

/// @brief Create a hashmap.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this type of comment is used in jerryscript


/*
The lit-hashmap library is based on the public domain software
available from https://github.com/sheredom/hashmap.h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license of https://github.com/sheredom/hashmap.h, the "Unlincense" is problematic: https://groups.google.com/g/unlicense/c/G6pzAFtgrx0/m/9TCxE-je2qMJ

Unlicense will not be reviewed by the OSI because it is a "crayon" licence (i.e. drafted by non legal professionals). Such licences have been problematic in the past.

Although Unlicense will not be reviewed, some (supposed) flaws have been
highlighted.
https://web.archive.org/web/20130421094711/http://projects.opensource.org/pipermail/license-review/2012-January/000052.html

In general the Apache or MIT licenses are recommended instead of various "public domain" licenses such as this Unlicense or CC0.

@ronanj
Copy link
Author

ronanj commented Mar 22, 2024

ok, let me rewrite the hashmap library, and at the same time use the standard comment syntax from jerryscript.

JerryScript-DCO-1.0-Signed-off-by: Ronan Jezequel [email protected]
@ronanj
Copy link
Author

ronanj commented Apr 29, 2024

New commit added, with hashmap completely written from scratch - and also with better performance.

@ronanj ronanj requested review from zherczeg and matetokodi April 29, 2024 09:11
@zherczeg
Copy link
Member

Looks like a nice patch! We check it further soon

@zherczeg
Copy link
Member

zherczeg commented May 3, 2024

It looks like tests/jerry/arithmetics.js fails. Do you know why?

Copy link
Member

@zherczeg zherczeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please squash the commits and add DCO to the commit message

@matetokodi
Copy link
Contributor

matetokodi commented Jun 3, 2024

@ronanj I apologize for the delay, the CI issues for OSX have been resolved, please squash and rebase your changes, and add DCO to the commit message.

Copy link
Contributor

@matetokodi matetokodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in my previous comment, the CI issues for OSX have been resolved. Please squash and rebase your changes, and add DCO to the commit message.

@LaszloLango
Copy link
Contributor

Close and reopen to re-trigger the CI

@LaszloLango LaszloLango reopened this Nov 13, 2024
@LaszloLango
Copy link
Contributor

@ronanj could you please rebase the PR? It will make the CI green (probably).

@lygstate
Copy link
Contributor

I have some idea about this, currently, the ecma_string_t is super complicated and have many issue about that, I'd like to take the idea Atom from quickjs to implement string and string in jerryscript, and Atom is Hash-Table based, so the performance issue addressed by this MR will also be resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants