Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk update extras of Nodes #6587

Open
GeigerJ2 opened this issue Oct 22, 2024 · 1 comment
Open

Bulk update extras of Nodes #6587

GeigerJ2 opened this issue Oct 22, 2024 · 1 comment

Comments

@GeigerJ2
Copy link
Contributor

As mentioned by @giovannipizzi, it would be nice if bulk-updating Node extras could be achieved using, e.g., a dictionary of the form:

{
    <UUID>: {
        'extra_key1': <VALUE>,
        'extra_key2': <VALUE>,
    },
    <UUID>: ...
}

Currently, the various storage backends already implement bulk_insert and bulk_update methods, which are being used when importing an archive. Using the existing bulk_update method, the following works to update replace the extras:

storage_backend.bulk_update(
    EntityTypes.NODE,
    [
        {
            # "uuid": "da44c605-70ee-4a24-9844-045c47fbebd9",
            'id': 1,
            "extras": {"a": 1, "b": 2, "c": 3},
        },
        {
            # "uuid": "56e619b0-f894-4417-bb85-aa430f4bcacb",
            'id': 2,
            "extras": {"a": 4, "b": 5, "c": 6},
        }
    ],
)

However, in the current implementation, node selection only works using the id (probably for efficiency reasons), and previous extras, e.g., _aiida_hash would be overwritten, so it doesn't function in the way of an extend feature, which is usually what one wants. One could add an additional, specific method for updating the node extras, or extend the current one. Though, care has to be taken to keep it efficient, and not iterate again over individual nodes in the implementation, possibly slowing down things.

As all other Node properties should be immutable once stored, I currently cannot think of other modifications than changing the extras which would be interesting in bulk for nodes.

@rabbull
Copy link
Contributor

rabbull commented Dec 5, 2024

As discussed with @GeigerJ2, there are two problems regarding bulk updates, and the following are our plans to address them:

  1. Bulk updates currently only support selecting nodes with the primary key.
    This limitation might not be a significant issue, as users typically already have access to the id beforehand. Given that the transmission cost to the database is generally higher than looping through an array in memory to gather ids, the current implementation may suffice. While it would be beneficial to support this feature, it is acceptable to deprioritize or abandon it if it proves too time-consuming to implement. I will explore this further after addressing the second problem.

  2. Expanding vs. Overwriting JSON fields.
    Users are more likely to expect expand-ing the existing JSON object (particularly the extra field as other JSON fields are mostly immutable in storage) rather than overwrite-ing it entirely. To maintain compatibility with the current version and allow for key deletion in JSON fields, the plan is to introduce a flag in the bulk_update method. This flag will indicate whether JSON fields should be extended or overwritten.

    • For PostgreSQL, this can be implemented using the || operator or jsonb_concat.
    • For SQLite, json_patch will be used.

    However, note that there are slight disparities between json_patch and jsonb_concat. For instance, in json_patch, assigning a null value to a key will remove it, while jsonb_concat retains the key with a null value. These differences need to be carefully considered and accounted for in the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants