Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Zero-downtime full re-index of Elasticsearch #161

Open
Nek- opened this issue May 3, 2023 · 1 comment
Open

[Feature] Zero-downtime full re-index of Elasticsearch #161

Nek- opened this issue May 3, 2023 · 1 comment
Labels
help wanted Extra attention is needed

Comments

@Nek-
Copy link
Contributor

Nek- commented May 3, 2023

A common need while working with ES is to rebuild the index from scratch.

Let's take an example to highlight an issue. I have an index of Products. My products database moves a lot, for stock updates or creation of new products/disabling some others: it changes, a lot. I have 1 000 000 products in my database.
I want to rebuild my products index: this process will take a while, and in the meantime, I need to be consistent with updates on the new and the old index.

To stay consistent, it would be convenient if the IndexationRequest could update many indexes simultaneously, with an extra argument it would be possible. Something like this:

$request = new IndexationRequest($document, ['indexes' => [
    $liveIndex->getName(),
    $newIndex->getName()
]);
@damienalexandre
Copy link
Member

Hello 👋

As much as I know, there are no bulletproof method at the moment to rebuild an entire index while keeping updates running in the background.

I have two great source of knowledge when it come to tacking reindexing without downtime:

As you can see, there are pros and cons to each method, and every application is different.

What I did one time was to tweak the Indexer to ALWAYS push to a Schrödinger alias next_index and ignore the 404, that way all my updates where always sent to the new index I was building. But that was working only because I didn't to DELETE, and didn't care about performances.

I would love for Elastically to provide an actual implementation of a real zero downtime reindex!

In the meantime, about your suggestion - how does your application knows about the $newIndex? For example if I update a Product from the Admin? 🤔 The new index would have to be stored somewhere?

One alternative proposal could be to tell the Indexer "there is a new Index in progress", and tweak

public function flush(): ?Bulk\ResponseSet
to add the new Index in all the Bulk operations 😋

@damienalexandre damienalexandre changed the title RFC: specify array of index names in IndexationRequest message [Feature] Zero-downtime full re-index of Elasticsearch Nov 13, 2024
@damienalexandre damienalexandre added the help wanted Extra attention is needed label Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants