Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added support for incremental download of translog files #16204

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rawahars
Copy link

@rawahars rawahars commented Oct 7, 2024

Description

Presently, the download workflow for remote backed storage works in a manner that causes the download of the same translog files multiple times, each time deleting all the older files before downloading them again. This causes significant wasted network bandwidth, along with the time taken for the shard to become active.

This change adds support for downloading the translog files incrementally and omitting the same if they are present locally.

Implementation

The key changes include-

Incremental Download Logic in RemoteFsTranslog:

  • The download logic now checks if the local translog files are present and have the same checksum as the remote files.
    • If the local and remote checksums match, the download is skipped for that generation.
    • If the checksums do not match, the download is performed for that generation.

Translog Footer Handling:

  • The TranslogFooter class has been added to handle the writing and reading of the translog footer, which contains the checksum of the translog data.
  • The footer is written when the TranslogWriter is closed, and the checksum is stored in the TranslogReader.
  • The TranslogReader is updated to handle reading the footer and ensuring that read requests do not overlap with the footer.

Generation-to-Checksum Mapping in TranslogTransferMetadata:

  • The TranslogTransferMetadata now includes a mapping between the translog generation and the corresponding checksum.
  • This mapping is used to compare the local and remote checksums during the incremental download process.
  • The TranslogTransferMetadataHandler is updated to read and write this generation-to-checksum mapping.

Testing

Manual Testing

Manually tested failover and restore workflow on a 3-node EC2 cluster.
Also, tested backward compatibility wherein initial cluster was created and indexed with older OpenSearch process. It was then switched to incremental download.

Newly added tests

  • The RemoteFsTranslogTests class has been updated to include new test cases for the incremental download logic:
    • testIncrementalDownloadWithMatchingChecksum tests the scenario where the local and remote translog files have the same checksum, and the download is skipped.
    • testIncrementalDownloadWithDifferentChecksum tests the scenario where the local and remote translog files have different checksums, and only the missing generation is downloaded.
  • The TranslogFooterTests class has been added to verify the functionality of the TranslogFooter class, including writing and reading the footer.
  • The TranslogTransferMetadataHandlerTests class has been updated to include test cases for handling the generation-to-checksum map in the TranslogTransferMetadata.

Related Issues

One of the optimisations for #15277

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Oct 7, 2024

❌ Gradle check result for cd08d89: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Oct 7, 2024

✅ Gradle check result for f17a378: SUCCESS

Copy link

codecov bot commented Oct 7, 2024

Codecov Report

Attention: Patch coverage is 81.95489% with 24 lines in your changes missing coverage. Please review.

Project coverage is 72.08%. Comparing base (35c366d) to head (1211703).
Report is 142 commits behind head on main.

Files with missing lines Patch % Lines
.../org/opensearch/index/translog/TranslogFooter.java 70.96% 6 Missing and 3 partials ⚠️
.../org/opensearch/index/translog/TranslogWriter.java 73.68% 2 Missing and 3 partials ⚠️
...rg/opensearch/index/translog/RemoteFsTranslog.java 84.00% 3 Missing and 1 partial ⚠️
.../org/opensearch/index/translog/TranslogReader.java 76.47% 1 Missing and 3 partials ⚠️
...n/java/org/opensearch/index/translog/Translog.java 75.00% 1 Missing ⚠️
...dex/translog/transfer/TranslogTransferManager.java 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16204      +/-   ##
============================================
- Coverage     72.10%   72.08%   -0.03%     
+ Complexity    64862    64858       -4     
============================================
  Files          5307     5308       +1     
  Lines        302606   302720     +114     
  Branches      43717    43734      +17     
============================================
+ Hits         218208   218228      +20     
- Misses        66541    66542       +1     
- Partials      17857    17950      +93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rawahars rawahars force-pushed the incremental_download_with_footer branch from f17a378 to b8f56ea Compare October 8, 2024 09:15
Copy link
Contributor

github-actions bot commented Oct 8, 2024

❌ Gradle check result for b8f56ea: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Oct 9, 2024

❌ Gradle check result for 535bc28: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Oct 9, 2024

❌ Gradle check result for ce18c9b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Comment on lines 57 to 62
*
* - Magic number (Uint32): A constant value that identifies the start of the footer.
* - Algorithm ID (Uint32): The identifier of the checksum algorithm used. Currently, this is always 0.
* - Checksum (Uint64): The checksum of the entire translog data, calculated using the specified algorithm.
*/
public class TranslogFooter {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious will Translog Footer only apply to remote translog?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the Translog footer will be applied to both Remote and Local translog. The footer will be applied at the time of generation rollover, when the translog file becomes immutable. By instantiating TranslogCheckedContainer for both Local and Remote translog, we will now have the checksum pre-calculated for both, which will be written into footer.

For remote translog, upon each sync, the files are transferred to remote and the generation is rolled over.
For local translog, the sync can happen independently of generation rollover. So, we will keep writing to file until the rollover happens.

@rawahars rawahars force-pushed the incremental_download_with_footer branch from ce18c9b to 6ea2adb Compare October 10, 2024 05:39
Copy link
Contributor

❌ Gradle check result for 6ea2adb: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

translogCheckedContainer = new TranslogCheckedContainer(byteArrayOutputStream.toByteArray());
}

// Enable translogCheckedContainer for remote as well as local translog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we enabling translogCheckedContainer for local translogs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per discussion with @Bukhtawar and @sachinpkale, for ensuring consistent view of the world, we are enabling footer for both local and remote translogs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am assuming there would be low/minimal overhead of checksum for the local translog? Can we run a perf benchmark?

Copy link
Author

@rawahars rawahars Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am running the OSB (opensearch benchmarks) for the same to understand the impact. But from code workflow POV, it seems to be a low overhead operation.

@@ -438,8 +438,43 @@ public TranslogReader closeIntoReader() throws IOException {
synchronized (syncLock) {
try (ReleasableLock toClose = writeLock.acquire()) {
synchronized (this) {
Long translogWithoutFooterChecksum = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't need to explicitly initialise Long with null value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am more comfortable with overt initialisation of the variable prior and post the workflow. It find that it helps ensure that code is cleaner.

If you feel strongly about it or implicit initialisation leads to some performance benefits, we can change the same.

(translogCheckedContainer != null) ? translogCheckedContainer.getChecksum() : null
translogWithoutFooterChecksum,
translogWithFooterChecksum,
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it, that we are always passing hasFooter to true here? There can be cases when using older header and therefore, no footer right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular instance of TranslogReader creation is from TranslogWriter.closeIntoReader. This method is called only by current writers when they are rolling generation and creating readers out of them. Now, once this PR is merged, all the writers will have headers and therefore, whenever they try to close into reader, they will indeed have header.

Note that older header will never go through this workflow.

Copy link
Contributor

@skumawat2025 skumawat2025 Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. From what i understand by your comment. This below particular else code block looks redundant?

else {
                            // If we reach here then it means we are using older header and therefore, no footer.
                            // So, both translogWithoutFooterChecksum and translogWithFooterChecksum are same.
                            translogWithoutFooterChecksum = (translogCheckedContainer != null)
                                ? translogCheckedContainer.getChecksum()
                                : null;
                            translogWithFooterChecksum = translogWithoutFooterChecksum;
                        }

In that case can we think of removing it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is there for safety purpose. Imagine a scenario wherein we have an open translog partially written when the OpenSearch process is upgraded, especially since it is under customer control. Now, when we will finally close this writer, then we will omit the footer in this case.

Now, for remote translog, we do not have a problem as the expectation is that any data which was in memory and not synced with remote will be lost. So the new footer will start anew.
But for local translog, the older partial translog will be present on disk which is of older version. So we still need to close it but not write the footer.

// When we open a reader to Translog from a path, we want to fetch the checksum
// as that would be needed later on while creating the metadata map for
// generation to checksum.
Long translogChecksum = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Long translogChecksum = null;
Long translogChecksum;

Copy link
Author

@rawahars rawahars Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer explicit initialisation of the variable especially since we are initialising hasFooter with false. It ensures that code is consistent and overt. If you feel strongly about it, we can make the change.

Long translogChecksum = null;
boolean hasFooter = false;
try {
translogChecksum = TranslogFooter.readChecksum(path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing. As readChecksum returns null value of checksum for an older version file that doesn't have footer. Here we are storing this translogChecksum as null and setting hasFooter to true even though the file doesn't have footer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I have refactored this block to make it better streamlined. Now we will directly check-

        boolean hasFooter = translogChecksum != null;

Comment on lines 111 to 120
public Long getTranslogWithFooterChecksum() {
return translogWithFooterChecksum;
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename it better, like fullChecksum and the other contentChecksum

* @return the checksum value from the translog footer, or `null` if the footer is not present
* @throws IOException if an I/O error occurs while reading the footer
*/
static Long readChecksum(Path path) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer non-static functions? Are they getting invoked from a static context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this method is invoked from static contexts in all instances.
downloadOnce in RemoteFSTranslog is one instance and open in TranslogReader is another one.

byte[] footer = TranslogFooter.write(
channel,
translogCheckedContainer.getChecksum(),
!Boolean.TRUE.equals(remoteTranslogEnabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doubt: as we have already written the translog to disk, after that we are writing footer to file. Why is it that we are forcing a sync again only in case of when remoteTranslogEnabled is false.?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So presently Remote translog does not flush the changes to disk and only keeps it in memory. During sync workflow, it would write the data to file channel in memory and then during rollover, we will upload it directly to remote store.

On the other hand, for local translog, we are flushing the changes to disk on each sync.

Now, in TranslogFooter.write method, we are doing the same operation- For local translog, we flush to disk and for remote translog, we write the data to file channel without flushing to disk.

Harsh Rawat added 2 commits October 15, 2024 13:54
Presently, the download workflow for remote backed storage works in a manner which causes the download for same translog files multiple times, each time deleting all the older files before downloading them again. This causes significant wasted network bandwidth, along with the time taken for the shard to become active.

This change adds support for downloading the translog files incrementally and omitting the same if they are present locally.

Signed-off-by: Harsh Rawat <[email protected]>
This commit adds the unit tests applicable for the changes made to support incremental download of translog files. Primarily, the unit tests cover the changes in-
- TranslogFooter to write and read the footer of Translog
- RemoteFSTranslog to skip the download of locally present translog
- TranslogWriter to create reader with checksum upon close
- TranslogReader closeIntoReader functionality

Signed-off-by: Harsh Rawat <[email protected]>
@rawahars rawahars force-pushed the incremental_download_with_footer branch from 6ea2adb to 1211703 Compare October 15, 2024 08:24
@rawahars
Copy link
Author

Rebased over mainline and addressed the comments which needed any refactoring.

Copy link
Contributor

❕ Gradle check result for 1211703: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Nov 14, 2024
@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added stalled Issues that have stalled and removed stalled Issues that have stalled labels Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants