Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: tpu_queued_resources_startup_script/create_network/time_bound #3907

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

gryczj
Copy link
Contributor

@gryczj gryczj commented Oct 23, 2024

Description

Fixes #

Note: Before submitting a pull request, please open an issue for discussion if you are not associated with Google.

Checklist

  • I have followed guidelines from CONTRIBUTING.MD and Samples Style Guide
  • Tests pass: npm test (see Testing)
  • Lint pass: npm run lint (see Style)
  • These samples need a new API enabled in testing projects to pass (let us know which ones)
  • These samples need a new/updated env vars in testing projects set to pass (let us know which ones)
  • This pull request is from a branch created directly off of GoogleCloudPlatform/nodejs-docs-samples. Not a fork.
  • This sample adds a new sample directory, and I updated the CODEOWNERS file with the codeowners for this sample
  • This sample adds a new sample directory, and I created GitHub Actions workflow for this sample
  • This sample adds a new Product API, and I updated the Blunderbuss issue/PR auto-assigner with the codeowners for this sample
  • Please merge this PR for me once it is approved

@gryczj gryczj added kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. labels Oct 23, 2024
@gryczj gryczj requested review from a team as code owners October 23, 2024 11:57
Copy link

snippet-bot bot commented Oct 23, 2024

Here is the summary of changes.

You are about to add 3 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added samples Issues that are directly related to samples. api: workflows Issues related to the Workflows API. labels Oct 23, 2024
@gryczj gryczj added the api: tpu Issues related to the Cloud TPU API. label Oct 23, 2024
@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch 2 times, most recently from 3276b2a to 3ef4622 Compare October 23, 2024 18:12
@gryczj gryczj changed the title feat: tpu_queued_resources_startup_script feat: tpu_queued_resources_startup_script/create_network Oct 24, 2024
@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch from 83378d8 to cb97989 Compare October 25, 2024 09:57
@gryczj gryczj changed the title feat: tpu_queued_resources_startup_script/create_network feat: tpu_queued_resources_startup_script/create_network/time_bound Oct 25, 2024
@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch 2 times, most recently from e6c1d2c to 964cafb Compare October 30, 2024 11:48
@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch from 964cafb to a66c4eb Compare November 12, 2024 17:05
@gryczj
Copy link
Contributor Author

gryczj commented Nov 12, 2024

Due to high costs of TPU Nodes, it was decided to use mocks in the tests.

@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch from a66c4eb to 9a724f9 Compare November 20, 2024 12:30
@iennae
Copy link
Contributor

iennae commented Nov 21, 2024

@gemini review

Copy link
Contributor

@code-review-assist-experimental code-review-assist-experimental bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request adds support for TPU queued resources and includes creating a network, startup script, and time-bound resource. The implementation looks good, and the tests are comprehensive. Here are a few suggestions:

  • Consider adding more details to the PR description about the specific functionalities added and how they address the linked issue. This will help reviewers understand the context better.
  • Ensure all TODO comments are addressed before merging.
  • Update the README files to reflect the new functionalities added by this PR.

id-token: 'write'
if: github.event_name == 'schedule' && always() # always() submits logs even if tests fail
uses: ./.github/workflows/flakybot.yaml
needs: [test]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unknown

Consider adding a newline at the end of the file for consistency.

Suggested change
needs: [test]
needs: [test]

Comment on lines +28 to +29
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider instantiating the tpuClient here since you're suggesting to import the class above. This makes the example runnable.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
// Instantiate a tpuClient
const tpuClient = new TpuClient();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is commented, becuase mocked tpuClient is used for testing and I pass mock to this function. If user wants to use real TPUClient, they will need to uncomment this line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. For clarity, perhaps a comment indicating that the tpuClient instantiation is for non-testing scenarios would be beneficial. This would help users understand the purpose of the commented-out code more easily.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// tpuClient = new TpuClient();
// Instantiate a tpuClient (uncomment for non-testing scenarios)
// const tpuClient = new TpuClient();

Comment on lines +22 to +23
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since TpuClient is already imported in the tests, consider importing it here as well instead of relying on the parameter. This makes the code more self-contained and easier to understand.

Suggested change
// TODO(developer): Uncomment below line before running the sample.
// const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
const {TpuClient} = require('@google-cloud/tpu').v2alpha1;
const {Node, NetworkConfig, QueuedResource} = require('@google-cloud/tpu').protos.google.cloud.tpu.v2alpha1;

Comment on lines +31 to +33
/**
* TODO(developer): Update/uncomment these variables before running the sample.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update the TODO with instructions on how to set up the required environment variables or configuration.

Suggested change
/**
* TODO(developer): Update/uncomment these variables before running the sample.
*/
/**
* TODO(developer): Before running the sample,
* 1. Set the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of your service account key file.
* 2. Create a network named 'compute-tpu-network' in the 'europe-west4' region.
* 3. Create a subnetwork with the same name as the network in the 'europe-west4' region.
*/

@@ -0,0 +1,3 @@
---
env:
mocha: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unknown

Consider adding a newline at the end of the file for consistency.

Suggested change
mocha: true
mocha: true

@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch from d06385d to 23feb0f Compare November 26, 2024 09:08
@gryczj gryczj assigned iennae and unassigned iennae Nov 28, 2024
@gryczj gryczj requested a review from iennae November 28, 2024 09:08
@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch 3 times, most recently from 1a62542 to 7362b9d Compare December 3, 2024 08:27
@gryczj gryczj requested a review from a team as a code owner December 3, 2024 08:27
const networkName = 'compute-tpu-network';

// The region of the network, that you want the node to connect to.
const region = 'europe-west4';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as in another PR, this value can be just taken from the zone value. No need to explicitly define.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch from 7362b9d to 9f48401 Compare December 3, 2024 16:53
@gryczj gryczj requested a review from m-strzelczyk December 3, 2024 16:53
@gryczj gryczj force-pushed the tpu_queued_resources_startup_script branch from 9f48401 to a9ee1f7 Compare December 4, 2024 13:57
@BigBlackWolf BigBlackWolf self-requested a review December 5, 2024 11:54
@BigBlackWolf
Copy link

Hi @iennae, could you please take a look once again on this PR?

cc: @rsamborski

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: tpu Issues related to the Cloud TPU API. api: workflows Issues related to the Workflows API. kokoro:force-run Add this label to force Kokoro to re-run the tests. kokoro:run Add this label to force Kokoro to re-run the tests. samples Issues that are directly related to samples.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants