Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Deployment Pre-Release APIs #479

Merged
merged 6 commits into from
Nov 19, 2024
Merged

Conversation

ShahabT
Copy link
Contributor

@ShahabT ShahabT commented Nov 17, 2024

What changed?
Added the following APIs:

  • DescribeDeployment
  • ListDeployments
  • GetCurrentDeployment
  • SetCurrentDeployment (low-level placeholder for Rollout API)
  • GetDeploymentReachability

Why?
Needed for Worker Versioning V3.

Breaking changes

Server PR

@ShahabT ShahabT requested review from a team as code owners November 17, 2024 21:22
Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just small things, nothing major.

temporal/api/deployment/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/deployment/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/history/v1/message.proto Outdated Show resolved Hide resolved
temporal/api/workflowservice/v1/request_response.proto Outdated Show resolved Hide resolved
temporal/api/deployment/v1/message.proto Outdated Show resolved Hide resolved
Comment on lines 42 to 43
// The deployment does not have a special status.
DEPLOYMENT_STATUS_NO_STATUS = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? is UNSPECIFIED not enough? Clarify in the comments...


// Used as part of Deployment write APIs to update metadata attached to a deployment.
message UpdateDeploymentMetadata {
map<string, bytes> upsert_entries = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need values to be binary or just string. It could be more user friendly if they were all strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up making them Payload similar to the memo in workflows and schedules.

temporal/api/deployment/v1/message.proto Show resolved Hide resolved
temporal/api/enums/v1/deployment.proto Outdated Show resolved Hide resolved
temporal/api/workflow/v1/message.proto Outdated Show resolved Hide resolved
deployment.v1.Deployment deployment = 1;
enums.v1.DeploymentStatus status = 2;
google.protobuf.Timestamp status_change_time = 3;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why redefine DeploymentInfo without metadata and task_queue_infos here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the list is smaller? is the idea that if the caller wants more detailed info they would subsequently call DescribeDeployment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we don't want to make the list API expensive. For example, TQ info will not be immediately accessible from Visibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you call it something different to avoid confusion, like DeploymentSummary or DeploymentBriefs or something like that...

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also seems odd to me. Why do users care if this API is expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drewhoskins-temporal are you suggesting the list should return complete info of all deployments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the name to DeploymentListEntry or DeploymentSummary. Same pattern is used for ListSchedules where we return a abbreviated set of info in the list response:

// ScheduleListInfo is an abbreviated set of values from Schedule and ScheduleInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to DeploymentListInfo similar to schedules.

string name = 1;
temporal.api.enums.v1.TaskQueueType type = 2;
// When server saw the first poller for this task queue in this deployment.
google.protobuf.Timestamp first_poller_time = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add last_poller_time if technically possible. Can be updated lazily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 That's possible and I agree we should have it. Will create a task for it (can't do right away because of more technical work).

string name = 1;
temporal.api.enums.v1.TaskQueueType type = 2;
// When server saw the first poller for this task queue in this deployment.
google.protobuf.Timestamp first_poller_time = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it belongs here. As this time is per build id. Or this message should be renamed to the DeploymentTaskQueueInfo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message is already nested in DeploymentInfo so I thought it's less verbose like this. But please lmk if you think it should still be renamed to DeploymentTaskQueueInfo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message names are top level. So someone looking at the name will not see where it is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfateev This message name is nested, not top level. Are suggesting we should not have nested message definitions and only top level?

// The deployment is the ramping deployment for the deployment series.
DEPLOYMENT_STATUS_RAMPING = 2;
// The deployment is the current deployment for the deployment series.
DEPLOYMENT_STATUS_CURRENT = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider a future when we are able to use a different current deployment for different workflow types or task queues? To me it looks like we need a separate Table/Message/API to get all Start/ContinueAsNew records that point to deployments.

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've thought about some of these cases:

  • for task queues, at least on our current roadmap, we'll insist that users create a different deployment series if they want to rollout task queues separately. (Note that we will support the ability to move a task queue from one deployment series to another) We don't see this as being a big burden for such an advanced use case.
  • for workflow types, we've decided to lean on the UpdateWorkflowOptions for these power-user scenarios for now. (or they can create a different deployment series) But you could imagine adding more fine-grained control to the rollout API. (i.e. instead of a single % ramp, we could provide various ramps for different types. Shahab mentioned there could be multi-cursor problems to think about with such, though). In that world, we could keep the high-level status of DEPLOYMENT_STATUS_RAMPING and create a more fine-grained API for power users. So, it seems premature to think too deeply now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to @drewhoskins-temporal comments, basically we envision eventually adding task queue and workflow type granularity to Rollout so user can ramp them one-by-one. However, setting different current deployment for different workflow types or task queues is not currently on the road map and users should create new deployment series for such cases.

// `behavior_override` is PINNED. Takes precedence over `deployment`.
temporal.api.common.v1.WorkerDeployment deployment_override = 4;
temporal.api.deployment.v1.Deployment deployment = 2;
// Manual override for execution's versioning behavior. Takes precedence over `behavior`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like "Manual" as it can be done by a script through an API. May be call it something like "WorkflowSpecificOverride" or something similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WorkflowExecutionOverride maybe? I feel like "WorkflowSpecific" is a bit ambiguous because "workflow" could mean Workflow Execution or Workflow Type/Workflow Definition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got rid of the "manual" in the field name and comment. Mentioned "execution-specific" in the comment. Renamed the message to VersioningOverride as discussed before. Also renamed the field name to be consistent to the message name.

}
}

message SetCurrentDeploymentRequest {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this API should exist as all changes to a deployment should happen through a rollout.

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfateev I've been meaning to mention to you--for pre-release, we're planning to ship this lower-level API. We're working on the design for the rollout API for public preview, at which time we will support ramp pre-deploy testing as part of a bigger rollout API. Then, we can deprecate this API.
The team is currently focused on execution, but expect to review the rollout API design in December.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I annotated this and other APIs as "Experimental" in service.proto so it's clear that they all can change or be removed in the next release.

As @drewhoskins-temporal said, this one in particular is merely a low-level placeholder until the rollout API arrives before public preview.

string build_id = 2;
}

// Holds information about a worker deployment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment doesn't add much information. Should define "deployment"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more information for both Deployment and DeploymentInfo.

message DeploymentInfo {
Deployment deployment = 1;
temporal.api.enums.v1.DeploymentStatus status = 2;
google.protobuf.Timestamp last_became_current_time = 3;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusingly worded and perhaps the concept isn't quite right.
Would it be 0 if status is not equal to current?
And why do users care about that specific status transition, but not the others?

It's unclear to me where timestamps should go--on the rollout, on the deployment, or both?
For example, we might link to the associated rollouts, once we have that.

Do we need this for pre-release? Maybe cut for now, or just stick with the normal "created time" and we can add stuff later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially it was more generic and was called status_change_time. But I realized for internal use while there is no rollout yet, we also need timestamp of the last time that a deployment becomes current. But you are right, that info should not go to the public API. Will revert this to status_change_time.

// Holds information about a worker deployment.
message DeploymentInfo {
Deployment deployment = 1;
temporal.api.enums.v1.DeploymentStatus status = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a denormalization of the rollout status. Which might be fine, but it's also blurring the lines between a rollout and a deployment in a way that's a bit confusing. (why isn't it RolloutStatus) ?
And it's making deployments stateful when perhaps rollouts are the stateful thing.
I'm okay thinking of this as a placeholder for pre-release, but let's flag this to revisit.

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, until we design the state machine, maybe best is just to expose an is_current boolean that we can choose to deprecate. It might be more challenging and confusing to evolve if we put in this enum, which might be wrong in some way. (wrong name, wrong states, sitting on the wrong object)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspected that in the UI, for example, we would want to label statuses such as RAMPING and CURRENT so the Describe and List APIs might want to return these info in some format. It does not necessarily mean that the Deployment itself is stateful, as the status could be logically populated based on the Rollout records.

For now, I can get rid of the status and the change timestamp if you prefer. That means for now the only way for the user to get currentness of a deployment is to compare it with the result of GetCurrentDeployment.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I buy the user scenario, and I think this design may well be valid, but it just needs a design pass.
I think users would like to see a created timestamp on this object so they know when the command to create the deploy got run (and ideally, by whom, for auditing purposes). I presume that would be the sort order and it's a fairly straightforward field that doesn't have any design considerations, and would give people an intuition for these rollouts until we figure out what we want to do with state tracking.

Up to you if you want to completely remove the status/changed status fields for now. I'm okay adding the bool for pre-release.

Copy link
Member

@cretz cretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one additional minor suggestion, otherwise LGTM (knowing it's going into non master and may be adjusted as implementation dictates)

google.protobuf.Timestamp last_became_current_time = 3;
// A user-defined set of key-values. Can be updated as part of write operations to the
// deployment, such as `SetCurrentDeployment`.
map<string, string> metadata = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should consider making the values payloads if users should be allowed to encrypt this information

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. changed it to Payload, now it is consistent with workflow and schedules memo.

//
// (-- api-linter: core::0216::synonyms=disabled
// aip.dev/not-precedent: This enum specifies the status of the deployment among all deployments
// in the same series, therefore `DeploymentStatus` is a better name that `DeploymentState`. --)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the reasoning here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting rid of this enum for now. just putting an is_current bool in DeploymentInfo until we finish the rollout API design.

string series_name = 4;
}

message ListDeploymentsResponse {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify the ordering of the results in a comment here?

Copy link
Contributor Author

@ShahabT ShahabT Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be roughly sorted based on deployment creation time, decending. Although if a deployment has a very high number of TQs (or very high number of events happening to it) it will CaN and the sort will be mixed up. In reality we are relying on Visibility sort which is based on WF start time.

I prefer to keep this unspecified until we have a guaranteed order in the future.

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something's not right here. I don't see a use case for listing all the deployments, intermixed, regardless of deployment name. The two scenarios I can think of are users asking:
a) what are the deployment series? (or perhaps the current deployment for each series)
b) what's the history of a given deployment series?

If users do want them all in one endpoint, or if we want that for our UI, it would be a list of lists. But that seems puntable for the unblock , here.

Copy link

@drewhoskins-temporal drewhoskins-temporal Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also question the value of an undisclosed, flaky ordering. Users will naturally expect a time-reverse ordering regardless of what we say here.
I would say a total ordering should be a hard requirement.
Is the scenario you mentioned very likely? Is it sufficiently unlikely to ship for the pre-release and then task up the improvement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the shuffle is sufficiently unlikely and we can fix it after pre-release. It also won't be a total shuffle, one/few over active deployment will be at the top, the fact that they are over active might mean they are also new builds.
Regarding list of lists, I agree such a UI will make sense and likely preferable. This is something I actually tried to bring up and get attention to but was not successful. Hopefully, it will be discussed during UI work and I'll be OK changing the list APIs then.

// Holds information about a worker deployment.
message DeploymentInfo {
Deployment deployment = 1;
temporal.api.enums.v1.DeploymentStatus status = 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I buy the user scenario, and I think this design may well be valid, but it just needs a design pass.
I think users would like to see a created timestamp on this object so they know when the command to create the deploy got run (and ideally, by whom, for auditing purposes). I presume that would be the sort order and it's a fairly straightforward field that doesn't have any design considerations, and would give people an intuition for these rollouts until we figure out what we want to do with state tracking.

Up to you if you want to completely remove the status/changed status fields for now. I'm okay adding the bool for pre-release.

@ShahabT ShahabT changed the title Add Deployment APIs Add Deployment Pre-Release APIs Nov 19, 2024
@ShahabT ShahabT merged commit d4826db into versioning-3 Nov 19, 2024
2 checks passed
@ShahabT ShahabT deleted the shahab/deployment-apis branch November 19, 2024 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants