-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Deployment Pre-Release APIs #479
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly just small things, nothing major.
// The deployment does not have a special status. | ||
DEPLOYMENT_STATUS_NO_STATUS = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? is UNSPECIFIED not enough? Clarify in the comments...
|
||
// Used as part of Deployment write APIs to update metadata attached to a deployment. | ||
message UpdateDeploymentMetadata { | ||
map<string, bytes> upsert_entries = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need values to be binary or just string. It could be more user friendly if they were all strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up making them Payload similar to the memo in workflows and schedules.
deployment.v1.Deployment deployment = 1; | ||
enums.v1.DeploymentStatus status = 2; | ||
google.protobuf.Timestamp status_change_time = 3; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why redefine DeploymentInfo
without metadata and task_queue_infos here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the list is smaller? is the idea that if the caller wants more detailed info they would subsequently call DescribeDeployment
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also we don't want to make the list API expensive. For example, TQ info will not be immediately accessible from Visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call it something different to avoid confusion, like DeploymentSummary or DeploymentBriefs or something like that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems odd to me. Why do users care if this API is expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drewhoskins-temporal are you suggesting the list should return complete info of all deployments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change the name to DeploymentListEntry
or DeploymentSummary
. Same pattern is used for ListSchedules where we return a abbreviated set of info in the list response:
api/temporal/api/schedule/v1/message.proto
Line 372 in a3ed90f
// ScheduleListInfo is an abbreviated set of values from Schedule and ScheduleInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to DeploymentListInfo similar to schedules.
string name = 1; | ||
temporal.api.enums.v1.TaskQueueType type = 2; | ||
// When server saw the first poller for this task queue in this deployment. | ||
google.protobuf.Timestamp first_poller_time = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add last_poller_time if technically possible. Can be updated lazily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 That's possible and I agree we should have it. Will create a task for it (can't do right away because of more technical work).
string name = 1; | ||
temporal.api.enums.v1.TaskQueueType type = 2; | ||
// When server saw the first poller for this task queue in this deployment. | ||
google.protobuf.Timestamp first_poller_time = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it belongs here. As this time is per build id. Or this message should be renamed to the DeploymentTaskQueueInfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message is already nested in DeploymentInfo
so I thought it's less verbose like this. But please lmk if you think it should still be renamed to DeploymentTaskQueueInfo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Message names are top level. So someone looking at the name will not see where it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfateev This message name is nested, not top level. Are suggesting we should not have nested message definitions and only top level?
// The deployment is the ramping deployment for the deployment series. | ||
DEPLOYMENT_STATUS_RAMPING = 2; | ||
// The deployment is the current deployment for the deployment series. | ||
DEPLOYMENT_STATUS_CURRENT = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider a future when we are able to use a different current deployment for different workflow types or task queues? To me it looks like we need a separate Table/Message/API to get all Start/ContinueAsNew records that point to deployments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've thought about some of these cases:
- for task queues, at least on our current roadmap, we'll insist that users create a different deployment series if they want to rollout task queues separately. (Note that we will support the ability to move a task queue from one deployment series to another) We don't see this as being a big burden for such an advanced use case.
- for workflow types, we've decided to lean on the UpdateWorkflowOptions for these power-user scenarios for now. (or they can create a different deployment series) But you could imagine adding more fine-grained control to the
rollout
API. (i.e. instead of a single % ramp, we could provide various ramps for different types. Shahab mentioned there could be multi-cursor problems to think about with such, though). In that world, we could keep the high-level status ofDEPLOYMENT_STATUS_RAMPING
and create a more fine-grained API for power users. So, it seems premature to think too deeply now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to @drewhoskins-temporal comments, basically we envision eventually adding task queue and workflow type granularity to Rollout so user can ramp them one-by-one. However, setting different current deployment for different workflow types or task queues is not currently on the road map and users should create new deployment series for such cases.
// `behavior_override` is PINNED. Takes precedence over `deployment`. | ||
temporal.api.common.v1.WorkerDeployment deployment_override = 4; | ||
temporal.api.deployment.v1.Deployment deployment = 2; | ||
// Manual override for execution's versioning behavior. Takes precedence over `behavior` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like "Manual" as it can be done by a script through an API. May be call it something like "WorkflowSpecificOverride" or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WorkflowExecutionOverride maybe? I feel like "WorkflowSpecific" is a bit ambiguous because "workflow" could mean Workflow Execution or Workflow Type/Workflow Definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got rid of the "manual" in the field name and comment. Mentioned "execution-specific" in the comment. Renamed the message to VersioningOverride
as discussed before. Also renamed the field name to be consistent to the message name.
} | ||
} | ||
|
||
message SetCurrentDeploymentRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this API should exist as all changes to a deployment should happen through a rollout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mfateev I've been meaning to mention to you--for pre-release, we're planning to ship this lower-level API. We're working on the design for the rollout API for public preview, at which time we will support ramp pre-deploy testing as part of a bigger rollout API. Then, we can deprecate this API.
The team is currently focused on execution, but expect to review the rollout API design in December.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I annotated this and other APIs as "Experimental" in service.proto
so it's clear that they all can change or be removed in the next release.
As @drewhoskins-temporal said, this one in particular is merely a low-level placeholder until the rollout API arrives before public preview.
string build_id = 2; | ||
} | ||
|
||
// Holds information about a worker deployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment doesn't add much information. Should define "deployment"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more information for both Deployment and DeploymentInfo.
message DeploymentInfo { | ||
Deployment deployment = 1; | ||
temporal.api.enums.v1.DeploymentStatus status = 2; | ||
google.protobuf.Timestamp last_became_current_time = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is confusingly worded and perhaps the concept isn't quite right.
Would it be 0 if status is not equal to current?
And why do users care about that specific status transition, but not the others?
It's unclear to me where timestamps should go--on the rollout, on the deployment, or both?
For example, we might link to the associated rollouts, once we have that.
Do we need this for pre-release? Maybe cut for now, or just stick with the normal "created time" and we can add stuff later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially it was more generic and was called status_change_time
. But I realized for internal use while there is no rollout yet, we also need timestamp of the last time that a deployment becomes current. But you are right, that info should not go to the public API. Will revert this to status_change_time.
// Holds information about a worker deployment. | ||
message DeploymentInfo { | ||
Deployment deployment = 1; | ||
temporal.api.enums.v1.DeploymentStatus status = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a denormalization of the rollout status. Which might be fine, but it's also blurring the lines between a rollout and a deployment in a way that's a bit confusing. (why isn't it RolloutStatus
) ?
And it's making deployments stateful when perhaps rollouts are the stateful thing.
I'm okay thinking of this as a placeholder for pre-release, but let's flag this to revisit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, until we design the state machine, maybe best is just to expose an is_current
boolean that we can choose to deprecate. It might be more challenging and confusing to evolve if we put in this enum, which might be wrong in some way. (wrong name, wrong states, sitting on the wrong object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspected that in the UI, for example, we would want to label statuses such as RAMPING and CURRENT so the Describe and List APIs might want to return these info in some format. It does not necessarily mean that the Deployment itself is stateful, as the status could be logically populated based on the Rollout records.
For now, I can get rid of the status and the change timestamp if you prefer. That means for now the only way for the user to get currentness of a deployment is to compare it with the result of GetCurrentDeployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I buy the user scenario, and I think this design may well be valid, but it just needs a design pass.
I think users would like to see a created timestamp on this object so they know when the command to create the deploy got run (and ideally, by whom, for auditing purposes). I presume that would be the sort order and it's a fairly straightforward field that doesn't have any design considerations, and would give people an intuition for these rollouts until we figure out what we want to do with state tracking.
Up to you if you want to completely remove the status/changed status fields for now. I'm okay adding the bool for pre-release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one additional minor suggestion, otherwise LGTM (knowing it's going into non master
and may be adjusted as implementation dictates)
google.protobuf.Timestamp last_became_current_time = 3; | ||
// A user-defined set of key-values. Can be updated as part of write operations to the | ||
// deployment, such as `SetCurrentDeployment`. | ||
map<string, string> metadata = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should consider making the values payloads if users should be allowed to encrypt this information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. changed it to Payload, now it is consistent with workflow and schedules memo.
// | ||
// (-- api-linter: core::0216::synonyms=disabled | ||
// aip.dev/not-precedent: This enum specifies the status of the deployment among all deployments | ||
// in the same series, therefore `DeploymentStatus` is a better name that `DeploymentState`. --) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the reasoning here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getting rid of this enum for now. just putting an is_current
bool in DeploymentInfo until we finish the rollout API design.
string series_name = 4; | ||
} | ||
|
||
message ListDeploymentsResponse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify the ordering of the results in a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be roughly sorted based on deployment creation time, decending. Although if a deployment has a very high number of TQs (or very high number of events happening to it) it will CaN and the sort will be mixed up. In reality we are relying on Visibility sort which is based on WF start time.
I prefer to keep this unspecified until we have a guaranteed order in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something's not right here. I don't see a use case for listing all the deployments, intermixed, regardless of deployment name. The two scenarios I can think of are users asking:
a) what are the deployment series? (or perhaps the current deployment for each series)
b) what's the history of a given deployment series?
If users do want them all in one endpoint, or if we want that for our UI, it would be a list of lists. But that seems puntable for the unblock , here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also question the value of an undisclosed, flaky ordering. Users will naturally expect a time-reverse ordering regardless of what we say here.
I would say a total ordering should be a hard requirement.
Is the scenario you mentioned very likely? Is it sufficiently unlikely to ship for the pre-release and then task up the improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the shuffle is sufficiently unlikely and we can fix it after pre-release. It also won't be a total shuffle, one/few over active deployment will be at the top, the fact that they are over active might mean they are also new builds.
Regarding list of lists, I agree such a UI will make sense and likely preferable. This is something I actually tried to bring up and get attention to but was not successful. Hopefully, it will be discussed during UI work and I'll be OK changing the list APIs then.
// Holds information about a worker deployment. | ||
message DeploymentInfo { | ||
Deployment deployment = 1; | ||
temporal.api.enums.v1.DeploymentStatus status = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I buy the user scenario, and I think this design may well be valid, but it just needs a design pass.
I think users would like to see a created timestamp on this object so they know when the command to create the deploy got run (and ideally, by whom, for auditing purposes). I presume that would be the sort order and it's a fairly straightforward field that doesn't have any design considerations, and would give people an intuition for these rollouts until we figure out what we want to do with state tracking.
Up to you if you want to completely remove the status/changed status fields for now. I'm okay adding the bool for pre-release.
What changed?
Added the following APIs:
Why?
Needed for Worker Versioning V3.
Breaking changes
Server PR