Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CORE-17476 Modifying helm charts to create ClusterIP services for workers, passing to FlowWorker #4777

Merged
merged 7 commits into from
Oct 6, 2023

Conversation

ben-millar
Copy link
Contributor

@ben-millar ben-millar commented Oct 4, 2023

As part of the Corda topology redesign, communication between the flow engine and Corda workers is moving from a Kafka message bus to synchronous RPC calls. In order to support this, we're adding a load balancing layer (Here, a ClusterIP service per worker) which will expose a single endpoint which distributes calls across all worker instances.

The endpoint for this ClusterIP service is now being passed into the additional startup args of the FlowWorker; a future PR will use these to define a routing component for external messages.

Related PRs:

@ben-millar ben-millar marked this pull request as ready for review October 4, 2023 17:26
@ben-millar ben-millar requested a review from a team as a code owner October 4, 2023 17:26
@ben-millar ben-millar requested a review from a team October 4, 2023 17:26
davidcurrie
davidcurrie previously approved these changes Oct 5, 2023
Copy link
Member

@davidcurrie davidcurrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the port isn't configurable, and if not, making it so would not be part of this PR. The other suggestions are just stylistic.

@@ -5,5 +5,5 @@ internal const val HTTP_SERVICE_UNAVAILABLE_CODE = 503
internal const val HTTP_HEALTH_ROUTE = "/isHealthy"
internal const val HTTP_METRICS_ROUTE = "/metrics"
internal const val HTTP_STATUS_ROUTE = "/status"
internal const val WORKER_SERVER_PORT = 7000
internal const val WORKER_SERVER_PORT = 7000 // Note: This variable is defined in charts/corda-lib/templates/_helpers.kt:622
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really the definition of the port. The Helm chart variable tells the rest of the world what this value is. I don't know whether this port is configurable on a worker but, if it is, the Helm chart could pass the value in and then be sure that everything ties up nicely (even if the value we pass in is still 7000!).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defaults to 7000, but you can override it using the -p param:

(just FYI)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know! 🤔 I'll remove the comment then, as it's not adding any clarity to the code 😅

{{- include "corda.worker" ( list $ .Values.workers.flow "flow"
( dict "stateManagerDbAccess" true )
( dict "stateManagerDbAccess" true "additionalWorkerArgs" $args )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably doesn't make much difference while the flow worker is the only worker doing this but I'd still be tempted to pass servicesAccessed in the dictionary and then do the processing of that within corda.worker rather than here.

Comment on lines 11 to 12
{{- $endpoint := include "corda.getWorkerEndpoint" (dict "context" $ "worker" $worker) }}
{{- $args = append $args (printf "--endpoint=%s" $endpoint) }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: just for readability, I'd be tempted to indent these lines to indicate that they're nested within the range (the white space being eaten by {{- anyway).

@@ -5,5 +5,5 @@ internal const val HTTP_SERVICE_UNAVAILABLE_CODE = 503
internal const val HTTP_HEALTH_ROUTE = "/isHealthy"
internal const val HTTP_METRICS_ROUTE = "/metrics"
internal const val HTTP_STATUS_ROUTE = "/status"
internal const val WORKER_SERVER_PORT = 7000
internal const val WORKER_SERVER_PORT = 7000 // Note: This variable is defined in charts/corda-lib/templates/_helpers.kt:622
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It defaults to 7000, but you can override it using the -p param:

(just FYI)

{{/*
The port which should be used to connect to Corda worker instances
*/}}
{{- define "worker.port" -}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit nitpicky (and don't propose it needs to be changed), but I guess technically this is the LB port, not the worker port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically its both the port for the worker and for the load balancer; the LB forwards port :7000 to port :7000 on the respective worker. I'm on the fence about it, but I figure worker.port makes the overall intent more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh - but now you've highlighted it again, I realise this breaks the naming convention for functions. They are all corda. something. This is because the functions are in a global namespace when charts are combined together.


@Option(names = ["--endpoint"], description = ["Internal RPC endpoints for Corda workers"], required = true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: can we update the description to HTTP endpoints instead, as that what they are?. Just trying to avoid weeks of works in the future trying to refactor from RPC to HTTP again (it already happened for the old rpc-worker)?.

davidcurrie
davidcurrie previously approved these changes Oct 5, 2023
Copy link
Member

@davidcurrie davidcurrie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

driessamyn
driessamyn previously approved these changes Oct 5, 2023
Copy link
Contributor

@driessamyn driessamyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

jujoramos
jujoramos previously approved these changes Oct 5, 2023
Copy link
Contributor

@jujoramos jujoramos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ben-millar ben-millar force-pushed the bm/CORE-17476/load-balancer branch from 869fdac to 702ae56 Compare October 5, 2023 13:58
@ben-millar ben-millar dismissed stale reviews from jujoramos, davidcurrie, and driessamyn via 9072689 October 6, 2023 11:26
@corda-jenkins-ci02
Copy link
Contributor

Jenkins build for PR 4777 build 7

Build Successful:
Jar artifact version produced by this PR: 5.1.0.0-alpha-1696591598556
Helm chart version produced by this PR: 5.1.0-alpha.1696591598556
Helm chart pushed to: oci://corda-os-docker-dev.software.r3.com/helm-charts/pr-4777/corda

Copy link
Contributor

@driessamyn driessamyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ben-millar ben-millar merged commit 5ca49de into release/os/5.1 Oct 6, 2023
2 checks passed
@ben-millar ben-millar deleted the bm/CORE-17476/load-balancer branch October 6, 2023 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants