Assumed background: Kubernetes controllers, operator-sdk or similar, custom resource definitions
This is a Go project. To build and test this operator you also need the operator-sdk.
Use of Go modules,
per the Go modules Wiki, requires invoking Go outside of the src
folder
or setting export GO111MODULE=on
.
See the Operator-Framework Community Operators docs for instructions for PR'ing OperatorHub.io with new versions.
The AkkaCluster operator is similar in concept to a Deployment controller, in that it watches for a top level resource then drives changes down into sub-resources. So just like a Deployment drives changes into a ReplicaSet, the AkkaCluster drives changes into a Deployment, ServiceAccount, Role, and RoleBinding.
The spec
of an AkkaCluster is just a Deployment spec, with a set of defaults that are
used if certain fields are blank. On the reconcile main loop, the AkkaCluster resource is
turned into an ideal specification for a set of sub-resources, which is the goal state.
In-cluster resources are then compared to the goal state, and created or updated as needed.
The status
of an AkkaCluster is fed to the controller by a helper loop, run as an actor,
which is keeping a list of clusters, their leader endpoints, and their Akka Management
endpoint results. If the status actor sees a change, it pushes a reconcile event to the
main controller, where the change is picked up.
The above custom logic should hook pretty easily into any operator framework, and there's nothing that particularly requires operator-sdk except where it is used to validate OLM things. The heavy lifting outside of the above logic is all done in Kubernetes client code. Since this area is a fast moving target, the main thought here was just to use the native client libs which happen to be in Go, so this is in go too. But this could all be written in Scala if one wanted to be adventurous.
The project skeleton was generated by operator-sdk
akkacluster_types.go
is the primary source for serialization / deserialization. It has
Go structures with json tags for all the AkkaCluster spec and status bits. If you change
things here, you must run "generate k8s" to regenerate all the various DeepCopy functions.
operator-sdk generate k8s
akkacluster_controller.go
is the primary source for the controller, and is where
Watch()
is called to set up reconcile triggers, and where Reconcile()
is defined.
deploy_builder.go
takes an AkkaCluster, fills in defaults, returns a set of ideal
resources.
subset.go
is a generic SubsetEqual implementation, using reflection to support arbitrary
Go structures. SubsetEqual(A,B) returns true if A is a subset of B. This is handy for
comparing pristine ideal resources with mucked in-cluster resources, ignoring all the
ephemera of live resources like timestamps, uids, and other run time housekeeping.
status.go
is an actor responsible for polling cluster leaders for Akka Management
status. It provides status to Reconcile(), and triggers reconcile when it sees status
changes. It bootstraps in a way very similar to Akka Management, in that it uses the
AkkaCluster pod selector to list running pods and then starts talking to them to locate
the leader.
./deploy/*.yaml
is the operator Deployment, ServiceAccount, Role, and RoleBinding. These
were all generated by operator-sdk, meaning nothing special here, just generic operator
things.
./deploy/crds/app.lightbend.com_akkaclusters_crd.yaml
has the custom resource definition. This is where
new top level fields and basic validation go, if you want kubectl
to know a valid from
invalid AkkaCluster.
./deploy/olm-catalog/
has a nested OLM package, which should be updated on releases. The
primary source here are the "*clusterserviceversion.yaml" files, one for each version published.
Certified versions of the CSVs are of the format -certified.*.clusterserviceversion.yaml
.
Certified container images, including the Operator, are published and distributed separately.
The controller has several unit tests that can be run with the usual go tools.
go test -race ./...
akkacluster_controller_test.go
uses controller-runtime client mocks to emulate a
kubernetes environment behind the client interface.
deploy_builder_test.go
uses a yamlizer and gold files to take input yaml, generate ideal
resources, compare to output yaml files. Gold file tests are a kind of regression test,
and may need to be updated if the schema changes or deploy builder defaults change.
To update gold files, run tests with the -update
flag:
go test -update ./...
subset_test.go
has a mix of whitebox, blackbox, edge, and gold file tests. The whitebox
tests just say if the subset was found or not. The blackbox tests say how many nodes in
the object tree were compared equal, to ensure that the tree is walked completely. Edge
tests include various empty comparisons, off-by-one comparisons, and validates that short
circuit works on recursive objects. There is one gold file test against a complex
Deployment object found in OpenShift, with tons of extraneous fields to ignore.
status_test.go
is a step toward property testing, with various generators for
AkkaClusters. It also defines a mock urlReader and podLister so the test can provide the
status actor with arbitrary cluster results. It then runs a status actor in high speed
mode and verifies that status changes are correctly signaled back to the controller.
- install operator-sdk
- start minikube
- install the CRD
- route pod network to macbook so operator can query akka management endpoints
sudo route -n add 172.17.0.0/16 $(minikube ip)
then loop on:
operator-sdk up local
and a demo app in minikube. This lets you run the operator locally, watch logs, watch mutations to resources within minikube.
While typically you would run locally using operator-sdk up local
you can also build the
docker image using operator-sdk.
operator-sdk build akkacluster-operator:latest
Note that you'll then need to modify the Deployment to point to your local image.
CI/CD is done via GitHub Actions, as seen in ./.github/main.workflow
. PRs must pass unit
tests, and merges to master trigger goreportcard updates and docker image builds that get
pushed to Lightbend registry.
On every Pull Request pull_request.yml
- execute
go test -race ./...
On every push to master push.yml
- checkout master.
- build artifact akkacluster-operator:latest
- publish to Lightbend registry.
- credentials are available in settings/secrets of the repo.
The ./.github/actions/operator-sdk/
docker image works off a pinned version of
operator-sdk so will have to be updated regularly to keep up with changes.
In order to install versions of the Operator other than those published on Operatorhub, one can:
- Install manually, without OLM, which is described below.
- Use the configMap technique.
- Publish to a test Quay.io repository, see testing-operators in community-operators.
For Kubernetes to understand AkkaCluster resources, each cluster must have the CRD
installed. This yaml specifies the schema of an AkkaCluster resource, and registers it in
the API layer so that kubectl
and controllers and other API clients can interact with
AkkaCluster resources just like any other.
This CRD must be installed once for each cluster:
kubectl apply -f ./deploy/crds/app.lightbend.com_akkaclusters_crd.yaml
One way to test if this worked is to run kubectl get akkaclusters
. This should return an
error if the CRD is not yet installed, explaining that this is an unknown resource type.
$ kubectl get akkaclusters
error: the server doesn't have a resource type "akkaclusters"
If the CR is installed, you should see a normal response, like "No resources found."
$ kubectl get akkaclusters
No resources found.
Once the CRD is installed, the run-time controller for that resource must be installed and running for Kubernetes to act on AkkaCluster resources. This controller watches for resource changes related to AkkaClusters, and reconciles what is running with what is wanted just like a Deployment controller.
This must be installed in each namespace where AkkaClusters are expected.
kubectl apply -f ./deploy
If this works, you should see an akka-cluster-operator
deployment in the namespace.
The OperatorHub dev site, https://dev.operatorhub.io/ includes 'beta' CSV generation and validation tooling. The dev.operatorhub.io tooling helps developers manage the manifests.
The OLM manifest is a ClusterServiceVersion (csv) yaml file. Broadly speaking, this is a collection of 1. marketing material describing the catalog web page content 2. installation specification corresponding to a version of the operator. It may be desirable to build a release template for this manifest as it will need to be copied and slightly altered for each release. In practice, it has been easier to regenerate the CSV using the OperatorHub Beta Package tool.
The spec.install
section is a kind of copy of the operator Deployment, so changes to the
deploy resources will need to be reflected here.
The spec.customresourcedefinitions
is a mix of marketing material and validation
collateral. The list of resources here correspond to each alm-examples entry, and the
description
and displayName
will be published in the web page Example box. The
resources
, specDescriptors
, and statusDescriptors
are used by operator-sdk
scorecard, but isn't clear to me how else these are used.
Most of the rest is marketing material and can be previewed
here, but the main content is in spec.description
which
is in markdown format and can include links and images etc. metadata.description
is the
headline blurb shown in several places, and should be kept quite short.
Release versions will touch several fields, including
metadata.annotations.containerImage
metadata.name
spec.install...image
spec.version
For manual preview of the marketing material as it might appear on OperatorHub, you can use https://operatorhub.io/preview
Then to validate the OLM manifest without running a Kubernetes cluster, one can use operator-courier like so:
operator-courier --verbose verify ./deploy/olm-catalog/akka-cluster-operator
Given a working Kubernetes cluster, including minikube, one can run the scorecard
tests.
With operator-sdk
version v0.14.1, a config file is used to run scorecard, .osdk-scorecard.yaml
.
Enable Go 1.11 modules before running.
export GO111MODULE=on
operator-sdk scorecard