diff --git a/README.md b/README.md index d4e1e17..e804ac4 100644 --- a/README.md +++ b/README.md @@ -1,36 +1,377 @@ -This helm chart will deploy the vertica kafka scheduler. It will deploy the vertica/vertica-kafka-scheduler in two modes: initializer and launcher. When deployed as the initializer it will run the container so that you can exec into it and do your setup. When deployed as the launcher, the container will automatically call 'vkconfig launch'. This is expected to be done once everything has been setup. - -| Parameter Name | Description | Default Value | -|----------------|-------------|---------------| -| affinity | Affinity to use with the pods to control where it is scheduled | | -| conf.configMapName | The name of the ConfigMap to use and optionally generate. If omitted, the chart will pick a suitable default. | | -| conf.content | Set of key/values pairs that will be included in the generated ConfigMap. This is ignored if conf.generate is false. | | -| conf.generate | If true, the helm chart will control creation of the vkconfig.conf ConfigMap. | true | -| fullNameOverride | Gives full controls over the name of the objects that get created. This takes precedence over nameOverride. | | -| initializerEnabled | If true, the initializer pod is created. This can be used to run any setup tasks needed. | true | -| image.pullPolicy | The pull policy to use for the image | IfNotPresent | -| image.repository | The image repository and name that contains the Vertica Kafka Scheduler | vertica/kafka-scheduler | -| image.tag | The tag corresponds to the version to use. The version of the Vertica Kafka Scheduler must match version of the Vertica server you are connecting to | Defaults to the charts appVersion | -| imagePullSecrets | A list of Secret's that are needed to be able to pull the image | | -| launcherEnabled | If true, the launch deployment is created. This should only be enabled once everything has been setup. | true | -| jvmOpts | Values to assign to the VKCONFIG_JVM_OPTS environment variable in the pods. You can omit most trustrtore/keystore settings as they are controlled by tls.* | | -| nameOverride | Controls the name of the objects that get created. This is combined with the helm chart release to form the name | | -| nodeSelector | A node selector to use with the pod to control where it is scheduled | | -| podAnnotations | Annotations to attach to the pods | | -| podSecurityContext | A PodSecurityContext to use for the pods | | -| replicaCount | If you want more than one launch pod deployed set this to a value greater than 1. | 1 | -| resourecs | Resources to use with the pod | | -| securityContext | A SecurityContext to use for the container in the pod | | -| serviceAccount.annotations | Annotations to attach to the ServiceAccount | | -| serviceAccount.create | If true, a ServiceAccount is created as part of the deployment | true | -| serviceAccount.name | Name of the service account. If not set and create is true, a name is generated using the fullname template | | -| tls.enabled | If true, we setup with the assumption that TLS authentication will be used. | false | -| tls.keyStoreMountPath | Directory name where the keystore will be mounted in the pod. This controls the name of the keystore within the pod. The full path to the keystore will be constructed by combining this parameter with tls.keyStoreSecretKey. | | -| tls.keyStorePassword | The password to use along with the keystore. If omitted, then no password is used. | | -| tls.keyStoreSecretKey | A key within tls.keyStoreSecretName that will be used as the keystore file name. This is used along with tls.keyStoreMountPath to form the full path to the key in the pod. | | -| tls.keyStoreSecretName | Name of an existing Secret that contains the keystore. If this is omitted, then no keystore information is included. | | -| tls.trustStoreMountPath | Directory name where the truststore will be mounted in the pod. This controls the name of the truststore within the pod. The full path to the truststore will be constructed by combining this parameter with tls.trustStoreSecretKey. | | -| tls.trustStorePassword | The password to use along with the truststore. If omitted, then no password is used. | | -| tls.trustStoreSecretKey | A key within tls.trustStoreSecretName that will be used as the truststore file name. This is used along with tls.trustStoreMountPath to form the full path to the key in the pod. | | -| tls.trustStoreSecretName | Name of an existing Secret that contains the truststore. If this is omitted, then no truststore information is included. | | -| tolerations | Tolerations to use with the pods to control where it is scheduled | | +This Helm chart deploys the [vertica-kafka-scheduler](https://github.com/vertica/vertica-containers/tree/main/vertica-kafka-scheduler) with two modes: +- **initializer**: Configuration mode. Starts a container so that you can `exec` into it and configure it. +- **launcher**: Launch mode. Launches the vkconfig scheduler. Starts a container that calls `vkconfig launch` automatically. Run this mode after you configure the container in `initializer` mode. + +## Install the charts + +Add the charts to your repo and install the Helm chart. The following `helm install` command uses the `image.tag` [parameter](#parameters) to install version 24.1.0: + +```shell +$ helm repo add vertica-charts https://vertica.github.io/charts +$ helm repo update +$ helm install vkscheduler vertica-charts/vertica-kafka-scheduler \ + --set "image.tag=24.1.0" +``` + +## Sample manifests + +The following dropdowns provide sample manifests for a Kafka cluster, VerticaDB operator and custom resource (CR), and vkconfig scheduler. These manifests are applied in [Usage](#usage) to demonstrate how a simple deployment: + +
+ kafka-cluster.yaml (with Strimzi operator) + + ```yaml + apiVersion: kafka.strimzi.io/v1beta2 + kind: Kafka + metadata: + + namespace: kafka + name: my-cluster + spec: + kafka: + version: 3.6.0 + replicas: 1 + listeners: + - name: plain + port: 9092 + type: internal + tls: false + - name: tls + port: 9093 + type: internal + tls: true + config: + offsets.topic.replication.factor: 1 + transaction.state.log.replication.factor: 1 + transaction.state.log.min.isr: 1 + default.replication.factor: 1 + min.insync.replicas: 1 + inter.broker.protocol.version: "3.6" + storage: + type: jbod + volumes: + - id: 0 + type: persistent-claim + size: 100Gi + deleteClaim: false + zookeeper: + replicas: 1 + storage: + type: persistent-claim + size: 100Gi + deleteClaim: false + entityOperator: + topicOperator: {} + userOperator: {} + ``` +
+ +
+ vdb-op-cr.yaml + + ```yaml + apiVersion: vertica.com/v1 + kind: VerticaDB + metadata: + annotations: + vertica.com/include-uid-in-path: "false" + vertica.com/vcluster-ops: "false" + name: vdb-1203 + spec: + communal: + credentialSecret: "" + endpoint: https://s3.amazonaws.com + path: s3://// + image: vertica/vertica-k8s:12.0.3-0 + initPolicy: Create + subclusters: + - name: sc0 + size: 3 + type: primary + ``` +
+ +
+ vertica-kafka-scheduler.yaml + + ```yaml + image: + repository: vertica/kafka-scheduler + pullPolicy: IfNotPresent + tag: 12.0.3 + launcherEnabled: false + replicaCount: 1 + initializerEnabled: true + conf: + generate: true + content: + config-schema: Scheduler + username: dbadmin + dbport: '5433' + enable-ssl: 'false' + dbhost: 10.20.30.40 + tls: + enabled: false + serviceAccount: + create: true + ``` +
+ +## Usage + +The following sections deploy a Kafka cluster and a VerticaDB operator and CR on Kubernetes. Then, they show you how to configure Vertica to consume data from Kafka by setting up the necessary tables and configuring the scheduler. Finally, you launch the scheduler and send data on the command line to test the implementation. + +### Deploy the manifests + +Apply manifests on Kubernetes to create a Kafka cluster, VerticaDB operator, and VerticaDB CR: + +1. Create a namespace. The following command creates a namespace named `kafka`: + ```shell + kubectl create namespace kafka + ``` +1. Create the Kafka custom resource. Apply the [kafka-cluster.yaml](#sample-manifests) manifest: + ```shell + kubectl apply -f kafka-cluster.yaml + ``` + +2. Deploy the VerticaDB operator and custom resource. The [vdb-op-cr.yaml](#sample-manifests) manifest deploys version 12.0.3. Before you apply the manifest, edit `spec.communal.path` to provide a path to an existing S3 bucket: + ```shell + kubectl apply -f vdb-op-cr.yaml + ``` + +### Set up Vertica + +Create tables and resources so that Vertica can consume data from a Kafka topic: + +1. Create a Vertica database for Kafka messages: + ```sql + CREATE FLEX TABLE KafkaFlex(); + ``` +1. Create the Kafka user: + ```sql + CREATE USER KafkaUser; + ``` +1. Create a resource pool: + ```sql + CREATE RESOURCE POOL scheduler_pool PLANNEDCONCURRENCY 1; + ``` + +### Create a Kafka topic + +Start the Kafka service, and create a Kafka topic that the scheduler can consume data from: + +1. Open a new shell and start the Kafka producer: + ```shell + kubectl -namespace kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.38.0-kafka-3.6.0 --rm=true --restart=Never -- bash + ``` +1. Create the Kafka topic that the scheduler subscribes to: + ```shell + bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap.kafka:9092 --topic KafkaTopic1 + ``` + +### Configure the scheduler + +Deploy the scheduler container in initializer mode, and configure the scheduler to consume data from the [Kafka topic](#create-a-kafka-topic): + +1. Deploy the [vertica-kafka-scheduler Helm chart](#sample-manifests). This manifest has `initializerEnabled` set to `true` so you can configure the vkconfig container before you launch the scheduler: + ```shell + kubectl apply -f vertica-kafka-scheduler.yaml + ``` + +1. Use `kubectl exec` to get a shell in the initializer pod: + ```shell + kubectl exec -namespace main -it vk1-vertica-kafka-scheduler-initializer -- bash + ``` +1. Set configuration options for the scheduler. For descriptions of each of the following options, see [vkconfig script options](https://docs.vertica.com/23.4.x/en/kafka-integration/vkconfig-script-options/): + ```shell + # scheduler options + vkconfig scheduler --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ + --frame-duration 00:00:10 \ + --create --operator KafkaUser \ + --eof-timeout-ms 2000 \ + --config-refresh 00:01:00 \ + --new-source-policy START \ + --resource-pool scheduler_pool + + # target options + vkconfig target --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ + --target-schema public \ + --target-table KafkaFlex + + # load spec options + vkconfig load-spec --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ + --load-spec KafkaSpec \ + --parser kafkajsonparser \ + --load-method DIRECT \ + --message-max-bytes 1000000 + + # cluster options + vkconfig cluster --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ + --cluster KafkaCluster \ + --hosts my-cluster-kafka-bootstrap.kafka:9092 + + # source options + vkconfig source --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ + --cluster KafkaCluster \ + --source KafkaTopic1 \ + --partitions 1 + + # microbatch options + vkconfig microbatch --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \ + --microbatch KafkaBatch1 \ + --add-source KafkaTopic1 \ + --add-source-cluster KafkaCluster \ + --target-schema public \ + --target-table KafkaFlex \ + --rejection-schema public \ + --rejection-table KafkaFlex_rej \ + --load-spec KafkaSpec + ``` + +### Launch the scheduler + +After you configure the scheduler options, you can deploy it in launcher mode: + +```shell +helm upgrade -namespace main vk1 vertica-charts/vertica-kafka-scheduler \ + --set "launcherEnabled=true" +``` + +### Testing the deployment + +Now that you have a containerized Kafka cluster and VerticaDB CR running, you can test that the scheduler is automatically sending data from the Kafka producer to Vertica: + +1. In the terminal that is running your Kafka producer, send sample JSON data: + ```shell + >{"a": 1} + >{"a": 1000} + ``` + +1. In a different terminal, open `vsql` and query the `KafkaFlex` table for the data: + ```sql + => SELECT compute_flextable_keys_and_build_view('KafkaFlex'); + compute_flextable_keys_and_build_view + -------------------------------------------------------------------------------------------------------- + Please see public.KafkaFlex_keys for updated keys + The view public.KafkaFlex_view is ready for querying + (1 row) + + => SELECT a from KafkaFlex_view; + a + ----- + 1 + 1000 + (2 rows) + ``` + +## Parameters + +
+
affinity
+
Applies affinity rules that constrain the scheduler to specific nodes.
+ +
conf.configMapName
+
Name of the ConfigMap to use and optionally generate. If omitted, the chart picks a suitable default.
+ +
conf.content
+
Set of key-value pairs in the generated ConfigMap. If conf.generate is false, this setting is ignored.
+ +
conf.generate
+
When set to true, the Helm chart controls the creation of the vkconfig.conf ConfigMap.
+
Default: true
+ +
fullNameOverride
+
Gives the Helm chart full control over the name of the objects that get created. This takes precedence over nameOverride.
+ +
initializerEnabled
+
When set to true, the initializer pod is created. This can be used to run any setup tasks needed.
+
Default: true
+ +
image.pullPolicy
+
How often Kubernetes pulls the image for an object. For details, see Updating Images in the Kubernetes documentation.
+
Default: IfNotPresent
+ +
image.repository
+
The image repository and name that contains the Vertica Kafka Scheduler.
+
Default: vertica/kafka-scheduler
+ +
image.tag
+
Version of the Vertica Kafka Scheduler. This setting must match the version of the Vertica server that the scheduler connects to.
+
Default: Helm chart's appVersion
+ +
imagePullSecrets
+
List of Secrets that contain the required credentials to pull the image.
+ +
launcherEnabled
+
When set to true, the Helm chart creates the launch deployment. Enable this setting after you configure the scheduler options in the container.
+
Default: true
+ +
jvmOpts
+
Values to assign to the VKCONFIG_JVM_OPTS environment variable in the pods. + + > **NOTE** + > You can omit most truststore and keystore settings because they are set by tls.* parameters.
+ +
nameOverride
+
Controls the name of the objects that get created. This is combined with the Helm chart release to form the name.
+ +
nodeSelector
+
nodeSelector that controls where the pod is scheduled.
+ +
podAnnotations
+
Annotations that you want to attach to the pods.
+ +
podSecurityContext
+
Security context for the pods.
+ +
replicaCount
+
Number of launch pods that the chart deploys.
+
Default: 1
+ +
resources
+
Host resources to use for the pod.
+ +
securityContext
+
Security context for the container in the pod.
+ +
serviceAccount.annotations
+
Annotations to attach to the ServiceAccount.
+ +
serviceAccount.create
+
When set to true, a ServiceAccount is created as part of the deployment.
+
Default: true
+ +
serviceAccount.name
+
Name of the service account. If this parameter is not set and serviceAccount.create is set to true, a name is generated using the fullname template.
+ +
tls.enabled
+
When set to true, the scheduler is set up for TLS authentication.
+
Default: false
+ +
tls.keyStoreMountPath
+
Directory name where the keystore is mounted in the pod. This setting controls the name of the keystore within the pod. The full path to the keystore is constructed by combining this parameter and tls.keyStoreSecretKey.
+ +
tls.keyStorePassword
+
Password that protects the keystore. If this setting is omitted, then no password is used.
+ +
tls.keyStoreSecretKey
+
Key within tls.keyStoreSecretName that is used as the keystore file name. This setting and tls.keyStoreMountPath form the full path to the key in the pod.
+ +
tls.keyStoreSecretName
+
Name of an existing Secret that contains the keystore. If this setting is omitted, no keystore information is included.
+ +
tls.trustStoreMountPath
+
Directory name where the truststore is mounted in the pod. This setting controls the name of the truststore within the pod. The full path to the truststore is constructed by combining this parameter with tls.trustStoreSecretKey.
+ +
tls.trustStorePassword
+
Password that protects the truststore. If this setting is omitted, then no password is used.
+ +
tls.trustStoreSecretKey
+
Key within tls.trustStoreSecretName that is used as the truststore file name. This is used with tls.trustStoreMountPath to form the full path to the key in the pod.
+ +
tls.trustStoreSecretName
+
Name of an existing Secret that contains the truststore. If this setting is omitted, then no truststore information is included.
+ +
tolerations
+
Applies tolerations that control where the pod is scheduled.
+