Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/mongodb] Enabling TLS using a custom certificate runs into multiple issues. - user creation / current_primary detection / all probes #16719

Closed
dtrts opened this issue May 17, 2023 · 12 comments · Fixed by #25397
Assignees
Labels
mongodb solved tech-issues The user has a technical issue about an application

Comments

@dtrts
Copy link
Contributor

dtrts commented May 17, 2023

Name and Version

bitnami/mongodb 13.9.4

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Create a TLS certificate with a single wildcard domain. *.mongodb.example.com
  2. Enable TLS in the chart with requireTLS.

a) The root users are unable to be created during the startup
b) New nodes are unable to register with the primary host
c) The probes fail to run commands
d) The metrics server is unable to connect to mongod

The root cause of this issue is being restricted by a TLS Certificate which can only have a single domain.

The certificate cannot have 127.0.0.1, localhost or any of the k8s domains on there.

Are you using any custom parameters or values?

tls: 
  enabled: true 
  standalone/replicaset/hidden/arbiter:
  existingSecret(s): [ "mongodb-example-com-cert" ]
externalAccess: 
  enabled: true
  autoDiscovery: 
    enabled: true
  annotations:
      external-dns.alpha.kubernetes.io/hostname: "{{ .targetPod }}.mongodb.example.com"
  hidden:
    enabled: true
    service:
      annotations:
        external-dns.alpha.kubernetes.io/hostname: "{{ .targetPod }}.mongodb.example.com"

SIDE NOTE: We have enabled autodiscovery to mount the /shared volume and populate info.txt.
Our in use DNS address does not match the value returned by the autoDiscovery script since we are using the external-dns won AWS.
Our FQDN is releasename-0.mongodb.example.com and autoDiscovery returns the AWS load balancer address.
If we switch auto-discovery off then the MONGODB_ADVERTISED_HOSTNAME reverts to the loadbalancer IPs which we do not want.


What is the expected behavior?

The mongodb database initializes correctly:

  • root user and additional users are created
  • replicaset config is configured and nodes join existing replica sets.

The probes are able to connect to mongod using tls
The metrics server connects to mongod using tls

What do you see instead?

  • All commands of mongosh time out since the TLS requirements are not met.
  • The script continues, configuring mongo.conf and restarting the mongod server.
  • Mongod is then running but without any users or a replicaset configured.

Additional information

We have implemented significant workarounds to get the chart working. I think these are good candidates for being included in the Chart (and additionally the Container)

Below I will describe the various points where the restrictions


Current Primary / Mongodb Advertised Hostname / Mongodb Initial Primary Host.

Description

The MONGODB_ADVERTISED_HOSTNAME envar is populated through autoDiscovery.sh and takes a default through the container envvars. It is used as the hostname when configuring the replica set and also used to compare against the hostname returned from an existing replica set to verify if the current node is a primary.

The MONGODB_INITIAL_PRIMARY_HOST env var is used as the host when configuring a new secondary node onto an existing replicaset.

The current_primary variable is used in the setup.sh script and populated through a mongosh function. The chart only uses the hostnames for the headless service.

The environment variables can be overwritten through the chart extraEnvVars: but this is not sufficient.
MONGODB_ADVERTISED_HOSTNAME is overwritten in the setup.sh script, either through the autoDiscover.sh initi container or using other details for the load balancer.

Issues

All three of these variables cause issues with our TLS set-up.

The current_primary node is unable to be found. The replicaset configuration is unable to be set, and even if it was able to connect a hostname would be added which would also fail TLS.

Workaround

First we override the environment variables with our domain name:

extraEnvVars:
      - name: MONGODB_INITIAL_PRIMARY_HOST
        value: {{ printf "%s-0.%s" (include "mongodb.fullname" .) "mongodb.example.com" }}

      - name: MONGODB_ADVERTISED_HOSTNAME
        value: $(MY_POD_NAME).mongodb.example.com

This sets the defaults to use our Valid Domain name and will ensure that the replicaset config can be done.

We alter the args: of the container to run a preparatory script. This script runs first to perform some small changes, it also change the /scripts/setup.sh file, before running the altered script

The preparatory script first inserts our Valid FQDN for that pod into the response from the initContainer.

    echo -n "$MONGODB_ADVERTISED_HOSTNAME" > /shared/info.txt

Next it takes /scripts/setip.sh and replaces the connection string for the current_primary variable with our preferred host list.
NOTE: This requires us to mount an additional emptyDir volume since we cannot alter the script in place

    {{- $replicaCount := int .Values.replicaCount }}
    {{- $portNumber := int .Values.service.ports.mongodb }}
    {{- $fullname := include "mongodb.fullname" . }}
    {{- $releaseNamespace := include "mongodb.namespace" . }}
    {{- $clusterDomain := .Values.mongodb.clusterDomain }}
    {{- $loadBalancerIPListLength := len .Values.externalAccess.service.loadBalancerIPs }}
    {{- $mongoList := list }}
    {{- $mongoListTLS := list }} <----------  New line added to construct connection string to tls addresses.
    {{- range $e, $i := until $replicaCount }}
    {{- $mongoList = append $mongoList (printf "%s-%d.%s-headless.%s.svc.%s:%d" $fullname $i $fullname $releaseNamespace $clusterDomain $portNumber) }}
    {{- $mongoListTLS = append $mongoListTLS (printf "%s-%d.%s:%d" $fullname $i "mongodb.example.com" $portNumber) }} #### <---------- New line added to construct connection string to tls addresses
    {{- end }}

    CONNECTION_STRING_OLD="\"{{ join "," $mongoList }}\""
    CONNECTION_STRING_NEW="\"{{ join "," $mongoListTLS }}\""

    CONNECTION_STRING_NEW="$CONNECTION_STRING_NEW --tls --tlsCertificateKeyFile=/certs/mongodb.pem --tlsCAFile=/certs/mongodb-ca-cert"

    info "Old Connection String: $CONNECTION_STRING_OLD"
    info "New Connection String: $CONNECTION_STRING_NEW"

    for sourceFile in "/scripts/setup.sh" "/scripts/setup-hidden.sh"; do
      if [[ -f "$sourceFile" ]]; then
        destFile="/alteredScripts/$(basename -- "$sourceFile")"
        info "Amending $sourceFile and moving to $destFile..."

        ESCAPED_CONNECTION_STRING_OLD=$(printf '%s\n' "$CONNECTION_STRING_OLD" | sed -e 's/[]\/$*.^[]/\\&/g');
        ESCAPED_CONNECTION_STRING_NEW=$(printf '%s\n' "$CONNECTION_STRING_NEW" | sed -e 's/[\/&]/\\&/g')

        sed "s/$ESCAPED_CONNECTION_STRING_OLD/$ESCAPED_CONNECTION_STRING_NEW/g" $sourceFile > $destFile

        chmod 0755 "$destFile"
      fi
    done

We can now run /alteredScripts/setup.sh or /alteredScripts/setup-hidden.sh as required.

We have alsorequired a wait to ensure the hostnames we want to reach are ready.

    for mongodb_host in "${MONGODB_INITIAL_PRIMARY_HOST}" "${MONGODB_ADVERTISED_HOSTNAME}"; do
      info "Testing host: $mongodb_host..."
      for i in {1..180}; do
        if getent ahosts "$mongodb_host"; then
          break
        fi
        sleep 1
        debug "Waiting for $mongodb_host to be ready... $i"
        if [ "$i" == "180" ]; then
          warn "Unable to connect to host: $mongodb_host"
          exit 1
        fi
      done
    done

This waits for 3 minutes, if It still fails the pod stops and restarts.
We have increased the startupProdeDelay by 2 minutes to avoid a CrashLoopBackoff from slowing down the startup.
(Perhaps a better balance can be found between the wait in the pod and the risk of triggering a long crashLoopBackoff?)

BONUS FEATURE - Attach to existing replicaSet

In our prep script we have also added an option to completely overwrite the connection string for current_primary using a hardcoded string from the values.

    {{- if .Values.global.existingReplicaSetConnectionString }}
    CONNECTION_STRING_NEW="{{ .Values.global.existingReplicaSetConnectionString }}\""
    {{- end }}

Ensuring that the replicaset name, key and connection string match an existing replicaset it will deploy all nodes as secondaries for that replica set.

We are using this for migration purposes.
Once the data has been synced we can move the primary to the cluster, update our connection strings, remove the additional value and destroy the old nodes.

Suggested Chart Changes

To implement similar in the chart some / all of these changes could be made:

  • Provide a method to manually set MONGODB_ADVERTISED_HOSTNAME even when autoDiscovery is disabled
    • OR
    • provide a way to customise autoDiscovery so that it can take/output our custom domain. This initContainer could also verify the DNS address is reachable.
  • Provide a method to change the structure of the hosts generated in the host list for populating current_primary. This way we can guarantee a TLS connection.
  • Bonus: Provide a way to override the host list when populating current_primary. This will enable deploying all nodes as part of an existing replica set.

The environment variable override is working well, some way of tieing these changes together and avoiding bad combinations would be good to find.

TLS Connection over 127.0.0.1

Description

The libmongodb.sh script runs commands against localhost (127.0.0.1) during initialization. These operations createall the users and importantly configure the connection to a replicaset.

Issues

All these commands fail through TLS unless 127.0.0.1 is part of the certificate. (Uncommon for external certificates)

Workaround

During setup the mongod server is bound to the localhost IP.

There are two options with this work around:

  1. Provide a way of connecting to 127.0.0.1 using a domain in the certificate
  2. Use the external domain and alter the mongodb.conf to bind all ips before the first time mongod is started allowing connections from outside the pod.

We preferred option 1.

Workaround 1

First add a localhost.mongodb.example.com domain to our certificate and route that domain to 127.0.0.1

hostAliases:
- ip: 127.0.0.1
  hostnames:
  - "localhost.mongodb.example.com"

We add another step to the prepatory script described above to alters the libmongodb.sh file.

sed -i "s/127.0.0.1/localhost.mongodb.example.com/g" /opt/bitnami/scripts/libmongodb.sh

This file can be altered in place and then you can continue as normal!

NOTE: I have been unable to find a way to add a pod specific domain to the hosts file. If this is possible we could remove the need for a localhost.* domain.

Workaround 2:

Similar to the workaround 1 we replace the 127.0.0.1 in libmongo, this time with the advertised hostname.

Since this will resolve to an external IP we also need to enable bindAll Ips before mongod start.

Something like:

sed -i "s/127.0.0.1/${MONGODB_ADVERTISED_HOSTNAME}/g" /opt/bitnami/scripts/libmongodb.sh


# Load libraries
. /opt/bitnami/scripts/libmongodb.sh
# Load environment
. /opt/bitnami/scripts/mongodb-env.sh
mongodb_set_listen_all_conf "$MONGODB_CONF_FILE"

There is a risk that a bad actor will come along and create a root user before you can since the server is now exposed, but it is minimal.

Suggested Chart Changes

The fix for this will require a change to the scripts in the container.

  • Use an environment variable to be used in place of 127.0.0.1 in the libmongo.sh script. (MONGODB_LOCALHOST_NAME?)
  • The chart changes to provide this environment variable.

It could also be done to ensure that this host only resolves to 127.0.0.1, or use the chart to set it as a default hostalias.

Probes don't work

Description

The probes connect with the TLS options, but do not specify a hostname which defaults to 127.0.0.1.

Workaround

This issue has forced me to provide custom probes which are only altered to provide the hostname of MONGODB_ADVERTISED_HOSTNAME

We could also use MONGODB_LOCALHOST_NAME / localhost.mongodb.example.com if it is available through the TLS certificate and configured in the hostAliases.

This has the benefit of routing traffic internally, so it is directly checking on the pod health and not implicitly checking network connectivity.

Metrics Cant Connect

Description

The metrics sidecar by default uses the localhost as the host in the connection string.

The metrics sidecar is unable to connect to mongod as it fails the tls check

Workaround

I have manually defined the args for the side car to use localhost.mongodb.example.com.

Suggested Chart Changes

Having access to MONGODB_ADVERTISED_HOSTNAME, /shared/info.txt or MONGODB_LOCALHOST_NAME would make a more DRY approach to correcting this connection string.

(There are other issues with metrics, but that is outside the scope of this issue)

@dtrts dtrts added the tech-issues The user has a technical issue about an application label May 17, 2023
@github-actions github-actions bot added the triage Triage is needed label May 17, 2023
@dtrts
Copy link
Contributor Author

dtrts commented May 17, 2023

My apologies for the rambling length of the issue. I will attempt to edit it down if I have time.

@migruiz4
Copy link
Member

Hi @dtrts,

Thank you very much for reporting this issue and for all the details you provided in the issue description.

There is no need to edit and reduce the length of the issue, I think all the information and examples you provided are helpful.

I see you put a lot of thought into the issues you encountered in your use case and that you already got to the technical part of the issue. Would you like to contribute with a PR?

Otherwise, I will need to move this issue to on-hold and create an internal task, which may take a bit until we can fit it into our priorities.

@dtrts
Copy link
Contributor Author

dtrts commented May 18, 2023

Hi @migruiz4

I have made one PR to enable the configuration of the localhost name during set up: bitnami/containers#34297

I will look at creating a PR for the chart depending on how that PR is received.

Will there be any issues with the chart requiring a change in the scripts embedded on the image?

@dtrts
Copy link
Contributor Author

dtrts commented May 20, 2023

I've started to look at a PR for these issues and I'm not sure where to start.

I notice in the Redis chart a similar issue has been addressed: #8570

This specifically implements options for handling ExternalDNS. This aligns with my current issue and if implemented would give me a great way to solve the above. I can supply the domain suffix to the external DNS options, it would be used to generate the FQDNs in the INITIAL_PRIMARY_HOST and ADVERTISED_HOSTNAME environment variables. We could even pull out the FQDN from the service annotation during the autoDiscovery init container and add in a wait for the domain to respond to a getent command.

This approach is specifically geared toward the usage of external DNS, which even though it is a popular option does limit more custom setups.

If a user specifies a LoadBalancer service with IPs, or a ClusterIP service and they manage the DNS routing themselves, then they would still run into issues with the TLS certificates I imagine.

The other approach is to just provide a list of external domains / an external domain which can get formatted.

I think it is also worth noting the hidden node only registers with the initial primary host. If the -0 the pod is down then it won't join the replica set. I will also look into adding all the replicaset hosts + hidden hosts to the current_primary search.

@dtrts
Copy link
Contributor Author

dtrts commented May 20, 2023

Another thing I don't fully understand is the headless service. Would that ever be used as a way to connect to the pods through a custom DNS address?

If we are providing an external access service, and ensuring that all connections are made through that connection string, are we able to disable the headless service?

@github-actions
Copy link

github-actions bot commented Jun 5, 2023

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Jun 5, 2023
@dtrts
Copy link
Contributor Author

dtrts commented Jun 5, 2023

Bump.

I've been unable to find the time to contribute, would it be possible to create an internal task so that these issues don't get lost?

We currently have production workarounds in place, but it ould be difficult to translate them to permanent changes I think.

@github-actions github-actions bot removed the stale 15 days without activity label Jun 6, 2023
@github-actions
Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Jun 22, 2023
@github-actions
Copy link

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

@bitnami-bot bitnami-bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023
@rafariossaa
Copy link
Contributor

Hi,
There is an internal task to review this.
Meanwhile feel free to send a PR, we will be glad to review and merge it.

@rafariossaa rafariossaa reopened this Jul 6, 2023
@rafariossaa rafariossaa added on-hold Issues or Pull Requests with this label will never be considered stale and removed solved labels Jul 6, 2023
@github-actions github-actions bot added triage Triage is needed and removed on-hold Issues or Pull Requests with this label will never be considered stale labels Jul 6, 2023
@carrodher carrodher added on-hold Issues or Pull Requests with this label will never be considered stale and removed stale 15 days without activity triage Triage is needed labels Jul 6, 2023
@rrileyca
Copy link
Contributor

rrileyca commented Apr 4, 2024

We have this issue - related: #16341 (comment)

What we do is:

  1. For the initial install, we run mongo without TLS configured.
  2. Connect to Mongo manually and add the FQDN's in the ReplicaSet config using mongosh:
cfg=rs.conf()
cfg.members[0].host = "mongodb-0.my.domain.com:27017"
cfg.members[1].host = "mongodb-1.my.domain.com:27017"
cfg.members[2].host = "mongodb-2.my.domain.com:27017"
rs.reconfig(cfg)
  1. Add the coreDNS translation as described here.
  2. Finally, redeploy the chart with the TLS values set and everything works. Our certs are signed by Let's Encrypt and trusted implicitly.

This process only has run on the initial install.

It's really not ideal but it was the best solution I could come up with without sinking a tonne of dev time into the chart. Hopefully someone else has time to fix this.

@fmulero
Copy link
Collaborator

fmulero commented Apr 25, 2024

Hi @dtrts and @rrileyca

I've just created a PR #25397 to solve these problems. It is still a draft but I hope to close it soon. Your thoughts and comments are welcome

@github-actions github-actions bot added solved and removed on-hold Issues or Pull Requests with this label will never be considered stale labels May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mongodb solved tech-issues The user has a technical issue about an application
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants