Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/mongodb] Mongodb Replicaset existingSecrets doesn't work with cert-manager #16341

Open
rrileyca opened this issue May 2, 2023 · 14 comments
Assignees
Labels
mongodb on-hold Issues or Pull Requests with this label will never be considered stale tech-issues The user has a technical issue about an application

Comments

@rrileyca
Copy link
Contributor

rrileyca commented May 2, 2023

Name and Version

bitnami/mongodb 13.9.4

What architecture are you using?

amd64

What steps will reproduce the bug?

When setting values.yaml with the following values:

tls:
  enabled: true
  autoGenerated: false
  replicaset:
    existingSecrets:
      - mongodb-tls-0
      - mongodb-tls-1
      - mongodb-tls-2

Each secret must have a field ca.crt, as documented in the values.yaml comment. However, the Jetstack cert-manager (arguably a very common if not "standard" kubernetes application) does not produce such a field. Without the ability to inject the field manually in the helm chart, there is no integration capability out of the box.

Are you using any custom parameters or values?

values:

tls:
  enabled: true
  autoGenerated: false
  replicaset:
    existingSecrets:
      - mongodb-tls-0
      - mongodb-tls-1
      - mongodb-tls-2

What is the expected behavior?

Either the helm chart can deploy without a ca.crt, or it would provide a way to inject one.

What do you see instead?

The Pods fail to Init.

Running the command:

kubectl logs mongodb-0 -n mongodb -c generate-tls-certs 

I see the output:

cp: cannot stat '/certs-0/ca.crt': No such file or directory
chmod: cannot access '/certs/mongodb-ca-cert': No such file or directory

Additional information

No response

@rrileyca rrileyca added the tech-issues The user has a technical issue about an application label May 2, 2023
@github-actions github-actions bot added the triage Triage is needed label May 2, 2023
@rrileyca
Copy link
Contributor Author

rrileyca commented May 2, 2023

I actually did something similar to the Kafka chart, see here: #9422

@javsalgar javsalgar changed the title Mongodb Replicaset existingSecrets doesn't work with cert-manager [bitnami/mongodb] Mongodb Replicaset existingSecrets doesn't work with cert-manager May 3, 2023
@github-actions github-actions bot added in-progress and removed triage Triage is needed labels May 3, 2023
@bitnami-bot bitnami-bot assigned jotamartos and unassigned javsalgar May 3, 2023
@jotamartos
Copy link
Contributor

Hi @rrileyca,

Thank you for taking the time to report this issue. Would you like to contribute again by creating a PR to solve the issue? As you are familiar with the solution and the repo, you will be able to make the necessary changes to make this work. The Bitnami team will be happy to review it and provide feedback. As always, you can find here the contributing guidelines.

Thanks

@rrileyca
Copy link
Contributor Author

rrileyca commented May 8, 2023

Hi @jotamartos,

I'm actually kind of confused as to how this chart would support any PKI certificates in ReplicaSet architecture. There are several problems:

  1. A given MongoDB host requires that the rs.conf().memebers array contain a host name that is resolvable to itself ("itself" means on of the IP addresses on the host). In the case of a Kubernetes Pod, that means the value in the ReplicaSet configuration needs to be in the /etc/hosts file. I don't know where this occurs - but currently only the headless svc name is added to the hosts file. I don't see a way around this, unless there is a way to resolve a k8s fabric IP address for the container somehow.
  2. The headless Service can't be used with PKI. I define (at least for this explanation) PKI to be public key infrastructure, meaning a Certificate Authority accessible on the Internet. The domain svc.cluster.local will never be signed by a PKI CA as you cannot prove ownership of that domain.

A potential solution here might be to add the FQDN when .values.externalAccess is set into the hosts file, but it's not clear to me where this should occur. Where is the headless service added to the hosts file?

What I have described above is a separate-but-related issue. If Bitnami can give me direction on the above comments, I will put in a separate issue and PR to fix that.

@dtrts
Copy link
Contributor

dtrts commented May 18, 2023

Hi @rrileyca - It looks like we have been on a very similar journey.

Your points about how the domains through the chart do not integrate well with a TLS certificate provided by the cert-manager has also been causing us trouble.

I have described my approaches to some of these issues here: #16719
I have a PR open to help ease the initialization of the databases here: bitnami/containers#34297

As you've said we have been adding an FQDN to the hosts file using the hostsAlias: value.
Unfortunately I haven't yet found a way to add the node specific domain to the hosts file through the chart, so have resorted to using a localhost.external-domain-name.com as the hostAlias and ensuring our certificate words for that address.


I still believe taking the same approach as the kafka chart would be super valuable.

I am also working with a certificate which is missing the ca.crt attribute. I've resorted to setting tls.enabled: false and then reimplementing TLS through additional values and scripts, all to get around the fact that the init container will fail.

@dtrts
Copy link
Contributor

dtrts commented May 18, 2023

I have created a PR to match the kafka changes: #16731

I was wondering if we should also consider an option to use /etc/ssl/certs/ca-certificates.crt as the CA file.
It was a valid workaround for our issues since it worked for our cert issuer
(NOTE: It does not work for letsencrypt staging environments since that root is not distributed as widely)

I have a few more experiments to run but will be running out of time this week, let me know if this provides any progress.

@jotamartos
Copy link
Contributor

Thank you for taking the time to create this PR. I just saw that the team already reviewed and merged it. I hope that solves the issues you found.

@rrileyca
Copy link
Contributor Author

I think it solves this problem, thanks. Special thanks to @dtrts.

I wonder what solution you have for the FQDN problem. No public ACME will sign a certificate with 127.0.0.1 or any hostname ending in .cluster.local.

For me, I manually edited the replica set config to make the hostnames the FQDN, and then I configured CoreDNS to rewrite the FQDN to the svc.cluster.local name so that the pods would resolve their own fabric IP and believe they are in the replica set. I couldn't quite think of a way to solve it entirely within the chart.

I may file a separate issue but wanted to check with you first @dtrts.

@dtrts
Copy link
Contributor

dtrts commented May 22, 2023

@rrileyca All thanks to your Kafka fix!

I'm not sure I follow your question completely, but here are the other fixes I ran:

Configuration:

  • Added this to my values:
  hostAliases:
  - ip: 127.0.0.1
    hostnames:
    - "localhost.my-custom-domain.com"
  • Required the ability to add this domain to your certificate.

Pre setup.sh script

Before i let the setup script run I do a bunch of changes.

Replace hostname in libmongodb.sh

Running this command:

sed -i "s/127.0.0.1/localhost.my-custom-domain.com/g" /opt/bitnami/scripts/libmongodb.sh

It alters lines like these

This fixes the initilization issue. We're still connecting to 127.0.0.1 due to the hostAlias addition, but we're also coming through our domain name. This passes the tls check without having to bind the MongoDB server to all IPs before hand.

(If you can't create a certificate like this then there is the more insecure option of altering the config file to bind to all IPs before hand. You can use your external domain name, but the traffic will go through "the internet"

# Load libraries
. /opt/bitnami/scripts/libmongodb.sh
# Load environment
. /opt/bitnami/scripts/mongodb-env.sh
mongodb_set_listen_all_conf

sed -i "s/127.0.0.1/${MY_POD_NAME}.my-custom-domain.com/g" /opt/bitnami/scripts/libmongodb.sh

Amending Advertised Hostname

Assuming you have autoDiscovery enabled then you will be grabbing the loadbalancer address, or otherwise using the svc.cluster.local address.

The MONGODB_ADVERTISED_HOSTNAME is initially populated through the chart. In my pre-setup script I export my FQDN of the first pod.

If autoDiscovery is enabled then you can just overwrite the /shared/info.txt file it generates:

echo -n "${MY_POD_NAME}.my-custom-domain.com" > /shared/info.txt

Current Primary

This one required the most effort.

When starting a new node it will try to find the existing primary to register with. This helps if your initial primary node has lost all the data and should start a secondary.
https://github.com/bitnami/charts/blob/a907ce5ee58e6cce0728cd4309073b12bd9e3054/bitnami/mongodb/templates/replicaset/scripts-configmap.yaml#LL119C10-L119C10

The hostnames it uses are the domains for the headless service. This will time out due to tls and then it will default to the MONGODB_INITIAL_PRIMARY_HOST as the current primary, but that will also not work due to tls.

The easy option for this is to export your own MONGODB_INITIAL_PRIMARY_HOST before the set up script. You have to let the original command time out but that's not too bad.

The hard option is to find and replace the connection string with one you generate yourself using the domains you want.

The benefit here is you can put in the connection string for an existing replica set. This allows you to deploy all nodes as a secondary to a completely different cluster.


I think the MONGODB_ADVERTISED_HOSTNAME is the most vital for you as it is used to add nodes to a replica set. That way once you look at rs.conf() it will be constructed of your custom domain only.

I think that's it. But I've already covered this in more detail here:

#16719

@rrileyca
Copy link
Contributor Author

@dtrts That's an interesting solution. I wouldn't say it's "proper" to add localhost.yourdomain.com to a certificate, but it is functional and clever. Hopefully your PR that allows the $HOSTNAME of a pod to be configured in the chart goes through.

My solution was to edit the rs.conf() (Replica Set configuration) and set all the host fields to be the FQDN. This presents a problem, though, as the MongoDB internal isSelf() command essentially does a DNS lookup and compares the resolved IP address to the listening IP addresses (in this case, 127.0.0.1 and the Pod's fabric IP). Since Pod fabric IPs change almost every time a Pod is re-created, you cannot add a static DNS entry for this.

My solution was to create a CoreDNS rewrite rule, but this is platform-dependent as your Kubernetes cluster will require CoreDNS. My understanding is this is fairly common, but not standard. In any case, the rewrite rule allows you to intercept a DNS request and edit the name being looked up. The rules looks like this:

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  my.server: |
    mongodb-0.mydomain.com:53 {
      log
      errors
      rewrite stop {
        name mongodb-0.mydomain.com. mongodb-0.mongodb-headless.mongodb.svc.cluster.local
      }
      forward . 127.0.0.1
    }
    mongodb-1.mydomain.com:53 {
      log
      errors
      rewrite stop {
        name mongodb-1.mydomain.com. mongodb-1.mongodb-headless.mongodb.svc.cluster.local
      }
      forward . 127.0.0.1
    }
    mongodb-2.mydomain.com:53 {
      log
      errors
      rewrite stop {
        name mongodb-2.mydomain.com. mongodb-2.mongodb-headless.mongodb.svc.cluster.local
      }
      forward . 127.0.0.1
    }

So as a request for mongodb-0.mydomain.com comes in, it is re-written to mongodb-0.mongodb-headless.mongodb.svc.cluster.local and then passed along to the CoreDNS resolver, which then returns the Pod's fabric IP to the client looking up the name. This allows TLS to function properly, and for inter-cluster traffic to avoid using the LoadBalancer.

@github-actions
Copy link

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Jun 11, 2023
@rrileyca
Copy link
Contributor Author

Bump

@github-actions github-actions bot removed the stale 15 days without activity label Jun 12, 2023
@jotamartos
Copy link
Contributor

Hi @rrileyca,

I understand you want to keep this ticket open to continue discussing about the issue you are running into with @dtrts, correct? That's not a problem at all, just confirming.

As you know, if your use case doesn't work as expected when using our solution, you can always contribute to improve it.

Thanks

@rrileyca
Copy link
Contributor Author

Hello @jotamartos,

@dtrts and I have proposed some workarounds, but the chart is still broken. If I don't bump the thread then the issue gets closed by your bot, and then the chart remains broken. I can appreciate why you'd close stale issues in scenarios where an issue is unclear or the reporter stops answering questions required to proceed, but that isn't the case here. The chart is still affected by this issue.

I don't have time to contribute at the moment. How does Bitnami file bugs like this? I think it is worth fixing. I doubt we will be the last users to try to use cert-manager with the MongoDB helm chart. Does Bitnami not contribute to the Helm charts themselves anymore?

@jotamartos
Copy link
Contributor

jotamartos commented Jun 19, 2023

Hi @rrileyca,

Yes, we continue maintaining our solutions and updating them when an issue is reported. I'm going to create a task on our side to review the solution and the information you posted here, but please note that our bandwidth is limited and it can take some time for this to be fixed. That's why we always suggest contributing to improve the solution. This way, you will help the community and our team will review that you follow the best practices when applying the patch. If you do not have time to contribute, do not worry, as I mentioned above, I'll create a task to work on this in the future.

Note: I'll add the on-hold label to prevent the ticket being closed by the bot

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mongodb on-hold Issues or Pull Requests with this label will never be considered stale tech-issues The user has a technical issue about an application
Projects
None yet
Development

No branches or pull requests

4 participants