Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Exported Services in Federated Consul Cluster Using Peering #4322

Open
andrei-ziminov opened this issue Sep 14, 2024 · 1 comment
Labels
type/bug Something isn't working

Comments

@andrei-ziminov
Copy link

andrei-ziminov commented Sep 14, 2024

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

The Problem:

Hi Consul team,

I noticed some unexpected behavior in my federated Consul cluster setup. Here’s a breakdown of the current architecture:

I’ve followed the tutorial below to configure the federation:
https://developer.hashicorp.com/consul/docs/k8s/deployment-configurations/multi-cluster/kubernetes

Everything is working as expected so far — I can centrally manage ACLs and interact with clusters from the primary cluster.

I’m now trying to centralize logging by exporting specific services, like Loki (registered via Connect Inject in the service mesh), from my secondary clusters to my primary cluster. For this, I’ve tried using both the PeeringAcceptor and PeeringDialer approaches. Additionally, I established a Mesh CRD connection using the Consul CLI.

Both connections were established successfully. However, when I peer cluster-management [Primary] with application-dev [Secondary], the following error occurs.

```
2024-09-14T18:43:58.711Z [ERROR] agent.http: Request error: method=GET url=/v1/config/exported-services/default?dc=application-dev from=10.96.2.91:36496 error="Config entry not found for 'exported-services' / 'default'"
2024-09-14T18:43:58.711Z [ERROR] agent.http: Request error: method=PUT url=/v1/config?dc=application-dev from=10.96.2.91:36496 error="exported-services writes must not target secondary datacenters."
```

What I Tried:

I reviewed the following documentation and followed all the steps carefully:

The Question:

Why does Consul throw the error: "exported-services writes must not target secondary datacenters"? Is there a limitation when exporting services between primary and secondary datacenters in a federated setup? Or is there something else I’m missing?

I appreciate any guidance or clarification on this issue!

Reproduction Steps

  1. Peering is successfully established between cluster-management [Primary] and application-dev [Secondary].

  2. In application-dev [Secondary], I created the following resource to export the loki-write service to the primary cluster:

    apiVersion: consul.hashicorp.com/v1alpha1
    kind: ExportedServices
    metadata:
      name: default
      namespace: cluster-system
    spec:
      services:
        - name: loki-write
          consumers:
            - peer: cluster-management
  3. After creating this resource, Consul throws the following error in application-dev [Secondary]:

    2024-09-14T18:43:58.711Z [ERROR] agent.http: Request error: method=GET url=/v1/config/exported-services/default?dc=application-dev from=10.96.2.91:36496 error="Config entry not found for 'exported-services' / 'default'"
    2024-09-14T18:43:58.711Z [ERROR] agent.http: Request error: method=PUT url=/v1/config?dc=application-dev from=10.96.2.91:36496 error="exported-services writes must not target secondary datacenters."
    

Logs

```
2024-09-14T18:43:58.711Z [ERROR] agent.http: Request error: method=GET url=/v1/config/exported-services/default?dc=application-dev from=10.96.2.91:36496 error="Config entry not found for 'exported-services' / 'default'"
2024-09-14T18:43:58.711Z [ERROR] agent.http: Request error: method=PUT url=/v1/config?dc=application-dev from=10.96.2.91:36496 error="exported-services writes must not target secondary datacenters."
```

Expected behavior

Environment details

Consul Helm chart version on all clusters: 1.5.3
Kubernetes version on all clusters: 1.29.6

Cluster Setup:

  • cluster-management [Primary]
  • application-dev [Secondary]
  • application-qa [Secondary]
  • application-prod [Secondary]

Configuration:

  • Primary cluster:
  - name: consul
    namespace: {{ .Values.namespace }}
    chart: hashicorp/consul
    version: 1.5.3
    values:
      - global:
          name: consul
          datacenter: {{ .Values.consul.global.datacenter }}
          peering:
            enabled: true
          tls:
            enabled: true
            enableAutoEncrypt: true
            verify: true
            serverAdditionalDNSSANs:
              - "consul-server.{{ .Values.namespace }}.svc.cluster.local"

          federation:
            enabled: true
            createFederationSecret: true
            
          acls:
            enabled: true
            manageSystemACLs: true
            createReplicationToken: true
            nodeSelector: |
              {{ .Values.consul.global.nodeSelector }}
            
          gossipEncryption:
            autoGenerate: true
            
          metrics:
            enabled: true
            enableAgentMetrics: true
            agentMetricsRetentionTime: "1m"
            
          storage:
            size: {{ .Values.consul.storage.size }}
            
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}

      - server:
          replicas: {{ .Values.consul.replicas }}
          bootstrapExpect: {{ .Values.consul.bootstrapExpect }}
          disruptionBudget:
            maxUnavailable: 0
          securityContext:
            runAsNonRoot: false
            runAsUser: 0
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}

      - ui:
          enabled: true
          service:
            enabled: true
            type: "ClusterIP"
          metrics:
            enabled: true
            provider: "prometheus"
            baseURL: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local

      - controller:
          enabled: true

      - prometheus:
          enabled: false

      - grafana:
          enabled: false

      - client:
          enabled: true
          grpc: true

      - connectInject:
          enabled: true
          default: false
          metrics:
            defaultEnabled: true
          transparentProxy:
            defaultEnabled: true
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}

      - syncCatalog:
          enabled: true
          default: false
          toConsul: true
          toK8S: false
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}
            
      - meshGateway:
          enabled: true
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}
      - webhookCertManager:
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}
  • Secondary clusters:
- name: consul
  namespace: {{ .Values.namespace }}
  chart: hashicorp/consul
  version: 1.5.3
  values:
    - global:
        name: consul
        datacenter: {{ .Values.consul.global.datacenter }}
        peering:
          enabled: true
        tls:
          enabled: true
          enableAutoEncrypt: true
          verify: true
          caCert:
            secretName: consul-federation
            secretKey: caCert
          caKey:
            secretName: consul-federation
            secretKey: caKey
          serverAdditionalDNSSANs:
            - "consul-server.{{ .Values.namespace }}.svc.cluster.local"
        
        federation:
          enabled: {{ .Values.consul.global.federation.enabled }}
          k8sAuthMethodHost: {{ .Values.consul.global.federation.k8sAuthMethodHost }}
          primaryDatacenter: {{ .Values.consul.global.federation.primaryDatacenter }} # cluster-management
        
        acls:
          manageSystemACLs: true
          replicationToken:
            secretName: consul-federation
            secretKey: replicationToken
          nodeSelector: |
            {{ .Values.consul.global.nodeSelector }}

        gossipEncryption:
          secretName: consul-federation
          secretKey: gossipEncryptionKey
          
        metrics:
          enabled: true
          enableAgentMetrics: true
          agentMetricsRetentionTime: "1m"
        
        storage:
          size: {{ .Values.consul.storage.size }}
        
        nodeSelector: |
          {{ .Values.consul.global.nodeSelector }}

    - server:
        replicas: {{ .Values.consul.replicas }}
        bootstrapExpect: {{ .Values.consul.bootstrapExpect }}
        disruptionBudget:
          maxUnavailable: 0
        securityContext:
          runAsNonRoot: false
          runAsUser: 0
        nodeSelector: |
          {{ .Values.consul.global.nodeSelector }}
        extraVolumes:
          - type: secret
            name: consul-federation
            items:
              - key: serverConfigJSON
                path: config.json
            load: true
          
    - ui:
        enabled: true
        service:
          enabled: true
          type: "ClusterIP"
        metrics:
          enabled: true
          provider: "prometheus"
          baseURL: http://prometheus-prometheus.monitoring.svc.cluster.local
          
    - controller:
        enabled: true
        
    - prometheus:
        enabled: false
        
    - grafana:
        enabled: false
        
    - client:
        enabled: true
        grpc: true
    - connectInject:
        enabled: true
        default: false
        metrics:
          defaultEnabled: true
        transparentProxy:
          defaultEnabled: true
        nodeSelector: |
          {{ .Values.consul.global.nodeSelector }}

    - syncCatalog:
        enabled: true
        default: false
        toConsul: true
        toK8S: false
        nodeSelector: |
          {{ .Values.consul.global.nodeSelector }}

    - meshGateway:
        enabled: true
        nodeSelector: |
          {{ .Values.consul.global.nodeSelector }}
    - webhookCertManager:
        nodeSelector: |
          {{ .Values.consul.global.nodeSelector }}

Features in use:

  • Mesh Gateways
  • ACLs with Federation
  • Gossip Encryption
@andrei-ziminov andrei-ziminov added the type/bug Something isn't working label Sep 14, 2024
@andrei-ziminov
Copy link
Author

The issue still persists. Even with the newest consul version. Could it be that you can not use federated datacenters with peering?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant