Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cortex 1.13.0 Ingester Readiness probe failed; context dead line exceeded (Client.Timeout exceeded while awaiting headers) #391

Closed
vk0125 opened this issue Aug 21, 2022 · 4 comments

Comments

@vk0125
Copy link

vk0125 commented Aug 21, 2022

Labels: bug

I have started Cortex with the basic setup such as nginx, distributor and Ingester Only. I am facing Readiness Probe failure issue consistently, no matter if I deploy it as deployment or StatefulSet.

Desired Result
All Instances are up/running end-to-end. Ingester should be able to write data in Storage Account.

Testing
Even though, I remove readiness for a while, it shows running but unlike distributor, Ingester don't show any connection with memberlist.

Below are the environment setup details:

  • Cloud provider: Azure
  • Cluster: AKS [Latest version]
  • Long term Storage : Azure Storage Account [Blob storage]
  • Helm chart Repo: cortex-helm-charts

values-override.yaml file is as below:

alertmanager:
  enabled: false
ruler:
  enabled: false
query_scheduler:
  enabled: false
querier:
  enabled: false
query_frontend:
  enabled: false
overrides_exporter:
  enabled: false
store_gateway:
  enabled: false
compactor:
  enabled: false

config:
  auth_enabled: false
  storage:
    engine: blocks
  blocks_storage:
    backend: azure
    azure:
      account_name: "****"
      account_key: "****"
      container_name: "cortex"
      endpoint_suffix: "core.windows.net"
    tsdb:
      dir: /tmp/cortex/tsdb
    bucket_store:
      sync_dir: /tmp/cortex/tsdb-sync
  server:
    http_listen_port: 9009
    grpc_server_max_recv_msg_size: 104857600
    grpc_server_max_send_msg_size: 104857600
    grpc_server_max_concurrent_streams: 1000
  ingester:
    lifecycler:
      join_after: 0s
      min_ready_duration: 0s
      final_sleep: 0s
  ingester_client:
    grpc_client_config:
      max_recv_msg_size: 104857600
      max_send_msg_size: 104857600
      grpc_compression: gzip
  distributor:
    ring:
      kvstore:
        store: memberlist
nginx:
  service:
    type: LoadBalancer
    annotations:
      service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  autoscaling:
    enabled: true
    minReplicas: 1
    minreplicas: 2
  resources:
    limits:
      memory: 512M
      cpu: 0.5
    requests:
      memory: 256M
      cpu: 0.2

ingester:
  statefulSet:
    enabled: true
  resources:
    requests:
      memory: 2G
      cpu: 1
    limits:
      memory: 2G
      cpu: 1
      
distributor:
  resources:
    limits:
      memory: 512M
      cpu: 0.5
    requests:
      memory: 256M
      cpu: 0.2

Distributor Logs:

cortexuser@cortexvm:~/cortex-helm-chart-master$ kubectl logs cortex-distributor-7d64cc7447-g2zpz   -n cortex
level=info caller=main.go:193 msg="Starting Cortex" version="(version=1.13.0, branch=HEAD, revision=69
139ac)"    
level=info caller=server.go:260 http=[::]:9009 grpc=[::]:9095 msg="server listening on addresses"
level=info caller=memberlist_client.go:395 msg="Using memberlist cluster node name" name=cortex-distri
butor-7d64c
level=info caller=module_service.go:64 msg=initialising module=server
level=info caller=module_service.go:64 msg=initialising module=memberlist-kv
level=info caller=module_service.go:64 msg=initialising module=runtime-config
level=info caller=module_service.go:64 msg=initialising module=ring
level=info caller=ring.go:269 msg="ring doesn't exist in KV store yet"
level=info caller=module_service.go:64 msg=initialising module=distributor-service
level=info caller=cortex.go:436 msg="Cortex started"
level=info caller=memberlist_client.go:523 msg="joined memberlist cluster" reached_nodes=1

Ingester Logs:

level=info caller=main.go:193 msg="Starting Cortex" version="(version=1.13.0, branch=HEAD, revision=69
139ac)"
level=info caller=server.go:260 http=[::]:9009 grpc=[::]:9095 msg="server listening on addresses"

Ingester Describe pod details:

  Type          Reason          Age                From               Message
  ----            ------            ----                   ----               -------
Normal  Scheduled 27m                            default-scheduler  Successfully assigned cortex/cortex-ingester-0
Warning BackOff   26m (x10 over 27m)     kubelet    Back-off restarting failed container
Normal  Pulled      26m (x5 over 27m)      kubelet     Container image "quay.io/cortexproject/cortex:v1.13.0" already present on machine
Normal  Created   26m (x5 over 27m)      kubelet     Created container ingester
Normal  Started    26m (x5 over 27m)      kubelet     Started container ingester
Warning Unhealthy 2m40s (x151 over 24m)  kubelet        Readiness probe failed: Get "http://172.17.0.6:9009/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I tried some debugging found on google:

cortexuser@cortexvm:~/cortex-helm-chart-master$ curl https://172.17.0.6:9009/ready
curl: (7) Failed to connect to 172.17.0.6 port 9009: No route to host

Please guide if is there anything incorrect in my approach.

@nschad
Copy link
Collaborator

nschad commented Aug 22, 2022

I tried some debugging found on google:
cortexuser@cortexvm:~/cortex-helm-chart-master$ curl https://172.17.0.6:9009/ready
curl: (7) Failed to connect to 172.17.0.6 port 9009: No route to host

That seems like a Linux/CNI Issue. Especially that no route to host message. Maybe there is some bug in the helm-chart since you changed http_listen_port but I couldn't find anything. Can you try with the default port? (simply remove that line from your values)

@nschad
Copy link
Collaborator

nschad commented Aug 22, 2022

Also and probably more likely you might ran into this: #322

Try to set "publishNotReadyAddresses" to true

@vk0125
Copy link
Author

vk0125 commented Aug 22, 2022

@nschad thanks for reviewing the issue. I tried with default http_listen_port 8080 as well as added publishNotReadyAddress to cortex-memberlist (dynamic headless service which have Distributor and Ingester pods as endpoints) , But no progress.
Ingester showing same status.

I am using Ingester "memberlist" as default ring kv store [default set by cortex-helm-charts for ingester]

As expected I ran into issue #322
image

Distributor logs got updated as below:
caller=memberlist_logger.go:74 level=warn msg="Failed to resolve cortex-memberlist: lookup cortex-memberlist on 10.251.0.10:53: no such host"

MemberList status as shown below:
image

Distributor ring status as shown below:
image

Ingester ring status as shown below:
image

image

I have only concern with this output. Is the below config expected ??

image

@vk0125
Copy link
Author

vk0125 commented Aug 24, 2022

With debug mode On, It was found that the Ingester component is not able to connect with Azure Blob storage.
It was the endpoint_suffix under blocks_storage config causing the issue.
What I passed:
endpoint_suffix: "core.windows.net"
Whats expected:
endpoint_suffix: "blob.core.windows.net"

The correct config help Ingester to complete the connection with Azure and passed the readiness Probe test.

@vk0125 vk0125 closed this as completed Aug 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants