Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google chat notifications are not received when bulk #83

Open
diegocejasprieto opened this issue Nov 8, 2024 · 5 comments
Open

Google chat notifications are not received when bulk #83

diegocejasprieto opened this issue Nov 8, 2024 · 5 comments

Comments

@diegocejasprieto
Copy link

diegocejasprieto commented Nov 8, 2024

When firing multiples alerts at the same time for the same endpoint, only the first one is received in google chat. The rest of them seems to be discared
The pods shows errors like "error sending message","error":"non ok response from gchat" which does not tell you what's exactly going on.

The app should include a mechanism for retrying sending alerts when this happens or fire them after a few seconds between each other to avoid these kind of errors

I'm using image ghcr.io/mr-karan/calert:v2.1.1, deployed by helm chart

{"time":"2024-11-08T15:23:33.800146915Z","level":"INFO","source":{"function":"github.com/mr-karan/calert/internal/notifier.(*Notifier).Dispatch","file":"/home/runner/work/calert/calert/internal/notifier/notifier.go","line":41},"msg":"dispatching alerts","count":2}
{"time":"2024-11-08T15:23:33.800173642Z","level":"INFO","source":{"function":"github.com/mr-karan/calert/internal/providers/google_chat.(*GoogleChatManager).Push","file":"/home/runner/work/calert/calert/internal/providers/google_chat/google_chat.go","line":109},"msg":"dispatching alerts to google chat","count":2}
{"time":"2024-11-08T15:23:33.800273706Z","level":"INFO","source":{"function":"github.com/mr-karan/calert/internal/notifier.(*Notifier).Dispatch","file":"/home/runner/work/calert/calert/internal/notifier/notifier.go","line":41},"msg":"dispatching alerts","count":1}
{"time":"2024-11-08T15:23:33.800346855Z","level":"INFO","source":{"function":"github.com/mr-karan/calert/internal/providers/google_chat.(*GoogleChatManager).Push","file":"/home/runner/work/calert/calert/internal/providers/google_chat/google_chat.go","line":109},"msg":"dispatching alerts to google chat","count":1}
{"time":"2024-11-08T15:23:33.877335705Z","level":"ERROR","source":{"function":"github.com/mr-karan/calert/internal/providers/google_chat.(*GoogleChatManager).Push","file":"/home/runner/work/calert/calert/internal/providers/google_chat/google_chat.go","line":140},"msg":"error sending message","error":"non ok response from gchat"}
{"time":"2024-11-08T15:23:33.957092268Z","level":"ERROR","source":{"function":"github.com/mr-karan/calert/internal/providers/google_chat.(*GoogleChatManager).Push","file":"/home/runner/work/calert/calert/internal/providers/google_chat/google_chat.go","line":140},"msg":"error sending message","error":"non ok response from gchat"
@diegocejasprieto
Copy link
Author

diegocejasprieto commented Nov 11, 2024

Additional info: I've put the pod in debug mode and got the following:

{"time":"2024-11-11T12:46:37.226655855Z","level":"DEBUG","source":{"function":"github.com/mr-karan/calert/internal/providers/google_chat.(*GoogleChatManager).sendMessage","file":"/home/runner/work/calert/calert/internal/providers/google_chat/message.go","line":110},"msg":"Non OK HTTP Response received from Google Chat Webhook endpoint","status":429,"responseBody":"{\n  \"error\": {\n    \"code\": 429,\n    \"message\": \"Resource has been exhausted (e.g. check quota).\",\n    \"status\": \"RESOURCE_EXHAUSTED\"\n  }\n}\n"}

I did some research and it seems that the Google API cannot handle multiple messages fired up at the same time (I didn't hit any other "quota" like amount of messages per hour or something).
That's why Calert should have a mechanism to retry those alerts instead of being discarded

@e100
Copy link

e100 commented Nov 22, 2024

I have confirmed that it is related to how quickly messages are sent to the Google API.
I tried a delay of 50ms between each message, still had the problem so I tried 500ms and that seems to work.

diff --git a/internal/providers/google_chat/google_chat.go b/internal/providers/google_chat/google_chat.go
index 953c84d..df07d37 100644
--- a/internal/providers/google_chat/google_chat.go
+++ b/internal/providers/google_chat/google_chat.go
@@ -143,6 +143,7 @@ func (m *GoogleChatManager) Push(alerts []alertmgrtmpl.Alert) error {
                        }
 
                        m.metrics.Duration(fmt.Sprintf(`alerts_dispatched_duration_seconds{provider="%s", room="%s"}`, m.ID(), m.Room()), now)
+            time.Sleep(500 * time.Millisecond)
                }
        }

Google recommends to retry a few times when this happens with a longer delay between each try:
https://developers.google.com/workspace/chat/limits#resolve_time-based_quota_errors

@diegocejasprieto
Copy link
Author

diegocejasprieto commented Nov 22, 2024 via email

@allexf
Copy link

allexf commented Dec 5, 2024

Same problem. And adding
time.Sleep(500 * time.Millisecond)
as shown in the post above, didn't help. Apparently a repeat mechanism is needed

@allexf
Copy link

allexf commented Dec 6, 2024

I changed the sendMessage procedure in internal/providers/google_chat/message.go a bit to resend if an error is received. It doesn't look pretty, but it works for me.

func (m *GoogleChatManager) sendMessage(msg ChatMessage, threadKey string) error {
	out, err := json.Marshal(msg)
	if err != nil {
		return err
	}

	// Parse the webhook URL to add `?threadKey` param.
	u, err := url.Parse(m.endpoint)
	if err != nil {
		return err
	}
	q := u.Query()
	// Default behaviour is to start a new thread for every alert.
	q.Set("messageReplyOption", "MESSAGE_REPLY_OPTION_UNSPECIFIED")
	if m.threadedReplies {
		// If threaded replies are enabled, use the threadKey to reply to the same thread.
		q.Set("messageReplyOption", "REPLY_MESSAGE_FALLBACK_TO_NEW_THREAD")
		q.Set("threadKey", threadKey)
	}
	u.RawQuery = q.Encode()
	endpoint := u.String()

	i := 1
	for next := true; next; next = i < 5 {

	    // Prepare the request.
	    req, err := http.NewRequest("POST", endpoint, bytes.NewBuffer(out))
	    if err != nil {
		return err
	    }
	    req.Header.Set("Content-Type", "application/json")

	    // Send the request.
	    m.lo.Debug("sending alert", "url", endpoint, "msg", msg.Text)
	    resp, err := m.client.Do(req)
	    if err != nil {
		return err
	    }
	    defer resp.Body.Close()
	
	    if resp.StatusCode != http.StatusOK {
		time.Sleep(500 * time.Millisecond)
		fmt.Println("WARNING! HTTP Response Status:", resp.StatusCode, http.StatusText(resp.StatusCode),". Step:", i)
		i++
		if i == 4 {
		    // If response is non 200, log and throw the error.
		    // Read the response body
		    bodyBytes, err := io.ReadAll(resp.Body)
		    if err != nil {
			// Log the error if unable to read the response body
			m.lo.Error("Failed to read response body", "error", err)
			return fmt.Errorf("failed to read response body")
		    }
		    // Ensure the original response body is closed
		    defer resp.Body.Close()

		    // Convert the body bytes to a string for logging
		    responseBody := string(bodyBytes)

		    // Log the status code and response body at the debug level
		    m.lo.Debug("Non OK HTTP Response received from Google Chat Webhook endpoint", "status", resp.StatusCode, "responseBody", responseBody)

		    // Since the body has been read, if you need to use it later,
		    // you may need to reassign resp.Body with a new reader
		    resp.Body = io.NopCloser(bytes.NewBuffer(bodyBytes))

		    return fmt.Errorf("non ok response from gchat")
		}

	    } else {
		break
	    }
	}

	return nil
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants