Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: the buff of eventloop.cache used cannot be released #659

Closed
3 tasks done
szza opened this issue Nov 14, 2024 · 22 comments · Fixed by #660
Closed
3 tasks done

[Bug]: the buff of eventloop.cache used cannot be released #659

szza opened this issue Nov 14, 2024 · 22 comments · Fixed by #660
Assignees
Labels
bug Something isn't working needs investigation needs more info waiting for response waiting for the response from commenter

Comments

@szza
Copy link

szza commented Nov 14, 2024

Actions I've taken before I'm here

  • I've thoroughly read the documentations on this issue but still have no clue.
  • I've searched the Github Issues but didn't find any duplicate issues that have been resolved.
  • I've searched the internet for this issue but didn't find anything helpful.

What happened?

the buff of eventloop.cache always occupied
1

Major version of gnet

v2

Specific version of gnet

github.com/panjf2000/gnet/v2 v2.5.2

Operating system

Linux

OS version

Linux 6.2.0-39-generic x86_64

Go version

1.21.6

Relevant log output

1

Code snippets (optional)

1

How to Reproduce

the buff of eventloop.cache always occupied after no data communication

Does this issue reproduce with the latest release?

It can reproduce with the latest release

@szza szza added the bug Something isn't working label Nov 14, 2024
@panjf2000
Copy link
Owner

The eventloop.cache is supposed to retain for the life of your gnet application, which is how this was designed.

Did it lead to any issues?

@panjf2000 panjf2000 added invalid This doesn't seem right waiting for response waiting for the response from commenter and removed bug Something isn't working labels Nov 14, 2024
@szza
Copy link
Author

szza commented Nov 14, 2024

The eventloop.cache is supposed to retain for the life of your gnet application, which is how this was designed.

Did it lead to any issues?
the issue is:
When the data traffic is very low, the memory previously occupied by eventloop.cache cannot be released and continues to occupy a large amount of memory.

@panjf2000
Copy link
Owner

There is one buffer for each event loop and the number of the event loops is typically equal to the number of CPU. Besides, the buffer size is not expected to be large because it only caches the bytes every time you read data from the connection.

When you said "a large amount of memory", how many bytes were you referring to? Any statistics you can show me?

@szza
Copy link
Author

szza commented Nov 14, 2024

There is one buffer for each event loop and the number of the event loops is typically equal to the number of CPU. Besides, the buffer size is not expected to be large because it only caches the bytes every time you read data from the connection.

When you said "a large amount of memory", how many bytes were you referring to? Any statistics you can show me?

71 % after data traffic decrease
3

@panjf2000
Copy link
Owner

This is uncanny, normally eventloop.cache shouldn't grow this big. Could you show me the code you write in OnTraffic?

@szza
Copy link
Author

szza commented Nov 14, 2024

reduce the code:

func (cm *connectionManager) OnTraffic(c gnet.Conn) (action gnet.Action) {
	for c.InboundBuffered() > 0 {
		// 1. decode
		if h, msg, err := ZeroCopyDecode(c); err != nil {
			if err == io.ErrShortBuffer {
				return
			}
			return gnet.Close
		}

		//... handle the msg
	}

	return gnet.None
}

// used for gent receive message
func ZeroCopyDecode(c gnet.Reader) (*Header, pb.Message, error) {
	total, err := c.Peek(MaxIntSize)
	if err != nil {
		return nil, nil, err
	}
	lenSize, length := DecodeInt(total)
	if lenSize > length {
		return nil, nil, proto.ErrorInvalidPacket
	}
	buff, err := c.Peek(length)
	if err != nil {
		return nil, nil, err
	}

	defer c.Discard(length)
	return DocodeBody(buff[lenSize:])
}


func DocodeBody(buff []byte) (h *Header, msg pb.Message, err error) {
	// 1. decode header
	h = FetchHeader(0)
	h.ReadFromBytes(buff)

	// 2. deocde body length
	bodySize, bodyLen := DecodeInt(buff[h.Size():])
	

	// 3. decode body
	msg = proto.NewMessage(h.Uri)
	if msg == nil || bodyLen == 0 {
		return
	}

	err = pb.Unmarshal(buff[h.Size()+bodySize:h.Size()+bodySize+bodyLen], msg)
	return
}

@szza
Copy link
Author

szza commented Nov 14, 2024

This is uncanny, normally eventloop.cache shouldn't grow this big. Could you show me the code you write in OnTraffic?
The code has been pasted above.

@panjf2000
Copy link
Owner

total, err := c.Peek(MaxIntSize)

What's this MaxIntSize? Is it the maximum value of int type? If so, it should be the first thing you need to fix because that's the root cause of this issue. You should peek one packet at a time, otherwise the eventloop.cache will be bloated if you peek a large size.

@szza
Copy link
Author

szza commented Nov 14, 2024

total, err := c.Peek(MaxIntSize)

What's this MaxIntSize? Is it the maximum value of int type? If so, it should be the first thing you need to fix because that's the root cause of this issue. You should peek one packet at a time, otherwise the eventloop.cache will be bloated if you peek a large size.

const (
MaxIntSize = 8
).

In addition, the mem growed location is 'buff, err := c.Peek(length)'

@szza
Copy link
Author

szza commented Nov 14, 2024

total, err := c.Peek(MaxIntSize)

What's this MaxIntSize? Is it the maximum value of int type? If so, it should be the first thing you need to fix because that's the root cause of this issue. You should peek one packet at a time, otherwise the eventloop.cache will be bloated if you peek a large size.

actually, we decode the one packet every time as the code pasted above:

  1. ZeroCopyDecode a packet to get a msg, and then
  2. handle the msg

dont return the OnTraffic func util meet the error io.ErrShortBuffer or read buff empty

@panjf2000
Copy link
Owner

How big is a packet? Also, what's the version of gnet you're using? You didn't fill out the issue template right.

@szza
Copy link
Author

szza commented Nov 14, 2024

How big is a packet? Also, what's the version of gnet you're using? You didn't fill out the issue template right.

Sorry,gent version is 'v2.5.2', mistakenly viewed the gnet version as the linux version.

The average size of packets should generally not exceed 32k; under certain exceptional circumstances, it may double, may be grow to a few megabytes.

@panjf2000
Copy link
Owner

If you keep reading data until io.ErrShortBuffer in OnTracffic, the eventloop.cache shouldn't wind up to be bigger than your maximum packet size. Are you absolutely sure there is no some particularly large packet coming in?

@panjf2000
Copy link
Owner

Like over one hundred megabytes?

@szza
Copy link
Author

szza commented Nov 14, 2024

Like over one hundred megabytes?

impossible to exceed 10 M

@szza
Copy link
Author

szza commented Nov 14, 2024

I think using bsPool to fetch memory and release it in Discard func might be a solution.

@panjf2000
Copy link
Owner

I think using bsPool to fetch memory and release it in Discard func might be a solution.

This could work for Conn.Peek(), but not for Conn.Next().

I'll run some tests with big packets and see if it can reproduce your issue. Next move will depend on the test result.

@panjf2000
Copy link
Owner

panjf2000 commented Nov 16, 2024

Updated

Never mind, I've tracked down the root cause and fixed it.


#660 should fix this issue.

However, I discovered another uncanny "issue" when I did pprof on gnet transmitting 10MB packets, idle connections would have inuse memory of the elastic.RingBuffer, which should have been released:
gnet-conn-mem-pprof
Did you see this when you ran pprof with your gnet server? @szza

@szza
Copy link
Author

szza commented Nov 18, 2024

Updated

Never mind, I've tracked down the root cause and fixed it.

#660 should fix this issue.

However, I discovered another uncanny "issue" when I did pprof on gnet transmitting 10MB packets, idle connections would have inuse memory of the elastic.RingBuffer, which should have been released: gnet-conn-mem-pprof Did you see this when you ran pprof with your gnet server? @szza

Yes, i just have found this problem, I'm trying to slove it

@szza
Copy link
Author

szza commented Nov 18, 2024

Idle connections still occupy memory, which the ringbuf cannot release.

5

@szza
Copy link
Author

szza commented Nov 18, 2024

Enable ET mode is a temporary solution ?

@szza
Copy link
Author

szza commented Nov 18, 2024

open a new issue about the ringbuf: #662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs investigation needs more info waiting for response waiting for the response from commenter
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants