Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Memory Overflow Risk in the getLines function in src/parse.ts #88

Open
PreciousnessX opened this issue Nov 7, 2024 · 0 comments

Comments

@PreciousnessX
Copy link

PreciousnessX commented Nov 7, 2024

In the getLines function within the parse.ts file (code shown below), an analysis has revealed a potential issue that could lead to excessive memory usage and potentially cause the browser to crash.

export function getLines(onLine: (line: Uint8Array, fieldLength: number) => void) {
    let buffer: Uint8Array | undefined;
    let position: number; // current read position
    let fieldLength: number; // length of the `field` portion of the line
    let discardTrailingNewline = false;

    // return a function that can process each incoming byte chunk:
    return function onChunk(arr: Uint8Array) {
        if (buffer === undefined) {
            buffer = arr;
            position = 0;
            fieldLength = -1;
        } else {
            // we're still parsing the old line. Append the new bytes into buffer:
            buffer = concat(buffer, arr);
        }

        const bufLength = buffer.length;
        let lineStart = 0; // index where the current line starts
        while (position < bufLength) {
            if (discardTrailingNewline) {
                if (buffer[position] === ControlChars.NewLine) {
                    lineStart = ++position; // skip to next char
                }

                discardTrailingNewline = false;
            }

            // start looking forward till the end of line:
            let lineEnd = -1; // index of the \r or \n char
            for (; position < bufLength && lineEnd === -1; ++position) {
                switch (buffer[position]) {
                    case ControlChars.Colon:
                        if (fieldLength === -1) { // first colon in line
                            fieldLength = position - lineStart;
                        }
                        break;
                    // @ts-ignore:7029 \r case below should fallthrough to \n:
                    case ControlChars.CarriageReturn:
                        discardTrailingNewline = true;
                    case ControlChars.NewLine:
                        lineEnd = position;
                        break;
                }
            }

            if (lineEnd === -1) {
                // We reached the end of the buffer but the line hasn't ended.
                // Wait for the next arr and then continue parsing:
                break;
            }

            // we've reached the line end, send it out:
            onLine(buffer.subarray(lineStart, lineEnd), fieldLength);
            lineStart = position; // we're now on the next line
            fieldLength = -1;
        }

        if (lineStart === bufLength) {
            buffer = undefined; // we've finished reading it
        } else if (lineStart!== 0) {
            // Create a new view into buffer beginning at lineStart so we don't
            // need to copy over the previous lines when we get the new arr:
            buffer = buffer.subarray(lineStart);
            position -= lineStart;
        }
    }
}

When continuous byte array chunks are fed into the getLines function, and if the previous buffer has not completed parsing a line when a new chunk arrives, the function appends the new data to the existing buffer using the concat operation. As data keeps coming in, if the parsing speed cannot keep up with the incoming data rate, for example, when lines are long or newlines occur infrequently, the buffer will continuously grow, consuming more and more memory space. Eventually, this can lead to the browser running out of memory and crashing.

Proposed Solutions

  1. Buffer Size Limitation: Consider setting a maximum length limit for the buffer. When the buffer reaches this limit, implement an appropriate strategy (such as discarding the oldest data portion while ensuring the integrity of line data) to prevent it from growing indefinitely.

  2. Faster Data Processing: Optimize the processing logic within the onLine callback function. If its operations are time - consuming, consider asynchronous processing or optimizing the algorithm so that it can return quickly, allowing the getLines function to handle subsequent data in a timely manner and reducing buffer accumulation.

  3. Stream - based Processing: Explore the possibility of changing the data processing to a stream - based model. This would involve reading and processing data chunk by chunk instead of accumulating all the data in the buffer as it currently does.

It is hoped that the development team can address this potential memory issue. Thank you for your hard work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant