Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex seems to always take priority #377

Open
afreeland opened this issue Feb 19, 2024 · 1 comment
Open

Regex seems to always take priority #377

afreeland opened this issue Feb 19, 2024 · 1 comment
Labels
bug Something isn't working duplicate This issue or pull request already exists help wanted Extra attention is needed

Comments

@afreeland
Copy link
Contributor

afreeland commented Feb 19, 2024

I'm new to Rust and new to Logos, so this could just be me...but when using regex is seems like it always stomps on the other tokens. The snippet below has essentially calls out three tokens, one represents an action alert, another is protocol tls and then network information.

Here is the code with a regex

use logos::Logos;
use regex::Regex;

#[derive(Debug, Logos, PartialEq)]
enum Token {
    #[token("alert", priority = 2500)]
    Action,

    #[token("tls", priority = 200)]
    Protocol,

    #[regex(r"([^\s]+) ([^\s]+) (->|<-) ([^\s]+) ([^\s]+)", priority = 0)]
    NetworkInfo,

    Error,
}

struct SuricataLexer<'a> {
    lexer: logos::Lexer<'a, Token>,
}

impl<'a> SuricataLexer<'a> {
    fn new(input: &'a str) -> Self {
        SuricataLexer {
            lexer: Token::lexer(input),
        }
    }

    fn next_token(&mut self) -> Token {
        self.lexer.next().unwrap().unwrap_or(Token::Error)
    }
}

fn main() {
    // Sample Suricata rule
    let input = "alert tls $HOME_NET any -> $EXTERNAL_NET any (msg:\"some bs\")";

    // Create a SuricataLexer instance
    let mut lexer = SuricataLexer::new(input);

    // Tokenize the input and print the results
    while lexer.lexer.span().end < input.len() {
        let token = lexer.next_token();
        println!("{:?}", token);
    }
}

This outputs:

Error
Error
NetworkInfo
Error
Error

However, if I comment out the NetworkInfo section, my Action and Protocol will work just fine.
Output:

Action
Error
Protocol
Error
Error
....

This part of the input $HOME_NET any -> $EXTERNAL_NET represents a source host, source port, direction, destination host and destination port. These things are pretty fluid so outside of regex, not really sure of how I would go about targeting them.

Is there a way to have regex not overpower everything around it...or am I doing something incorrectly? I read the token-disambiguation but couldn't seem to find a way to lower regex priority.

@jeertmans jeertmans added duplicate This issue or pull request already exists bug Something isn't working help wanted Extra attention is needed labels Feb 19, 2024
@jeertmans
Copy link
Collaborator

jeertmans commented Feb 19, 2024

Hello, thanks for sharing your issue!

First, let me suggest this simpler MWE, so it is easier to debug:

use logos::Logos;

#[derive(Debug, Logos)]
enum Token {
    #[token("alert")]
    Action,

    #[token("tls")]
    Protocol,

    #[regex(r"([^\s]+) ([^\s]+) (->|<-) ([^\s]+) ([^\s]+)")]
    NetworkInfo,
}

fn main() {
    let input = "alert tls $HOME_NET any -> $EXTERNAL_NET any (msg:\"some bs\")";

    let mut lexer = Token::lexer(input);
    while let Some(token) = lexer.next() {
        println!("{:?}", token);
    }
}

Second, I think this is a duplicate of #358, and maybe #265. Hopefully, the bug fix mentioned in #265 by @jameshurt might solve this, but I am waiting for a reply :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants