Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a custom set of characters in a token #87

Merged
merged 1 commit into from
Oct 4, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions boolean/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,8 @@ class BooleanAlgebra(object):
"""

def __init__(self, TRUE_class=None, FALSE_class=None, Symbol_class=None,
NOT_class=None, AND_class=None, OR_class=None):
NOT_class=None, AND_class=None, OR_class=None,
allowed_in_token=('.', ':', '_')):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it be better to provide a string that lists all allowed characters explicitly (including "alnum")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is good, but maybe not practical to match what alnum allows? According to the python docs, isalpha (part of alnum) matches any unicode character marked as 'Letter'. I'm not sure how many that is exactly, but probably more than one would want to specify.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Using a string vs a list is minor too. So I am merging as-is.
Thank you again!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

"""
The types for TRUE, FALSE, NOT, AND, OR and Symbol define the boolean
algebra elements, operations and Symbol variable. They default to the
Expand Down Expand Up @@ -158,6 +159,9 @@ def __init__(self, TRUE_class=None, FALSE_class=None, Symbol_class=None,
for name, value in tf_nao.items():
setattr(obj, name, value)

# Set the set of characters allowed in tokens
self.allowed_in_token = allowed_in_token

def definition(self):
"""
Return a tuple of this algebra defined elements and types as:
Expand Down Expand Up @@ -461,7 +465,7 @@ def tokenize(self, expr):
position += 1
while position < length:
char = expr[position]
if char.isalnum() or char in ('.', ':', '_'):
if char.isalnum() or char in self.allowed_in_token:
position += 1
tok += char
else:
Expand Down
11 changes: 11 additions & 0 deletions boolean/test_boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,17 @@ def build_symbol(current_dotted):
)
self.assertEqual(expected, expr)

def test_allowing_additional_characters_in_tokens(self):
algebra = BooleanAlgebra(allowed_in_token=('.', '_', '-', '+'))
test_expr = 'l-a AND b+c'

expr = algebra.parse(test_expr)
expected = algebra.AND(
algebra.Symbol('l-a'),
algebra.Symbol('b+c')
)
self.assertEqual(expected, expr)

def test_parse_raise_ParseError1(self):
algebra = BooleanAlgebra()
expr = 'l-a AND none'
Expand Down