Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mask): allow empty mask file, warn not error #1287

Open
wants to merge 1 commit into
base: feat/mask-all-gaps
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,15 @@
### Features

* mask: Add `--mask-gaps` option to `augur mask` to allow masking of gaps at terminals via `--mask-gaps terminals` or all gaps via `--mask-gaps all`. [#1286][] (@corneliusroemer)
* mask: Warn on empty mask file instead of error. Introduced to allow empty stub files when setting up builds and to prevent necessary workflow changes when all mask sites are removed. [#1287][] (@corneliusroemer)

### Bug fixes

* distance: Improve documentation by describing how gaps get treated as indels and how users can ignore specific characters in distance calculations. [#1285][] (@huddlej)

[#1285]: https://github.com/nextstrain/augur/pull/1285
[#1286]: https://github.com/nextstrain/augur/pull/1286
[#1287]: https://github.com/nextstrain/augur/pull/1286

## 22.3.0 (14 August 2023)

Expand Down
3 changes: 1 addition & 2 deletions augur/mask.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,8 +231,7 @@ def run(args):
print("ERROR: File {} does not exist!".format(args.mask_file))
sys.exit(1)
if os.path.getsize(args.mask_file) == 0:
print("ERROR: {} is an empty file.".format(args.mask_file))
sys.exit(1)
Comment on lines -234 to -235
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pytest failure isn't very clear, but it fails on this test:

augur/tests/test_mask.py

Lines 257 to 261 in 2810abe

def test_run_handle_empty_mask_file(self, vcf_file, bed_file, argparser):
open(bed_file, "w").close()
args = argparser("-s %s --mask %s" % (vcf_file, bed_file))
with pytest.raises(SystemExit):
mask.run(args)

Failing there makes sense because a SystemExit is no longer raised. However, by warning and letting augur mask continue to run with an empty mask file, there is a new error downstream:

augur/mask.py:244: in run
    mask_sites.update(load_mask_sites(args.mask_file))
augur/utils.py:531: in load_mask_sites
    mask_sites = read_bed_file(mask_file)
augur/utils.py:479: in read_bed_file
    bed = pd.read_csv(bed_file, sep='\t', header=None, usecols=[1,2],
…/site-packages/pandas/util/_decorators.py:211: in wrapper
    return func(*args, **kwargs)
…/site-packages/pandas/util/_decorators.py:331: in wrapper
    return func(*args, **kwargs)
…/site-packages/pandas/io/parsers/readers.py:950: in read_csv
    return _read(filepath_or_buffer, kwds)
…/site-packages/pandas/io/parsers/readers.py:605: in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
…/site-packages/pandas/io/parsers/readers.py:1442: in __init__
    self._engine = self._make_engine(f, self.engine)
…/site-packages/pandas/io/parsers/readers.py:1753: in _make_engine
    return mapping[engine](f, **self.options)
…/site-packages/pandas/io/parsers/c_parser_wrapper.py:79: in __init__
    self._reader = parsers.TextReader(src, **kwds)

>   ???
E   pandas.errors.EmptyDataError: No columns to parse from file

pandas/_libs/parsers.pyx:554: EmptyDataError

The downstream code should be updated to support empty mask files.

print(f"WARN: The mask file {args.mask_file} is an empty file.")
if not any((args.mask_file, args.mask_gaps, args.mask_from_beginning, args.mask_from_end, args.mask_sites, args.mask_invalid)):
print("No masking sites provided. Must include one of --mask, --mask-gaps, --mask-from-beginning, --mask-from-end, --mask-invalid, or --mask-sites")
sys.exit(1)
Expand Down