Skip to content

Commit

Permalink
Fixed bug for thread_dna function when using a ClipKIT log file. Inpu…
Browse files Browse the repository at this point in the history
…t protein alignment must be the untrimmed alignment.
  • Loading branch information
JLSteenwyk committed Aug 13, 2024
1 parent 2bc1d48 commit 0427f33
Show file tree
Hide file tree
Showing 11 changed files with 510 additions and 582 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ test.fast:
python -m pytest -m "not (integration or slow)"
rm -rf output/
mkdir output/
python -m pytest --basetemp=output -m "integration and not slow"
python -m pytest --basetemp=output -m "integration and not slow" -vv
rm test.fa test.occupancy test.partition

# used by GitHub actions during CI workflow
Expand Down
3 changes: 3 additions & 0 deletions change_log.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
Major changes to PhyKIT are summarized here.

1.20.0
- Fixed bug for thread_dna function when using a ClipKIT log file. Input protein alignment must be the untrimmed alignment.

1.19.4
- Saturation function forces y-intercept to be zero when calculating slope

Expand Down
3 changes: 3 additions & 0 deletions docs/change_log/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ Change log

Major changes to PhyKIT are summarized here.

**1.20.0**:
Fixed bug for thread_dna function when using a ClipKIT log file. Input protein alignment must be the untrimmed alignment.

**1.19.9**:
Saturation function now also reports the absolute value of 1-saturation. Lower values are indicative of less saturation.

Expand Down
17 changes: 14 additions & 3 deletions docs/usage/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -500,9 +500,20 @@ Thread DNA sequence onto a protein alignment to create a
codon-based alignment.

This function requires input alignments are in fasta format.
Codon alignments are then printed to stdout. Note, sequences
are assumed to occur in the same order in the protein and
nucleotide alignment.
Codon alignments are then printed to stdout. Note, paired
sequences are assumed to have the same name between the
protein and nucleotide file. The order does not matter.

To thread nucleotide sequences over a trimmed amino acid
alignment, provide PhyKIT with a log file specifying which
sites have been trimmed and which have been kept. The log
file must be formatted the same as the log files outputted
by the alignment trimming toolkit ClipKIT (see -l in ClipKIT
documentation.) Details about ClipKIT can be seen here:
https://github.com/JLSteenwyk/ClipKIT.

If using a ClipKIT log file, the untrimmed protein alignment
should be provided in the -p/--protein argument.

.. code-block:: shell
Expand Down
9 changes: 6 additions & 3 deletions phykit/phykit.py
Original file line number Diff line number Diff line change
Expand Up @@ -2606,9 +2606,9 @@ def thread_dna(argv):
codon-based alignment.
This function requires input alignments are in fasta format.
Codon alignments are then printed to stdout. Note, sequences
are assumed to occur in the same order in the protein and
nucleotide alignment.
Codon alignments are then printed to stdout. Note, paired
sequences are assumed to have the same name between the
protein and nucleotide file. The order does not matter.
To thread nucleotide sequences over a trimmed amino acid
alignment, provide PhyKIT with a log file specifying which
Expand All @@ -2618,6 +2618,9 @@ def thread_dna(argv):
documentation.) Details about ClipKIT can be seen here:
https://github.com/JLSteenwyk/ClipKIT.
If using a ClipKIT log file, the untrimmed protein alignment
should be provided in the -p/--protein argument.
Aliases:
thread_dna, pal2nal, p2n
Command line interfaces:
Expand Down
22 changes: 12 additions & 10 deletions phykit/services/alignment/dna_threader.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,16 +61,18 @@ def create_mask(self, length):
return keep_mask

def normalize_p_seq(self, p_seq, mask):
if self.clipkit_log_data:
untrimmed = []
offset = 0
for idx, value in enumerate(mask[::3]):
if value is True:
untrimmed.append(p_seq[idx - offset])
else:
offset += 1
untrimmed.append("#")
p_seq = "".join(untrimmed)
#TODO: write MP
#TODO: update tests
# if self.clipkit_log_data:
# untrimmed = []
# offset = 0
# for idx, value in enumerate(mask[::3]):
# if value is True:
# untrimmed.append(p_seq[idx - offset])
# else:
# offset += 1
# untrimmed.append("#")
# p_seq = "".join(untrimmed)
return "".join([c * 3 for c in p_seq])

def normalize_n_seq(self, n_seq, normalized_p_seq):
Expand Down
2 changes: 1 addition & 1 deletion phykit/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.19.9"
__version__ = "1.20.0"
Loading

0 comments on commit 0427f33

Please sign in to comment.