Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reserve method to Cigar for use in get_cigar #206

Merged
merged 6 commits into from
Oct 9, 2023

Conversation

tshauck
Copy link
Contributor

@tshauck tshauck commented Oct 1, 2023

Hi,

I'd like to propose a PR that I think could improve BAM cigar parsing performance. I noticed while profiling a tool of mine that calls to get_cigar may have do n allocations based on the number of cigar ops. Looking at the get_cigar function, it already takes the number of ops, so this PR adds a reserve method on Cigar so that get_cigar can make use of it.

I did a little benchmark via criterion that seems to indicate reduced runtime and reduced runtime variance.

image

I put the (slightly overcomplicated 😅) benchmark code in a separate branch here: https://github.com/tshauck/noodles/tree/add-reserve-to-cigar-decode-benches/target/criterion (with some changes to noodles-bam too).

@zaeleus zaeleus added the bam label Oct 3, 2023
@zaeleus
Copy link
Owner

zaeleus commented Oct 3, 2023

While this performance difference can be observed in how your benchmarks are defined, i.e., creating a new buffer for each iteration; in practice, there shouldn't be a difference with the current BAM decoder. The CIGAR buffer in a sam::alignment::Record is reused for each record, and Vec::clear does not affect its internal capacity. (Also note that Vec::reserve is for an additional number of entries, not for the expected count unless the capacity is already 0.)

I noticed while profiling a tool of mine that calls to get_cigar may have do n allocations based on the number of cigar ops.

Can you share this report? When a Vec is empty, the first push allocates for a minimum of 4 items, and when at capacity, the size of the Vec doubles. E.g., there will be an allocation at item 1 (capacity 4), 5 (capacity 8), 9 (capacity 16), etc.

@tshauck
Copy link
Contributor Author

tshauck commented Oct 3, 2023

Thanks for the reply. I should've added a little more color to my use case. I'll have a closer look tonight and follow up, but I think the lack of reuse is due to me working with a lazy BAM record, then calling let cigar: Cigar = record.cigar().try_into()?; to convert the lazy cigar into a regular one, and I think TryFrom does create a new Cigar.

impl<'a> TryFrom<Cigar<'a>> for sam::record::Cigar {
type Error = io::Error;
fn try_from(bam_cigar: Cigar<'a>) -> Result<Self, Self::Error> {
use crate::record::codec::decoder::get_cigar;
let mut src = bam_cigar.0;
let mut cigar = Self::default();
let op_count = bam_cigar.len();
get_cigar(&mut src, &mut cigar, op_count)
.map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))?;
Ok(cigar)
}
}

Edit: If reserve is to be included, perhaps moving it into try_from might make more sense?

@zaeleus
Copy link
Owner

zaeleus commented Oct 5, 2023

Ah, I see. Since this only affects the conversion use case, can we use the ops iterator instead of the decoder in the implementation of lazy::record::Cigar::try_from, i.e., call self.iter().collect()? The iterator has the correct size hint set, which preallocates the resulting Vec.

@tshauck
Copy link
Contributor Author

tshauck commented Oct 6, 2023

Cool, thanks for the feedback. Please let me know if the update is what you had in mind.

noodles-bam/src/lazy/record/cigar.rs Outdated Show resolved Hide resolved
@tshauck tshauck requested a review from zaeleus October 7, 2023 02:11
@zaeleus zaeleus merged commit 7257105 into zaeleus:master Oct 9, 2023
3 checks passed
@zaeleus
Copy link
Owner

zaeleus commented Oct 9, 2023

Thanks! (And sorry for the slow responses; I just got back from vacation.)

@tshauck
Copy link
Contributor Author

tshauck commented Oct 9, 2023

All good -- appreciate the education/patience

@tshauck tshauck deleted the add-reserve-to-cigar-decode branch October 9, 2023 14:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants