Add reserve method to Cigar for use in `get_cigar` #206

tshauck · 2023-10-01T18:11:32Z

Hi,

I'd like to propose a PR that I think could improve BAM cigar parsing performance. I noticed while profiling a tool of mine that calls to get_cigar may have do n allocations based on the number of cigar ops. Looking at the get_cigar function, it already takes the number of ops, so this PR adds a reserve method on Cigar so that get_cigar can make use of it.

I did a little benchmark via criterion that seems to indicate reduced runtime and reduced runtime variance.

I put the (slightly overcomplicated 😅) benchmark code in a separate branch here: https://github.com/tshauck/noodles/tree/add-reserve-to-cigar-decode-benches/target/criterion (with some changes to noodles-bam too).

zaeleus · 2023-10-03T14:54:20Z

While this performance difference can be observed in how your benchmarks are defined, i.e., creating a new buffer for each iteration; in practice, there shouldn't be a difference with the current BAM decoder. The CIGAR buffer in a sam::alignment::Record is reused for each record, and Vec::clear does not affect its internal capacity. (Also note that Vec::reserve is for an additional number of entries, not for the expected count unless the capacity is already 0.)

I noticed while profiling a tool of mine that calls to get_cigar may have do n allocations based on the number of cigar ops.

Can you share this report? When a Vec is empty, the first push allocates for a minimum of 4 items, and when at capacity, the size of the Vec doubles. E.g., there will be an allocation at item 1 (capacity 4), 5 (capacity 8), 9 (capacity 16), etc.

tshauck · 2023-10-03T15:49:40Z

Thanks for the reply. I should've added a little more color to my use case. I'll have a closer look tonight and follow up, but I think the lack of reuse is due to me working with a lazy BAM record, then calling let cigar: Cigar = record.cigar().try_into()?; to convert the lazy cigar into a regular one, and I think TryFrom does create a new Cigar.

noodles/noodles-bam/src/lazy/record/cigar.rs

Lines 48 to 62 in 38836a4

    
           impl<'a> TryFrom<Cigar<'a>> for sam::record::Cigar { 
        
               type Error = io::Error; 
        
               fn try_from(bam_cigar: Cigar<'a>) -> Result<Self, Self::Error> { 
        
                   use crate::record::codec::decoder::get_cigar; 
        
                   let mut src = bam_cigar.0; 
        
                   let mut cigar = Self::default(); 
        
                   let op_count = bam_cigar.len(); 
        
                   get_cigar(&mut src, &mut cigar, op_count) 
        
                       .map_err(|e| io::Error::new(io::ErrorKind::InvalidData, e))?; 
        
                   Ok(cigar) 
        
               } 
        
           }

Edit: If reserve is to be included, perhaps moving it into try_from might make more sense?

zaeleus · 2023-10-05T15:15:01Z

Ah, I see. Since this only affects the conversion use case, can we use the ops iterator instead of the decoder in the implementation of lazy::record::Cigar::try_from, i.e., call self.iter().collect()? The iterator has the correct size hint set, which preallocates the resulting Vec.

tshauck · 2023-10-06T16:15:24Z

Cool, thanks for the feedback. Please let me know if the update is what you had in mind.

noodles-bam/src/lazy/record/cigar.rs

Co-authored-by: Michael Macias <[email protected]>

zaeleus · 2023-10-09T14:54:20Z

Thanks! (And sorry for the slow responses; I just got back from vacation.)

tshauck · 2023-10-09T14:57:16Z

All good -- appreciate the education/patience

tshauck added 3 commits October 1, 2023 10:45

feat: reserve cap during cigar parsing

f68188e

feat: add reserve fn

ed23e9d

docs: simplfy docstring

f1010c2

zaeleus added the bam label Oct 3, 2023

tshauck added 2 commits October 5, 2023 09:05

feat: update

d010d1c

feat: better construction

895ead4

zaeleus requested changes Oct 7, 2023

View reviewed changes

noodles-bam/src/lazy/record/cigar.rs Outdated Show resolved Hide resolved

Update noodles-bam/src/lazy/record/cigar.rs

ed02da9

Co-authored-by: Michael Macias <[email protected]>

tshauck requested a review from zaeleus October 7, 2023 02:11

zaeleus approved these changes Oct 9, 2023

View reviewed changes

zaeleus merged commit 7257105 into zaeleus:master Oct 9, 2023
3 checks passed

tshauck deleted the add-reserve-to-cigar-decode branch October 9, 2023 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add reserve method to Cigar for use in `get_cigar` #206

Add reserve method to Cigar for use in `get_cigar` #206

tshauck commented Oct 1, 2023 •

edited

Loading

zaeleus commented Oct 3, 2023

tshauck commented Oct 3, 2023 •

edited

Loading

zaeleus commented Oct 5, 2023

tshauck commented Oct 6, 2023 •

edited

Loading

zaeleus commented Oct 9, 2023

tshauck commented Oct 9, 2023

Add reserve method to Cigar for use in get_cigar #206

Add reserve method to Cigar for use in get_cigar #206

Conversation

tshauck commented Oct 1, 2023 • edited Loading

zaeleus commented Oct 3, 2023

tshauck commented Oct 3, 2023 • edited Loading

zaeleus commented Oct 5, 2023

tshauck commented Oct 6, 2023 • edited Loading

zaeleus commented Oct 9, 2023

tshauck commented Oct 9, 2023

Add reserve method to Cigar for use in `get_cigar` #206

Add reserve method to Cigar for use in `get_cigar` #206

tshauck commented Oct 1, 2023 •

edited

Loading

tshauck commented Oct 3, 2023 •

edited

Loading

tshauck commented Oct 6, 2023 •

edited

Loading