WIP: Page Processing Design

Jump to bottom

Tom Kwong edited this page May 12, 2018 · 1 revision

Summary

Per issue #44, it has become clear that a two-pass approach is required to read sas7bdat files:

Read all meta info by scanning all pages
Read data

How do we do that efficiently? Pseudocode below. Ref: sas7bdat spec

Pass #1

Read header as usual.  Starting position SP = 1024 or 8192
Loop
  Read 18 or 34 bytes from disk
  Parse page type (position 17-18 or 33-34)
  If page type is META, MIX, or AMD 
    Read the rest of page from disk (seek to beginning of page & read page length bytes)
    Process page
  End
  If it is the first MIX or DATA page
    Remember file position
  End
End

Pass #2

Loop
  Read page and parse page
End