-
Notifications
You must be signed in to change notification settings - Fork 7
WIP: Page Processing Design
Tom Kwong edited this page May 12, 2018
·
1 revision
Per issue #44, it has become clear that a two-pass approach is required to read sas7bdat files:
- Read all meta info by scanning all pages
- Read data
How do we do that efficiently? Pseudocode below. Ref: sas7bdat spec
Read header as usual. Starting position SP = 1024 or 8192
Loop
Read 18 or 34 bytes from disk
Parse page type (position 17-18 or 33-34)
If page type is META, MIX, or AMD
Read the rest of page from disk (seek to beginning of page & read page length bytes)
Process page
End
If it is the first MIX or DATA page
Remember file position
End
End
Loop
Read page and parse page
End