Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Bittide.ObliviousFifo #299

Draft
wants to merge 2 commits into
base: staging
Choose a base branch
from
Draft

Add Bittide.ObliviousFifo #299

wants to merge 2 commits into from

Conversation

kleinreact
Copy link
Contributor

This FIFO combines Clash.Cores.Xilinx.DcFifo.dcFifo with Bittide.Counter.domainDiffCounter to create an oblivious FIFO that supports a much larger virtual data counter than the number of elements that it can actually hold. The FIFO is able to hold 2^n-1 virtual elements and 2^m-1 real data elements, where n can be much larger than m. When its real size m is exceeded, then the FIFO does not loose the frames, but only forgets about their data. This means, that the frames are still counted, "traverse" the FIFO and are output at the other end in the same order as they are written to the component. The only difference is that frames, which are written to the FIFO, when its real capacity is reached, are output with a lost status instead.

Note: This implementation is completely untested and currently only serves as a "proof-of-concept".

Copy link
Contributor

@martijnbastiaan martijnbastiaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm not sure about this. This seems to "stack" complexity instead of composing it. What I had in mind was the following reset strategy (i.e., after each step the reset of the next step is deasserted):

  1. Clock board initialization
  2. Transceiver initialization
  3. Counters + clock control
  4. DcFifo and its control logic
  5. ....

This seems to squash together (3 + 4), though not exactly everything. I guess my question is: what is the advantage of this over "just" keeping them separate?

bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
@kleinreact kleinreact force-pushed the oblivious-fifo branch 4 times, most recently from 15e137d to c880d9c Compare June 12, 2024 10:44
Copy link
Contributor

@martijnbastiaan martijnbastiaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got trouble parsing the implementation, I thought the whole point of the oblivious FIFO was to eliminate the separate domainDiffCounter. This just seems to move it, which I don't really see the point of.

If we want to fold it up into one FIFO, couldn't we:

  • Count the write-on-full transactions on the write side
  • Count the read-on-empty transactions on the read side

Then:

  • On the write side
    • full: up counter
    • not full and counter != 0: decrease counter, write garbage
    • otherwise: write valid frame
  • On the read side:
    • empty: up counter
    • not empty and counter != 0: decrease counter, report garbage
    • otherwise: read word (might be garbage if write side indicated so)

If we then want to know the virtual data counter on the read side then: count reported by FIFO - read_on_empty_counter + synced_write_on_full_counter. Where synced_write_on_full_counter is periodically synchronized from the write side by a handshake.

bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
Comment on lines +97 to +37
-- | This FIFO combines 'Clash.Cores.Xilinx.DcFifo.dcFifo' with
-- 'Bittide.Counter.domainDiffCounter' to create an oblivious FIFO
-- that supports a much larger _virtual_ data counter than the number
-- of elements that it can actually hold. The FIFO is able to hold
-- @2^n-1@ virtual elements and @2^m-1@ real data elements, where @n@
-- can be much larger than @m@. When its real size @m@ is exceeded,
-- then the FIFO does not loose the frames, but only forgets about
-- their data. This means, that the frames are still counted,
-- "traverse" the FIFO and are output at the other end in the same
-- order as they are written to the component. The only difference is
-- that frames, which are written to the FIFO, when its real capacity
-- is already utilized, are output with a @lost@ status instead.
Copy link
Contributor

@martijnbastiaan martijnbastiaan Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: In retrospect this documentation might just be hard because I don't understand the point of this implementation, see main PR comment.

I found it hard to read this documentation. It's hard to put a finger on what exactly is causing this difficulty but I think it is because:

  • It assumes the reader is familiar with both dcFifo and domainDiffCounter
  • It assumes it's "obvious" how dcFifo and domainDiffCounter would marry so-to-speak
  • It introduces the name "oblivious" without explaining what exactly is oblivious about it
  • It is quick to introduce very exact notation (mentioning type variables)

I think it should be restructured such that:

$what

$why

$details

E.g.,

Like a normal dual clock FIFO, but instead of dropping a write transaction on a write-on-full it inserts a "lost" (garbage) word instead. It does this without dropping any existing data. Lost words are kept track of using a counter, so the virtual size of the FIFO can grow arbitrarily large.

This is used within Bittide to....

Mention n, m, ...

IMO it's worth renaming to noDropFifo or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noDropFifo also sounds misleading to me, as this FIFO still drops the data, but keeps the frame skeleton in the right order.

No doubt, however, naming the functionality of this thing turns out to be hard 🙃

bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
bittide/src/Bittide/ObliviousFifo.hs Outdated Show resolved Hide resolved
@kleinreact
Copy link
Contributor Author

kleinreact commented Jun 27, 2024

I've got trouble parsing the implementation, I thought the whole point of the oblivious FIFO was to eliminate the separate domainDiffCounter. This just seems to move it, which I don't really see the point of.

No, the idea behind the oblivious FIFO is that it elides the need for a detection mechanism figuring out the point in time at which the real buffer, i.e. the dcFifo, remains within the limits. It also does not need the assumption that there is a point in time at which the buffer stays within these limits forever, because the oblivious FIFO allows for running out of the real limits without affecting clock control at any point in time (not talking about the virtual limits here).

Where synced_write_on_full_counter is periodically synchronized from the write side by a handshake.

But with the proposed implementation the value would not be cycle accurate. With the current implementation it is.


Please also note that the PR still is WIP and by no means complete. Nevertheless, thanks a lot for the early review comments.

@martijnbastiaan
Copy link
Contributor

No, the idea behind the oblivious FIFO is that it elides the need for a detection mechanism figuring out the point in time at which the real buffer, i.e. the dcFifo, remains within the limits.

Ah ok, then I think I might have been conflating multiple ideas. What would a "detection mechanism" look like otherwise? Isn't it just observing the out-of-bound signal from the FIFO on either end?

But with the proposed implementation the value would not be cycle accurate. With the current implementation it is.

Does this matter at all? A handshake takes between ~10-20 cycles (depending on settings), while the clock offsets are measured in parts per million. I.e., for a clock offset of 25 ppm we expect to do ~2000-4000 handshakes before measuring a single tick too much or tock too little.

I also don't really understand the idea on why the current implementation would be "cycle accurate" btw, but I'll have to study the current algo more.

Please also note that the PR still is WIP and by no means complete. Nevertheless, thanks a lot for the early review comments.

Yes, noted. I saw some activity on it, and we mentioned pulling it from the shelve again a few weeks ago so I figure I'd take another look.

@martijnbastiaan martijnbastiaan self-requested a review June 28, 2024 09:57
@martijnbastiaan
Copy link
Contributor

Dismissing my review as the changes requested was more for dropping the BitPack instance. I'll have to revisit on the general idea of the oblivious FIFO as I still don't fully grasp its purpose (which is not a critique of the author / implementation).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants