-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read-write lookup offline-memory-checking logUp-based implementation in (Super)Nova R1CS #48
base: dev
Are you sure you want to change the base?
Conversation
9787e88
to
80331a8
Compare
61eb0fd
to
478e3e3
Compare
2122ebe
to
aad19bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much, this looks very, very nice!
I left a few comments on the Rust, which should help further review.
466ab9a
to
94f7b28
Compare
Hi @huitseeker Thanks for the review :)
|
If you rebase on top of #76 you should get better results with CI! |
b7beb7c
to
caf2477
Compare
First off, great work on getting this type of extension working on top of Nova! I've looked through your hackmd and the code, here are some thoughts. I've got a few code-related comments coming later too. SoundnessBoth the lookup gadget and the logic behind the lookup SNARK look good to me. I found the latter a bit confusing though, do you have any spec you're basing this off of? What does the process look like for finalizing the proof? The It seems like a good idea to have some separation between both proofs to ensure users can opt out of lookups, but there are also some missed opportunities for batching and minimizing the number of verifier hashes. Cost estimationIt would be great to get an estimate of the number of constraints/variables required per read and write operation in each folding step. Depending on the context (read vs. r/w, table size, number of accesses), it may be cheaper to consider the "vanilla" approach of performing these accesses solely with R1CS constraints. Compressing SNARKThe Quarks approach for proving a grand product of The polynomial also needs to be opened at 2 different points, which can double the cost of the PCS opening. In the case of ZeroMorph, the reduction to univariate polynomials needs to be performed twice, leading to In contrast, when the proof system has a mechanism for accessing "shifted" values or "rotations" of a vector, the grand product argument from PlonK can be used, which only requires the commitment to a single column of length With the introduction of ZeroMorph, it would be possible to leverage this new PCS to implement the more efficient variant of product-check. Witness generationFrom a user-perspective, using lookups in this way means that the entire witness trace needs to be precomputed ahead of time, due to challenges This is fine for some applications, but can be impractical for others. It would be good to note what type of computations this SNARK is well-suited for. I think your example is a good one: An algorithm defined in terms of a loop, where the body has access to an array. In contrast to a non-folding-based circuit, the complexity of the compressing SNARK only depends on the size of the body. Different techniquesEfficient lookup arguments will most likely all require modifications to some parts of the proof system. There are also some proposals (such as Origami and the LogUp variant in Protostar) that integrate lookups directly as part of the folding. The main issue is that they require the recursive folding verifier to perform more scalar multiplications. With an implementation of CycleFold, this may have negligible effect on the recursive circuit, and may be competitive compared to the number of Poseidon hashes performed with the approach in this PR. What your work shows though, is that the RAM/ROM abstraction is far more useful than fixed lookups. A recent paper surveys the different approaches to memory-consistency, and highlights how different proof systems implement this abstraction. It would be very neat to see these adapted to Nova, to get some intuition about the different performance profiles of the techniques. In any case, being able to benchmark against an existing implementation will be very useful to figure out what approach works best! |
Hi @adr1anh thanks for the insightful feedback! I will try to address thoughts in comments gradually :) SoundnessThe specs/flows in And yes, the R1CS verification should create Computations of this SNARK well-suitedActually, this SNARK is especially useful for applications which involves ALL scenarios in step circuit
Because current approach are based on A strategy is, we can placement different tables into single table in a well-defined order, such that the table which address are well-constrainted (such as RAM where the address to rw is derived from computation) are in low memory address, while the tables (address are supposed to be non-deterministic from prover, such as range check) are in higher address. Then, we can replace the non-deterministic address from prover by |
Cost estimation
Giving we concat all table into a single table with separating address space.
Which indeed "vanilla approach" outperform. While in another fixed lookup, e.g. 64 bits XOR
Where this lookup approach shows a significant improvement over vanilla approach. Above estimated show current lookup approach is fruitful on R1CS for application which got complex step circuit, e.g. zkVM.
Witness && memory overheadFor this LookupSNARK, the witness for lookup need to be know in advance outside of circuit, means there need to be an external witness generator. The external witness generator is very common in zkVM application to firstly execute the program first on fast x86 machine then talk to zk proving system to generate the zk proof. It's a good idea to document this in some place, or maybe let unit test demonstrate it ?
I am not sure about this part. I think the linear amount of storage is not due to this LookupSNARK design, whereas it's due to Nova non-deterministic step circuit design so we need to instantiate immutable step circuits in advanced. Each step circuit only hold the non-deterministic input for its round, so the size remain constant. @adr1anh would you mind to explain more for the linear in the number of folding steps? New approach & techniquesIn this PR, product sum augment are leveraged on old Update: with logUp approach implemented in MemorySumcheck then we can reduce 2 challenge alpha/gamma into just one, so can reduce one more Nova state variable, with extra cost by just + 1 constraint + variable The verifier cost of final SNARK will be amortized for application with massive folding steps. Comparing with existing SOTA lookup attempt on R1CS Nova, I think first attempt of |
9b0ea2c
to
1f33dbb
Compare
updated PR with below change summary
So with above optimized refactor, my lookup approach in folding will be extreme low cost, since in folding everything are working on field value without touch chanllenge circuit. challenge is defered to final snark. I guess this should be ultimate solution 😄 ? Because so far I do not have clear idea for how to implement my idea about challenge defer in final CompressedSNARK R1CS, so it will be good if can gain some input from you
|
1f33dbb
to
9e8bc25
Compare
Thanks for this great write-up and PoC @hero78119 ! This approach seems very promising for zkVM where the execution trace is done in advanced followed by ZKP work. This all boils down to, how do we know that the prover did not cheat and change Removing challenge Poseidon Circuit (costs, benefits)I do not follow all of the details specifically in your last checkbox above, but having access to This seems like a very good first version. How do we make this more modular?
But keeping the hash allows devs to add more values into it in a compressed form: hash( For pseudo-IVC the prover could pre-commit to values at possible 'stop points' representing what the hash should be at those points (or public values), if the IVC circuit is stopped at a break point it could run the SNARK needed for proof. This would work something like Risc0 continuations. While completely getting rid of the costs of the hash by removing the challenge Poseidon circuit, it might be helpful to think in terms of a execution trace, what values are public, and/or prover pre-commitments to possible stop points. In the case where the full execution trace is not known, or is not needed, a user could decide only to do these steps where lookups are used. MLE-structured tables F.3.1 would be used instead. It would really be cool if we get this and also have great modularity :) EDIT Other observations: 'Address |
3d2313d
to
0580e95
Compare
Removing challenge Poseidon Circuit (costs, benefits)
Hi @wyattbenno777 thanks for the feedbacks. Let me try to expand more context on challenge part refactor plans. You proposed many good points which is not covered by this PR, as this PR still in preliminary version with goals
For this lookup implementation, the read_set/write_set accumulation which implemented in folding step circuit, are already in pretty low cost by just few fields operations. Here are a raw summary #48 (comment) as can see, the cost for an RW access just +15 R1CS var + 16 R1CS constraints. But when we introduce this lookup approach, there is an cost for challenges So one optimisation suggested by @adr1anh is can we seek another low cost way to accumulate and verify challenge
With this optimization, we do not need to encode poseidon circuit in folding step circuit, which is somehow achieve the match the real meaning of
Actually I am not quite sure about this question related to modularity if removing challenge from folding circuit. (The vision seems bigger), because challenges in lookup gadget are assumed to be blackbox and do not expect other application-level manipulation, so it will be easily to use and bug-pruning. Hopefully my elaboration for deferring challenge-integrity is clear :) |
Thank you for that very clear write-up on the LogUp proposed changes @hero78119 . That all seems very clear to be a SOTA implementation :) I am messing around with your code a bit in a fork specifically in the context of memory consistency checks.
I assume it was done this way in research and in Lasso for soundness. |
Hi all, and thank you for all the insightful discussion and work! I'd like to explain some thoughts around the construction using logUp, and to try and figure out how to make it more modular. Apologies in advance for the length of this response 😅. The main point I would like to make though, is that I think we may want to consider doing the audit as a circuit, rather than a specialized SNARK. Even though this may be less efficient, it will allow us to explore these primitives and figure out how they can be applied more generally to zkVM designs. It should also allow us to explore different approaches such as ECMH (described in the Spice paper ) which do not depend on challenges BackgroundIn the framework of Blum referenced by @wyattbenno777, we are maintaining two multi-sets The original paper instantiated the multi-sets The scheme can be more efficient for the verifier by implementing the multi-set abstraction in a more "ZK way" using some newer techniques. As I understand in Lasso, we are constructing a set Alternatively using logUp, we can write Both of these formulations are equivalent, assuming that When applied to memory checking, both parties need to derive The observation that @hero78119 implemented using logUp, is that we can combine both DescriptionInit:
Read
Write
Audit:
Application to NIVCThe above construction can easily be implemented purely in an NIVC setting. In the first phase, we modify each circuit's IO, so that it takes as in
The initial input of the program in the first iteration is Each step circuit updates
The final SuperNova verifier will need to check
We can probably find some better set of conditions to ensure uniqueness in the NIVC setting (I'm not yet 100% sure it is sound), but for now the definition of the audit circuit should satisfy our requirements. By implementing the audit circuit entirely in-circuit, we can ensure that the construction is entirely compatible with an existing SuperNova verifier, and would be a lot cheaper since we do not need any additional Sumcheck or PCS openings to verify the state. It is slightly more expensive in-circuit, but it does bring us closer to a more modular framework as @wyattbenno777 was talking about. Using a SNARK for the auditWe need to impose several extra restrictions if we want to use a SNARK-like approach for performing the audit. Firstly, we will need to define a fixed sized RAM where all addresses are integers in a range The read and write operations can be defined as before, and we can use the known bound During the audit SNARK, we need to supply 2 committed vectors The Sumcheck relation is then given by Here, exploit the fact that the verifier can easily evaluate the polynomial whose evaluations are the first We then need to check that Removing hashesUnfortunately, we cannot avoid the hashes that are performed during the read and write operation by the circuit. The main reason has to do with the semantics of the underlying interactive protocol that implements the multi-set that we are emulating in-circuit. Essentially, every tuple we add or remove from the set can be thought of as a value provided by the prover. It therefore appears in the corresponding Fiat-Shamir transcript. In our circuit implementation, we can think of the interaction as follows:
In the circuit, we are exploiting the non-determinism and can supply SoundnessIt seems to me that we can easily adapt the existing security proof from the existing literature to the above construction. Though for the sake of discussion, we can try to understand intuitively why it should hold. Before the audit happens, we can enforce the following invariant. By performing the reads and writes inside the circuit, we are ensuring that every update to At time and Two terms for the same address |
@adr1anh your summary and terminologies to formailize overall idea, is SUPER SUPER x 100 helpful for rest of us to speak on same math language 💯 💯 Thanks again for such a thorough and detail post. Based on your formalized terms, I can expand more on what current PR doing so it will helpful for other audience Background & Application in (N)IVCGiving multi-set
we can further expand
To elaborate each terms
Currently PR implement LogUp sumcheck with So
Currently in audit phase, I am checking
|
Challenge derivation
Even if the two sets are equal, they are constructed in a different order and therefore the insertions into each set will lead to In the description of the protocol I wrote about, there is a possible optimization that reduces the number of hashes, where we only use a single hash-chain
During any read or write, the prover has to supply
Hashing using PCSAdding polynomial commitments to the transcript/hash accumulator At the moment though, we can only use polynomial commitments alongside a SNARK-like protocol where we can use Sumcheck. Long-term, I think we should be able to incorporate these directly into the arithmetization layer, so that the Poseidon hashes can be removed entirely from the circuit. The main idea is that we allow the R1CS circuit to have additional witness columns Both techniques are morally equivalent, yet the PCS approach requires significantly more engineering to be usable. However, it seems like this will be the ultimate way of making the construction as efficient as possible in the long term. Moreover, I think it will lead to a cleaner overall design, where the entirety of the lookup logic is contained in-circuit, which gives an easier way to "opt-out".
I don't think this is any more true than with a SNARK, and in fact it is more flexible. With the SNARK approach, we are forced to use a fixed-size memory and the run-time of the prover will always be the size of the array. With the circuit approach, we only need to remove entries that were actually accessed. This allows us to consider a much larger address space, allowing for a mapping with arbitrary keys. Given that the number of unique addresses is variable in each IVC execution, we can write the audit circuit as I did above to be recursive, which allows us to run it an arbitrary number of times, and each iteration can audit a fixed number of entries. From a practical approach though, and given how much we still need to figure out about the construction in terms of efficiency and security, I think we should design the construction in such a way that it's efficiency can be better analyzed from a more theoretical perspective, which will help guide us towards an "optimal" solution that is also practical to implement.
The problem is that each entry is loaded non-deterministically twice and could be different (once when it is written, and once when it is read). In the honest case, both of these values will be equal, but we must assume it is not the case and protect against this case. We need to prevent against the case where the value and address being written can be chosen non-deterministically as they could then be chosen in a way that depends on It may technically be possible to argue that the entry added to the write set As I'm writing this though, I think we might be able to argue that adding |
9f4be89
to
c0e8730
Compare
add rwlookup test: heaplify add pre-computed challenge mechanism fix type mutabble in bench make clippy happy optimise constraints wip add lookupsnark reformat constraints scope to immutable stepcircuit lookupsnark test pass fix clippy sumcheck back to private module rollback sumcheck under ppsnark module to private enable defering challenge check separate permutation fingerprint challenges to alpha and gamma chores: mutability, typo and better rust syntax chores: use let-else instead cleanup map_aux_dirty read_row/write_row should be pass to verify func as arguments chores: code cosmetics implement intoiterator for lookup chores: fix let-else usage chores: remove tad pedantic usage optimize lookuprtace rw_trace processing tmp having RelaxedR1CSSNARKV2 for support table lookup efficiently implemenet compressedsnarkv2 implement LogUp approach and padding
c0e8730
to
dcc5bcf
Compare
I am rebasing this here and will make a PR in the next few days. I also moved/renamed some things into their own files so we do not have a bunch of V2 in one long file. |
rebased and PRed here |
Implementation idea derived from Lookup Argument in Nova tricks: permutation checking without re-ordering
This idea as a demo, aiming to bridge the last mile of Nova R1CS to handle below scenarios (well known of high cost of constraints in R1CS)
in pretty low cost (comparing with Merkle Tree approach, maybe SOTA?) per incremental computation (folding step): few field arithmetics + one random oracle hash for challenge, agnostic to lookup table size |T|! At the end just invoke an product sumcheck which can be integrated in nature to CompressionSnark with other claims together.
The test case show heapify creation on an array to build minheap. Array are representative as table support read/write. There will be O(N) round of foldings, where N is the size of array. In folding step i, there will be 3 read (parent node, 2 child nodes) and 4 write (update nodes). This case simply show that a table write in folding step i can be read later in folding step j for j > i.
R1CS constraints cost regarding to rw lookup
Table read/write are wrapped into a gadget so it can be easily integrated in other applications. Gadget are supposed to be used in non-deterministic primary circuit,
so there are other changes involved, basically set primary circuit as mutable reference so the lookup gadget can be updated and feeding non-deterministic aux witness easily.PR still under drafted with following pending tasks (excuse for bad coding style in early stage :P )
prove
/decompress inverify
got different resultWill keep ongoing work, and any review and feedback is welcome and appreciated :)