Robust, replayable proofs using unsat cores, (aka, hints, or how to replay verification in milliseconds instead of minutes)

In order to speed up verification replay times, F* is capable of recording Z3 unsat-cores. Unsat-cores contain a minimal set of facts that allow Z3 to prove unsat. By pruning the context and just keeping the unsat-core, verification can be made much faster, because Z3 doesn't see irrelevant facts.

However, although Z3 guarantees that an unsat core that it reports is logically unsatisfiable, it does not guarantee that it will be able to prove that an unsat core is indeed unsatisfiable, since such a proof may depend on heuristics triggered by facts that are not strictly part of the unsat core.

The mechanism works as follows:

if the file type-checks, then fstar --record_hints foo.fst writes into foo.fst.hints an unsat-core for every proof obligation that was successfully verified;
fstar --use_hints foo.fst first tries to use the hints (if any); if the hint does not result in a successful verification, then the verification condition is fed again to Z3, without any hints.

The two flags can be combined. A third flag, --hint_info, will tell you if the hints were successful or not. If the hints file is too stale, then it is time to regenerate it.

A fourth related flag --detail_hint_replay can help you diagnose unsat cores that Z3 fails to prove unsatisfiable.

Using hints in interactive mode

I use, in my ~/.emacs:

(setq fstar-subp-prover-args (lambda () `(
  "--use_hints" "--record_hints"
)))

The other technique is to have your Makefile generate the F* flags for you and use that to fill fstar-subp-prover-args, see https://github.com/mitls/mitls-fstar/#configuring-emacs-mode for examples.

However, as of 20170718, it looks like F* can only read hints in interactive mode, not write them. So, remember to refresh hints from the command-line once in a while.

When working in F* itself

Usually, make OTHERFLAGS=--record_hints is a good way to regenerate hints in the current directory. To regenerate all hints (it's good practice): cd src && make utest OTHERFLAGS=--record_hints

Achieving robust & resilient verification using hints

Having successful hint replay is a prerequisite to ensure that your file will continue to verify in the face of changes to F* and Z3. To check the health of a given file, make sure you record hints, then run F* again, passing it --use_hints --hint_info. If you get a message along the lines of:

(C:/Build/Agent1/_work/3/s/code/experimental/aesgcm/Crypto.Symmetric.GF128.fst(170,0-183,5))	Query-stats (Crypto.Symmetric.GF128.gf128_mul_loop, 3)	failed {reason-unknown=unknown because (incomplete quantifiers)} (with hint) in 171 milliseconds with fuel 1 and ifuel 0 and rlimit 21786240

(Hint-replay failed): C:/Build/Agent1/_work/3/s/code/experimental/aesgcm/Crypto.Symmetric.GF128.fst(168,43-168,63): (Info) could not prove post-condition

then your proof is not robust and may start failing as F* and Z3 evolve.

The first line of the message tells you that for the top-level term at Crypto.Symmetric.GF128.fst(170,0-183,5), specifically Crypto.Symmetric.GF128.gf128_mul_loop, the third proof obligation (that's the 3), F* tried to replay a previously discovered unsat core (that's the with hint) but failed because the SMT solver said unknown giving the reason incomplete quantifiers. This is a very common reason: for any problem with quantifiers (i.e., all F* problems) the SMT solver is inherently incomplete. Occasionally, the SMT solver may also say things like incomplete (theory arithmetic), if the proof requires non-linear arithmetic etc.

The next line of the message is not guaranteed to be present, but it should usually be there. Among all the properties that needed to be proved about Crypto.Symmetric.GF128.gf128_mul_loop, in this case only one property caused the unsat core to be unreplayable. In particular, the proof of the post-condition specified at Crypto.Symmetric.GF128.fst(168,43-168,63) is the culprit that's making your entire proof brittle.

At this point, you can provide a more detailed proof in F* for just those brittle parts, helping the SMT solver find a better unsat core from which it can easily rediscover a proof of unsatisfiability. Or, if you're an expert and comfortable working at the level of the smt2 files emitted by F* to Z3, you can try to understand what it is about those particular goals that are causing Z3 to find a brittle, unreplayable proof and fix the root cause of brittleness once and for all. This is not always easy. It requires a manual analysis of the libraries and their lemmas that your code is relying on to figure out which lemmas are stated in a brittle manner. An example of this is at the end of this page.

Occasionally, F* will not be able to report the the second Hint-replay failed diagnostic. This may happen when the SMT solver simply times out without providing F* with enough diagnostics to reconstruct an error message.

When this happens (hopefully not too often), there are two ways to proceed:

Run F* again, but this time with the option --detail_hint_replay

Here is a sample run of this feature.

$ fstar testhintreplay.fst --use_hints --detail_hint_replay
Detailed hint replay report follows for .\testhintreplay.fst(2,0-5,41)
Taking 5 seconds per proof obligation (3 proofs in total)
3, 2, 1,
OK: proof obligation at .\testhintreplay.fst(3,9-3,21) was proven
.\testhintreplay.fst(4,2-4,8): (Warning) Hint failed to replay this sub-proof: assertion failed (see also .\testhintreplay.fst(4,9-4,37))
.\testhintreplay.fst(5,2-5,8): (Warning) Hint failed to replay this sub-proof: assertion failed (see also .\testhintreplay.fst(5,9-5,41))
Verified module: TestHintReplay (569 milliseconds)
All verification conditions discharged successfully

This tells you that there were 3 sub-proofs involved in the top-level term at .\testhintreplay.fst(2,0-5,41).

The first sub-proof was easily rediscovered from the unsat core, but the two others failed.

If even --detail_hint_replay doesn't give you the information you need, then you can proceed manually

The --detail_hint_replay feature is relatively new (July 19th, 2017).

Prior to this feature, finding the sub-proof that was failing was very manual. The description of this manual process does have many useful insights, in particular, what one can try to do after having isolated the sub-proof that is brittle.

The steps are, roughly:

do a binary search on the post-condition to figure out which sub-term of the post-condition cannot be replayed, e.g. if you find that turning (ensures (a /\ b)) into (ensures a) makes the hint replayable, then you need to provide a better proof for b
once b has been identified, call extra lemmas to provide intermediary proof steps.

Generally speaking, a hint that fails to replay indicates some poorly-understood triggering behavior. At this stage, Profiling Z3 queries is what you need.

Analyzing one file manually is fine, but analyzing the build logs for an entire project is better done with a tool. One can use query-stats.py, located in the .scripts directory of F*. Some sample invocations:

# List all hints that failed to replay
    $FSTAR_HOME/.scripts/query-stats.py -f my_log_file \
      -n all --filter=fstar_usedhints=+ \
      --filter=fstar_tag=-
# List the 10 queries that took the longest time to verify along with some statistics
    $FSTAR_HOME/.scripts/query-stats.py -f my_log_file \
      -c -g -n 10

https://github.com/FStarLang/FStar/issues/900 may contain some useful discussion

An example analysis of the root cause of instability

An interesting source of brittleness was discovered in an old version of FStar.HyperStack.ST.

It causes even simple proofs like this to produce an unreplayable unsat core. This meant that any program making non-trivial use of FStar.HyperStack.ST's StackInline effect would have a brittle proof.

open FStar.HyperStack 
open FStar.HyperStack.ST
let test (h0:mem) (h1:mem) (h2:mem) = 
  assume (inline_stack_inv h0 h1);
  assume (inline_stack_inv h1 h2);  
  assert (inline_stack_inv h0 h2)

If you're writing libraries and making use of quantifiers then it's really worth your time to understand the causes of this kind of brittleness.

Here's the reason why the transitivity proof fails to replay.

This is the definition on inline_stack_inv:

  (* The frame invariant is enforced *)
  h.tip = h'.tip
  (* The heap structure is unchanged *)
  /\ Map.domain h.h == Map.domain h'.h
  (* Any region that is not the tip has no seen any allocation *)
  /\ (forall (r:HH.rid). {:pattern (Map.contains h.h r)} (r <> h.tip /\ Map.contains h.h r)
       ==> Heap.equal_dom (Map.sel h.h r) (Map.sel h'.h r))

The important part is the third clause. If the goal is inline_stack_inv h0 h2 then the goal is skolemized and we have an occurrence of Map.contains h0.h fresh_r

This triggers the quantifier in the context from inline_stack_inv h0 h1 producing an equal_dom (Map.sel h0.h fresh_r) (Map.sel h1.h fresh_r)

But, notice that there's no term Map.contains h1.h fresh_r term that's produced.

So, now we're stuck: there's no way to trigger the quantifier in inline_stack_inv h1 h2

Why the proof succeeds at all the first time is a mystery. Somehow, there must be some other assertion in the context contributing a Map.contains h1.h fresh_r probably resulting from Map.sel h1.h fresh_r.

Whatever that assertion is, it only contributes to triggering and so is excluded from the unsat core by Z3. Lacking it, the unsat core is unreplayable.

If we redefine inline_stack_inv like so:

  (* The frame invariant is enforced *)
  h.tip = h'.tip
  (* The heap structure is unchanged *)
  /\ Map.domain h.h == Map.domain h'.h
  (* Any region that is not the tip has no seen any allocation *)
  /\ (forall (r:HH.rid). {:pattern (Map.contains h.h r)} (r <> h.tip /\ Map.contains h.h r)
       ==> Heap.equal_dom (Map.sel h.h r) (Map.sel h'.h r) /\
           Map.contains h'.h r)

then all is well (notice the Map.contains h'.h r on the last line.

It's easy to prove this logically equivalent to the existing inline_stack_inv. But this one has more robust triggering behavior.

Notes from a hacking session on making one proof robust when all else fails

This is an example taken from SHA2_512.fst. The following function verifies, but its hints do not replay.

[@"substitute"]
private val update_core:
  hash_w :uint64_p {length hash_w = v size_hash_w} ->
  data   :uint8_p  {length data = v size_block /\ disjoint data hash_w} ->
  data_w :uint64_p {length data_w = v size_block_w /\ disjoint data_w hash_w} ->
  ws_w   :uint64_p {length ws_w = v size_ws_w /\ disjoint ws_w hash_w} ->
  k_w    :uint64_p {length k_w = v size_k_w /\ disjoint k_w hash_w} ->
  Stack unit
        (requires (fun h0 -> live h0 hash_w /\ live h0 data /\ live h0 data_w /\ live h0 ws_w /\ live h0 k_w
                  /\ reveal_h64s (as_seq h0 k_w) == Spec.k
                  /\ (reveal_h64s (as_seq h0 data_w) = Spec.Lib.uint64s_from_be (v size_block_w) (reveal_sbytes (as_seq h0 data)))
                  /\ (let w = reveal_h64s (as_seq h0 ws_w) in
                  let b = reveal_h64s (as_seq h0 data_w) in
                  (forall (i:nat). {:pattern (Seq.index w i)} i < 80 ==> Seq.index w i == Spec.ws b i))))
        (ensures  (fun h0 r h1 -> live h0 hash_w /\ live h0 data /\ live h0 data_w /\ live h1 hash_w /\ modifies_1 hash_w h0 h1
                  /\ (let seq_hash_0 = reveal_h64s (as_seq h0 hash_w) in
                  let seq_hash_1 = reveal_h64s (as_seq h1 hash_w) in
                  let seq_block = reveal_sbytes (as_seq h0 data) in
                  let res = Spec.update seq_hash_0 seq_block in
                  seq_hash_1 == res)))

#reset-options "--z3refresh --max_fuel 0  --z3rlimit 400"

[@"substitute"]
let update_core hash_w data data_w ws_w k_w =
  (**) assert_norm(pow2 32 = 0x100000000);
  (**) assert_norm(pow2 64 = 0x10000000000000000);
  (**) assert_norm(pow2 125 = 42535295865117307932921825928971026432);
  (**) let h0 = ST.get() in

  (* Push a new frame *)
  (**) push_frame();
  (**) let h1 = ST.get() in
  (**) assert(let b = Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data)) in
              reveal_h64s (as_seq h0 data_w) == b);

  (* Allocate space for converting the data block *)
  let hash_0 = Buffer.create (u64_to_h64 0uL) size_hash_w in
  (**) let h2 = ST.get() in
  (**) no_upd_lemma_0 h1 h2 data;
  (**) no_upd_lemma_0 h1 h2 data_w;
  (**) no_upd_lemma_0 h1 h2 ws_w;
  (**) no_upd_lemma_0 h1 h2 k_w;
  (**) no_upd_lemma_0 h1 h2 hash_w;

  (* Keep track of the the current working hash from the state *)
  copy_hash hash_0 hash_w;
  (**) let h3 = ST.get() in
  (**) no_upd_lemma_1 h2 h3 hash_0 data;
  (**) no_upd_lemma_1 h2 h3 hash_0 data_w;
  (**) no_upd_lemma_1 h2 h3 hash_0 ws_w;
  (**) no_upd_lemma_1 h2 h3 hash_0 k_w;
  (**) no_upd_lemma_1 h2 h3 hash_0 hash_w;

  (* Step 2 : Initialize the eight working variables *)
  (* Step 3 : Perform logical operations on the working variables *)
  (* Step 4 : Compute the ith intermediate hash value *)
  shuffle hash_0 data_w ws_w k_w;
  (**) let h4 = ST.get() in
  (**) assert(let b = Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data)) in
              let ha = Spec.shuffle (reveal_h64s (as_seq h0 hash_w)) b in
              as_seq h4 hash_w == as_seq h0 hash_w /\
              reveal_h64s (as_seq h4 hash_0) == ha);
  (**) no_upd_lemma_1 h3 h4 hash_0 data;
  (**) no_upd_lemma_1 h3 h4 hash_0 data_w;
  (**) no_upd_lemma_1 h3 h4 hash_0 ws_w;
  (**) no_upd_lemma_1 h3 h4 hash_0 k_w;
  (**) no_upd_lemma_1 h3 h4 hash_0 hash_w;

  (* Use the previous one to update it inplace *)
  sum_hash hash_w hash_0;
  (**) let h5 = ST.get() in
  (**) assert(let x = reveal_h64s (as_seq h4 hash_w) in
          let y = reveal_h64s (as_seq h4 hash_0) in
          x == reveal_h64s (as_seq h0 hash_w) /\
          y == Spec.shuffle (reveal_h64s (as_seq h0 hash_w)) (Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data))));
  (**) assert(let x = reveal_h64s (as_seq h0 hash_w) in
         let y = Spec.shuffle (reveal_h64s (as_seq h0 hash_w)) (Spec.words_from_be Spec.size_block_w (reveal_sbytes (as_seq h0 data))) in
         let z = reveal_h64s (as_seq h5 hash_w) in
         let z' = Spec.Loops.seq_map2 (fun x y -> FStar.UInt64.(x +%^ y)) x y in
         z == z');
  (**) no_upd_lemma_1 h4 h5 hash_w data;
  (**) no_upd_lemma_1 h4 h5 hash_w data_w;
  (**) no_upd_lemma_1 h4 h5 hash_w ws_w;
  (**) no_upd_lemma_1 h4 h5 hash_w k_w;

  (* Pop the frame *)
  (**) pop_frame()

To figure out that this function's hints do not replay, I ran the following twice (once to record, once to replay):

jonathan@chartreuse:~/Code/hacl-star/code/hash (migrating_sts_bis) $ make Hacl.Hash.SHA2_512.fst-verify OTHERFLAGS="--hint_info --record_hints" 2>&1 | tee log

The second time, I got:

(./Hacl.Hash.SHA2_512.fst(587,0-650,18))	Query-stats (Hacl.Hash.SHA2_512.update_core, 1)	failed (with hint) in 216 milliseconds with fuel 0 and ifuel 1 and rlimit 217862400 
Failed hint:
...

There is a lot of reasoning in this file, to show that the stateful implementation matches the pure specification. These proofs are typically expensive, requiring a lot of Z3 reasoning. Making sure these proofs are replayable is thus important.

The next step is figuring out which part of the post-condition is responsible for non-replayability. I manually edited the file as follows, making everything lax-checked except for the body of the function I was interested in.

#set-options "--lax"
... many definitions ...
val update_core: ...
#reset-options
let update_core = ...
#set-options "--lax"

Running my F* invocation twice again confirmed that the hint was still failing to replay: the culprit was not in the val but, really, in the let.

I started bisecting: I commented out various bits of the post-condition, running my F* invocation twice every time, until I found that the modifies_1 clause was the culprit: just leaving this in the post-condition generates a non-replayable hint.

There are a lot of modification-related lemmas that are being called, but none of them seem to help with the replayability: it is thus time to do a manual proof.

I commented out every single assert in the function, to ensure that Z3 is not working to prove unrelated things, and set the post-condition to True. I switched to interactive mode: the function was now going through in a fraction of a second.

The various liveness buffers are in FStar/ulib/FStar.Buffer.fst. I briefly skimmed the file to refresh my memory, then wrote out intermediary asserts.

  (**) let h0 = ST.get() in

  (* Push a new frame *)
  (**) push_frame();
  (**) let h1 = ST.get() in

  (* Allocate space for converting the data block *)
  let hash_0 = Buffer.create (u64_to_h64 0uL) size_hash_w in
  (**) let h2 = ST.get() in
  assert (modifies_0 h1 h2);

  (* Keep track of the the current working hash from the state *)
  copy_hash hash_0 hash_w;
  (**) let h3 = ST.get() in
  assert (modifies_1 hash_0 h2 h3);

  shuffle hash_0 data_w ws_w k_w;
  (**) let h4 = ST.get() in
  assert (modifies_1 hash_0 h3 h4);

  (* Use the previous one to update it inplace *)
  sum_hash hash_w hash_0;
  (**) let h5 = ST.get() in
  assert (modifies_1 hash_w h4 h5);

  (* Pop the frame *)
  (**) pop_frame();
  let h6 = ST.get () in
  ()

These asserts are easy: I found them by looking at the post-conditions of copy_hash, shuffle and sum_hash. I now have intermediary asserts that relate h1 to h2; h2 to h3; h3 to h4; h4 to h5. Time to use some transitivity lemmas.

Going from h2 to h3 modifies hash_0; going from h3 to h4 modifies hash_0. I can use transitivity of the modifies_1 clause: calling lemma_modifies_1_trans hash_0 h2 h3 h4 allows assert (modifies_1 hash_0 h2 h4) to succeed and now relates h2 to h4.
Relating h2 to h5 requires a different modifies clause, because going from h2 to h4 modifies hash_0 while going from h4 to h5 modifies hash_w. I can combine two modifies_1 into a modifies_2:

  lemma_modifies_1_1 hash_0 hash_w h2 h4 h5;
  assert (modifies_2 hash_0 hash_w h2 h5);

We have now related h2 to h5.

Interestingly, h1 and h5 mark the beginning and the end of the function's frame. Note that hash_0 was allocated in that temporary frame, and dies after the pop_frame. Thus, going from h0 to h6, among all addresses in h0, hash_w is the only buffer that's modified. In other words: modifies_1 hash_w h0 h6.

The lemma that we need, in order to work with the popped frame and conclude modifies_1 hash_w h0 h6 is:

let modifies_popped_1 (#t:Type) (a:buffer t) h0 h1 h2 h3 : Lemma
  (requires (live h0 a /\ fresh_frame h0 h1 /\ popped h2 h3 /\ modifies_2_1 a h1 h2))
  (ensures  (modifies_1 a h0 h3))

This lemma uses a new modifies clause: while modifies_1 b means that, among all buffers in all frames, only b was modified, modifies_2_1 b means that, in b's frame, b is the only modified buffer; all other frames remain unchanged, except for possibly the tip.

This modifies_2_1 clause is just what we need: if the tip was modified in an unspecified manner, and if, among all other frames, b is the only modified buffer, then once we pop the tip, modifies_1 b holds.

  lemma_modifies_0_2' hash_w hash_0 h1 h2 h5;
  assert (modifies_2_1 hash_w h1 h5);

We can kill two birds with one stone and stitch h1 and h5 together, obtaining a modifies_2_1 clause.

The final lemma call is:

  modifies_popped_1 hash_w h0 h1 h5 h6

At this stage, I have a proof of modifies_1 that goes through, in the interactive mode, in a fraction of a second.

  (**) let h0 = ST.get() in

  (* Push a new frame *)
  (**) push_frame();
  (**) let h1 = ST.get() in

  (* Allocate space for converting the data block *)
  let hash_0 = Buffer.create (u64_to_h64 0uL) size_hash_w in
  (**) let h2 = ST.get() in
  assert (modifies_0 h1 h2);

  (* Keep track of the the current working hash from the state *)
  copy_hash hash_0 hash_w;
  (**) let h3 = ST.get() in
  assert (modifies_1 hash_0 h2 h3);

  shuffle hash_0 data_w ws_w k_w;
  (**) let h4 = ST.get() in
  assert (modifies_1 hash_0 h3 h4);
  lemma_modifies_1_trans hash_0 h2 h3 h4;
  assert (modifies_1 hash_0 h2 h4);

  (* Use the previous one to update it inplace *)
  sum_hash hash_w hash_0;
  (**) let h5 = ST.get() in
  assert (modifies_1 hash_w h4 h5);
  lemma_modifies_1_1 hash_0 hash_w h2 h4 h5;
  assert (modifies_2 hash_0 hash_w h2 h5);
  lemma_modifies_0_2' hash_w hash_0 h1 h2 h5;
  assert (modifies_2_1 hash_w h1 h5);

  (* Pop the frame *)
  (**) pop_frame();
  let h6 = ST.get () in
  modifies_popped_1 hash_w h0 h1 h5 h6

Since modifies implies liveness, none of the original no_upd_* lemmas are necessary.

I un-comment the modifies_1 post-condition; re-running my F* invocation twice, I can see that the hints are now replayable. Un-commenting the rest of the post-condition and restoring the original asserts and lemmas for the functional correctness of the function, the hints are still replayable.

Provide feedback

Saved searches