-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rethinking the bounds calculations #75
Comments
Note that the encoding in Appendix D of the spec actually does not work. In particular, the requirement for the representable region having |
@vmurali Looks thorough! I haven't digested the whole thing, but would like to ask about one detail which might help elucidate the rest.
Is there value in allowing Es that push B and T beyond the 32-bit range? We have typically capped valid Es at the value that puts the implied msb of l at the 33rd bit, that is, the entire address space (0x00000000-0xFFFFFFFF). |
Yes, I mean, all E's are now valid, including 24 to 31. It makes it seamless to use memcpy using capability-load and capability-store while storing only expanded caps in the registers (where top and base are both 41 bits) |
Is there a problem with interpreting everything above 23 as 23 when encoding/decoding while storing the original E? I'm not sure what all the tradeoffs are, but there should be fewer muxes into the upper bits of expanded T & B, if that's worth anything. |
See issue #45 |
You still need to store some version of the compressed caps' T and B if you want to use cap-load and cap-store for memcpy's (which is necessary because in a struct, you don't know which fields are caps). This encoding (storing 41 bits) eliminates the need for special logic to decide if E is |
This incorporates the first suggestion you gave in #45 , namely reusing a bit from |
I think I understand why you're storing the 41 bits. We have previously solved the same problem by changing the interpretation of E >= 23 to being the same as 23 to avoid needing to shift T & B higher than 33. This also works with memcpy; you preserve all the bits of T & B; you just leave them at the same place as you would if E==23. |
Right. But you still need to store the |
How does "storing 41 bits" for each bound use "the same number of bits" as storing 33 bits for each bound? This would be at the cost of "special case handling", as you say, but I don't know what the complexity of that is after boolean optimization. (The logic that feeds each layer of muxing in the shifter would be different, though I really know if it would be more complex. It would likely be simpler as you go up to higher bits.) |
If you just have 33 bits, when a cap-load instruction is used, you get 64 bits and a tag bit. Let's say the tag is not set (i.e. this is part of memcpy of a struct). Now, you need to preserve the 64 bits when writing it back. How will you do that unless you recover the Anyway, this overall is a minor point. The algorithm (and the circuit decomplexity that comes with it) is more pertinent than whether a few bits are saved. |
If I understood your proposal, you keep 41 bits per bound with ~9 bits potentially non-zero, and the rest zero. When you store it out, you select the ~9 bits back out of the 41-bit value to derive the original T & B for the 64-bit format. I'm suggesting that if you decode the T and B into 33 bits instead, interpreting all E>=23 as 23, you will also preserve T and B in all cases and can select them out when re-deriving the original 64-bit format. For example, if E happens to be 30, you shift T and B up to the top of their 33-bit decompressed bounds, preserving E=30 in the register. When storing back out, you interpret E=30 as 23 so that you correctly select the upper bits to put in back in the T and B fields of the 64-bit format. |
Right, I think that works too. |
I implemented the algorithm in sail in #76 . I also changed the encoding to reflect base (B) and difference (M) rather than base (B) and top (T). M is 8 bits, while B is still 9 bits, and there's an extra ME field of 1 bit used either for exponent (when exponent is not zero) or the MSB of M. As expected, all tests and examples in cheriot-rtos pass. |
There's a different algorithm to calculate the exponent and B(ase), T(op) given a base and length that has the following properties:
Here's the algorithm. Given a base$b$ and length $l$ , calculate the following:
$S$ = mantissa size ($S = 9$ for CHERIoT; $S \ge 1$ should hold for the proof below)
$F$ = full size ($F = 32$ for CHERIoT; $F \ge S$ should hold for the proof below)
$e = \left\lceil lg(\left\lfloor \frac{l}{2^S} \right\rfloor) \right\rceil$ (This can be implemented similar to how it is done in sail specs using $F-S$ and a correction subtracting 1 if
$d = \left\lfloor \frac{l}{2^e} \right\rfloor$ (This is just $e$ LSB bits are removed)
$i = \left\lceil \frac{l\mod{2^e} + b\mod{2^e}}{2^e} \right\rceil$ (Uses an adder and a couple of$\frac{x \mod{2^{e+1}}}{2^e}$ is just the 2 MSB bits of $x[e+1:0]$ ))
countLeadingZeros
and subtracting fromcountOnes
is 1)truncToMSB_variable
wherebitSelect
's (The quantity$(d + i) \cdot 2^e$ is such that $(b - (b \mod{2^e})+ ((d + i) \cdot 2^e) \ge b + l)$ , that is $e$ and $(d+i)$ gives (decoded) exponent and the required "difference between $\lt 2^S$ ; but let's come back to this mantissa problem in a bit.)
T
andB
". (We are still not done because we need the mantissa a.k.a difference to beLemma 1:$(b - (b\mod{2^e}) + ((d + i) \cdot 2^e)) \ge (b + l)$
Proof:
Now let's get the bounds for$d$ and $i$ .
Lemma 2: if$e \ge 1$ , $d \ge 2^{S-1}$ . (This means the MSB of the difference between $e \ge 1$ , so no need to do any more shifts to the left.)
T
andB
is already guaranteed to be 1 whenProof:
Lemma 3:$d \lt 2^{S+1}$ (This bounds the number of shifts we need to do to the right; we also need to know the bound for $i$ for the maximum number of shifts for $d + i$ .)
Proof:
Lemma 4:$i \le 2$
Proof:
Lemma 5:$(d + i) \le 2^{S+1} + 1$ . (So we need to shift $(d + i)$ right by at most 3 to keep it $\le 2^9 - 1$ ; you need 1 shift to bring from 11 bits to 10 bits, and you need two shifts to bring from 10 bits to 9 bits (for example, when $(d+i) = 2^{10}-1$ ))
$\square$
Proof:
From Lemmas 3 and 4
Lemma 6:$x \cdot 2^y \le (\left\lfloor \frac{x}{2} \right\rfloor + x \mod{2}) \cdot 2^{y+1}$ . This gives us a way to right shift $(d+i)$ to keep it $\le 2^S - 1$ )
Proof:
Let the final value (after right shifts) of mantissa be$m$ and final value of exponent be $E$ . $m$ represents $\left\lfloor \frac{b}{2^{E}} \right\rfloor$ , which is essentially $m$ to $\left\lfloor \frac{b}{2^E}\right\rfloor$ is split into MSB and "mid" bits exactly as in the current algorithm and $c_b$ , $c_t$ are calculated in the same manner as the current algorithm.)
T
-B
.B
can be calculated usingtruncToMSB_variable
.T
can be calculated by addingB
. (I am ignoring the distinction between MSB bits and the "mid" bits here.Since$m =$ $E \ge 1$ ), we don't have to represent $m$ , i.e. the difference, instead of
T
-B
is guaranteed to have MSB set to 1 (whenT
with 9 bits, and instead we can use 8 bits to representT
(This part is Jon Woodruff's optimization suggestion; we need an adder to calculate the MSB ofT
. Alternatively, we can just storeT
, which will avoid the addition.)Note that for all values of$E \lt (F-S)$ any expanded cap will always have a canonical representation as a compressed cap since the MSB of $m = 1$ . So, any 64-bit value loaded during a memcpy can be expanded as a cap and stored in the registers canonically if $E \lt (F-S)$ . For $E \ge (F-S)$ however, the left shifting will push the $m$ out of 32-bit range, and so we need additional 9 bits to hold the shifted values. In fact, the additional 9 bits will hold exactly the $m$ . So, if we store 41 bits for base and 41 for length (or top) in the registers, recompressing them will give back the original compressed encoding always, even if the original value was not a cap.
B
andT
orB
andT
orThe text was updated successfully, but these errors were encountered: