-
Notifications
You must be signed in to change notification settings - Fork 0
/
ch07.xml
143 lines (134 loc) · 8.63 KB
/
ch07.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<chapter id="ch07">
<title>Assembly language</title>
<para><indexterm><primary>Assembly language</primary></indexterm>This chapter shows you how to build and link against a POWER8 SHA assembly language routine. The function is Cryptogams SHA-256 compression function.</para>
<para>A second example of PowerPC assembly language is provided for the DARN random number generator. The generator is available on POWER9 machines, but the compiler intrinsics and builtins to access the generator are mostly missing.</para>
<section id="asm_cryptogams">
<title>Cryptogams</title>
<para><indexterm><primary>Cryptogams</primary></indexterm><indexterm><primary>Andy Polyakov</primary></indexterm><indexterm><primary>Cryptogams</primary><secondary>Andy Polyakov</secondary></indexterm><ulink url="https://www.openssl.org/~appro/cryptogams/">Cryptogams</ulink> is the incubator used by Andy Polyakov to develop assembly language routines for OpenSSL. Andy dual licenses his implementations and a more permissive license is available for his assembly language source code.</para>
<para>The steps that follow were carried out on <systemitem>gcc112</systemitem>, which is ppc64-le. Andy's GitHub is located at <ulink url="https://github.com/dot-asm">dot-asm</ulink>, so clone the project and read the README.</para>
<screen>$ git clone https://github.com/dot-asm/cryptogams
$ cd cryptogams
</screen>
<para>The README contains instructions for using the source files:</para>
<blockquote>
<para>"Flavor" refers to ABI family or specific OS. E.g. x86_64 scripts recognize "elf", "elf32", "macosx", "mingw64", "nasm". PPC scripts recognize "linux32", "linux64", "linux64le", "aix32", "aix64", "osx32", "osx64", and so on...</para>
</blockquote>
<para>Unfortunately Andy has not uploaded the SHA gear to Cryptogams so you will have to switch to OpenSSL to get the Cryptogams sources. Make a <systemitem>cryptogams</systemitem> directory, and then copy <systemitem>sha512p8-ppc.pl</systemitem> and <systemitem>ppc-xlate.pl</systemitem> from the OpenSSL source directory:</para>
<screen>$ mkdir cryptogams
$ cp openssl/crypto/sha/asm/sha512p8-ppc.pl cryptogams/
$ cp openssl/crypto/perlasm/ppc-xlate.pl cryptogams/
$ cd cryptogams/</screen>
<para>Next examine the head notes in <systemitem>sha512p8-ppc.pl</systemitem>, which is used to create the source files for SHA-256 and SHA-512. The comments say the script takes two arguments. The first is a <quote>flavor</quote>, and the 32 or 64 is used to convey the platform architecture. Adding <quote>le</quote> to flavor will produce a source file for a little endian machine. The second argument is <quote>output</quote>, and 256 or 512 in the output filename selects either SHA-256 or SHA-512.</para>
<para>The commands to produce a SHA-256 assembly source file for <systemitem>gcc112</systemitem> and assemble it are shown below.</para>
<screen>$ ./sha512p8-ppc.pl linux64le sha256le_compress.s
$ as -mpower8 sha256le_compress.s -o sha256le_compress.o
</screen>
<para>The head notes in <systemitem>sha512p8-ppc.pl</systemitem> do not state the public API. However the source file <systemitem>crypto/ppccap.c</systemitem> says:</para>
<screen>$ grep -IR sha256_block_p8 *
crypto/ppccap.c:void sha256_block_p8(void *ctx, const void *inp,
size_t len);
...</screen>
<para>The signature using <systemitem>len</systemitem> is a bit misleading since it is a block count, and not a byte count. The signature for <systemitem>sha256_block_p8</systemitem> is better documented as shown below. There are no alignment requirements for <systemitem>state</systemitem> or <systemitem>input</systemitem>.</para>
<programlisting><?code-font-size 75% ?>void sha256_block_p8(uint32_t *state,
const uint8_t *input, size_t blocks);
</programlisting>
<para>Finally, a program that links to <systemitem>sha256_block_p8</systemitem> might look like the following.</para>
<programlisting><?code-font-size 75% ?>$ cat test.cxx
#include <stdio.h>
#include <string.h>
#include <stdint.h>
extern "C" {
void sha256_block_p8(uint32_t*, const uint8_t*, size_t);
}
int main(int argc, char* argv[])
{
/* empty message with padding */
uint8_t message[64];
memset(message, 0x00, sizeof(message));
message[0] = 0x80;
/* initial state */
uint32_t state[8] = {
0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a,
0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19
};
size_t blocks = sizeof(message)/64;
sha256_block_p8(state, message, blocks);
const uint8_t b1 = (uint8_t)(state[0] >> 24);
const uint8_t b2 = (uint8_t)(state[0] >> 16);
const uint8_t b3 = (uint8_t)(state[0] >> 8);
const uint8_t b4 = (uint8_t)(state[0] >> 0);
const uint8_t b5 = (uint8_t)(state[1] >> 24);
const uint8_t b6 = (uint8_t)(state[1] >> 16);
const uint8_t b7 = (uint8_t)(state[1] >> 8);
const uint8_t b8 = (uint8_t)(state[1] >> 0);
/* e3b0c44298fc1c14... */
printf("SHA256 hash of empty message: ");
printf("%02X%02X%02X%02X%02X%02X%02X%02X...\n",
b1, b2, b3, b4, b5, b6, b7, b8);
int success = ((b1 == 0xE3) && (b2 == 0xB0) &&
(b3 == 0xC4) && (b4 == 0x42) &&
(b5 == 0x98) && (b6 == 0xFC) &&
(b7 == 0x1C) && (b8 == 0x14));
if (success)
printf("Success!\n");
else
printf("Failure!\n");
return (success != 0 ? 0 : 1);
}
</programlisting>
<para>Compiling and linking to <systemitem>sha256le_compress.o</systemitem> would look similar to below.</para>
<screen>$ g++ -mcpu=power8 test.cxx sha256le_compress.o -o test.exe
$ ./test.exe
SHA256 hash of empty message: E3B0C44298FC1C14...
Success!
</screen>
</section>
<section id="asm_darn">
<title>DARN generator</title>
<para><indexterm><primary>DARN</primary></indexterm>POWER9 added the DARN random number generator. The DARN generator is a hardware based random number generator like <indexterm><primary>RDRAND</primary></indexterm><systemitem>RDRAND</systemitem> on Intel systems. The generator returns either 32-bits or 64-bits of random material.</para>
<para>The generator is accessed using builtins or assembly language. <indexterm><primary>__builtin_darn_32</primary></indexterm><systemitem>__builtin_darn_32</systemitem> and <indexterm><primary>__builtin_darn_64</primary></indexterm><systemitem>__builtin_darn_64</systemitem> are available on GCC 7.0 and above, but missing on XLC and Clang. To access the generator under most compilers you should use assembly language. To cover old compilers, like GCC 4.8.5, you can issue the instruction's byte codes.</para>
<para>The code below provides a 32-bit random number. The while loop checks for error conditions. On failure <systemitem>0xffffffff:ffffffff</systemitem> is returned, but the Power ISA 3.0 specification recommends only checking the low 32-bit word.</para>
<programlisting><?code-font-size 75% ?>inline void DARN32(uint32_t* output)
{
do
{
// This is "darn r3, 0". When L=0 a conditioned
// 32-bit word is returned.
__asm__ __volatile__ (
#if (__BIG_ENDIAN__)
".byte 0x7c, 0x60, 0x05, 0xe6 \n\t" // r3 = darn 3, 0
"mr %0, 3 \n\t" // output = r3
#else
".byte 0xe6, 0x05, 0x60, 0x7c \n\t" // r3 = darn 3, 0
"mr %0, 3 \n\t" // output = r3
#endif
: "=r" (*output) : : "r3"
);
} while (*output == 0xFFFFFFFF);
}
</programlisting>
<para>And the following is the 64-bit version of the function. On failure <systemitem>0xffffffff:ffffffff</systemitem> is returned, and the full value is checked.</para>
<programlisting><?code-font-size 75% ?>inline void DARN64(uint64_t* output)
{
do
{
// This is "darn r3, 1". When L=1 a conditioned
// 64-bit word is returned.
__asm__ __volatile__ (
#if (__BIG_ENDIAN__)
".byte 0x7c, 0x61, 0x05, 0xe6 \n\t" // r3 = darn 3, 1
"mr %0, 3 \n\t" // output = r3
#else
".byte 0xe6, 0x05, 0x61, 0x7c \n\t" // r3 = darn 3, 1
"mr %0, 3 \n\t" // output = r3
#endif
: "=r" (*output) : : "r3"
);
} while (*output == 0xFFFFFFFFFFFFFFFFull);
}
</programlisting>
<para><indexterm><primary>Compile farm</primary></indexterm><indexterm><primary>Compile farm</primary><secondary>gcc135</secondary></indexterm>You can test the code on <systemitem class="domainname">gcc135</systemitem> at the <xref linkend="intro_cfarm"/>.</para>
</section>
</chapter>