mirror of https://github.com/tevador/RandomX
Fixed some undefined behavior with signed types Fixed different results on big endian systems Removed unused code files Restored FNEG_R instructions Updated documentationpull/20/head
parent
a586751f6b
commit
32d827d0a6
@ -1,130 +1,103 @@
|
||||
|
||||
# RandomX instruction listing
|
||||
There are 31 unique instructions divided into 3 groups:
|
||||
|
||||
|group|# operations|# opcodes||
|
||||
|---------|-----------------|----|-|
|
||||
|integer (IA)|22|144|56.3%|
|
||||
|floating point (FP)|5|76|29.7%|
|
||||
|control (CL)|4|36|14.0%
|
||||
||**31**|**256**|**100%**
|
||||
|
||||
|
||||
## Integer instructions
|
||||
There are 22 integer instructions. They are divided into 3 classes (MATH, DIV, SHIFT) with different B operand selection rules.
|
||||
For integer instructions, the destination is always an integer register (register group R). Source operand (if applicable) can be either an integer register or memory value. If `dst` and `src` refer to the same register, most instructions use `imm32` as the source operand instead of the register. This is indicated in the 'src == dst' column.
|
||||
|
||||
|# opcodes|instruction|class|signed|A width|B width|C|C width|
|
||||
Memory operands are loaded as 8-byte values from the address indicated by `src`. This indirect addressing is marked with square brackets: `[src]`.
|
||||
|
||||
|frequency|instruction|dst|src|`src == dst ?`|operation|
|
||||
|-|-|-|-|-|-|-|-|
|
||||
|12|ADD_64|MATH|no|64|64|`A + B`|64|
|
||||
|2|ADD_32|MATH|no|32|32|`A + B`|32|
|
||||
|12|SUB_64|MATH|no|64|64|`A - B`|64|
|
||||
|2|SUB_32|MATH|no|32|32|`A - B`|32|
|
||||
|21|MUL_64|MATH|no|64|64|`A * B`|64|
|
||||
|10|MULH_64|MATH|no|64|64|`A * B`|64|
|
||||
|15|MUL_32|MATH|no|32|32|`A * B`|64|
|
||||
|15|IMUL_32|MATH|yes|32|32|`A * B`|64|
|
||||
|10|IMULH_64|MATH|yes|64|64|`A * B`|64|
|
||||
|4|DIV_64|DIV|no|64|32|`A / B`|64|
|
||||
|4|IDIV_64|DIV|yes|64|32|`A / B`|64|
|
||||
|4|AND_64|MATH|no|64|64|`A & B`|64|
|
||||
|2|AND_32|MATH|no|32|32|`A & B`|32|
|
||||
|4|OR_64|MATH|no|64|64|`A | B`|64|
|
||||
|2|OR_32|MATH|no|32|32|`A | B`|32|
|
||||
|4|XOR_64|MATH|no|64|64|`A ^ B`|64|
|
||||
|2|XOR_32|MATH|no|32|32|`A ^ B`|32|
|
||||
|3|SHL_64|SHIFT|no|64|6|`A << B`|64|
|
||||
|3|SHR_64|SHIFT|no|64|6|`A >> B`|64|
|
||||
|3|SAR_64|SHIFT|yes|64|6|`A >> B`|64|
|
||||
|6|ROL_64|SHIFT|no|64|6|`A <<< B`|64|
|
||||
|6|ROR_64|SHIFT|no|64|6|`A >>> B`|64|
|
||||
|
||||
#### 32-bit operations
|
||||
Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero.
|
||||
|
||||
#### Multiplication
|
||||
There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result.
|
||||
|
||||
#### Division
|
||||
For the division instructions, the dividend is 64 bits long and the divisor 32 bits long. The IDIV_64 instruction interprets both operands as signed integers. In case of division by zero or signed overflow, the result is equal to the dividend `A`.
|
||||
|
||||
75% of division instructions use a runtime-constant divisor and can be optimized using a multiplication and shifts.
|
||||
|
||||
#### Shift and rotate
|
||||
The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`imm8` is used as the immediate value). All treat `A` as unsigned except SAR_64, which performs an arithmetic right shift by copying the sign bit.
|
||||
|12/256|IADD_R|R|R|`src = imm32`|`dst = dst + src`|
|
||||
|7/256|IADD_M|R|mem|`src = imm32`|`dst = dst + [src]`|
|
||||
|16/256|IADD_RC|R|R|`src = dst`|`dst = dst + src + imm32`|
|
||||
|12/256|ISUB_R|R|R|`src = imm32`|`dst = dst - src`|
|
||||
|7/256|ISUB_M|R|mem|`src = imm32`|`dst = dst - [src]`|
|
||||
|9/256|IMUL_9C|R|-|-|`dst = 9 * dst + imm32`|
|
||||
|16/256|IMUL_R|R|R|`src = imm32`|`dst = dst * src`|
|
||||
|4/256|IMUL_M|R|mem|`src = imm32`|`dst = dst * [src]`|
|
||||
|4/256|IMULH_R|R|R|`src = dst`|`dst = (dst * src) >> 64`|
|
||||
|1/256|IMULH_M|R|mem|`src = imm32`|`dst = (dst * [src]) >> 64`|
|
||||
|4/256|ISMULH_R|R|R|`src = dst`|`dst = (dst * src) >> 64` (signed)|
|
||||
|1/256|ISMULH_M|R|mem|`src = imm32`|`dst = (dst * [src]) >> 64` (signed)|
|
||||
|4/256|IDIV_C|R|-|-|`dst = dst + dst / imm32`|
|
||||
|4/256|ISDIV_C|R|-|-|`dst = dst + dst / imm32` (signed)|
|
||||
|2/256|INEG_R|R|-|-|`dst = -dst`|
|
||||
|16/256|IXOR_R|R|R|`src = imm32`|`dst = dst ^ src`|
|
||||
|4/256|IXOR_M|R|mem|`src = imm32`|`dst = dst ^ [src]`|
|
||||
|10/256|IROR_R|R|R|`src = imm32`|`dst = dst >>> src`|
|
||||
|4/256|ISWAP_R|R|R|`src = dst`|`temp = src; src = dst; dst = temp`|
|
||||
|
||||
#### IMULH and ISMULH
|
||||
These instructions output the high 64 bits of the whole 128-bit multiplication result. The result differs for signed and unsigned multiplication (`IMULH` is unsigned, `ISMULH` is signed). The variants with a register source operand do not use `imm32` (they perform a squaring operation if `dst` equals `src`).
|
||||
|
||||
#### IDIV_C and ISDIV_C
|
||||
The division instructions use a constant divisor, so they can be optimized into a [multiplication by fixed-point reciprocal](https://en.wikipedia.org/wiki/Division_algorithm#Division_by_a_constant). `IDIV_C` performs unsigned division (`imm32` is zero-extended to 64 bits), while `ISDIV_C` performs signed division. In the case of division by zero, the instructions become a no-op. In the very rare case of signed overflow, the destination register is set to zero.
|
||||
|
||||
#### ISWAP_R
|
||||
This instruction swaps the values of two registers. If source and destination refer to the same register, the result is a no-op.
|
||||
|
||||
## Floating point instructions
|
||||
There are 5 floating point instructions. All floating point instructions are vector instructions that operate on two packed double precision floating point values.
|
||||
|
||||
|# opcodes|instruction|C|
|
||||
|-|-|-|
|
||||
|20|FPADD|`A + B`|
|
||||
|20|FPSUB|`A - B`|
|
||||
|22|FPMUL|`A * B`|
|
||||
|8|FPDIV|`A / B`|
|
||||
|6|FPSQRT|`sqrt(abs(A))`|
|
||||
|
||||
#### Conversion of operand A
|
||||
Operand A is loaded from memory as a 64-bit value. All floating point instructions interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values.
|
||||
For floating point instructions, the destination can be a group F or group E register. Source operand is either a group A register or a memory value.
|
||||
|
||||
Memory operands are loaded as 8-byte values from the address indicated by `src`. The 8 byte value is interpreted as two 32-bit signed integers and implicitly converted to floating point format. The lower and upper memory operands are marked as `[src][0]` and `[src][1]`.
|
||||
|
||||
|frequency|instruction|dst|src|operation|
|
||||
|-|-|-|-|-|-|-|
|
||||
|8/256|FSWAP_R|F+E|-|`(dst0, dst1) = (dst1, dst0)`|
|
||||
|20/256|FADD_R|F|A|`(dst0, dst1) = (dst0 + src0, dst1 + src1)`|
|
||||
|5/256|FADD_M|F|mem|`(dst0, dst1) = (dst0 + [src][0], dst1 + [src][1])`|
|
||||
|20/256|FSUB_R|F|A|`(dst0, dst1) = (dst0 - src0, dst1 - src1)`|
|
||||
|5/256|FSUB_M|F|mem|`(dst0, dst1) = (dst0 - [src][0], dst1 - [src][1])`|
|
||||
|6/256|FNEG_R|F|-|`(dst0, dst1) = (-dst0, -dst1)`|
|
||||
|20/256|FMUL_R|E|A|`(dst0, dst1) = (dst0 * src0, dst1 * src1)`|
|
||||
|4/256|FDIV_M|E|mem|`(dst0, dst1) = (dst0 / [src][0], dst1 / [src][1])`|
|
||||
|6/256|FSQRT_R|E|-|`(dst0, dst1) = (√dst0, √dst1)`|
|
||||
|
||||
#### Denormal and NaN values
|
||||
Due to restrictions on the values of the floating point registers, no operation results in `NaN`.
|
||||
`FDIV_M` can produce a denormal result. In that case, the result is set to `DBL_MIN = 2.22507385850720138309e-308`, which is the smallest positive normal number.
|
||||
|
||||
#### Rounding
|
||||
FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` control instruction. Denormal values must be always flushed to zero.
|
||||
|
||||
#### NaN
|
||||
If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN).
|
||||
|
||||
## Control instructions
|
||||
There are 4 control instructions.
|
||||
|
||||
|# opcodes|instruction|description|condition|
|
||||
|-|-|-|-|
|
||||
|2|FPROUND|change floating point rounding mode|-
|
||||
|11|JUMP|conditional jump|(see condition table below)
|
||||
|11|CALL|conditional procedure call|(see condition table below)
|
||||
|12|RET|return from procedure|stack is not empty
|
||||
All floating point instructions give correctly rounded results. The rounding mode depends on the value of the `fprc` register:
|
||||
|
||||
All control instructions behave as 'arithmetic no-op' and simply copy the input operand A into the destination C.
|
||||
|
||||
The JUMP and CALL instructions use a condition function, which takes the lower 32 bits of operand B (register) and the value `imm32` and evaluates a condition based on the `B.LOC.C` flag:
|
||||
|
||||
|`B.LOC.C`|signed|jump condition|probability|*x86*|*ARM*
|
||||
|---|---|----------|-----|--|----|
|
||||
|0|no|`B <= imm32`|0% - 100%|`JBE`|`BLS`
|
||||
|1|no|`B > imm32`|0% - 100%|`JA`|`BHI`
|
||||
|2|yes|`B - imm32 < 0`|50%|`JS`|`BMI`
|
||||
|3|yes|`B - imm32 >= 0`|50%|`JNS`|`BPL`
|
||||
|4|yes|`B - imm32` overflows|0% - 50%|`JO`|`BVS`
|
||||
|5|yes|`B - imm32` doesn't overflow|50% - 100%|`JNO`|`BVC`
|
||||
|6|yes|`B < imm32`|0% - 100%|`JL`|`BLT`
|
||||
|7|yes|`B >= imm32`|0% - 100%|`JGE`|`BGE`
|
||||
|
||||
The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).*
|
||||
|
||||
### FPROUND
|
||||
The FPROUND instruction changes the rounding mode for all subsequent FPU operations depending on a two-bit flag. The flag is calculated by rotating A `imm8` bits to the right and taking the two least-significant bits:
|
||||
|
||||
```
|
||||
rounding flag = (A >>> imm8)[1:0]
|
||||
```
|
||||
|
||||
|rounding flag|rounding mode|
|
||||
|`fprc`|rounding mode|
|
||||
|-------|------------|
|
||||
|00|roundTiesToEven|
|
||||
|01|roundTowardNegative|
|
||||
|10|roundTowardPositive|
|
||||
|11|roundTowardZero|
|
||||
|0|roundTiesToEven|
|
||||
|1|roundTowardNegative|
|
||||
|2|roundTowardPositive|
|
||||
|3|roundTowardZero|
|
||||
|
||||
The rounding modes are defined by the IEEE-754 standard.
|
||||
|
||||
*The two-bit flag value exactly corresponds to bits 13-14 of the x86 `MXCSR` register and bits 23 and 22 (reversed) of the ARM `FPSCR` register.*
|
||||
## Other instructions
|
||||
There are 4 special instructions that have more than one source operand or the destination operand is a memory value.
|
||||
|
||||
### JUMP
|
||||
If the jump condition is `true`, the JUMP instruction performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)` bytes (1-128 instructions forward).
|
||||
|frequency|instruction|dst|src|operation|
|
||||
|-|-|-|-|-|
|
||||
|7/256|COND_R|R|R, `imm32`|`if(condition(src, imm32)) dst = dst + 1`
|
||||
|1/256|COND_M|R|mem, `imm32`|`if(condition([src], imm32)) dst = dst + 1`
|
||||
|1/256|CFROUND|`fprc`|R, `imm32`|`fprc = src >>> imm32`
|
||||
|16/256|ISTORE|mem|R|`[dst] = src`
|
||||
|
||||
### CALL
|
||||
If the jump condition is `true`, the CALL instruction pushes the value of `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)` bytes (1-128 instructions forward).
|
||||
#### COND
|
||||
|
||||
### RET
|
||||
If the stack is not empty, the RET instruction pops the return address from the stack (it's the instruction following the previous CALL) and jumps to it.
|
||||
These instructions conditionally increment the destination register. The condition function depends on the `mod.cond` flag and takes the lower 32 bits of the source operand and the value `imm32`.
|
||||
|
||||
## Reference implementation
|
||||
A portable C++ implementation of all integer and floating point instructions is available in [instructionsPortable.cpp](../src/instructionsPortable.cpp).
|
||||
|`mod.cond`|signed|`condition`|probability|*x86*|*ARM*
|
||||
|---|---|----------|-----|--|----|
|
||||
|0|no|`src <= imm32`|0% - 100%|`JBE`|`BLS`
|
||||
|1|no|`src > imm32`|0% - 100%|`JA`|`BHI`
|
||||
|2|yes|`src - imm32 < 0`|50%|`JS`|`BMI`
|
||||
|3|yes|`src - imm32 >= 0`|50%|`JNS`|`BPL`
|
||||
|4|yes|`src - imm32` overflows|0% - 50%|`JO`|`BVS`
|
||||
|5|yes|`src - imm32` doesn't overflow|50% - 100%|`JNO`|`BVC`
|
||||
|6|yes|`src < imm32`|0% - 100%|`JL`|`BLT`
|
||||
|7|yes|`src >= imm32`|0% - 100%|`JGE`|`BGE`
|
||||
|
||||
The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected probability the condition is true (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).*
|
||||
|
||||
#### CFROUND
|
||||
This instruction sets the value of the `fprc` register to the 2 least significant bits of the source register rotated right by `imm32`. This changes the rounding mode of all subsequent floating point instructions.
|
||||
|
||||
#### ISTORE
|
||||
The `ISTORE` instruction stores the value of the source integer register to the memory at the address specified by the destination register. The `src` and `dst` register can be the same.
|
||||
|
@ -1,182 +1,91 @@
|
||||
# RandomX instruction encoding
|
||||
The instruction set was designed in such way that any random 16-byte word is a valid instruction and any sequence of valid instructions is a valid program. There are no syntax rules.
|
||||
|
||||
The encoding of each 128-bit instruction word is following:
|
||||
# RandomX instruction set architecture
|
||||
RandomX VM is a complex instruction set computer ([CISC](https://en.wikipedia.org/wiki/Complex_instruction_set_computer)). All data are loaded and stored in little-endian byte order. Signed integer numbers are represented using [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement). Floating point numbers are represented using the [IEEE-754 double precision format](https://en.wikipedia.org/wiki/Double-precision_floating-point_format).
|
||||
|
||||
![Imgur](https://i.imgur.com/xi8zuAZ.png)
|
||||
## Registers
|
||||
|
||||
## opcode
|
||||
There are 256 opcodes, which are distributed between 3 groups of instructions. There are 31 distinct operations (each operation can be encoded using multiple opcodes - for example opcodes `0x00` to `0x0d` correspond to integer addition).
|
||||
RandomX has 8 integer registers `r0`-`r7` (group R) and a total of 12 floating point registers split into 3 groups: `a0`-`a3` (group A), `f0`-`f3` (group F) and `e0`-`e3` (group E). Integer registers are 64 bits wide, while floating point registers are 128 bits wide and contain a pair of floating point numbers. The lower and upper half of floating point registers are not separately addressable.
|
||||
|
||||
**Table 1: Instruction groups**
|
||||
*Table 1: Addressable register groups*
|
||||
|
||||
|group|# operations|# opcodes||
|
||||
|---------|-----------------|----|-|
|
||||
|integer (IA)|22|144|56.3%|
|
||||
|floating point (FP)|5|76|29.7%|
|
||||
|control (CL)|4|36|14.0%
|
||||
||**31**|**256**|**100%**
|
||||
|
||||
Full description of all instructions: [isa-ops.md](isa-ops.md).
|
||||
|index|R|A|F|E|F+E|
|
||||
|--|--|--|--|--|--|
|
||||
|0|`r0`|`a0`|`f0`|`e0`|`f0`|
|
||||
|1|`r1`|`a1`|`f1`|`e1`|`f1`|
|
||||
|2|`r2`|`a2`|`f2`|`e2`|`f2`|
|
||||
|3|`r3`|`a3`|`f3`|`e3`|`f3`|
|
||||
|4|`r4`||||`e0`|
|
||||
|5|`r5`||||`e1`|
|
||||
|6|`r6`||||`e2`|
|
||||
|7|`r7`||||`e3`|
|
||||
|
||||
## A.LOC
|
||||
**Table 2: `A.LOC` encoding**
|
||||
Besides the directly addressable registers above, there is a 2-bit `fprc` register for rounding control, which is an implicit destination register of the `CFROUND` instruction, and two architectural 32-bit registers `ma` and `mx`, which are not accessible to any instruction.
|
||||
|
||||
|bits|description|
|
||||
|----|--------|
|
||||
|0-1|`A.LOC.W` flag|
|
||||
|2-5|Reserved|
|
||||
|6-7|`A.LOC.X` flag|
|
||||
|
||||
The `A.LOC.W` flag determines the address width when reading operand A from the scratchpad:
|
||||
Integer registers `r0`-`r7` can be the source or the destination operands of integer instructions or may be used as address registers for loading the source operand from the memory (scratchpad).
|
||||
|
||||
**Table 3: Operand A read address width**
|
||||
Floating point registers `a0`-`a3` are read-only and may not be written to except at the moment a program is loaded into the VM. They can be the source operand of any floating point instruction. The value of these registers is restricted to the interval `[1, 4294967296)`.
|
||||
|
||||
|`A.LOC.W`|address width (W)|
|
||||
|---------|-|
|
||||
|0|15 bits (256 KiB)|
|
||||
|1-3|11 bits (16 KiB)|
|
||||
Floating point registers `f0`-`f3` are the *additive* registers, which can be the destination of floating point addition and subtraction instructions. The absolute value of these registers will not exceed `1.0e+12`.
|
||||
|
||||
If the `A.LOC.W` flag is zero, the address space covers the whole 256 KiB scratchpad. Otherwise, just the first 16 KiB of the scratchpad are addressed.
|
||||
Floating point registers `e0`-`e3` are the *multiplicative* registers, which can be the destination of floating point multiplication, division and square root instructions. Their value is always positive.
|
||||
|
||||
If the `A.LOC.X` flag is zero, the instruction mixes the scratchpad read address into the `mx` register using XOR. This mixing happens before the address is truncated to W bits (see pseudocode below).
|
||||
## Instruction encoding
|
||||
|
||||
## A.REG
|
||||
**Table 4: `A.REG` encoding**
|
||||
Each instruction word is 64 bits long and has the following format:
|
||||
|
||||
|bits|description|
|
||||
|----|--------|
|
||||
|0-2|`A.REG.R` flag|
|
||||
|3-7|Reserved|
|
||||
|
||||
The `A.REG.R` flag encodes "readAddressRegister", which is an integer register `r0`-`r7` to be used for scratchpad read address generation. Read address is generated as follows (pseudocode):
|
||||
|
||||
```python
|
||||
readAddressRegister = IntegerRegister(A.REG.R)
|
||||
readAddressRegister = readAddressRegister XOR SignExtend(A.mask32)
|
||||
readAddress = readAddressRegister[31:0]
|
||||
# dataset is read if the ic register is divisible by 64
|
||||
IF ic mod 64 == 0:
|
||||
DatasetRead(readAddress)
|
||||
# optional mixing into the mx register
|
||||
IF A.LOC.X == 0:
|
||||
mx = mx XOR readAddress
|
||||
# truncate to W bits
|
||||
W = GetAddressWidth(A.LOC.W)
|
||||
readAddress = readAddress[W-1:0]
|
||||
```
|
||||
|
||||
Note that the value of the read address register is modified during address generation.
|
||||
|
||||
## B.LOC
|
||||
**Table 5: `B.LOC` encoding**
|
||||
|
||||
|bits|description|
|
||||
|----|--------|
|
||||
|0-1|`B.LOC.L` flag|
|
||||
|0-2|`B.LOC.C` flag|
|
||||
|3-7|Reserved|
|
||||
![Imgur](https://i.imgur.com/FtkWRwe.png)
|
||||
|
||||
The `B.LOC.L` flag determines the B operand. It can be either a register or immediate value.
|
||||
### opcode
|
||||
There are 256 opcodes, which are distributed between 35 distinct instructions. Each instruction can be encoded using multiple opcodes (the number of opcodes specifies the frequency of the instruction in a random program).
|
||||
|
||||
**Table 6: Operand B**
|
||||
*Table 2: Instruction groups*
|
||||
|
||||
|`B.LOC.L`|IA/DIV|IA/SHIFT|IA/MATH|FP|CL|
|
||||
|----|--------|----|------|----|---|
|
||||
|0|register|`imm8`|`imm32`|register|register|
|
||||
|1|`imm32`|register|register|register|register|
|
||||
|2|`imm32`|`imm8`|register|register|register|
|
||||
|3|`imm32`|register|register|register|register|
|
||||
|group|# instructions|# opcodes||
|
||||
|---------|-----------------|----|-|
|
||||
|integer |20|143|55.9%|
|
||||
|floating point |11|88|34.4%|
|
||||
|other |4|25|9.7%|
|
||||
||**35**|**256**|**100%**
|
||||
|
||||
Integer instructions are split into 3 classes: integer division (IA/DIV), shift and rotate (IA/SHIFT) and other (IA/MATH). Floating point (FP) and control (CL) instructions always use a register operand.
|
||||
Full description of all instructions: [isa-ops.md](isa-ops.md).
|
||||
|
||||
Register to be used as operand B is encoded in the `B.REG.R` flag (see below).
|
||||
### dst
|
||||
Destination register. Only bits 0-1 (register groups A, F, E) or 0-2 (groups R, F+E) are used to encode a register according to Table 1.
|
||||
|
||||
The `B.LOC.C` flag determines the condition for the JUMP and CALL instructions. The flag partially overlaps with the `B.LOC.L` flag.
|
||||
### src
|
||||
|
||||
## B.REG
|
||||
**Table 7: `B.REG` encoding**
|
||||
The `src` flag encodes a source operand register according to Table 1 (only bits 0-1 or 0-2 are used).
|
||||
|
||||
|bits|description|
|
||||
|----|--------|
|
||||
|0-2|`B.REG.R` flag|
|
||||
|3-7|Reserved|
|
||||
Immediate value `imm32` is used as the source operand in cases when `dst` and `src` encode the same register.
|
||||
|
||||
Register encoded by the `B.REG.R` depends on the instruction group:
|
||||
For register-memory instructions, the source operand determines the `address_base` value for calculating the memory address (see below).
|
||||
|
||||
**Table 8: Register operands by group**
|
||||
### mod
|
||||
|
||||
|group|registers|
|
||||
|----|--------|
|
||||
|IA|`r0`-`r7`|
|
||||
|FP|`f0`-`f7`|
|
||||
|CL|`r0`-`r7`|
|
||||
The `mod` flag is encoded as:
|
||||
|
||||
## C.LOC
|
||||
**Table 9: `C.LOC` encoding**
|
||||
*Table 3: mod flag encoding*
|
||||
|
||||
|bits|description|
|
||||
|`mod`|description|
|
||||
|----|--------|
|
||||
|0-1|`C.LOC.W` flag|
|
||||
|2|`C.LOC.R` flag|
|
||||
|3-6|Reserved|
|
||||
|7|`C.LOC.H` flag|
|
||||
|
||||
The `C.LOC.W` flag determines the address width when writing operand C to the scratchpad:
|
||||
|0-1|`mod.mem` flag|
|
||||
|2-4|`mod.cond` flag|
|
||||
|5-7|Reserved|
|
||||
|
||||
**Table 10: Operand C write address width**
|
||||
|
||||
|`C.LOC.W`|address width (W)|
|
||||
|---------|-|
|
||||
|0|15 bits (256 KiB)|
|
||||
|1-3|11 bits (16 KiB)|
|
||||
|
||||
If the `C.LOC.W` flag is zero, the address space covers the whole 256 KiB scratchpad. Otherwise, just the first 16 KiB of the scratchpad are addressed.
|
||||
|
||||
The `C.LOC.R` determines the destination where operand C is written:
|
||||
|
||||
**Table 11: Operand C destination**
|
||||
|
||||
|`C.LOC.R`|groups IA, CL|group FP
|
||||
|---------|-|-|
|
||||
|0|scratchpad|register
|
||||
|1|register|register + scratchpad
|
||||
|
||||
Integer and control instructions (groups IA and CL) write either to the scratchpad or to a register. Floating point instructions always write to a register and can also write to the scratchpad. In that case, flag `C.LOC.H` determines if the low or high half of the register is written:
|
||||
|
||||
**Table 12: Floating point register write**
|
||||
|
||||
|`C.LOC.H`|write bits|
|
||||
|---------|----------|
|
||||
|0|0-63|
|
||||
|1|64-127|
|
||||
|
||||
## C.REG
|
||||
**Table 13: `C.REG` encoding**
|
||||
|
||||
|bits|description|
|
||||
|----|--------|
|
||||
|0-2|`C.REG.R` flag|
|
||||
|3-7|Reserved|
|
||||
The `mod.mem` flag determines the address mask when reading from or writing to memory:
|
||||
|
||||
The destination register encoded in the `C.REG.R` flag encodes both the write address register (if writing to the scratchpad) and the destination register (if writing to a register). The destination register depends on the instruction group (see Table 8). Write address is always generated from an integer register:
|
||||
*Table 3: memory address mask*
|
||||
|
||||
```python
|
||||
writeAddressRegister = IntegerRegister(C.REG.R)
|
||||
writeAddress = writeAddressRegister[31:0] XOR C.mask32
|
||||
# truncate to W bits
|
||||
W = GetAddressWidth(C.LOC.W)
|
||||
writeAddress = writeAddress [W-1:0]
|
||||
```
|
||||
|`mod.mem`|`address_mask`|(scratchpad level)|
|
||||
|---------|-|---|
|
||||
|0|262136|(L2)|
|
||||
|1-3|16376|(L1)|
|
||||
|
||||
## imm8
|
||||
`imm8` is an 8-bit immediate value that is used as the B operand by IA/SHIFT instructions (see Table 6). Additionally, it's used by some control instructions.
|
||||
Table 3 applies to all memory accesses except for cases when the source operand is an immediate value. In that case, `address_mask` is equal to 2097144 (L3).
|
||||
|
||||
## A.mask32
|
||||
`A.mask32` is a 32-bit address mask that is used to calculate the read address for the A operand. It's sign-extended to 64 bits before use.
|
||||
The address for reading/writing is calculated by applying bitwise AND operation to `address_base` and `address_mask`.
|
||||
|
||||
## imm32
|
||||
`imm32` is a 32-bit immediate value which is used for integer instructions from groups IA/DIV and IA/OTHER (see Table 6). The immediate value is sign-extended for instructions that expect 64-bit operands.
|
||||
The `mod.cond` flag is used only by the `COND` instruction to select a condition to be tested.
|
||||
|
||||
## C.mask32
|
||||
`C.mask32` is a 32-bit address mask that is used to calculate the write address for the C operand. `C.mask32` is equal to `imm32`.
|
||||
### imm32
|
||||
A 32-bit immediate value that can be used as the source operand. The immediate value is sign-extended to 64 bits in most cases.
|
||||
|
@ -1,72 +0,0 @@
|
||||
/*
|
||||
Copyright (c) 2018 tevador
|
||||
|
||||
This file is part of RandomX.
|
||||
|
||||
RandomX is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
RandomX is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with RandomX. If not, see<http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
// Based on:
|
||||
// *Really* minimal PCG32 code / (c) 2014 M.E. O'Neill / pcg-random.org
|
||||
// Licensed under Apache License 2.0 (NO WARRANTY, etc. see website)
|
||||
|
||||
#pragma once
|
||||
#include <cstdint>
|
||||
|
||||
#if defined(_MSC_VER)
|
||||
#pragma warning (disable : 4146)
|
||||
#endif
|
||||
|
||||
class Pcg32 {
|
||||
public:
|
||||
typedef uint32_t result_type;
|
||||
static constexpr result_type min() { return 0U; }
|
||||
static constexpr result_type max() { return UINT32_MAX; }
|
||||
Pcg32(const void* seed) {
|
||||
auto* u64seed = (const uint64_t*)seed;
|
||||
state = *(u64seed + 0);
|
||||
inc = *(u64seed + 1) | 1ull;
|
||||
}
|
||||
Pcg32(uint64_t state, uint64_t inc) : state(state), inc(inc | 1ull) {
|
||||
}
|
||||
result_type operator()() {
|
||||
return next();
|
||||
}
|
||||
result_type getUniform(result_type min, result_type max) {
|
||||
const result_type range = max - min;
|
||||
const result_type erange = range + 1;
|
||||
result_type ret;
|
||||
|
||||
for (;;) {
|
||||
ret = next();
|
||||
if (ret / erange < UINT32_MAX / erange || UINT32_MAX % erange == range) {
|
||||
ret %= erange;
|
||||
break;
|
||||
}
|
||||
}
|
||||
return ret + min;
|
||||
}
|
||||
private:
|
||||
uint64_t state;
|
||||
uint64_t inc;
|
||||
result_type next() {
|
||||
uint64_t oldstate = state;
|
||||
// Advance internal state
|
||||
state = oldstate * 6364136223846793005ULL + inc;
|
||||
// Calculate output function (XSH RR), uses old state for max ILP
|
||||
uint32_t xorshifted = ((oldstate >> 18u) ^ oldstate) >> 27u;
|
||||
uint32_t rot = oldstate >> 59u;
|
||||
return (xorshifted >> rot) | (xorshifted << (-rot & 31));
|
||||
}
|
||||
};
|
@ -0,0 +1,99 @@
|
||||
#pragma once
|
||||
#include <stdint.h>
|
||||
#include <string.h>
|
||||
|
||||
#if defined(_MSC_VER)
|
||||
#define FORCE_INLINE __inline
|
||||
#elif defined(__GNUC__) || defined(__clang__)
|
||||
#define FORCE_INLINE __inline__
|
||||
#else
|
||||
#define FORCE_INLINE
|
||||
#endif
|
||||
|
||||
/* Argon2 Team - Begin Code */
|
||||
/*
|
||||
Not an exhaustive list, but should cover the majority of modern platforms
|
||||
Additionally, the code will always be correct---this is only a performance
|
||||
tweak.
|
||||
*/
|
||||
#if (defined(__BYTE_ORDER__) && \
|
||||
(__BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__)) || \
|
||||
defined(__LITTLE_ENDIAN__) || defined(__ARMEL__) || defined(__MIPSEL__) || \
|
||||
defined(__AARCH64EL__) || defined(__amd64__) || defined(__i386__) || \
|
||||
defined(_M_IX86) || defined(_M_X64) || defined(_M_AMD64) || \
|
||||
defined(_M_ARM)
|
||||
#define NATIVE_LITTLE_ENDIAN
|
||||
#endif
|
||||
/* Argon2 Team - End Code */
|
||||
|
||||
static FORCE_INLINE uint32_t load32(const void *src) {
|
||||
#if defined(NATIVE_LITTLE_ENDIAN)
|
||||
uint32_t w;
|
||||
memcpy(&w, src, sizeof w);
|
||||
return w;
|
||||
#else
|
||||
const uint8_t *p = (const uint8_t *)src;
|
||||
uint32_t w = *p++;
|
||||
w |= (uint32_t)(*p++) << 8;
|
||||
w |= (uint32_t)(*p++) << 16;
|
||||
w |= (uint32_t)(*p++) << 24;
|
||||
return w;
|
||||
#endif
|
||||
}
|
||||
|
||||
static FORCE_INLINE uint64_t load64(const void *src) {
|
||||
#if defined(NATIVE_LITTLE_ENDIAN)
|
||||
uint64_t w;
|
||||
memcpy(&w, src, sizeof w);
|
||||
return w;
|
||||
#else
|
||||
const uint8_t *p = (const uint8_t *)src;
|
||||
uint64_t w = *p++;
|
||||
w |= (uint64_t)(*p++) << 8;
|
||||
w |= (uint64_t)(*p++) << 16;
|
||||
w |= (uint64_t)(*p++) << 24;
|
||||
w |= (uint64_t)(*p++) << 32;
|
||||
w |= (uint64_t)(*p++) << 40;
|
||||
w |= (uint64_t)(*p++) << 48;
|
||||
w |= (uint64_t)(*p++) << 56;
|
||||
return w;
|
||||
#endif
|
||||
}
|
||||
|
||||
static FORCE_INLINE void store32(void *dst, uint32_t w) {
|
||||
#if defined(NATIVE_LITTLE_ENDIAN)
|
||||
memcpy(dst, &w, sizeof w);
|
||||
#else
|
||||
uint8_t *p = (uint8_t *)dst;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
#endif
|
||||
}
|
||||
|
||||
static FORCE_INLINE void store64(void *dst, uint64_t w) {
|
||||
#if defined(NATIVE_LITTLE_ENDIAN)
|
||||
memcpy(dst, &w, sizeof w);
|
||||
#else
|
||||
uint8_t *p = (uint8_t *)dst;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
w >>= 8;
|
||||
*p++ = (uint8_t)w;
|
||||
#endif
|
||||
}
|
@ -1,57 +0,0 @@
|
||||
/*
|
||||
Copyright (c) 2018 tevador
|
||||
|
||||
This file is part of RandomX.
|
||||
|
||||
RandomX is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation, either version 3 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
RandomX is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with RandomX. If not, see<http://www.gnu.org/licenses/>.
|
||||
*/
|
||||
|
||||
#include <cstdint>
|
||||
#include "common.hpp"
|
||||
|
||||
namespace RandomX {
|
||||
|
||||
extern "C" {
|
||||
void ADD_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void ADD_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void SUB_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void SUB_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void MUL_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void MULH_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void MUL_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void IMUL_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void IMULH_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void DIV_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void IDIV_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void AND_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void AND_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void OR_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void OR_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void XOR_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void XOR_32(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void SHL_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void SHR_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void SAR_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void ROL_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
void ROR_64(convertible_t& a, convertible_t& b, convertible_t& c);
|
||||
bool JMP_COND(uint8_t, convertible_t&, int32_t);
|
||||
void FPINIT();
|
||||
void FPROUND(convertible_t, uint8_t);
|
||||
void FADD(convertible_t& a, fpu_reg_t& b, fpu_reg_t& c);
|
||||
void FSUB(convertible_t& a, fpu_reg_t& b, fpu_reg_t& c);
|
||||
void FMUL(convertible_t& a, fpu_reg_t& b, fpu_reg_t& c);
|
||||
void FDIV(convertible_t& a, fpu_reg_t& b, fpu_reg_t& c);
|
||||
void FSQRT(convertible_t& a, fpu_reg_t& b, fpu_reg_t& c);
|
||||
}
|
||||
}
|
File diff suppressed because it is too large
Load Diff
Loading…
Reference in new issue