* Optimized dataset read
There was a false dependency on readReg2 and readReg3 (caused by `xor rbp, rax` instruction) when reading dataset item (see design.md - 4.6.2 Loop execution, steps 5 and 7). This change uses `ma` register to read dataset item before the whole `rbp` (`ma` and `mx`) is changed, so superscalar and out-of-order CPU can start executing it earlier.
Results: https://i.imgur.com/Bpeq9mx.png
~1% speedup on modern Intel/AMD CPUs.
* ARMv8: optimized dataset read
Break dependency from readReg2 and readReg3.
* Fixed light mode hashing
Fix compilation and JIT support on NetBSD
1. Disable hugepages (not supported).
2. Force W^X (required).
3. When allocating JIT memory, PROT_EXEC must be reserved
in order to set the pages executable later.
1. Disable hugepages (not supported).
2. Force W^X (required).
3. When allocating JIT memory, PROT_EXEC must be reserved
in order to set the pages executable later.
Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.
To build portable binaries, omit the `ARCH` option when executing cmake.
### Windows
On Windows, it is possible to build using MinGW (same procedure as on Linux) or using Visual Studio (solution file is provided).
@ -63,6 +65,8 @@ RandomX was primarily designed as a PoW algorithm for [Monero](https://www.getmo
* The key `K` is selected to be the hash of a block in the blockchain - this block is called the 'key block'. For optimal mining and verification performance, the key should change every 2048 blocks (~2.8 days) and there should be a delay of 64 blocks (~2 hours) between the key block and the change of the key `K`. This can be achieved by changing the key when `blockHeight % 2048 == 64` and selecting key block such that `keyBlockHeight % 2048 == 0`.
* The input `H` is the standard hashing blob with a selected nonce value.
RandomX was successfully activated on the Monero network on the 30th November 2019.
If you wish to use RandomX as a PoW algorithm for your cryptocurrency, please follow the [configuration guidelines](doc/configuration.md).
**Note**: To achieve ASIC resistance, the key `K` must change and must not be miner-selectable. We recommend to use blockchain data as the key in a similar way to the Monero example above. If blockchain data cannot be used for some reason, use a predefined sequence of keys.
@ -108,7 +112,12 @@ Most Intel and AMD CPUs made since 2011 should be fairly efficient at RandomX. M
* DDR4 memory is limited to about 4000-6000 H/s per channel (depending on frequency and timings)
### Does RandomX facilitate botnets/malware mining or web mining?
Efficient mining requires more than 2 GiB of memory, which is difficult to hide in an infected computer and disqualifies many low-end machines such as IoT devices. Web mining is infeasible due to the large memory requirement and the lack of directed rounding support for floating point operations in both Javascript and WebAssembly.
Due to the way the algorithm works, mining malware is much easier to detect. [RandomX Sniffer](https://github.com/tevador/randomx-sniffer) is a proof of concept tool that can detect illicit mining activity on Windows.
Efficient mining requires more than 2 GiB of memory, which also disqualifies many low-end machines such as IoT devices, which are often parts of large botnets.
Web mining is infeasible due to the large memory requirement and the lack of directed rounding support for floating point operations in both Javascript and WebAssembly.
### Since RandomX uses floating point math, does it give reproducible results on different platforms?
The L3 cache is much larger and located further from the CPU core. As a result, its access latencies are much higher and can cause stalls in program execution.
#elif defined(__PPC64__) && defined(__ALTIVEC__) && defined(__VSX__) //sadly only POWER7 and newer will be able to use SIMD acceleration. Earlier processors cant use doubles or 64 bit integers with SIMD
#include<cstdint>
#include<stdexcept>
@ -201,6 +206,7 @@ typedef union{
#define rx_aligned_alloc(a, b) malloc(a)
#define rx_aligned_free(a) free(a)
#define rx_prefetch_nta(x)
#define rx_prefetch_t0(x)
/* Splat 64-bit long long to 2 64-bit long longs */
// Affected CPU models and stepping numbers are taken from https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf