* Optimized dataset read
There was a false dependency on readReg2 and readReg3 (caused by `xor rbp, rax` instruction) when reading dataset item (see design.md - 4.6.2 Loop execution, steps 5 and 7). This change uses `ma` register to read dataset item before the whole `rbp` (`ma` and `mx`) is changed, so superscalar and out-of-order CPU can start executing it earlier.
Results: https://i.imgur.com/Bpeq9mx.png
~1% speedup on modern Intel/AMD CPUs.
* ARMv8: optimized dataset read
Break dependency from readReg2 and readReg3.
* Fixed light mode hashing
Fix compilation and JIT support on NetBSD
1. Disable hugepages (not supported).
2. Force W^X (required).
3. When allocating JIT memory, PROT_EXEC must be reserved
in order to set the pages executable later.
1. Disable hugepages (not supported).
2. Force W^X (required).
3. When allocating JIT memory, PROT_EXEC must be reserved
in order to set the pages executable later.
Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.
* Fixed CMake configuration for visual studio build
Added proper asm source and set correct type.
* Disabled stadard layout check of randomx_cache for visual studio debug
Required to silence static_assert which fails on Visual Studio Debug
configuation.
* Fixed warning message and defines check
* Removed unsupported flags for MSVC compiler
* Enabled AVX2 for msvc
* Fixed formatting in CmakeLists
* Added generation of configuration.asm by CMake for MSVC