From e3561d661e57ba4ca81a6ef44273ff26bc89434c Mon Sep 17 00:00:00 2001
From: tevador <tevador@gmail.com>
Date: Sun, 1 Dec 2019 11:52:56 +0100
Subject: [PATCH 1/8] Updated readme with RandomX Sniffer

---
 README.md | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8f82fdd..95dce40 100644
--- a/README.md
+++ b/README.md
@@ -63,6 +63,8 @@ RandomX was primarily designed as a PoW algorithm for [Monero](https://www.getmo
 * The key `K` is selected to be the hash of a block in the blockchain - this block is called the 'key block'. For optimal mining and verification performance, the key should change every 2048 blocks (~2.8 days) and there should be a delay of 64 blocks (~2 hours) between the key block and the change of the key `K`. This can be achieved by changing the key when `blockHeight % 2048 == 64` and selecting key block such that `keyBlockHeight % 2048 == 0`.
 * The input `H` is the standard hashing blob with a selected nonce value.
 
+RandomX was successfully activated on the Monero network on the 30th November 2019.
+
 If you wish to use RandomX as a PoW algorithm for your cryptocurrency, please follow the [configuration guidelines](doc/configuration.md).
 
 **Note**: To achieve ASIC resistance, the key `K` must change and must not be miner-selectable. We recommend to use blockchain data as the key in a similar way to the Monero example above. If blockchain data cannot be used for some reason, use a predefined sequence of keys.
@@ -108,7 +110,12 @@ Most Intel and AMD CPUs made since 2011 should be fairly efficient at RandomX. M
     * DDR4 memory is limited to about 4000-6000 H/s per channel  (depending on frequency and timings)
 
 ### Does RandomX facilitate botnets/malware mining or web mining?
-Efficient mining requires more than 2 GiB of memory, which is difficult to hide in an infected computer and disqualifies many low-end machines such as IoT devices. Web mining is infeasible due to the large memory requirement and the lack of directed rounding support for floating point operations in both Javascript and WebAssembly.
+
+Due to the way the algorithm works, mining malware is much easier to detect. [RandomX Sniffer](https://github.com/tevador/randomx-sniffer) is a proof of concept tool that can detect illicit mining activity on Windows.
+
+Efficient mining requires more than 2 GiB of memory, which also disqualifies many low-end machines such as IoT devices, which are often parts of large botnets.
+
+Web mining is infeasible due to the large memory requirement and the lack of directed rounding support for floating point operations in both Javascript and WebAssembly.
 
 ### Since RandomX uses floating point math, does it give reproducible results on different platforms?
 

From 6235852e007e216278b826833b3358a8e2212f47 Mon Sep 17 00:00:00 2001
From: tevador <tevador@gmail.com>
Date: Sun, 1 Dec 2019 16:39:42 +0100
Subject: [PATCH 2/8] Add .gitattributes

---
 .gitattributes | 3 +++
 1 file changed, 3 insertions(+)
 create mode 100644 .gitattributes

diff --git a/.gitattributes b/.gitattributes
new file mode 100644
index 0000000..2b93d41
--- /dev/null
+++ b/.gitattributes
@@ -0,0 +1,3 @@
+.gitignore export-ignore
+.gitattributes export-ignore
+audits export-ignore

From 219c02e1e508bea3b14a7b38f6614d0d5897a71a Mon Sep 17 00:00:00 2001
From: SChernykh <sergey.v.chernykh@gmail.com>
Date: Sun, 1 Dec 2019 16:58:38 +0100
Subject: [PATCH 3/8] Combined hash and fill AES loop (#166)

Adds more parallelizm into AES loop so modern CPUs can take advantage of it. Also, scratchpad data moves between L1 and L3 caches only one time which saves time and energy per hash.
---
 src/aes_hash.cpp        | 81 +++++++++++++++++++++++++++++++++++++++++
 src/aes_hash.hpp        |  3 ++
 src/intrin_portable.h   |  7 ++++
 src/randomx.cpp         | 17 +++++++++
 src/randomx.h           | 16 ++++++++
 src/tests/benchmark.cpp |  9 ++++-
 src/virtual_machine.cpp |  6 +++
 src/virtual_machine.hpp |  2 +
 8 files changed, 139 insertions(+), 2 deletions(-)

diff --git a/src/aes_hash.cpp b/src/aes_hash.cpp
index 1aff37f..a3b7395 100644
--- a/src/aes_hash.cpp
+++ b/src/aes_hash.cpp
@@ -239,3 +239,84 @@ void fillAes4Rx4(void *state, size_t outputSize, void *buffer) {
 
 template void fillAes4Rx4<true>(void *state, size_t outputSize, void *buffer);
 template void fillAes4Rx4<false>(void *state, size_t outputSize, void *buffer);
+
+template<bool softAes>
+void hashAndFillAes1Rx4(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state) {
+	uint8_t* scratchpadPtr = (uint8_t*)scratchpad;
+	const uint8_t* scratchpadEnd = scratchpadPtr + scratchpadSize;
+
+	// initial state
+	rx_vec_i128 hash_state0 = rx_set_int_vec_i128(AES_HASH_1R_STATE0);
+	rx_vec_i128 hash_state1 = rx_set_int_vec_i128(AES_HASH_1R_STATE1);
+	rx_vec_i128 hash_state2 = rx_set_int_vec_i128(AES_HASH_1R_STATE2);
+	rx_vec_i128 hash_state3 = rx_set_int_vec_i128(AES_HASH_1R_STATE3);
+
+	const rx_vec_i128 key0 = rx_set_int_vec_i128(AES_GEN_1R_KEY0);
+	const rx_vec_i128 key1 = rx_set_int_vec_i128(AES_GEN_1R_KEY1);
+	const rx_vec_i128 key2 = rx_set_int_vec_i128(AES_GEN_1R_KEY2);
+	const rx_vec_i128 key3 = rx_set_int_vec_i128(AES_GEN_1R_KEY3);
+
+	rx_vec_i128 fill_state0 = rx_load_vec_i128((rx_vec_i128*)fill_state + 0);
+	rx_vec_i128 fill_state1 = rx_load_vec_i128((rx_vec_i128*)fill_state + 1);
+	rx_vec_i128 fill_state2 = rx_load_vec_i128((rx_vec_i128*)fill_state + 2);
+	rx_vec_i128 fill_state3 = rx_load_vec_i128((rx_vec_i128*)fill_state + 3);
+
+	constexpr int PREFETCH_DISTANCE = 4096;
+	const char* prefetchPtr = ((const char*)scratchpad) + PREFETCH_DISTANCE;
+	scratchpadEnd -= PREFETCH_DISTANCE;
+
+	for (int i = 0; i < 2; ++i) {
+		//process 64 bytes at a time in 4 lanes
+		while (scratchpadPtr < scratchpadEnd) {
+			hash_state0 = aesenc<softAes>(hash_state0, rx_load_vec_i128((rx_vec_i128*)scratchpadPtr + 0));
+			hash_state1 = aesdec<softAes>(hash_state1, rx_load_vec_i128((rx_vec_i128*)scratchpadPtr + 1));
+			hash_state2 = aesenc<softAes>(hash_state2, rx_load_vec_i128((rx_vec_i128*)scratchpadPtr + 2));
+			hash_state3 = aesdec<softAes>(hash_state3, rx_load_vec_i128((rx_vec_i128*)scratchpadPtr + 3));
+
+			fill_state0 = aesdec<softAes>(fill_state0, key0);
+			fill_state1 = aesenc<softAes>(fill_state1, key1);
+			fill_state2 = aesdec<softAes>(fill_state2, key2);
+			fill_state3 = aesenc<softAes>(fill_state3, key3);
+
+			rx_store_vec_i128((rx_vec_i128*)scratchpadPtr + 0, fill_state0);
+			rx_store_vec_i128((rx_vec_i128*)scratchpadPtr + 1, fill_state1);
+			rx_store_vec_i128((rx_vec_i128*)scratchpadPtr + 2, fill_state2);
+			rx_store_vec_i128((rx_vec_i128*)scratchpadPtr + 3, fill_state3);
+
+			rx_prefetch_t0(prefetchPtr);
+
+			scratchpadPtr += 64;
+			prefetchPtr += 64;
+		}
+		prefetchPtr = (const char*) scratchpad;
+		scratchpadEnd += PREFETCH_DISTANCE;
+	}
+
+	rx_store_vec_i128((rx_vec_i128*)fill_state + 0, fill_state0);
+	rx_store_vec_i128((rx_vec_i128*)fill_state + 1, fill_state1);
+	rx_store_vec_i128((rx_vec_i128*)fill_state + 2, fill_state2);
+	rx_store_vec_i128((rx_vec_i128*)fill_state + 3, fill_state3);
+
+	//two extra rounds to achieve full diffusion
+	rx_vec_i128 xkey0 = rx_set_int_vec_i128(AES_HASH_1R_XKEY0);
+	rx_vec_i128 xkey1 = rx_set_int_vec_i128(AES_HASH_1R_XKEY1);
+
+	hash_state0 = aesenc<softAes>(hash_state0, xkey0);
+	hash_state1 = aesdec<softAes>(hash_state1, xkey0);
+	hash_state2 = aesenc<softAes>(hash_state2, xkey0);
+	hash_state3 = aesdec<softAes>(hash_state3, xkey0);
+
+	hash_state0 = aesenc<softAes>(hash_state0, xkey1);
+	hash_state1 = aesdec<softAes>(hash_state1, xkey1);
+	hash_state2 = aesenc<softAes>(hash_state2, xkey1);
+	hash_state3 = aesdec<softAes>(hash_state3, xkey1);
+
+	//output hash
+	rx_store_vec_i128((rx_vec_i128*)hash + 0, hash_state0);
+	rx_store_vec_i128((rx_vec_i128*)hash + 1, hash_state1);
+	rx_store_vec_i128((rx_vec_i128*)hash + 2, hash_state2);
+	rx_store_vec_i128((rx_vec_i128*)hash + 3, hash_state3);
+}
+
+template void hashAndFillAes1Rx4<false>(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state);
+template void hashAndFillAes1Rx4<true>(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state);
diff --git a/src/aes_hash.hpp b/src/aes_hash.hpp
index b4d0e94..9f75f73 100644
--- a/src/aes_hash.hpp
+++ b/src/aes_hash.hpp
@@ -38,3 +38,6 @@ void fillAes1Rx4(void *state, size_t outputSize, void *buffer);
 
 template<bool softAes>
 void fillAes4Rx4(void *state, size_t outputSize, void *buffer);
+
+template<bool softAes>
+void hashAndFillAes1Rx4(void *scratchpad, size_t scratchpadSize, void *hash, void* fill_state);
diff --git a/src/intrin_portable.h b/src/intrin_portable.h
index b5ad91a..c9d4475 100644
--- a/src/intrin_portable.h
+++ b/src/intrin_portable.h
@@ -102,6 +102,7 @@ typedef __m128d rx_vec_f128;
 #define rx_aligned_alloc(a, b) _mm_malloc(a,b)
 #define rx_aligned_free(a) _mm_free(a)
 #define rx_prefetch_nta(x) _mm_prefetch((const char *)(x), _MM_HINT_NTA)
+#define rx_prefetch_t0(x) _mm_prefetch((const char *)(x), _MM_HINT_T0)
 
 #define rx_load_vec_f128 _mm_load_pd
 #define rx_store_vec_f128 _mm_store_pd
@@ -201,6 +202,7 @@ typedef union{
 #define rx_aligned_alloc(a, b) malloc(a)
 #define rx_aligned_free(a) free(a)
 #define rx_prefetch_nta(x)
+#define rx_prefetch_t0(x)
 
 /* Splat 64-bit long long to 2 64-bit long longs */
 FORCE_INLINE __m128i vec_splat2sd (int64_t scalar)
@@ -399,6 +401,10 @@ inline void rx_prefetch_nta(void* ptr) {
 	asm volatile ("prfm pldl1strm, [%0]\n" : : "r" (ptr));
 }
 
+inline void rx_prefetch_t0(const void* ptr) {
+	asm volatile ("prfm pldl1strm, [%0]\n" : : "r" (ptr));
+}
+
 FORCE_INLINE rx_vec_f128 rx_load_vec_f128(const double* pd) {
 	return vld1q_f64((const float64_t*)pd);
 }
@@ -532,6 +538,7 @@ typedef union {
 #define rx_aligned_alloc(a, b) malloc(a)
 #define rx_aligned_free(a) free(a)
 #define rx_prefetch_nta(x)
+#define rx_prefetch_t0(x)
 
 FORCE_INLINE rx_vec_f128 rx_load_vec_f128(const double* pd) {
 	rx_vec_f128 x;
diff --git a/src/randomx.cpp b/src/randomx.cpp
index 90fc46a..c4281b3 100644
--- a/src/randomx.cpp
+++ b/src/randomx.cpp
@@ -363,4 +363,21 @@ extern "C" {
 		machine->getFinalResult(output, RANDOMX_HASH_SIZE);
 	}
 
+	void randomx_calculate_hash_first(randomx_vm* machine, uint64_t *tempHash, const void* input, size_t inputSize) {
+		blake2b(tempHash, sizeof(uint64_t) * 8, input, inputSize, nullptr, 0);
+		machine->initScratchpad(tempHash);
+	}
+
+	void randomx_calculate_hash_next(randomx_vm* machine, uint64_t *tempHash, const void* nextInput, size_t nextInputSize, void* output) {
+		machine->resetRoundingMode();
+		for (uint32_t chain = 0; chain < RANDOMX_PROGRAM_COUNT - 1; ++chain) {
+			machine->run(tempHash);
+			blake2b(tempHash, sizeof(uint64_t) * 8, machine->getRegisterFile(), sizeof(randomx::RegisterFile), nullptr, 0);
+		}
+		machine->run(tempHash);
+
+		// Finish current hash and fill the scratchpad for the next hash at the same time
+		blake2b(tempHash, sizeof(uint64_t) * 8, nextInput, nextInputSize, nullptr, 0);
+		machine->hashAndFill(output, RANDOMX_HASH_SIZE, tempHash);
+	}
 }
diff --git a/src/randomx.h b/src/randomx.h
index c06002b..4a7fcbf 100644
--- a/src/randomx.h
+++ b/src/randomx.h
@@ -30,6 +30,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 #define RANDOMX_H
 
 #include <stddef.h>
+#include <stdint.h>
 
 #define RANDOMX_HASH_SIZE 32
 #define RANDOMX_DATASET_ITEM_SIZE 64
@@ -238,6 +239,21 @@ RANDOMX_EXPORT void randomx_destroy_vm(randomx_vm *machine);
 */
 RANDOMX_EXPORT void randomx_calculate_hash(randomx_vm *machine, const void *input, size_t inputSize, void *output);
 
+/**
+ * Paired functions used to calculate multiple RandomX hashes during mining for example.
+ *
+ * @param machine is a pointer to a randomx_vm structure. Must not be NULL.
+ * @param tempHash an array of 8 64-bit values used to store intermediate data between calls to randomx_calculate_hash_first and randomx_calculate_hash_next.
+ * @param input is a pointer to memory to be hashed. Must not be NULL.
+ * @param inputSize is the number of bytes to be hashed.
+ * @param nextInput is a pointer to memory to be hashed for the next hash. Must not be NULL.
+ * @param nextInputSize is the number of bytes to be hashed for the next hash.
+ * @param output is a pointer to memory where the hash will be stored. Must not
+ *        be NULL and at least RANDOMX_HASH_SIZE bytes must be available for writing.
+*/
+RANDOMX_EXPORT void randomx_calculate_hash_first(randomx_vm* machine, uint64_t *tempHash, const void* input, size_t inputSize);
+RANDOMX_EXPORT void randomx_calculate_hash_next(randomx_vm* machine, uint64_t *tempHash, const void* nextInput, size_t nextInputSize, void* output);
+
 #if defined(__cplusplus)
 }
 #endif
diff --git a/src/tests/benchmark.cpp b/src/tests/benchmark.cpp
index 03023ab..868dd28 100644
--- a/src/tests/benchmark.cpp
+++ b/src/tests/benchmark.cpp
@@ -122,11 +122,16 @@ void mine(randomx_vm* vm, std::atomic<uint32_t>& atomicNonce, AtomicHash& result
 	void* noncePtr = blockTemplate + 39;
 	auto nonce = atomicNonce.fetch_add(1);
 
+	uint64_t tempHash[8];
+
+	store32(noncePtr, nonce);
+	randomx_calculate_hash_first(vm, tempHash, blockTemplate, sizeof(blockTemplate));
+
 	while (nonce < noncesCount) {
+		nonce = atomicNonce.fetch_add(1);
 		store32(noncePtr, nonce);
-		randomx_calculate_hash(vm, blockTemplate, sizeof(blockTemplate), &hash);
+		randomx_calculate_hash_next(vm, tempHash, blockTemplate, sizeof(blockTemplate), &hash);
 		result.xorWith(hash);
-		nonce = atomicNonce.fetch_add(1);
 	}
 }
 
diff --git a/src/virtual_machine.cpp b/src/virtual_machine.cpp
index d73a024..2d5d2be 100644
--- a/src/virtual_machine.cpp
+++ b/src/virtual_machine.cpp
@@ -120,6 +120,12 @@ namespace randomx {
 		blake2b(out, outSize, &reg, sizeof(RegisterFile), nullptr, 0);
 	}
 
+	template<class Allocator, bool softAes>
+	void VmBase<Allocator, softAes>::hashAndFill(void* out, size_t outSize, uint64_t *fill_state) {
+		hashAndFillAes1Rx4<softAes>((void*) getScratchpad(), ScratchpadSize, &reg.a, fill_state);
+		blake2b(out, outSize, &reg, sizeof(RegisterFile), nullptr, 0);
+	}
+
 	template<class Allocator, bool softAes>
 	void VmBase<Allocator, softAes>::initScratchpad(void* seed) {
 		fillAes1Rx4<softAes>(seed, ScratchpadSize, scratchpad);
diff --git a/src/virtual_machine.hpp b/src/virtual_machine.hpp
index d662c89..4e89366 100644
--- a/src/virtual_machine.hpp
+++ b/src/virtual_machine.hpp
@@ -38,6 +38,7 @@ public:
 	virtual ~randomx_vm() = 0;
 	virtual void allocate() = 0;
 	virtual void getFinalResult(void* out, size_t outSize) = 0;
+	virtual void hashAndFill(void* out, size_t outSize, uint64_t *fill_state) = 0;
 	virtual void setDataset(randomx_dataset* dataset) { }
 	virtual void setCache(randomx_cache* cache) { }
 	virtual void initScratchpad(void* seed) = 0;
@@ -78,6 +79,7 @@ namespace randomx {
 		void allocate() override;
 		void initScratchpad(void* seed) override;
 		void getFinalResult(void* out, size_t outSize) override;
+		void hashAndFill(void* out, size_t outSize, uint64_t *fill_state) override;
 	protected:
 		void generateProgram(void* seed);
 	};

From 3910d49b498fe974c971771ec0a48c0ef44e9196 Mon Sep 17 00:00:00 2001
From: tevador <tevador@gmail.com>
Date: Sun, 1 Dec 2019 18:19:09 +0100
Subject: [PATCH 4/8] Hide tempHash from the public API

---
 src/randomx.cpp         | 18 +++++++++---------
 src/randomx.h           |  9 +++++----
 src/tests/benchmark.cpp |  6 ++----
 src/virtual_machine.hpp |  1 +
 4 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/src/randomx.cpp b/src/randomx.cpp
index c4281b3..898425c 100644
--- a/src/randomx.cpp
+++ b/src/randomx.cpp
@@ -363,21 +363,21 @@ extern "C" {
 		machine->getFinalResult(output, RANDOMX_HASH_SIZE);
 	}
 
-	void randomx_calculate_hash_first(randomx_vm* machine, uint64_t *tempHash, const void* input, size_t inputSize) {
-		blake2b(tempHash, sizeof(uint64_t) * 8, input, inputSize, nullptr, 0);
-		machine->initScratchpad(tempHash);
+	void randomx_calculate_hash_first(randomx_vm* machine, const void* input, size_t inputSize) {
+		blake2b(machine->tempHash, sizeof(machine->tempHash), input, inputSize, nullptr, 0);
+		machine->initScratchpad(machine->tempHash);
 	}
 
-	void randomx_calculate_hash_next(randomx_vm* machine, uint64_t *tempHash, const void* nextInput, size_t nextInputSize, void* output) {
+	void randomx_calculate_hash_next(randomx_vm* machine, const void* nextInput, size_t nextInputSize, void* output) {
 		machine->resetRoundingMode();
 		for (uint32_t chain = 0; chain < RANDOMX_PROGRAM_COUNT - 1; ++chain) {
-			machine->run(tempHash);
-			blake2b(tempHash, sizeof(uint64_t) * 8, machine->getRegisterFile(), sizeof(randomx::RegisterFile), nullptr, 0);
+			machine->run(machine->tempHash);
+			blake2b(machine->tempHash, sizeof(machine->tempHash), machine->getRegisterFile(), sizeof(randomx::RegisterFile), nullptr, 0);
 		}
-		machine->run(tempHash);
+		machine->run(machine->tempHash);
 
 		// Finish current hash and fill the scratchpad for the next hash at the same time
-		blake2b(tempHash, sizeof(uint64_t) * 8, nextInput, nextInputSize, nullptr, 0);
-		machine->hashAndFill(output, RANDOMX_HASH_SIZE, tempHash);
+		blake2b(machine->tempHash, sizeof(machine->tempHash), nextInput, nextInputSize, nullptr, 0);
+		machine->hashAndFill(output, RANDOMX_HASH_SIZE, machine->tempHash);
 	}
 }
diff --git a/src/randomx.h b/src/randomx.h
index 4a7fcbf..d582cf0 100644
--- a/src/randomx.h
+++ b/src/randomx.h
@@ -240,10 +240,11 @@ RANDOMX_EXPORT void randomx_destroy_vm(randomx_vm *machine);
 RANDOMX_EXPORT void randomx_calculate_hash(randomx_vm *machine, const void *input, size_t inputSize, void *output);
 
 /**
- * Paired functions used to calculate multiple RandomX hashes during mining for example.
+ * Paired functions used to calculate multiple RandomX hashes more efficiently.
+ * randomx_calculate_hash_first is called for the first input value.
+ * randomx_calculate_hash_next will output the hash value of the previous input.
  *
  * @param machine is a pointer to a randomx_vm structure. Must not be NULL.
- * @param tempHash an array of 8 64-bit values used to store intermediate data between calls to randomx_calculate_hash_first and randomx_calculate_hash_next.
  * @param input is a pointer to memory to be hashed. Must not be NULL.
  * @param inputSize is the number of bytes to be hashed.
  * @param nextInput is a pointer to memory to be hashed for the next hash. Must not be NULL.
@@ -251,8 +252,8 @@ RANDOMX_EXPORT void randomx_calculate_hash(randomx_vm *machine, const void *inpu
  * @param output is a pointer to memory where the hash will be stored. Must not
  *        be NULL and at least RANDOMX_HASH_SIZE bytes must be available for writing.
 */
-RANDOMX_EXPORT void randomx_calculate_hash_first(randomx_vm* machine, uint64_t *tempHash, const void* input, size_t inputSize);
-RANDOMX_EXPORT void randomx_calculate_hash_next(randomx_vm* machine, uint64_t *tempHash, const void* nextInput, size_t nextInputSize, void* output);
+RANDOMX_EXPORT void randomx_calculate_hash_first(randomx_vm* machine, const void* input, size_t inputSize);
+RANDOMX_EXPORT void randomx_calculate_hash_next(randomx_vm* machine, const void* nextInput, size_t nextInputSize, void* output);
 
 #if defined(__cplusplus)
 }
diff --git a/src/tests/benchmark.cpp b/src/tests/benchmark.cpp
index 868dd28..ed450e1 100644
--- a/src/tests/benchmark.cpp
+++ b/src/tests/benchmark.cpp
@@ -122,15 +122,13 @@ void mine(randomx_vm* vm, std::atomic<uint32_t>& atomicNonce, AtomicHash& result
 	void* noncePtr = blockTemplate + 39;
 	auto nonce = atomicNonce.fetch_add(1);
 
-	uint64_t tempHash[8];
-
 	store32(noncePtr, nonce);
-	randomx_calculate_hash_first(vm, tempHash, blockTemplate, sizeof(blockTemplate));
+	randomx_calculate_hash_first(vm, blockTemplate, sizeof(blockTemplate));
 
 	while (nonce < noncesCount) {
 		nonce = atomicNonce.fetch_add(1);
 		store32(noncePtr, nonce);
-		randomx_calculate_hash_next(vm, tempHash, blockTemplate, sizeof(blockTemplate), &hash);
+		randomx_calculate_hash_next(vm, blockTemplate, sizeof(blockTemplate), &hash);
 		result.xorWith(hash);
 	}
 }
diff --git a/src/virtual_machine.hpp b/src/virtual_machine.hpp
index 4e89366..d72a918 100644
--- a/src/virtual_machine.hpp
+++ b/src/virtual_machine.hpp
@@ -68,6 +68,7 @@ protected:
 	uint64_t datasetOffset;
 public:
 	std::string cacheKey;
+	alignas(16) uint64_t tempHash[8]; //8 64-bit values used to store intermediate data
 };
 
 namespace randomx {

From aa19c5b9b6c388bb0699d0f0d8ff5ae596a339d9 Mon Sep 17 00:00:00 2001
From: tevador <tevador@gmail.com>
Date: Sun, 1 Dec 2019 20:41:20 +0100
Subject: [PATCH 5/8] Disable thread affinity on FreeBSD and Android

---
 src/tests/affinity.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/tests/affinity.cpp b/src/tests/affinity.cpp
index a2e37aa..40f6458 100644
--- a/src/tests/affinity.cpp
+++ b/src/tests/affinity.cpp
@@ -65,7 +65,7 @@ set_thread_affinity(std::thread::native_handle_type thread,
             (thread_policy_t)&policy, 1);
 #elif defined(_WIN32) || defined(__CYGWIN__)
     rc = SetThreadAffinityMask(reinterpret_cast<HANDLE>(thread), 1ULL << cpuid) == 0 ? -2 : 0;
-#elif !defined(__OpenBSD__)
+#elif !defined(__OpenBSD__) && !defined(__FreeBSD__) && !defined(__ANDROID__)
     cpu_set_t cs;
     CPU_ZERO(&cs);
     CPU_SET(cpuid, &cs);

From 30969c0e25274eb7fe8c3dfede8edb607a1563d4 Mon Sep 17 00:00:00 2001
From: tevador <tevador@gmail.com>
Date: Sun, 1 Dec 2019 21:07:13 +0100
Subject: [PATCH 6/8] Benchmark version

---
 src/tests/benchmark.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/tests/benchmark.cpp b/src/tests/benchmark.cpp
index ed450e1..47cd026 100644
--- a/src/tests/benchmark.cpp
+++ b/src/tests/benchmark.cpp
@@ -161,7 +161,7 @@ int main(int argc, char** argv) {
 
 	store32(&seed, seedValue);
 
-	std::cout << "RandomX benchmark v1.1.5" << std::endl;
+	std::cout << "RandomX benchmark v1.1.7" << std::endl;
 
 	if (help) {
 		printUsage(argv[0]);

From a223b6b33ba7bf6765f59bac38471716a9cf6919 Mon Sep 17 00:00:00 2001
From: tevador <37503146+tevador@users.noreply.github.com>
Date: Wed, 18 Dec 2019 12:30:49 +0100
Subject: [PATCH 7/8] Fixed an incorrect URL the the documentation

---
 doc/design.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/doc/design.md b/doc/design.md
index 1a77458..7a1b8ef 100644
--- a/doc/design.md
+++ b/doc/design.md
@@ -255,7 +255,7 @@ The Scratchpad is split into 3 levels to mimic the typical CPU cache hierarchy [
 |----------------|----------|----------|----------|------|
 ARM Cortex A55|2|6|-|[[24](https://www.anandtech.com/show/11441/dynamiq-and-arms-new-cpus-cortex-a75-a55/4)]
 |AMD Zen+|4|12|40|[[25](https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B#Memory_Hierarchy)]|
-|Intel Skylake|4|12|42|[[26](https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B#Memory_Hierarchy)]
+|Intel Skylake|4|12|42|[[26](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Memory_Hierarchy)]
 
 The L3 cache is much larger and located further from the CPU core. As a result, its access latencies are much higher and can cause stalls in program execution.
 
@@ -638,7 +638,7 @@ state3 = 00000000000000000000000000000000
 
 [25] AMD Zen+ Microarchitecture - https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B#Memory_Hierarchy
 
-[26] Intel Skylake Microarchitecture - https://en.wikichip.org/wiki/amd/microarchitectures/zen%2B#Memory_Hierarchy
+[26] Intel Skylake Microarchitecture - https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Memory_Hierarchy
 
 [27] Biryukov et al.: Fast and Tradeoff-Resilient Memory-Hard Functions for
 Cryptocurrencies and Password Hashing - https://eprint.iacr.org/2015/430.pdf Table 2, page 8
@@ -647,4 +647,4 @@ Cryptocurrencies and Password Hashing - https://eprint.iacr.org/2015/430.pdf Tab
 
 [29] 7-Zip File archiver - https://www.7-zip.org/
 
-[30] TestU01 library - http://simul.iro.umontreal.ca/testu01/tu01.html
\ No newline at end of file
+[30] TestU01 library - http://simul.iro.umontreal.ca/testu01/tu01.html

From bbbb34757b4b355d023f9226f11463f9c1c928fa Mon Sep 17 00:00:00 2001
From: tevador <37503146+tevador@users.noreply.github.com>
Date: Thu, 26 Dec 2019 12:32:04 +0100
Subject: [PATCH 8/8] Add a note about building portable binaries

---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index 95dce40..4c1dabb 100644
--- a/README.md
+++ b/README.md
@@ -48,6 +48,8 @@ cmake -DARCH=native ..
 make
 ```
 
+To build portable binaries, omit the `ARCH` option when executing cmake.
+
 ### Windows
 
 On Windows, it is possible to build using MinGW (same procedure as on Linux) or using Visual Studio (solution file is provided).