wallet: mitigate statistical dependence for decoy selection within rings

Since we are required to check for uniqueness of decoy picks within any given ring, and since some decoy picks may fail due to unlock time or malformed EC points, the wallet2 decoy selection code was building up a larger than needed *unique* set of decoys for each ring according to a certain distribution *without replacement*. After filtering out the outputs that it couldn't use, it chooses from the remaining decoys uniformly random *without replacement*. The problem with this is that the picks later in the picking process are not independent from the picks earlier in the picking process, and the later picks do not follow the intended decoy distribution as closely as the earlier picks. To understand this intuitively, imagine that you have 1023 marbles. You label 512 marbles with the letter A, label 256 with the letter B, so on and so forth, finally labelling one marble with the letter J. You put them all into a bag, shake it well, and pick 8 marbles from the bag, but everytime you pick a marble of a certain letter, you remove all the other marbles from that bag with the same letter. That very first pick, the odds of picking a certain marble are exactly how you would expect: you are twice as likely to pick A as you are B, twice as likely to pick B as you are C, etc. However, on the second pick, the odds of getting the first pick are 0%, and the chances for everything else is higher. As you go down the line, your picked marbles will have letters that are increasingly more unlikely to pick if you hadn't remove the other marbles. In other words, the distribution of the later marbles will be more "skewed" in comparison to your original distribution of marbles. In Monero's decoy selection, this same statistical effect applies. It is not as dramatic since the distribution is not so steep, and we have more unique values to choose from, but the effect *is* measureable. Because of the protocol rules, we cannot have duplicate ring members, so unless that restriction is removed, we will never have perfectly independent picking. However, since the earlier picks are less affected by this statistical effect, the workaround that this commit offers is to store the order that the outputs were picked and commit to this order after fetching output information over RPC.
7 months ago · dfb990e8bb
parent 8eab181fe1
commit dfb990e8bb
1 changed files with 43 additions and 16 deletions
--- a/src/wallet/wallet2.cpp
+++ b/src/wallet/wallet2.cpp
@ -8909,6 +8909,26 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
    COMMAND_RPC_GET_OUTPUTS_BIN::request req = AUTO_VAL_INIT(req);
    COMMAND_RPC_GET_OUTPUTS_BIN::response daemon_resp = AUTO_VAL_INIT(daemon_resp);

+    // The secret picking order contains outputs in the order that we selected them.
+    //
+    // We will later sort the output request entries in a pre-determined order so that the daemon
+    // that we're requesting information from doesn't learn any information about the true spend
+    // for each ring. However, internally, we want to prefer to construct our rings using the
+    // outputs that we picked first versus outputs picked later.
+    //
+    // The reason why is because each consecutive output pick within a ring becomes increasing less
+    // statistically independent from other picks, since we pick outputs from a finite set
+    // *without replacement*, due to the protocol not allowing duplicate ring members. This effect
+    // is exacerbated by the fact that we pick 1.5x + 75 as many outputs as we need per RPC
+    // request to account for unusable outputs. This effect is small, but non-neglibile and gets
+    // worse with larger ring sizes.
+    std::vector<get_outputs_out> secret_picking_order;
+
+    // Convenience/safety lambda to make sure that both output lists req.outputs and secret_picking_order are updated together
+    // Each ring section of req.outputs gets sorted later after selecting all outputs for that ring
+    const auto add_output_to_lists = [&req, &secret_picking_order](const get_outputs_out &goo)
+      { req.outputs.push_back(goo); secret_picking_order.push_back(goo); };
+
    std::unique_ptr<gamma_picker> gamma;
    if (has_rct)
      gamma.reset(new gamma_picker(rct_offsets));
@ -9043,7 +9063,7 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
            if (out < num_outs)
            {
              MINFO("Using it");
-              req.outputs.push_back({amount, out});
+              add_output_to_lists({amount, out});
              ++num_found;
              seen_indices.emplace(out);
              if (out == td.m_global_output_index)
@ -9065,12 +9085,12 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
      if (num_outs <= requested_outputs_count)
      {
        for (uint64_t i = 0; i < num_outs; i++)
-          req.outputs.push_back({amount, i});
+          add_output_to_lists({amount, i});
        // duplicate to make up shortfall: this will be caught after the RPC call,
        // so we can also output the amounts for which we can't reach the required
        // mixin after checking the actual unlockedness
        for (uint64_t i = num_outs; i < requested_outputs_count; ++i)
-          req.outputs.push_back({amount, num_outs - 1});
+          add_output_to_lists({amount, num_outs - 1});
      }
      else
      {
@ -9079,7 +9099,7 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
        {
          num_found = 1;
          seen_indices.emplace(td.m_global_output_index);
-          req.outputs.push_back({amount, td.m_global_output_index});
+          add_output_to_lists({amount, td.m_global_output_index});
          LOG_PRINT_L1("Selecting real output: " << td.m_global_output_index << " for " << print_money(amount));
        }

@ -9187,7 +9207,7 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
          seen_indices.emplace(i);

          picks[type].insert(i);
-          req.outputs.push_back({amount, i});
+          add_output_to_lists({amount, i});
          ++num_found;
          MDEBUG("picked " << i << ", " << num_found << " now picked");
        }
@ -9201,7 +9221,7 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
        // we'll error out later
        while (num_found < requested_outputs_count)
        {
-          req.outputs.push_back({amount, 0});
+          add_output_to_lists({amount, 0});
          ++num_found;
        }
      }
@ -9211,6 +9231,10 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
          [](const get_outputs_out &a, const get_outputs_out &b) { return a.index < b.index; });
    }

+    THROW_WALLET_EXCEPTION_IF(req.outputs.size() != secret_picking_order.size(), error::wallet_internal_error,
+        "bug: we did not update req.outputs/secret_picking_order in tandem");
+
+    // List all requested outputs to debug log
    if (ELPP->vRegistry()->allowed(el::Level::Debug, MONERO_DEFAULT_LOG_CATEGORY))
    {
      std::map<uint64_t, std::set<uint64_t>> outs;
@ -9331,18 +9355,21 @@ void wallet2::get_outs(std::vector<std::vector<tools::wallet2::get_outs_entry>>
        }
      }

-      // then pick others in random order till we reach the required number
-      // since we use an equiprobable pick here, we don't upset the triangular distribution
-      std::vector<size_t> order;
-      order.resize(requested_outputs_count);
-      for (size_t n = 0; n < order.size(); ++n)
-        order[n] = n;
-      std::shuffle(order.begin(), order.end(), crypto::random_device{});
-
+      // While we are still lacking outputs in this result ring, in our secret pick order...
      LOG_PRINT_L2("Looking for " << (fake_outputs_count+1) << " outputs of size " << print_money(td.is_rct() ? 0 : td.amount()));
-      for (size_t o = 0; o < requested_outputs_count && outs.back().size() < fake_outputs_count + 1; ++o)
+      for (size_t ring_pick_idx = base; ring_pick_idx < base + requested_outputs_count && outs.back().size() < fake_outputs_count + 1; ++ring_pick_idx)
      {
-        size_t i = base + order[o];
+        const get_outputs_out attempted_output = secret_picking_order[ring_pick_idx];
+
+        // Find the index i of our pick in the request/response arrays
+        size_t i;
+        for (i = base; i < base + requested_outputs_count; ++i)
+          if (req.outputs[i].index == attempted_output.index)
+            break;
+        THROW_WALLET_EXCEPTION_IF(i == base + requested_outputs_count, error::wallet_internal_error,
+          "Could not find index of picked output in requested outputs");
+
+        // Try adding this output's information to result ring if output isn't invalid
        LOG_PRINT_L2("Index " << i << "/" << requested_outputs_count << ": idx " << req.outputs[i].index << " (real " << td.m_global_output_index << "), unlocked " << daemon_resp.outs[i].unlocked << ", key " << daemon_resp.outs[i].key);
        tx_add_fake_output(outs, req.outputs[i].index, daemon_resp.outs[i].key, daemon_resp.outs[i].mask, td.m_global_output_index, daemon_resp.outs[i].unlocked, valid_public_keys_cache);
      }