Skip to content

Commit

Permalink
feat: conventional lookups using log-deriv (#7020)
Browse files Browse the repository at this point in the history
Replaces the plookup based lookup argument in UltraHonk/MegaHonk with a
log-derivative based argument. This is advantageous in a number of ways
including 1) no more committing to expensive z_lookup poly (especially
relevant for structured trace), 2) prover work is proportional to number
of lookups not size of circuit.'

Below are the benchmark results (master vs branch). Note that most of
the benefit is derived from no longer needing to commit to `z_lookup`
which was previously the most expensive single commitment. The Lookup
relation also goes from being 45% of total accumulate work, to only 10%.

```
--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
ClientIVCBench/Full/6      16151 ms        11247 ms            1 
function                                  ms     % sum
construct_circuits(t)                   3241    20.23%
ProverInstance(Circuit&)(t)             1575     9.83%
ProtogalaxyProver::fold_instances(t)    7627    47.61%
Decider::construct_proof(t)              503     3.14%
ECCVMProver(CircuitBuilder&)(t)          173     1.08%
ECCVMProver::construct_proof(t)         1904    11.88%
TranslatorProver::construct_proof(t)     819     5.11%
Goblin::merge(t)                         177     1.11%

Total time accounted for: 16019ms/16151ms = 99.18%

Major contributors:
function                                  ms    % sum
commit(t)                               4114   25.68%
compute_combiner(t)                     2824   17.63%
compute_perturbator(t)                   795    4.96%
compute_univariate(t)                   1149    7.17%

Breakdown of ProtogalaxyProver::fold_instances:
ProtoGalaxyProver_::preparation_round(t)           3721    48.79%
ProtoGalaxyProver_::perturbator_round(t)            795    10.43%
ProtoGalaxyProver_::combiner_quotient_round(t)     2826    37.05%
ProtoGalaxyProver_::accumulator_update_round(t)     284     3.73%

Relation contributions (times to be interpreted relatively):
function                        ms     % sum
Arithmetic::accumulate(t)     1490     5.78%
Permutation::accumulate(t)    6746    26.18%
Lookup::accumulate(t)        11679    45.33%
DeltaRange::accumulate(t)      977     3.79%
Elliptic::accumulate(t)        635     2.46%
Auxiliary::accumulate(t)      3812    14.79%
EccOp::accumulate(t)             0     0.00%
DatabusRead::accumulate(t)       1     0.01%
PoseidonExt::accumulate(t)     110     0.43%
PoseidonInt::accumulate(t)     316     1.23%
```

```
--------------------------------------------------------------------------------
Benchmark                      Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------
ClientIVCBench/Full/6      15345 ms        10707 ms            1 
function                                  ms     % sum
construct_circuits(t)                   3183    21.01%
ProverInstance(Circuit&)(t)             1697    11.20%
ProtogalaxyProver::fold_instances(t)    6754    44.57%
Decider::construct_proof(t)              467     3.08%
ECCVMProver(CircuitBuilder&)(t)          176     1.16%
ECCVMProver::construct_proof(t)         1910    12.60%
TranslatorProver::construct_proof(t)     815     5.38%
Goblin::merge(t)                         150     0.99%

Total time accounted for: 15153ms/15345ms = 98.74%

Major contributors:
function                                  ms    % sum
commit(t)                               3379   22.30%
compute_combiner(t)                     2200   14.52%
compute_perturbator(t)                   739    4.87%
compute_univariate(t)                   1118    7.38%

Breakdown of ProtogalaxyProver::fold_instances:
ProtoGalaxyProver_::preparation_round(t)           3529    52.24%
ProtoGalaxyProver_::perturbator_round(t)            739    10.94%
ProtoGalaxyProver_::combiner_quotient_round(t)     2202    32.60%
ProtoGalaxyProver_::accumulator_update_round(t)     285     4.22%

Relation contributions (times to be interpreted relatively):
function                        ms     % sum
Arithmetic::accumulate(t)     1474     9.69%
Permutation::accumulate(t)    6515    42.84%
Lookup::accumulate(t)         1622    10.67%
DeltaRange::accumulate(t)      952     6.26%
Elliptic::accumulate(t)        605     3.98%
Auxiliary::accumulate(t)      3616    23.78%
EccOp::accumulate(t)             0     0.00%
DatabusRead::accumulate(t)       1     0.01%
PoseidonExt::accumulate(t)     112     0.74%
PoseidonInt::accumulate(t)     309     2.03%
```
  • Loading branch information
ledwards2225 authored Jun 25, 2024
1 parent 9aabc32 commit 6f1212f
Show file tree
Hide file tree
Showing 38 changed files with 960 additions and 924 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# BIN: to specify a different binary to test with (e.g. bb.js or bb.js-dev).
set -eu

BIN=${BIN:-../cpp/build-debug/bin/bb}
BIN=${BIN:-../cpp/build/bin/bb}
CRS_PATH=~/.bb-crs
BRANCH=master
VERBOSE=${VERBOSE:-}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, UltraArithmeticRelati
BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, DeltaRangeConstraintRelation<Fr>>);
BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, EllipticRelation<Fr>>);
BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, AuxiliaryRelation<Fr>>);
BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, LookupRelation<Fr>>);
BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, LogDerivLookupRelation<Fr>>);
BENCHMARK(execute_relation_for_pg_univariates<UltraFlavor, UltraPermutationRelation<Fr>>);

// Goblin-Ultra only relations (PG prover combiner work)
Expand All @@ -79,7 +79,7 @@ BENCHMARK(execute_relation_for_univariates<UltraFlavor, UltraArithmeticRelation<
BENCHMARK(execute_relation_for_univariates<UltraFlavor, DeltaRangeConstraintRelation<Fr>>);
BENCHMARK(execute_relation_for_univariates<UltraFlavor, EllipticRelation<Fr>>);
BENCHMARK(execute_relation_for_univariates<UltraFlavor, AuxiliaryRelation<Fr>>);
BENCHMARK(execute_relation_for_univariates<UltraFlavor, LookupRelation<Fr>>);
BENCHMARK(execute_relation_for_univariates<UltraFlavor, LogDerivLookupRelation<Fr>>);
BENCHMARK(execute_relation_for_univariates<UltraFlavor, UltraPermutationRelation<Fr>>);

// Goblin-Ultra only relations (Sumcheck prover work)
Expand All @@ -93,7 +93,7 @@ BENCHMARK(execute_relation_for_values<UltraFlavor, UltraArithmeticRelation<Fr>>)
BENCHMARK(execute_relation_for_values<UltraFlavor, DeltaRangeConstraintRelation<Fr>>);
BENCHMARK(execute_relation_for_values<UltraFlavor, EllipticRelation<Fr>>);
BENCHMARK(execute_relation_for_values<UltraFlavor, AuxiliaryRelation<Fr>>);
BENCHMARK(execute_relation_for_values<UltraFlavor, LookupRelation<Fr>>);
BENCHMARK(execute_relation_for_values<UltraFlavor, LogDerivLookupRelation<Fr>>);
BENCHMARK(execute_relation_for_values<UltraFlavor, UltraPermutationRelation<Fr>>);

// Goblin-Ultra only relations (verifier work)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#include "barretenberg/stdlib_circuit_builders/ultra_circuit_builder.hpp"
#include "barretenberg/circuit_checker/circuit_checker.hpp"
#include "barretenberg/crypto/pedersen_commitment/pedersen.hpp"
#include "barretenberg/stdlib_circuit_builders/mock_circuits.hpp"
#include "barretenberg/stdlib_circuit_builders/plookup_tables/fixed_base/fixed_base.hpp"

#include <gtest/gtest.h>
Expand Down Expand Up @@ -104,6 +105,22 @@ TEST(ultra_circuit_constructor, create_gates_from_plookup_accumulators)
EXPECT_EQ(result, true);
}

TEST(ultra_circuit_constructor, bad_lookup_failure)
{
UltraCircuitBuilder builder;
MockCircuits::add_lookup_gates(builder);

// Erroneously set a non-zero wire value to zero in one of the lookup gates
for (auto& wire_3_witness_idx : builder.blocks.lookup.w_o()) {
if (wire_3_witness_idx != builder.zero_idx) {
wire_3_witness_idx = builder.zero_idx;
break;
}
}

EXPECT_FALSE(CircuitChecker::check(builder));
}

TEST(ultra_circuit_constructor, base_case)
{
UltraCircuitBuilder circuit_constructor = UltraCircuitBuilder();
Expand Down
1 change: 0 additions & 1 deletion barretenberg/cpp/src/barretenberg/eccvm/eccvm_prover.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
#include "barretenberg/honk/proof_system/permutation_library.hpp"
#include "barretenberg/plonk_honk_shared/library/grand_product_library.hpp"
#include "barretenberg/polynomials/polynomial.hpp"
#include "barretenberg/relations/lookup_relation.hpp"
#include "barretenberg/relations/permutation_relation.hpp"
#include "barretenberg/sumcheck/sumcheck.hpp"

Expand Down
75 changes: 75 additions & 0 deletions barretenberg/cpp/src/barretenberg/plonk/composer/composer_lib.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,4 +48,79 @@ std::shared_ptr<plonk::verification_key> compute_verification_key_common(
// silencing for now but need to figure out where to extract type of VerifierCrs from :-/
std::shared_ptr<bb::srs::factories::VerifierCrs<curve::BN254>> const& vrs);

/**
* @brief Construct polynomials containing the sorted concatenation of the lookups and the lookup tables
*
* @tparam Flavor
* @param circuit
* @param dyadic_circuit_size
* @param additional_offset Additional space needed in polynomials to add randomness for zk (Plonk only)
* @return std::array<typename Flavor::Polynomial, 4>
*/
template <typename Flavor>
std::array<typename Flavor::Polynomial, 4> construct_sorted_list_polynomials(typename Flavor::CircuitBuilder& circuit,
const size_t dyadic_circuit_size,
size_t additional_offset = 0)
{
using Polynomial = typename Flavor::Polynomial;
std::array<Polynomial, 4> sorted_polynomials;
// Initialise the sorted concatenated list polynomials for the lookup argument
for (auto& s_i : sorted_polynomials) {
s_i = Polynomial(dyadic_circuit_size);
}

// The sorted list polynomials have (tables_size + lookups_size) populated entries. We define the index below so
// that these entries are written into the last indices of the polynomials. The values on the first
// dyadic_circuit_size - (tables_size + lookups_size) indices are automatically initialized to zero via the
// polynomial constructor.
size_t s_index = dyadic_circuit_size - (circuit.get_tables_size() + circuit.get_lookups_size()) - additional_offset;
ASSERT(s_index > 0); // We need at least 1 row of zeroes for the permutation argument

for (auto& table : circuit.lookup_tables) {
const fr table_index(table.table_index);
auto& lookup_gates = table.lookup_gates;
for (size_t i = 0; i < table.size(); ++i) {
if (table.use_twin_keys) {
lookup_gates.push_back({
{
table.column_1[i].from_montgomery_form().data[0],
table.column_2[i].from_montgomery_form().data[0],
},
{
table.column_3[i],
0,
},
});
} else {
lookup_gates.push_back({
{
table.column_1[i].from_montgomery_form().data[0],
0,
},
{
table.column_2[i],
table.column_3[i],
},
});
}
}

#ifdef NO_TBB
std::sort(lookup_gates.begin(), lookup_gates.end());
#else
std::sort(std::execution::par_unseq, lookup_gates.begin(), lookup_gates.end());
#endif

for (const auto& entry : lookup_gates) {
const auto components = entry.to_table_components(table.use_twin_keys);
sorted_polynomials[0][s_index] = components[0];
sorted_polynomials[1][s_index] = components[1];
sorted_polynomials[2][s_index] = components[2];
sorted_polynomials[3][s_index] = table_index;
++s_index;
}
}
return sorted_polynomials;
}

} // namespace bb::plonk
Original file line number Diff line number Diff line change
@@ -1 +1 @@
barretenberg_module(plonk_honk_shared polynomials)
barretenberg_module(plonk_honk_shared polynomials ultra_honk)
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#include "barretenberg/common/ref_array.hpp"
#include "barretenberg/flavor/flavor.hpp"
#include "barretenberg/polynomials/polynomial_store.hpp"
#include "barretenberg/stdlib_circuit_builders/plookup_tables/types.hpp"

#include <memory>

Expand All @@ -20,6 +21,7 @@ void construct_lookup_table_polynomials(RefArray<typename Flavor::Polynomial, 4>
// ^^^^^^^^^ ^^^^^^^^ ^^^^^^^ ^nonzero to ensure uniqueness and to avoid infinity commitments
// | table randomness
// ignored, as used for regular constraints and padding to the next power of 2.
// TODO(https://github.com/AztecProtocol/barretenberg/issues/1033): construct tables and counts at top of trace
ASSERT(dyadic_circuit_size > circuit.get_tables_size() + additional_offset);
size_t offset = dyadic_circuit_size - circuit.get_tables_size() - additional_offset;

Expand All @@ -37,78 +39,40 @@ void construct_lookup_table_polynomials(RefArray<typename Flavor::Polynomial, 4>
}

/**
* @brief Construct polynomials containing the sorted concatenation of the lookups and the lookup tables
*
* @tparam Flavor
* @param circuit
* @param dyadic_circuit_size
* @param additional_offset Additional space needed in polynomials to add randomness for zk (Plonk only)
* @return std::array<typename Flavor::Polynomial, 4>
* @brief Construct polynomial whose value at index i is the number of times the table entry at that index has been
* read.
* @details Read counts are needed for the log derivative lookup argument. The table polynomials are constructed as a
* concatenation of basic 3-column tables. Similarly, the read counts polynomial is constructed as the concatenation of
* read counts for the individual tables.
*/
template <typename Flavor>
std::array<typename Flavor::Polynomial, 4> construct_sorted_list_polynomials(typename Flavor::CircuitBuilder& circuit,
const size_t dyadic_circuit_size,
size_t additional_offset = 0)
void construct_lookup_read_counts(typename Flavor::Polynomial& read_counts,
typename Flavor::Polynomial& read_tags,
typename Flavor::CircuitBuilder& circuit,
size_t dyadic_circuit_size)
{
using Polynomial = typename Flavor::Polynomial;
std::array<Polynomial, 4> sorted_polynomials;
// Initialise the sorted concatenated list polynomials for the lookup argument
for (auto& s_i : sorted_polynomials) {
s_i = Polynomial(dyadic_circuit_size);
}

// The sorted list polynomials have (tables_size + lookups_size) populated entries. We define the index below so
// that these entries are written into the last indices of the polynomials. The values on the first
// dyadic_circuit_size - (tables_size + lookups_size) indices are automatically initialized to zero via the
// polynomial constructor.
size_t s_index = dyadic_circuit_size - (circuit.get_tables_size() + circuit.get_lookups_size()) - additional_offset;
ASSERT(s_index > 0); // We need at least 1 row of zeroes for the permutation argument
// TODO(https://github.com/AztecProtocol/barretenberg/issues/1033): construct tables and counts at top of trace
size_t offset = dyadic_circuit_size - circuit.get_tables_size();

size_t table_offset = offset; // offset of the present table in the table polynomials
// loop over all tables used in the circuit; each table contains data about the lookups made on it
for (auto& table : circuit.lookup_tables) {
const fr table_index(table.table_index);
auto& lookup_gates = table.lookup_gates;
for (size_t i = 0; i < table.size(); ++i) {
if (table.use_twin_keys) {
lookup_gates.push_back({
{
table.column_1[i].from_montgomery_form().data[0],
table.column_2[i].from_montgomery_form().data[0],
},
{
table.column_3[i],
0,
},
});
} else {
lookup_gates.push_back({
{
table.column_1[i].from_montgomery_form().data[0],
0,
},
{
table.column_2[i],
table.column_3[i],
},
});
}
}
table.initialize_index_map();

for (auto& gate_data : table.lookup_gates) {
// convert lookup gate data to an array of three field elements, one for each of the 3 columns
auto table_entry = gate_data.to_table_components(table.use_twin_keys);

#ifdef NO_TBB
std::sort(lookup_gates.begin(), lookup_gates.end());
#else
std::sort(std::execution::par_unseq, lookup_gates.begin(), lookup_gates.end());
#endif
// find the index of the entry in the table
auto index_in_table = table.index_map[table_entry];

for (const auto& entry : lookup_gates) {
const auto components = entry.to_sorted_list_components(table.use_twin_keys);
sorted_polynomials[0][s_index] = components[0];
sorted_polynomials[1][s_index] = components[1];
sorted_polynomials[2][s_index] = components[2];
sorted_polynomials[3][s_index] = table_index;
++s_index;
// increment the read count at the corresponding index in the full polynomial
size_t index_in_poly = table_offset + index_in_table;
read_counts[index_in_poly]++;
read_tags[index_in_poly] = 1; // tag is 1 if entry has been read 1 or more times
}
table_offset += table.size(); // set the offset of the next table within the polynomials
}
return sorted_polynomials;
}

} // namespace bb
Original file line number Diff line number Diff line change
@@ -1,21 +1,78 @@
#include "barretenberg/plonk_honk_shared/composer/composer_lib.hpp"
#include "barretenberg/common/slab_allocator.hpp"
#include "barretenberg/plonk_honk_shared/types/circuit_type.hpp"
#include "barretenberg/srs/factories/crs_factory.hpp"
#include "barretenberg/stdlib_circuit_builders/ultra_circuit_builder.hpp"
#include "barretenberg/stdlib_circuit_builders/ultra_flavor.hpp"

#include <array>
#include <gtest/gtest.h>

using namespace bb;

class ComposerLibTests : public ::testing::Test {
public:
using Flavor = UltraFlavor;
using FF = typename Flavor::FF;

protected:
static void SetUpTestSuite() { bb::srs::init_crs_factory("../srs_db/ignition"); }
};

/**
* @brief A test to demonstrate that lookup read counts/tags are computed correctly for a simple 'hand-computable' case
* using the uint32 XOR table
*
*/
TEST_F(ComposerLibTests, LookupReadCounts)
{
using Builder = UltraCircuitBuilder;
using Flavor = UltraFlavor;
using FF = typename Flavor::FF;
Flavor::CircuitBuilder circuit_constructor;
Flavor::ProvingKey proving_key = []() {
auto crs_factory = srs::factories::CrsFactory<bb::curve::BN254>();
auto crs = crs_factory.get_prover_crs(4);
return Flavor::ProvingKey(/*circuit_size=*/8, /*num_public_inputs=*/0);
}();
};
using Polynomial = typename Flavor::Polynomial;
auto UINT32_XOR = plookup::MultiTableId::UINT32_XOR;

Builder builder;

// define some very simply inputs to XOR
FF left{ 1 };
FF right{ 5 };

auto left_idx = builder.add_variable(left);
auto right_idx = builder.add_variable(right);

// create a single lookup from the uint32 XOR table
auto accumulators = plookup::get_lookup_accumulators(UINT32_XOR, left, right, /*is_2_to_1_lookup*/ true);
builder.create_gates_from_plookup_accumulators(UINT32_XOR, accumulators, left_idx, right_idx);

EXPECT_EQ(builder.lookup_tables.size(), 1); // we only used a single table
EXPECT_EQ(builder.lookup_tables[0].size(), 4096); // table has size 64*64 (6 bit operands)

size_t circuit_size = 8192;

Polynomial read_counts{ circuit_size };
Polynomial read_tags{ circuit_size };

construct_lookup_read_counts<Flavor>(read_counts, read_tags, builder, circuit_size);

// The table polys are constructed at the bottom of the trace, thus so to are the counts/tags
// TODO(https://github.com/AztecProtocol/barretenberg/issues/1033): construct tables and counts at top of trace
size_t offset = circuit_size - builder.get_tables_size();

// The uint32 XOR lookup table is constructed for 6 bit operands via double for loop that iterates through the left
// operand externally (0 to 63) then the right operand internally (0 to 63). Computing (1 XOR 5) will thus result in
// 1 lookup from the (1*64 + 5)th index in the table and 5 lookups from the (0*64 + 0)th index (for the remaining 5
// limbs that are all 0). The counts and tags at all other indices should be zero.
size_t idx = 0;
for (auto [count, tag] : zip_view(read_counts, read_tags)) {
if (idx == (0 + offset)) {
EXPECT_EQ(count, 5);
EXPECT_EQ(tag, 1);
} else if (idx == (69 + offset)) {
EXPECT_EQ(count, 1);
EXPECT_EQ(tag, 1);
} else {
EXPECT_EQ(count, 0);
EXPECT_EQ(tag, 0);
}
idx++;
}
}
Loading

0 comments on commit 6f1212f

Please sign in to comment.