Skip to content

Commit

Permalink
fix: reduce calls to MinNamespace & MaxNamespace (#285)
Browse files Browse the repository at this point in the history
In this issue #216, it was
raised that
1. More time was spent allocating than hashing in the HashLeaf function
2. The min and max namespace functions were taking up a large amount of
RAM

---

## More time was spent allocating than hashing in the HashLeaf function

The benchmark test described in the issue listed above no longer exists,
and when I run the `BenchmarkComputeRoot` in the nmt package
[HERE](https://github.com/bruscar010/nmt/blob/main/nmt_test.go#L696) I
can't replicate memory allocation taking longer than the hashing,
although this may be related to my test parameters.

Below are FlameGraphs of CPU/Mem usage (I've updated the Benchmark to
have 20k leaf nodes with 512 bytes transaction size)

[Base Mem
Flamegraph](https://flamegraph.com/share/6a04ffab-c866-11ef-9832-26c3e5347170)
[Base CPU
Flamegraph](https://flamegraph.com/share/efce1d16-c868-11ef-b027-3a58f36d25ce)

```
BenchmarkComputeRoot/64-leaves-12         	     100	    141785 ns/op	   65166 B/op	    1535 allocs/op
BenchmarkComputeRoot/128-leaves-12        	     100	    280652 ns/op	  123894 B/op	    3074 allocs/op
BenchmarkComputeRoot/256-leaves-12        	     100	    501392 ns/op	  254611 B/op	    6154 allocs/op
```

I've instead focused on the second issue

---

## The min and max namespace functions were taking up a large amount of
RAM

### Solution 1: Reduce the number of calls to MinNamespace &
MaxNamespace

The HashNode function made 4 duplicate function calls to
MaxNamespace/MinNamespace for the same node.
```
1. HashNode -> ValidateNodes -> ValidateNodeFormat.Min/MaxNamespace
2. HashNode -> ValidateNodes -> validateSiblingsNamespaceOrder -> ValidateNodeFormat(left|right).Min/MaxNamespace
3. HashNode -> ValidateNodes -> validateSiblingsNamespaceOrder.Min/MaxNamespace
4. HashNode.Min/MaxNamespace
```

After reducing to a single call, here is the new benchmark output
(running `go test -run="none" -bench=. -benchtime=100x -benchmem`)
```
BenchmarkComputeRoot/64-leaves-12         	     100	    137292 ns/op	   60100 B/op	     905 allocs/op
BenchmarkComputeRoot/128-leaves-12        	     100	    237664 ns/op	  113758 B/op	    1804 allocs/op
BenchmarkComputeRoot/256-leaves-12        	     100	    462802 ns/op	  234241 B/op	    3604 allocs/op
```

### Solution 2: Update the MinNamespace/MaxNamespace to return a slice
of the existing array rather than creating a new underlying array

Copying the slice stops the possibility [array capacity memory
leaks](https://itnext.io/mastering-memory-management-in-go-avoiding-slice-related-leaks-7d5e97ee48d4)
if for some reason the namespace ID variable's lifetime is longer than
the lifetime of the original slice, but I don't think that is the case.
[Slicing with a max capacity also stops the potential for appends to the
namespace ID slice editing the
underlying](https://build-your-own.org/blog/20230316_go_full_slice/)
`ns|ns|hash` array.

Benchmark output after this change
```
BenchmarkComputeRoot/64-leaves-12         	     100	    136436 ns/op	   58180 B/op	     653 allocs/op
BenchmarkComputeRoot/128-leaves-12        	     100	    241836 ns/op	  109747 B/op	    1297 allocs/op
BenchmarkComputeRoot/256-leaves-12        	     100	    470231 ns/op	  225924 B/op	    2583 allocs/op
```

[New Mem
Flamegraph](https://flamegraph.com/share/71b8ef18-c86a-11ef-b027-3a58f36d25ce)
[New CPU
Flamegraph](https://flamegraph.com/share/7f46d1b4-c86a-11ef-9832-26c3e5347170)

---

The improvements made by the first solution are made redundant by the
second, but I was initially reluctant to replace the slice copy in case
it would lead to a memory leak. As I got more familiar with the package,
I became more confident that a slicing the array may be more
appropriate.

**Note**
- Given that validation is done in the `Push` method, adding an
`UnsafeHashNode` that skips validation was also a method I considered,
but I felt that this could lead to more complications & would not make
any notable differences compared to either of the other solutions.
- After reworking the `ValidateSiblings` function, I also removed the
associated tests, but it looks like the same test cases also exist in
`TestValidateNodes`
[HERE](https://github.com/bruscar010/nmt/blob/main/hasher_test.go#L684)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

- **New Features**
- Enhanced namespace validation methods with improved error reporting
and context.
- Introduced `nsIDRange` struct for better namespace ID range
management.

- **Refactor**
- Streamlined namespace-related functions in `MinNamespace` and
`MaxNamespace`.
- Improved code readability and maintainability in namespace merkle tree
hasher.

- **Tests**
	- Added a benchmark for `ComputeRoot` with 20,000 leaves.
	- Removed existing sibling node validation test.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: francis <28549164+FrancisLennon17@users.noreply.github.com>
  • Loading branch information
bruscar010 and bruscar010 authored Jan 17, 2025
1 parent 2fb6332 commit 5b1b3d8
Show file tree
Hide file tree
Showing 4 changed files with 53 additions and 90 deletions.
86 changes: 50 additions & 36 deletions hasher.go
Original file line number Diff line number Diff line change
Expand Up @@ -154,9 +154,8 @@ func (n *NmtHasher) BlockSize() int {

func (n *NmtHasher) EmptyRoot() []byte {
n.baseHasher.Reset()
emptyNs := bytes.Repeat([]byte{0}, int(n.NamespaceLen))
h := n.baseHasher.Sum(nil)
digest := append(append(emptyNs, emptyNs...), h...)
digest := append(make([]byte, int(n.NamespaceLen)*2), h...)

return digest
}
Expand Down Expand Up @@ -212,56 +211,73 @@ func (n *NmtHasher) MustHashLeaf(ndata []byte) []byte {
return res
}

// ValidateNodeFormat checks whether the supplied node conforms to the
// namespaced hash format and returns ErrInvalidNodeLen if not.
func (n *NmtHasher) ValidateNodeFormat(node []byte) (err error) {
// nsIDRange represents the range of namespace IDs with minimum and maximum values.
type nsIDRange struct {
Min, Max namespace.ID
}

// tryFetchNodeNSRange attempts to return the min and max namespace ids.
// It will return an ErrInvalidNodeLen | ErrInvalidNodeNamespaceOrder
// if the supplied node does not conform to the namespaced hash format.
func (n *NmtHasher) tryFetchNodeNSRange(node []byte) (nsIDRange, error) {
expectedNodeLen := n.Size()
nodeLen := len(node)
if nodeLen != expectedNodeLen {
return fmt.Errorf("%w: got: %v, want %v", ErrInvalidNodeLen, nodeLen, expectedNodeLen)
return nsIDRange{}, fmt.Errorf("%w: got: %v, want %v", ErrInvalidNodeLen, nodeLen, expectedNodeLen)
}
// check the namespace order
minNID := namespace.ID(MinNamespace(node, n.NamespaceSize()))
maxNID := namespace.ID(MaxNamespace(node, n.NamespaceSize()))
if maxNID.Less(minNID) {
return fmt.Errorf("%w: max namespace ID %d is less than min namespace ID %d ", ErrInvalidNodeNamespaceOrder, maxNID, minNID)
return nsIDRange{}, fmt.Errorf("%w: max namespace ID %d is less than min namespace ID %d ", ErrInvalidNodeNamespaceOrder, maxNID, minNID)
}
return nil
return nsIDRange{Min: minNID, Max: maxNID}, nil
}

// validateSiblingsNamespaceOrder checks whether left and right as two sibling
// nodes in an NMT have correct namespace IDs relative to each other, more
// specifically, the maximum namespace ID of the left sibling should not exceed
// the minimum namespace ID of the right sibling. It returns ErrUnorderedSiblings error if the check fails.
func (n *NmtHasher) validateSiblingsNamespaceOrder(left, right []byte) (err error) {
if err := n.ValidateNodeFormat(left); err != nil {
return fmt.Errorf("%w: left node does not match the namesapce hash format", err)
// ValidateNodeFormat checks whether the supplied node conforms to the
// namespaced hash format and returns an error if not.
func (n *NmtHasher) ValidateNodeFormat(node []byte) error {
_, err := n.tryFetchNodeNSRange(node)
return err
}

// tryFetchLeftAndRightNSRange attempts to return the min/max namespace ids of both
// the left and right nodes. It verifies whether left
// and right comply by the namespace hash format, and are correctly ordered
// according to their namespace IDs.
func (n *NmtHasher) tryFetchLeftAndRightNSRanges(left, right []byte) (
nsIDRange,
nsIDRange,
error,
) {
var lNsRange nsIDRange
var rNsRange nsIDRange
var err error

lNsRange, err = n.tryFetchNodeNSRange(left)
if err != nil {
return lNsRange, rNsRange, err
}
if err := n.ValidateNodeFormat(right); err != nil {
return fmt.Errorf("%w: right node does not match the namesapce hash format", err)
rNsRange, err = n.tryFetchNodeNSRange(right)
if err != nil {
return lNsRange, rNsRange, err
}
leftMaxNs := namespace.ID(MaxNamespace(left, n.NamespaceSize()))
rightMinNs := namespace.ID(MinNamespace(right, n.NamespaceSize()))

// check the namespace range of the left and right children
if rightMinNs.Less(leftMaxNs) {
return fmt.Errorf("%w: the maximum namespace of the left child %x is greater than the min namespace of the right child %x", ErrUnorderedSiblings, leftMaxNs, rightMinNs)
if rNsRange.Min.Less(lNsRange.Max) {
err = fmt.Errorf("%w: the min namespace ID of the right child %d is less than the max namespace ID of the left child %d", ErrUnorderedSiblings, rNsRange.Min, lNsRange.Max)
}
return nil

return lNsRange, rNsRange, err
}

// ValidateNodes is a helper function to verify the
// validity of the inputs of HashNode. It verifies whether left
// and right comply by the namespace hash format, and are correctly ordered
// according to their namespace IDs.
func (n *NmtHasher) ValidateNodes(left, right []byte) error {
if err := n.ValidateNodeFormat(left); err != nil {
return err
}
if err := n.ValidateNodeFormat(right); err != nil {
return err
}
return n.validateSiblingsNamespaceOrder(left, right)
_, _, err := n.tryFetchLeftAndRightNSRanges(left, right)
return err
}

// HashNode calculates a namespaced hash of a node using the supplied left and
Expand All @@ -278,21 +294,19 @@ func (n *NmtHasher) ValidateNodes(left, right []byte) error {
// If the namespace range of the right child is start=end=MAXNID, indicating that it represents the root of a subtree whose leaves all have the namespace ID of `MAXNID`, then exclude the right child from the namespace range calculation. Instead,
// assign the namespace range of the left child as the parent's namespace range.
func (n *NmtHasher) HashNode(left, right []byte) ([]byte, error) {
// validate the inputs
if err := n.ValidateNodes(left, right); err != nil {
// validate the inputs & fetch the namespace ranges
lRange, rRange, err := n.tryFetchLeftAndRightNSRanges(left, right)
if err != nil {
return nil, err
}

h := n.baseHasher
h.Reset()

leftMinNs, leftMaxNs := MinNamespace(left, n.NamespaceLen), MaxNamespace(left, n.NamespaceLen)
rightMinNs, rightMaxNs := MinNamespace(right, n.NamespaceLen), MaxNamespace(right, n.NamespaceLen)

// compute the namespace range of the parent node
minNs, maxNs := computeNsRange(leftMinNs, leftMaxNs, rightMinNs, rightMaxNs, n.ignoreMaxNs, n.precomputedMaxNs)
minNs, maxNs := computeNsRange(lRange.Min, lRange.Max, rRange.Min, rRange.Max, n.ignoreMaxNs, n.precomputedMaxNs)

res := make([]byte, 0)
res := make([]byte, 0, len(minNs)*2)
res = append(res, minNs...)
res = append(res, maxNs...)

Expand Down
50 changes: 0 additions & 50 deletions hasher_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -283,56 +283,6 @@ func TestHashNode_Error(t *testing.T) {
}
}

func TestValidateSiblings(t *testing.T) {
// create a dummy hash to use as the digest of the left and right child
randHash := createByteSlice(sha256.Size, 0x01)

type children struct {
l []byte // namespace hash of the left child with the format of MinNs||MaxNs||h
r []byte // namespace hash of the right child with the format of MinNs||MaxNs||h
}

tests := []struct {
name string
nidLen namespace.IDSize
children children
wantErr bool
}{
{
"wrong left node format", 2,
children{concat([]byte{0, 0, 1, 1}, randHash[:len(randHash)-1]), concat([]byte{0, 0, 1, 1}, randHash)},
true,
},
{
"wrong right node format", 2,
children{concat([]byte{0, 0, 1, 1}, randHash), concat([]byte{0, 0, 1, 1}, randHash[:len(randHash)-1])},
true,
},
{
"left.maxNs>right.minNs", 2,
children{concat([]byte{0, 0, 1, 1}, randHash), concat([]byte{0, 0, 1, 1}, randHash)},
true,
},
{
"left.maxNs=right.minNs", 2,
children{concat([]byte{0, 0, 1, 1}, randHash), concat([]byte{1, 1, 2, 2}, randHash)},
false,
},
{
"left.maxNs<right.minNs", 2,
children{concat([]byte{0, 0, 1, 1}, randHash), concat([]byte{2, 2, 3, 3}, randHash)},
false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
n := NewNmtHasher(sha256.New(), tt.nidLen, false)
err := n.validateSiblingsNamespaceOrder(tt.children.l, tt.children.r)
assert.Equal(t, tt.wantErr, err != nil)
})
}
}

func TestValidateNodeFormat(t *testing.T) {
hashValue := createByteSlice(sha256.Size, 0x01)
minNID := createByteSlice(2, 0x00)
Expand Down
6 changes: 2 additions & 4 deletions nmt.go
Original file line number Diff line number Diff line change
Expand Up @@ -682,16 +682,14 @@ type LeafRange struct {
// which is formatted as: minimum namespace ID || maximum namespace ID || hash
// digest.
func MinNamespace(hash []byte, size namespace.IDSize) []byte {
min := make([]byte, 0, size)
return append(min, hash[:size]...)
return hash[:size:size]
}

// MaxNamespace extracts the maximum namespace ID from a given namespace hash,
// which is formatted as: minimum namespace ID || maximum namespace ID || hash
// digest.
func MaxNamespace(hash []byte, size namespace.IDSize) []byte {
max := make([]byte, 0, size)
return append(max, hash[size:size*2]...)
return hash[size : size*2 : size*2]
}

// Size returns the number of leaves in the tree.
Expand Down
1 change: 1 addition & 0 deletions nmt_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -704,6 +704,7 @@ func BenchmarkComputeRoot(b *testing.B) {
{"64-leaves", 64, 8, 256},
{"128-leaves", 128, 8, 256},
{"256-leaves", 256, 8, 256},
{"20k-leaves", 20000, 8, 512},
}

for _, tt := range tests {
Expand Down

0 comments on commit 5b1b3d8

Please sign in to comment.