Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

eryeer · 2019-12-24T04:07:41Z

Summary or problem description

When there are many elements cached in the FIFOSet, its Add and ExceptWith efficiency will become worse. It is very obvious during performance testing, in the TaskManager's RestartTask,

neo/src/neo/Network/P2P/TaskManager.cs

Lines 140 to 142 in 44e7f3b

    
           private void OnRestartTasks(InvPayload payload) 
        
           { 
        
               knownHashes.ExceptWith(payload.Hashes);

the ExceptWith method is used to exclude 16,000 of the 62,800 knownHashes. Each delete performance is 0.015s and totally spends 240s. The process will get stuck. That means under high overload, if the nodes are out of sync due to lacking of a large number of transactions, obtaining transactions will become extremely slow.
The reason is mainly the computational efficiency issue of the OrderedDictionary. The unit test below will show the performance for large number of elements.

        [TestMethod]
        public void TestFIFOSet()
        {
            Stopwatch stopwatch = new Stopwatch();
            var bucket = new FIFOSet<int>(150_000);
            stopwatch.Start();
            for (int i = 1; i <= 550_000; i++)
            {
                bucket.Add(i);
            }
            stopwatch.Stop();
            Console.WriteLine($"Add timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();
            var items = new int[10000];
            var value = 550_000;
            for (int i = 0; i <= 9999; i++)
            {
                items[i] = value;
                value -= 50;
            }
            stopwatch.Start();
            bucket.ExceptWith(items);
            stopwatch.Stop();
            Console.WriteLine($"except with timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();

            stopwatch.Start();
            var ret = bucket.Contains(140_000);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();

            stopwatch.Start();
            ret = bucket.Contains(545_001);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();
        }

Console prints

Add timespan: 21.3015352s
except with timespan: 10.8901768s
contains with timespan: 0.0001552s result: False
contains with timespan: 5E-07s result: True

Do you have any solution you want to propose?

Currently, we only use FIFOSet to cache UInt256 hashes. The elimination order is based on the sequence of the hashes. In this case the elimination order is not critical. Therefore, we can use the List of HashSet instead, create a List containing multiple HashSets, and each element is stored in the HashSet in chronological order. When the previous HashSet is full, it starts to load the next HashSet. If all HashSets are full, delete the first HashSet in the List and add a new empty HashSet at the end of the List to hold the newly added elements.
The design of this class is as follows

    public class HashSetCache<T> : IEnumerable<T> where T : IEquatable<T>
    {
        private readonly int hashSetCapacity; //maxmium of elements in each hashset
        private readonly List<HashSet<T>> sets = new List<HashSet<T>>();

        public int Size
        {
            get
            {
                int size = 0;
                foreach (var set in sets)
                {
                    size += set.Count;
                }
                return size;
            }
        }

        public HashSetCache(int hashSetCapacity, int hashSetCount = 10)
        {
            if (hashSetCapacity <= 0) throw new ArgumentOutOfRangeException(nameof(hashSetCapacity));
            if (hashSetCount <= 0 || hashSetCount > 20) throw new ArgumentOutOfRangeException($"{nameof(hashSetCount)} should between 1 and 20");

            this.hashSetCapacity = hashSetCapacity;
            for (int i = 0; i < hashSetCount; i++)
            {
                sets.Add(new HashSet<T>());
            }
        }

        public bool Add(T item)
        {
            if (Contains(item)) return false;
            foreach (var set in sets)
            {
                if (set.Count < hashSetCapacity)
                {
                    return set.Add(item);
                }
            }
            sets.RemoveAt(0);
            var newSet = new HashSet<T>
            {
                item
            };
            sets.Add(newSet);
            return true;
        }

        public bool Contains(T item)
        {
            foreach (var set in sets)
            {
                if (set.Contains(item)) return true;
            }
            return false;
        }

        public void ExceptWith(IEnumerable<T> items)
        {
            foreach (var item in items)
            {
                foreach (var set in sets)
                {
                    if (set.Remove(item)) break;
                }
            }
        }

        public IEnumerator<T> GetEnumerator()
        {
            foreach (var set in sets)
            {
                foreach (var item in set)
                {
                    yield return item;
                }
            }
        }

        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    }

The unit test shows this design is much faster than OrderedDictionary under the same input condition.

        [TestMethod]
        public void TestHashSetCache()
        {
            Stopwatch stopwatch = new Stopwatch();
            var bucket = new HashSetCache<int>(15_000);
            stopwatch.Start();
            for (int i = 1; i <= 550_000; i++)
            {
                bucket.Add(i);
            }
            stopwatch.Stop();
            Console.WriteLine($"Add timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();
            var items = new int[10000];
            var value = 550_000;
            for (int i = 0; i <= 9999; i++)
            {
                items[i] = value;
                value -= 50;
            }

            stopwatch.Start();
            bucket.ExceptWith(items);
            stopwatch.Stop();
            Console.WriteLine($"except with timespan: {stopwatch.Elapsed.TotalSeconds}s");
            stopwatch.Reset();
            stopwatch.Start();
            var ret = bucket.Contains(140_000);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();

            stopwatch.Start();
            ret = bucket.Contains(545_001);
            stopwatch.Stop();
            Console.WriteLine($"contains with timespan: {stopwatch.Elapsed.TotalSeconds}s result: {ret}");
            stopwatch.Reset();
        }

Here is the Console print

Add timespan: 0.2357612s
except with timespan: 0.0035483s
contains with timespan: 2.1E-06s result: False
contains with timespan: 4.6E-06s result: True

Neo Version

Neo 3

Where in the software does this update applies to?

Consensus
Ledger

The text was updated successfully, but these errors were encountered:

shargon · 2019-12-24T11:23:29Z

Did you tested other alternatives to OrderedDictionary ?

eryeer · 2019-12-25T02:12:10Z

No, do you have any other alternative ideas?

eryeer added the Discussion Initial issue state - proposed but not yet accepted label Dec 24, 2019

This was referenced Dec 30, 2019

Consensus optimizations taking the advantage of ChangeViewReason #988

Open

Use HashsetCache instead of FIFOSet for KnownHashes and SentHashes #1389

Merged

erikzhang mentioned this issue Jan 2, 2020

Optimize FIFOSet #1392

Closed

vncoelho closed this as completed in #1389 Jan 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

eryeer commented Dec 24, 2019 •

edited

Loading

shargon commented Dec 24, 2019

eryeer commented Dec 25, 2019

Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

Use List of HashSet to cache hashes instead of OrderedDictionary in FIFOSet #1382

Comments

eryeer commented Dec 24, 2019 • edited Loading

shargon commented Dec 24, 2019

eryeer commented Dec 25, 2019

eryeer commented Dec 24, 2019 •

edited

Loading