From 9a0f8a08fd46ae998f6ccf322dd1106eda54f2fb Mon Sep 17 00:00:00 2001 From: Jordi Arranz Date: Tue, 26 Sep 2023 10:16:43 +0100 Subject: [PATCH 01/20] Created doc --- rlog/2023-09-26-wakurtosis-retro.mdx | 67 ++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 rlog/2023-09-26-wakurtosis-retro.mdx diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx new file mode 100644 index 00000000..1da74525 --- /dev/null +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -0,0 +1,67 @@ +WakurtosisRetro.md + +# Wakurtosis: Lessons Learned for Large-Scale Protocol Simulation +### VAC-DST Team (Alberto Rendo, Ganesh Narayanaswamy, Jordi Arranz) +### Sep 29, 2023 + +### TL;DR +The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. + +### Introduction +Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. + +Specifically, the most significant issues arose during middle-scale simulations of 600 nodes and high-traffic patterns exceeding 100 msg/s. In these scenarios, most simulations either failed to complete reliably or broke down entirely before finishing. Even when simulations managed to fully run, results were often skewed due to the inability of the infrastructure to inject the traffic. + +These challenges stemmed from the massive hardware requirements for simulations at such scales and traffic loads, which had to be run on a single machine under Kurtosis. This led to inadequate sampling rates, message loss, and other data inconsistencies. The system struggled to provide the computational power, memory capacity, and I/O throughput needed for smooth operations under such loads. + +In summary, while Wakurtosis successfully handled small-to-medium scales, simulations in the range of 600 nodes and 100 msg/s and beyond exposed restrictive bottlenecks tied to the limitations of the underlying Kurtosis platform and constraints around single-machine deployment. + +### Key Challenges with the Initial Kurtosis Approach + +Wakurtosis faced two fundamental challenges in achieving its goal of large-scale Waku protocol testing under the initial Kurtosis framework: + +#### Hardware Limitations +Kurtosis' constraint of running all simulations on a single machine led to severe resource bottlenecks approaching 1000+ nodes. Specific limitations included: + +##### CPU +To run the required parallel containers, our simulations demanded a minimum of 16 cores with 32 cores (64 threads) often employed. The essence of Wakurtosis simulations involved running multiple containers in parallel to mimic a network and its topology, with each container functioning as a separate node. Having the containers operate in parallel provided a basic yet realistic representation of network behavior. In this scenario, the CPU acts as the workhorse, needing to process the activities of every node simultaneously. Our computations indicated a need for at least 16 cores to ensure seamless simulations without lag or delays from overloading. However, even higher core counts could not robustly reach our target scale due to inherent single-machine limitations. Commercial constraints also exist regarding the maximum CPU cores available in a single machine. Ultimately, the single-machine approach proved insufficient for the parallelism required to smoothly simulate the intended network sizes. + +##### Memory +Memory serves as the temporary storage during simulations, holding data that's currently in use. Each container in our simulation had a baseline memory requirement of approximately 20MB RAM to operate efficiently. While this is minimal on a per-container basis, the aggregate demand could scale up significantly when operating over 10k nodes. However, even at full scale, memory consumption never exceeded 128GB, and remained manageable for the Wakurtosis simulations. So although combined memory requirements could escalate for massive simulations, it was never a major limiting factor for Wakurtosis itself or our hardware infrastructure. + +##### Disk I/O throttling +Disk Input/Output (I/O) refers to the reading (input) and writing (output) of data in the system. In our scenario, the simulations created a heavy load on the I/O operations due to continuous data flow and logging activities for each container. As the number of containers (nodes) increased, the simultaneous read/write operations caused throttling, akin to a traffic jam, leading to slower data access and potential data loss. + +##### ARP table exhaustion +Another important issue we encounteres is the exhaustion of the ARP table. \The Address Resolution Protocol (ARP) is pivotal for routing, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. However, ARP tables have a size limit. With the vast number of containers running, we quickly ran into situations where the ARP tables were filled to capacity, leading to routing failures. + + +#### Restrictive Software Environment +While initially promising, Kurtosis imposed restrictions that challenged large-scale testing: +- No multi-cluster support, limiting simulations to a single machine's resources. +- Strategic deprioritization of large simulations, influenced by partnerships. Nullified promised multi-cluster capabilities. +- Discontinuation of advanced networking features critical for flexible topology modeling. +- No straightforward way to model key QoS parameters like delay, loss, and bandwidth configurations. +- Constraints from orchestration language limitations that complicated dynamic topology modeling. + +#### Impact on Testing Scope +These hardware and software limitations manifested in two key ways: +- Inflexible Network Topologies: The inability to realistically simulate diverse network configurations and conditions. +- Superficial Protocol Implementation: A gossip model that lacked nuances critical for gathering meaningful insights. + +### The Pivot to Kubernetes and Shadow + +To circumvent most of the limitations of our previous approach, we decided to make a strategic transition to Kubernetes, primarily drawn to its inherent capabilities for cluster orchestration and scaling. The major advantage that Kubernetes brings to the table is its robust support for multi-cluster simulations, allowing us to effectively reach 10K-node simulations with high granularity. The ongoing shift to Kubernetes is expected to leverage its significant advantage in multi-cluster simulation capabilities. Even though this transition demands a considerable architectural overhaul, we believe that the potential benefits of Kubernetes' flexibility and scalability are worth the effort. + +Alongside Kubernetes, we incorporated Shadow into our testing and simulation toolkit. Shadow's unique strength lies in its ability to run real application binaries on a simulated network, offering a high level of accuracy even at greater scales. With Shadow, we're confident in pushing our simulations beyond the 50K-node mark. Moreover, since Shadow employs an event-based approach, it not only allows us to achieve these scales but also opens up the potential for simulations that run faster than real-time scenarios. Additionally, Shadow provides out-of-the-box support for simulating different QoS parameters like delay, loss, and bandwidth configurations on the virtual network. + +By combining both Kubernetes and Shadow, we aim to substantially enhance our testing framework. Kubernetes, with its multi-cluster simulation capabilities, will offer a wider array of practical insights during large-scale simulations. On the other hand, Shadow's theoretical modeling strengths allow us to develop a deeper comprehension of potential behaviors in even larger network environments. + +#### Conclusion +The journey to develop Wakurtosis has underscored the inherent challenges in large-scale protocol simulation. While the Kurtosis platform initially showed promise, it quickly struggled to handle the scale and features we were aiming to. Still, Wakurtosis still proved a useful tool for small and medium-scale simulations of the Waku protocol. + +These limitations forced a pivot to a hybrid Kubernetes and Shadow approach, promising enhanced scalability, flexibility, and accuracy for large-scale simulations. This experience emphasized the importance of anticipating potential bottlenecks when scaling up complexity. It also highlighted the value of blending practical testing and theoretical modeling to gain meaningful insights. + +Integrating Kubernetes and Shadow represents a renewed commitment to pushing the boundaries of what is possible in large-scale protocol simulation. This aims not just to rigorously stress test Waku, but to set a precedent for how to approach, design, and execute such simulations overall going forward. Through continuous learning, adaptation, and innovation, we remain dedicated to achieving the most accurate, reliable, and extensive simulations possible. + + From b216d1477d90e78f2604271845a0021567f76750 Mon Sep 17 00:00:00 2001 From: Jordi Arranz Date: Wed, 27 Sep 2023 16:50:15 +0100 Subject: [PATCH 02/20] Re-structured the Kurtosis section --- rlog/2023-09-26-wakurtosis-retro.mdx | 44 +++++++++++++++------------- 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 1da74525..dd341173 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -1,11 +1,21 @@ -WakurtosisRetro.md +--- +title: 'Wakurtosis: Lessons Learned for Large-Scale Protocol Simulation' +date: 2023-09-26 12:00:00 +authors: daimakaimura +published: true +slug: Wakurtosis-Retrospective +categories: wakurtosis, waku, dst -# Wakurtosis: Lessons Learned for Large-Scale Protocol Simulation -### VAC-DST Team (Alberto Rendo, Ganesh Narayanaswamy, Jordi Arranz) -### Sep 29, 2023 +toc_min_heading_level: 2 +toc_max_heading_level: 5 +--- -### TL;DR -The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. +## Wakurtosis: Lessons Learned for Large-Scale Protocol Simulation + + + +**TL;DR** +The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. Here we will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. ### Introduction Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. @@ -14,7 +24,7 @@ Specifically, the most significant issues arose during middle-scale simulations These challenges stemmed from the massive hardware requirements for simulations at such scales and traffic loads, which had to be run on a single machine under Kurtosis. This led to inadequate sampling rates, message loss, and other data inconsistencies. The system struggled to provide the computational power, memory capacity, and I/O throughput needed for smooth operations under such loads. -In summary, while Wakurtosis successfully handled small-to-medium scales, simulations in the range of 600 nodes and 100 msg/s and beyond exposed restrictive bottlenecks tied to the limitations of the underlying Kurtosis platform and constraints around single-machine deployment. +In summary, while Wakurtosis successfully handled small-to-medium scales, with simulations in the range of 600 nodes and 10 msg/s and beyond exposed restrictive bottlenecks tied to the limitations of the underlying Kurtosis platform and constraints around single-machine deployment. ### Key Challenges with the Initial Kurtosis Approach @@ -33,21 +43,15 @@ Memory serves as the temporary storage during simulations, holding data that's c Disk Input/Output (I/O) refers to the reading (input) and writing (output) of data in the system. In our scenario, the simulations created a heavy load on the I/O operations due to continuous data flow and logging activities for each container. As the number of containers (nodes) increased, the simultaneous read/write operations caused throttling, akin to a traffic jam, leading to slower data access and potential data loss. ##### ARP table exhaustion -Another important issue we encounteres is the exhaustion of the ARP table. \The Address Resolution Protocol (ARP) is pivotal for routing, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. However, ARP tables have a size limit. With the vast number of containers running, we quickly ran into situations where the ARP tables were filled to capacity, leading to routing failures. +Another important issue we encounteres is the exhaustion of the ARP table. The Address Resolution Protocol (ARP) is pivotal for routing, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. However, ARP tables have a size limit. With the vast number of containers running, we quickly ran into situations where the ARP tables were filled to capacity, leading to routing failures. + +#### Kurtosis +The Kurtosis framework, though initially appearing to be a promising solution, presented multiple limitations when applied to large-scale testing. One of its major constraints was the lack of multi-cluster support, which restricted simulations to the resources of a single machine. This limitation became even more pronounced when the platform strategically deprioritized large-scale simulations, a decision seemingly influenced by specific partnerships. This decision effectively nullified any anticipated multi-cluster capabilities. -#### Restrictive Software Environment -While initially promising, Kurtosis imposed restrictions that challenged large-scale testing: -- No multi-cluster support, limiting simulations to a single machine's resources. -- Strategic deprioritization of large simulations, influenced by partnerships. Nullified promised multi-cluster capabilities. -- Discontinuation of advanced networking features critical for flexible topology modeling. -- No straightforward way to model key QoS parameters like delay, loss, and bandwidth configurations. -- Constraints from orchestration language limitations that complicated dynamic topology modeling. +Further complicating the situation was Kurtosis's decision to discontinue certain advanced networking features that were previously critical for modeling flexible network topologies. Additionally, the platform lacked an intuitive mechanism to represent key Quality of Service (QoS) parameters, such as delay, loss, and bandwidth configurations. These constraints were exacerbated by limitations in the orchestration language used by Kurtosis, which added complexity to dynamic topology modeling. -#### Impact on Testing Scope -These hardware and software limitations manifested in two key ways: -- Inflexible Network Topologies: The inability to realistically simulate diverse network configurations and conditions. -- Superficial Protocol Implementation: A gossip model that lacked nuances critical for gathering meaningful insights. +The array of hardware and software limitations imposed by Kurtosis had significant ramifications on our testing capabilities. The constraints primarily manifested in the inability to realistically simulate diverse network configurations and conditions. This inflexibility in network topologies was a significant setback. Moreover, when it came to protocol implementation, Kurtosis's approach was rather rudimentary. Relying on a basic gossip model, the platform missed capturing the nuances that are critical for deriving meaningful insights from the simulations. ### The Pivot to Kubernetes and Shadow @@ -58,7 +62,7 @@ Alongside Kubernetes, we incorporated Shadow into our testing and simulation too By combining both Kubernetes and Shadow, we aim to substantially enhance our testing framework. Kubernetes, with its multi-cluster simulation capabilities, will offer a wider array of practical insights during large-scale simulations. On the other hand, Shadow's theoretical modeling strengths allow us to develop a deeper comprehension of potential behaviors in even larger network environments. #### Conclusion -The journey to develop Wakurtosis has underscored the inherent challenges in large-scale protocol simulation. While the Kurtosis platform initially showed promise, it quickly struggled to handle the scale and features we were aiming to. Still, Wakurtosis still proved a useful tool for small and medium-scale simulations of the Waku protocol. +The journey to develop Wakurtosis has underscored the inherent challenges in large-scale protocol simulation. While the Kurtosis platform initially showed promise, it quickly struggled to handle the scale and features we were aiming to. Still, Wakurtosis proved a useful tool for analysing the protocol at moderate scales and loads. These limitations forced a pivot to a hybrid Kubernetes and Shadow approach, promising enhanced scalability, flexibility, and accuracy for large-scale simulations. This experience emphasized the importance of anticipating potential bottlenecks when scaling up complexity. It also highlighted the value of blending practical testing and theoretical modeling to gain meaningful insights. From 39dadc71cd3943ef6feab1e2bd469c58633c2054 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Mon, 23 Oct 2023 14:13:25 +0100 Subject: [PATCH 03/20] Added Semantic Line Breaks --- rlog/2023-09-26-wakurtosis-retro.mdx | 82 ++++++++++++++++++++++------ 1 file changed, 64 insertions(+), 18 deletions(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index dd341173..8eb0d45d 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -15,14 +15,20 @@ toc_max_heading_level: 5 **TL;DR** -The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. Here we will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. +The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. +Here we will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. ### Introduction -Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. +Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. +While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. -Specifically, the most significant issues arose during middle-scale simulations of 600 nodes and high-traffic patterns exceeding 100 msg/s. In these scenarios, most simulations either failed to complete reliably or broke down entirely before finishing. Even when simulations managed to fully run, results were often skewed due to the inability of the infrastructure to inject the traffic. +Specifically, the most significant issues arose during middle-scale simulations of 600 nodes and high-traffic patterns exceeding 100 msg/s. +In these scenarios, most simulations either failed to complete reliably or broke down entirely before finishing. +Even when simulations managed to fully run, results were often skewed due to the inability of the infrastructure to inject the traffic. -These challenges stemmed from the massive hardware requirements for simulations at such scales and traffic loads, which had to be run on a single machine under Kurtosis. This led to inadequate sampling rates, message loss, and other data inconsistencies. The system struggled to provide the computational power, memory capacity, and I/O throughput needed for smooth operations under such loads. +These challenges stemmed from the massive hardware requirements for simulations at such scales and traffic loads, which had to be run on a single machine under Kurtosis. +This led to inadequate sampling rates, message loss, and other data inconsistencies. +The system struggled to provide the computational power, memory capacity, and I/O throughput needed for smooth operations under such loads. In summary, while Wakurtosis successfully handled small-to-medium scales, with simulations in the range of 600 nodes and 10 msg/s and beyond exposed restrictive bottlenecks tied to the limitations of the underlying Kurtosis platform and constraints around single-machine deployment. @@ -31,41 +37,81 @@ In summary, while Wakurtosis successfully handled small-to-medium scales, with s Wakurtosis faced two fundamental challenges in achieving its goal of large-scale Waku protocol testing under the initial Kurtosis framework: #### Hardware Limitations -Kurtosis' constraint of running all simulations on a single machine led to severe resource bottlenecks approaching 1000+ nodes. Specific limitations included: +Kurtosis' constraint of running all simulations on a single machine led to severe resource bottlenecks approaching 1000+ nodes. +Specific limitations included: ##### CPU -To run the required parallel containers, our simulations demanded a minimum of 16 cores with 32 cores (64 threads) often employed. The essence of Wakurtosis simulations involved running multiple containers in parallel to mimic a network and its topology, with each container functioning as a separate node. Having the containers operate in parallel provided a basic yet realistic representation of network behavior. In this scenario, the CPU acts as the workhorse, needing to process the activities of every node simultaneously. Our computations indicated a need for at least 16 cores to ensure seamless simulations without lag or delays from overloading. However, even higher core counts could not robustly reach our target scale due to inherent single-machine limitations. Commercial constraints also exist regarding the maximum CPU cores available in a single machine. Ultimately, the single-machine approach proved insufficient for the parallelism required to smoothly simulate the intended network sizes. +To run the required parallel containers, our simulations demanded a minimum of 16 cores with 32 cores (64 threads) often employed. +The essence of Wakurtosis simulations involved running multiple containers in parallel to mimic a network and its topology, with each container functioning as a separate node. +Having the containers operate in parallel provided a basic yet realistic representation of network behavior. +In this scenario, the CPU acts as the workhorse, needing to process the activities of every node simultaneously. +Our computations indicated a need for at least 16 cores to ensure seamless simulations without lag or delays from overloading. +However, even higher core counts could not robustly reach our target scale due to inherent single-machine limitations. +Commercial constraints also exist regarding the maximum CPU cores available in a single machine. +Ultimately, the single-machine approach proved insufficient for the parallelism required to smoothly simulate the intended network sizes. ##### Memory -Memory serves as the temporary storage during simulations, holding data that's currently in use. Each container in our simulation had a baseline memory requirement of approximately 20MB RAM to operate efficiently. While this is minimal on a per-container basis, the aggregate demand could scale up significantly when operating over 10k nodes. However, even at full scale, memory consumption never exceeded 128GB, and remained manageable for the Wakurtosis simulations. So although combined memory requirements could escalate for massive simulations, it was never a major limiting factor for Wakurtosis itself or our hardware infrastructure. +Memory serves as the temporary storage during simulations, holding data that's currently in use. +Each container in our simulation had a baseline memory requirement of approximately 20MB RAM to operate efficiently. +While this is minimal on a per-container basis, the aggregate demand could scale up significantly when operating over 10k nodes. +Still, even at full scale, memory consumption never exceeded 128GB, and remained manageable for the Wakurtosis simulations. +So although combined memory requirements could escalate for massive simulations, it was never a major limiting factor for Wakurtosis itself or our hardware infrastructure. ##### Disk I/O throttling -Disk Input/Output (I/O) refers to the reading (input) and writing (output) of data in the system. In our scenario, the simulations created a heavy load on the I/O operations due to continuous data flow and logging activities for each container. As the number of containers (nodes) increased, the simultaneous read/write operations caused throttling, akin to a traffic jam, leading to slower data access and potential data loss. +Disk Input/Output (I/O) refers to the reading (input) and writing (output) of data in the system. +In our scenario, the simulations created a heavy load on the I/O operations due to continuous data flow and logging activities for each container. +As the number of containers (nodes) increased, the simultaneous read/write operations caused throttling, akin to a traffic jam, leading to slower data access and potential data loss. ##### ARP table exhaustion -Another important issue we encounteres is the exhaustion of the ARP table. The Address Resolution Protocol (ARP) is pivotal for routing, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. However, ARP tables have a size limit. With the vast number of containers running, we quickly ran into situations where the ARP tables were filled to capacity, leading to routing failures. +Another important issue we encounteres is the exhaustion of the ARP table. +The Address Resolution Protocol (ARP) is pivotal for routing, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. +However, ARP tables have a size limit. With the vast number of containers running, we quickly ran into situations where the ARP tables were filled to capacity, leading to routing failures. #### Kurtosis -The Kurtosis framework, though initially appearing to be a promising solution, presented multiple limitations when applied to large-scale testing. One of its major constraints was the lack of multi-cluster support, which restricted simulations to the resources of a single machine. This limitation became even more pronounced when the platform strategically deprioritized large-scale simulations, a decision seemingly influenced by specific partnerships. This decision effectively nullified any anticipated multi-cluster capabilities. +The Kurtosis framework, though initially appearing to be a promising solution, presented multiple limitations when applied to large-scale testing. +One of its major constraints was the lack of multi-cluster support, which restricted simulations to the resources of a single machine. +This limitation became even more pronounced when the platform strategically deprioritized large-scale simulations, a decision seemingly influenced by specific partnerships. +This decision effectively nullified any anticipated multi-cluster capabilities. -Further complicating the situation was Kurtosis's decision to discontinue certain advanced networking features that were previously critical for modeling flexible network topologies. Additionally, the platform lacked an intuitive mechanism to represent key Quality of Service (QoS) parameters, such as delay, loss, and bandwidth configurations. These constraints were exacerbated by limitations in the orchestration language used by Kurtosis, which added complexity to dynamic topology modeling. +Further complicating the situation was Kurtosis's decision to discontinue certain advanced networking features that were previously critical for modeling flexible network topologies. +Additionally, the platform lacked an intuitive mechanism to represent key Quality of Service (QoS) parameters, such as delay, loss, and bandwidth configurations. +These constraints were exacerbated by limitations in the orchestration language used by Kurtosis, which added complexity to dynamic topology modeling. -The array of hardware and software limitations imposed by Kurtosis had significant ramifications on our testing capabilities. The constraints primarily manifested in the inability to realistically simulate diverse network configurations and conditions. This inflexibility in network topologies was a significant setback. Moreover, when it came to protocol implementation, Kurtosis's approach was rather rudimentary. Relying on a basic gossip model, the platform missed capturing the nuances that are critical for deriving meaningful insights from the simulations. +The array of hardware and software limitations imposed by Kurtosis had significant ramifications on our testing capabilities. +The constraints primarily manifested in the inability to realistically simulate diverse network configurations and conditions. +This inflexibility in network topologies was a significant setback. +Moreover, when it came to protocol implementation, Kurtosis's approach was rather rudimentary. +Relying on a basic gossip model, the platform missed capturing the nuances that are critical for deriving meaningful insights from the simulations. ### The Pivot to Kubernetes and Shadow -To circumvent most of the limitations of our previous approach, we decided to make a strategic transition to Kubernetes, primarily drawn to its inherent capabilities for cluster orchestration and scaling. The major advantage that Kubernetes brings to the table is its robust support for multi-cluster simulations, allowing us to effectively reach 10K-node simulations with high granularity. The ongoing shift to Kubernetes is expected to leverage its significant advantage in multi-cluster simulation capabilities. Even though this transition demands a considerable architectural overhaul, we believe that the potential benefits of Kubernetes' flexibility and scalability are worth the effort. +To circumvent most of the limitations of our previous approach, we decided to make a strategic transition to Kubernetes, primarily drawn to its inherent capabilities for cluster orchestration and scaling. +The major advantage that Kubernetes brings to the table is its robust support for multi-cluster simulations, allowing us to effectively reach 10K-node simulations with high granularity. +The ongoing shift to Kubernetes is expected to leverage its significant advantage in multi-cluster simulation capabilities. +Even though this transition demands a considerable architectural overhaul, we believe that the potential benefits of Kubernetes' flexibility and scalability are worth the effort. -Alongside Kubernetes, we incorporated Shadow into our testing and simulation toolkit. Shadow's unique strength lies in its ability to run real application binaries on a simulated network, offering a high level of accuracy even at greater scales. With Shadow, we're confident in pushing our simulations beyond the 50K-node mark. Moreover, since Shadow employs an event-based approach, it not only allows us to achieve these scales but also opens up the potential for simulations that run faster than real-time scenarios. Additionally, Shadow provides out-of-the-box support for simulating different QoS parameters like delay, loss, and bandwidth configurations on the virtual network. +Alongside Kubernetes, we incorporated Shadow into our testing and simulation toolkit. +Shadow's unique strength lies in its ability to run real application binaries on a simulated network, offering a high level of accuracy even at greater scales. +With Shadow, we're confident in pushing our simulations beyond the 50K-node mark. +Moreover, since Shadow employs an event-based approach, it not only allows us to achieve these scales but also opens up the potential for simulations that run faster than real-time scenarios. +Additionally, Shadow provides out-of-the-box support for simulating different QoS parameters like delay, loss, and bandwidth configurations on the virtual network. -By combining both Kubernetes and Shadow, we aim to substantially enhance our testing framework. Kubernetes, with its multi-cluster simulation capabilities, will offer a wider array of practical insights during large-scale simulations. On the other hand, Shadow's theoretical modeling strengths allow us to develop a deeper comprehension of potential behaviors in even larger network environments. +By combining both Kubernetes and Shadow, we aim to substantially enhance our testing framework. +Kubernetes, with its multi-cluster simulation capabilities, will offer a wider array of practical insights during large-scale simulations. +On the other hand, Shadow's theoretical modeling strengths allow us to develop a deeper comprehension of potential behaviors in even larger network environments. #### Conclusion -The journey to develop Wakurtosis has underscored the inherent challenges in large-scale protocol simulation. While the Kurtosis platform initially showed promise, it quickly struggled to handle the scale and features we were aiming to. Still, Wakurtosis proved a useful tool for analysing the protocol at moderate scales and loads. +The journey to develop Wakurtosis has underscored the inherent challenges in large-scale protocol simulation. +While the Kurtosis platform initially showed promise, it quickly struggled to handle the scale and features we were aiming to. +Still, Wakurtosis proved a useful tool for analysing the protocol at moderate scales and loads. -These limitations forced a pivot to a hybrid Kubernetes and Shadow approach, promising enhanced scalability, flexibility, and accuracy for large-scale simulations. This experience emphasized the importance of anticipating potential bottlenecks when scaling up complexity. It also highlighted the value of blending practical testing and theoretical modeling to gain meaningful insights. +These limitations forced a pivot to a hybrid Kubernetes and Shadow approach, promising enhanced scalability, flexibility, and accuracy for large-scale simulations. +This experience emphasized the importance of anticipating potential bottlenecks when scaling up complexity. +It also highlighted the value of blending practical testing and theoretical modeling to gain meaningful insights. -Integrating Kubernetes and Shadow represents a renewed commitment to pushing the boundaries of what is possible in large-scale protocol simulation. This aims not just to rigorously stress test Waku, but to set a precedent for how to approach, design, and execute such simulations overall going forward. Through continuous learning, adaptation, and innovation, we remain dedicated to achieving the most accurate, reliable, and extensive simulations possible. +Integrating Kubernetes and Shadow represents a renewed commitment to pushing the boundaries of what is possible in large-scale protocol simulation. +This aims not just to rigorously stress test Waku, but to set a precedent for how to approach, design, and execute such simulations overall going forward. +Through continuous learning, adaptation, and innovation, we remain dedicated to achieving the most accurate, reliable, and extensive simulations possible. From 0a86bd07961529f7746a143aa72be9e0780ec0cd Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 07:49:03 +0000 Subject: [PATCH 04/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 8eb0d45d..3165f15a 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -14,7 +14,6 @@ toc_max_heading_level: 5 -**TL;DR** The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. Here we will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. From 25ea263da8df71068a1bc103f77fda44fc2b4015 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 07:49:17 +0000 Subject: [PATCH 05/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 3165f15a..ae0cfe9e 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -14,7 +14,8 @@ toc_max_heading_level: 5 -The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. +The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales +but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. Here we will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. ### Introduction From 2c71725a02a052f8a5f75445276ff9e6b14692d1 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 07:49:26 +0000 Subject: [PATCH 06/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index ae0cfe9e..9527ac95 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -16,7 +16,7 @@ toc_max_heading_level: 5 The Wakurtosis framework aimed to simulate and test the behaviour of the Waku protocol at large scales but faced a plethora of challenges that ultimately led us to pivot to a hybrid approach that relies on Shadow and Kubernetes for greater reliability, flexibility, and scaling. -Here we will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. +This blog post will discuss some of the most important issues we faced and their potential solutions in a new hybrid framework. ### Introduction Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. From d5919f965e9de4e15ca889f6336ad7e9cfcef63c Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 07:50:22 +0000 Subject: [PATCH 07/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 1 - 1 file changed, 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 9527ac95..74084b52 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -88,7 +88,6 @@ Relying on a basic gossip model, the platform missed capturing the nuances that To circumvent most of the limitations of our previous approach, we decided to make a strategic transition to Kubernetes, primarily drawn to its inherent capabilities for cluster orchestration and scaling. The major advantage that Kubernetes brings to the table is its robust support for multi-cluster simulations, allowing us to effectively reach 10K-node simulations with high granularity. -The ongoing shift to Kubernetes is expected to leverage its significant advantage in multi-cluster simulation capabilities. Even though this transition demands a considerable architectural overhaul, we believe that the potential benefits of Kubernetes' flexibility and scalability are worth the effort. Alongside Kubernetes, we incorporated Shadow into our testing and simulation toolkit. From 4e836df7b6d28669d63e79ddd22169f29d5b3b0c Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 08:09:18 +0000 Subject: [PATCH 08/20] Added link to shadow --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 74084b52..f1e7fbe3 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -90,7 +90,7 @@ To circumvent most of the limitations of our previous approach, we decided to ma The major advantage that Kubernetes brings to the table is its robust support for multi-cluster simulations, allowing us to effectively reach 10K-node simulations with high granularity. Even though this transition demands a considerable architectural overhaul, we believe that the potential benefits of Kubernetes' flexibility and scalability are worth the effort. -Alongside Kubernetes, we incorporated Shadow into our testing and simulation toolkit. +Alongside Kubernetes, we incorporated [https://shadow.github.io/](Shadow) into our testing and simulation toolkit. Shadow's unique strength lies in its ability to run real application binaries on a simulated network, offering a high level of accuracy even at greater scales. With Shadow, we're confident in pushing our simulations beyond the 50K-node mark. Moreover, since Shadow employs an event-based approach, it not only allows us to achieve these scales but also opens up the potential for simulations that run faster than real-time scenarios. From 1de152fe670f6c3087ae9bc08282e020d46afbd0 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 08:10:51 +0000 Subject: [PATCH 09/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index f1e7fbe3..0dff7e07 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -30,7 +30,7 @@ These challenges stemmed from the massive hardware requirements for simulations This led to inadequate sampling rates, message loss, and other data inconsistencies. The system struggled to provide the computational power, memory capacity, and I/O throughput needed for smooth operations under such loads. -In summary, while Wakurtosis successfully handled small-to-medium scales, with simulations in the range of 600 nodes and 10 msg/s and beyond exposed restrictive bottlenecks tied to the limitations of the underlying Kurtosis platform and constraints around single-machine deployment. +In summary, while Wakurtosis successfully handled small-to-medium scales, simulations in the range of 600 nodes and 10 msg/s and beyond exposed restrictive bottlenecks tied to the limitations of the underlying Kurtosis platform and constraints around single-machine deployment. ### Key Challenges with the Initial Kurtosis Approach From 701c96358a956f00d07b20ea06e2775fb14f33d7 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 08:11:13 +0000 Subject: [PATCH 10/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 0dff7e07..83073950 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -20,7 +20,8 @@ This blog post will discuss some of the most important issues we faced and their ### Introduction Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. -While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. +While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, +largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. Specifically, the most significant issues arose during middle-scale simulations of 600 nodes and high-traffic patterns exceeding 100 msg/s. In these scenarios, most simulations either failed to complete reliably or broke down entirely before finishing. From 1577a7c1fa049b53e6593cf081f1f298756a07e0 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Wed, 1 Nov 2023 08:16:17 +0000 Subject: [PATCH 11/20] Added link to Kurtosis --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 83073950..8e982e4b 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -21,7 +21,7 @@ This blog post will discuss some of the most important issues we faced and their ### Introduction Wakurtosis sought to stress-test Waku implementations at large scales over 10K nodes. While it achieved success with small-to-medium scale simulations, running intensive tests at larger scales revealed major bottlenecks, -largely stemming from inherent restrictions imposed by Kurtosis – the testing and orchestration framework Wakurtosis is built on top of. +largely stemming from inherent restrictions imposed by [Kurtosis](https://www.kurtosis.com/) – the testing and orchestration framework Wakurtosis is built on top of. Specifically, the most significant issues arose during middle-scale simulations of 600 nodes and high-traffic patterns exceeding 100 msg/s. In these scenarios, most simulations either failed to complete reliably or broke down entirely before finishing. From b063c4cc3b0ee2837ae7dbbb81f0dd3858759f4e Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 07:35:40 +0000 Subject: [PATCH 12/20] Review --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 8e982e4b..316eb050 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -111,7 +111,7 @@ This experience emphasized the importance of anticipating potential bottlenecks It also highlighted the value of blending practical testing and theoretical modeling to gain meaningful insights. Integrating Kubernetes and Shadow represents a renewed commitment to pushing the boundaries of what is possible in large-scale protocol simulation. -This aims not just to rigorously stress test Waku, but to set a precedent for how to approach, design, and execute such simulations overall going forward. +This aims not just to rigorously stress test Waku and other P2P network nodes, but to set a precedent for how to approach, design, and execute such simulations overall going forward. Through continuous learning, adaptation, and innovation, we remain dedicated to achieving the most accurate, reliable, and extensive simulations possible. From c06a2e762db51941a55289c85abbbee1eb7325b2 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 07:40:19 +0000 Subject: [PATCH 13/20] "For many scenarios ..." --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 316eb050..c43a27e4 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -42,7 +42,7 @@ Kurtosis' constraint of running all simulations on a single machine led to sever Specific limitations included: ##### CPU -To run the required parallel containers, our simulations demanded a minimum of 16 cores with 32 cores (64 threads) often employed. +To run the required parallel containers, our simulations demanded a minimum of 16 cores. For many scenarios we scaled up to 32 cores (64 threads). The essence of Wakurtosis simulations involved running multiple containers in parallel to mimic a network and its topology, with each container functioning as a separate node. Having the containers operate in parallel provided a basic yet realistic representation of network behavior. In this scenario, the CPU acts as the workhorse, needing to process the activities of every node simultaneously. From 5d85d8c9c78c01b0d5e2a069aa9e682692e06982 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 07:50:22 +0000 Subject: [PATCH 14/20] "These challenges stemmed from the massive ..." --- rlog/2023-09-26-wakurtosis-retro.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index c43a27e4..a8100f3c 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -27,7 +27,8 @@ Specifically, the most significant issues arose during middle-scale simulations In these scenarios, most simulations either failed to complete reliably or broke down entirely before finishing. Even when simulations managed to fully run, results were often skewed due to the inability of the infrastructure to inject the traffic. -These challenges stemmed from the massive hardware requirements for simulations at such scales and traffic loads, which had to be run on a single machine under Kurtosis. +These challenges stemmed from the massive hardware requirements for simulations. +Despite Kurtosis being relatively lightweight, it requires that the simulation be run on a single machine, which presents considerable hardware challenges given the scale and traffic load of the simulations. This led to inadequate sampling rates, message loss, and other data inconsistencies. The system struggled to provide the computational power, memory capacity, and I/O throughput needed for smooth operations under such loads. From f0ec723cc8246b8df7c1f8bc8f85ff423a5fed0d Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 08:15:40 +0000 Subject: [PATCH 15/20] Line 48 rephrasing --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index a8100f3c..dc141f78 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -45,7 +45,7 @@ Specific limitations included: ##### CPU To run the required parallel containers, our simulations demanded a minimum of 16 cores. For many scenarios we scaled up to 32 cores (64 threads). The essence of Wakurtosis simulations involved running multiple containers in parallel to mimic a network and its topology, with each container functioning as a separate node. -Having the containers operate in parallel provided a basic yet realistic representation of network behavior. +Operating the containers concurrently—as opposed to a sequential, one-at-a-time approach—allowed us to simulate network behavior with greater fidelity, closely mirroring the simultaneous node interactions that naturally occur within real-world network infrastructures. In this scenario, the CPU acts as the workhorse, needing to process the activities of every node simultaneously. Our computations indicated a need for at least 16 cores to ensure seamless simulations without lag or delays from overloading. However, even higher core counts could not robustly reach our target scale due to inherent single-machine limitations. From eaf6d75be3dd6af8807bc32383c5a8d25172fe90 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 09:48:44 +0000 Subject: [PATCH 16/20] Line 70 edit --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index dc141f78..435cae96 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -66,7 +66,7 @@ As the number of containers (nodes) increased, the simultaneous read/write opera ##### ARP table exhaustion Another important issue we encounteres is the exhaustion of the ARP table. -The Address Resolution Protocol (ARP) is pivotal for routing, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. +The Address Resolution Protocol (ARP) is pivotal for delivering Ethernet frames, translating IP addresses to MAC addresses so data packets can be correctly delivered within a local network. However, ARP tables have a size limit. With the vast number of containers running, we quickly ran into situations where the ARP tables were filled to capacity, leading to routing failures. From 22b4bccfe461a712f179495ff3cf56037452b1a4 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 09:53:27 +0000 Subject: [PATCH 17/20] Update rlog/2023-09-26-wakurtosis-retro.mdx Co-authored-by: kaiserd <1684595+kaiserd@users.noreply.github.com> --- rlog/2023-09-26-wakurtosis-retro.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index dc141f78..ab3633c5 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -83,7 +83,7 @@ These constraints were exacerbated by limitations in the orchestration language The array of hardware and software limitations imposed by Kurtosis had significant ramifications on our testing capabilities. The constraints primarily manifested in the inability to realistically simulate diverse network configurations and conditions. This inflexibility in network topologies was a significant setback. -Moreover, when it came to protocol implementation, Kurtosis's approach was rather rudimentary. +Moreover, when it came to protocol implementation, Kurtosis' approach was rather rudimentary. Relying on a basic gossip model, the platform missed capturing the nuances that are critical for deriving meaningful insights from the simulations. ### The Pivot to Kubernetes and Shadow From 21f9b3385d9626dfe354bec5c1bc284ad58c6fb6 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 10:28:47 +0000 Subject: [PATCH 18/20] Added references section --- rlog/2023-09-26-wakurtosis-retro.mdx | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 78335ae8..330811b0 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -115,4 +115,12 @@ Integrating Kubernetes and Shadow represents a renewed commitment to pushing the This aims not just to rigorously stress test Waku and other P2P network nodes, but to set a precedent for how to approach, design, and execute such simulations overall going forward. Through continuous learning, adaptation, and innovation, we remain dedicated to achieving the most accurate, reliable, and extensive simulations possible. +#### References + +1. [Kurtosis Framework](https://www.kurtosis.com/) +2. [The Shadow Network Simulator](https://shadow.github.io/) +3. [Kubernetes](https://kubernetes.io/docs/) +4. [Waku Protocol](https://rfc.vac.dev/spec/10/) +5. [Wakurtosis](https://github.com/vacp2p/wakurtosis) +6. [Address Resolution Protocol (ARP)](https://datatracker.ietf.org/doc/html/rfc826) From 8719d818899deabddff690b924023453d1f370ea Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 10:33:58 +0000 Subject: [PATCH 19/20] Added comment on shadow limitations --- rlog/2023-09-26-wakurtosis-retro.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 330811b0..1ac869a6 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -93,8 +93,8 @@ The major advantage that Kubernetes brings to the table is its robust support fo Even though this transition demands a considerable architectural overhaul, we believe that the potential benefits of Kubernetes' flexibility and scalability are worth the effort. Alongside Kubernetes, we incorporated [https://shadow.github.io/](Shadow) into our testing and simulation toolkit. -Shadow's unique strength lies in its ability to run real application binaries on a simulated network, offering a high level of accuracy even at greater scales. -With Shadow, we're confident in pushing our simulations beyond the 50K-node mark. +Shadow's unique strength lies in its ability to run real application binaries on a simulated network, offering a high level of accuracy even at greater scales. However, this approach also has limitations, as it does not accurately simulate CPU times and resource contention, which can lead to less realistic performance modeling in scenarios where these factors are significant. +With Shadow, we are hopefull in pushing our simulations beyond the 50K-node mark. Moreover, since Shadow employs an event-based approach, it not only allows us to achieve these scales but also opens up the potential for simulations that run faster than real-time scenarios. Additionally, Shadow provides out-of-the-box support for simulating different QoS parameters like delay, loss, and bandwidth configurations on the virtual network. From bf8802b82c89df6f630c4dbdaf9e084be6804237 Mon Sep 17 00:00:00 2001 From: Daimakaimura <17453177+Daimakaimura@users.noreply.github.com> Date: Thu, 2 Nov 2023 10:35:45 +0000 Subject: [PATCH 20/20] Fixed references formating to match other posts --- rlog/2023-09-26-wakurtosis-retro.mdx | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/rlog/2023-09-26-wakurtosis-retro.mdx b/rlog/2023-09-26-wakurtosis-retro.mdx index 1ac869a6..a419838e 100644 --- a/rlog/2023-09-26-wakurtosis-retro.mdx +++ b/rlog/2023-09-26-wakurtosis-retro.mdx @@ -117,10 +117,10 @@ Through continuous learning, adaptation, and innovation, we remain dedicated to #### References -1. [Kurtosis Framework](https://www.kurtosis.com/) -2. [The Shadow Network Simulator](https://shadow.github.io/) -3. [Kubernetes](https://kubernetes.io/docs/) -4. [Waku Protocol](https://rfc.vac.dev/spec/10/) -5. [Wakurtosis](https://github.com/vacp2p/wakurtosis) -6. [Address Resolution Protocol (ARP)](https://datatracker.ietf.org/doc/html/rfc826) +- [Kurtosis Framework](https://www.kurtosis.com/) +- [The Shadow Network Simulator](https://shadow.github.io/) +- [Kubernetes](https://kubernetes.io/docs/) +- [Waku Protocol](https://rfc.vac.dev/spec/10/) +- [Wakurtosis](https://github.com/vacp2p/wakurtosis) +- [Address Resolution Protocol (ARP)](https://datatracker.ietf.org/doc/html/rfc826)