- The downtime of the IOTA Tangle last week was caused by an external spam test.
- The test was probably conducted by a larger company that may be interested in adopting IOTA.
As reported by CNF, IOTA mainnet stood still for more than half a day from November 12 to 13. While there were rumors of a possible cause for the Tangle’s downtime, at the time of the event it was only clear that no milestones had been generated by the coordinator. In an article published yesterday on GitHub, the IOTA Foundation has now provided a detailed report on the events.
According to this, the incident occurred on November 12, 2020 at 19:45 UTC and was fully resolved on November 13, 2020 at 12:37 UTC. The reason for the Tangle to come to a standstill was a spam test initiated by the community, whereby the test was coordinated in advance with the IOTA Foundation. As the organization behind IOTA states, “no single change in node software” or any other component of the network was responsible for the incident.
During the test, the network reached over 1,000 TPS. According to the IOTA Foundation, the network collapsed during the final phase of the test, when “the infrastructure hosting some of the critical components of the network reached its IOPS processing limits.” In terms of the cause, the IOTA Foundation further explained:
The infrastructure and by that subsequently the deployed software were not able to process all the transactions coming into the network. This resulted in the coordinator emitting milestones for transactions that were not persisted due to the aforementioned IO bottleneck.
Due to the load on the network as a whole, the Coordinator was unable to gossip the necessary transactions to solidify the emitted milestones before it was halted, leading to the entire network not being able to solidify.
To resolve the incident, a new version of Hornet was released on November 13, and the infrastructure was updated to the new version of the node software. According to IOTA Foundation, the incident and the preceding spam event helped to refine the internal “response protocols”. It also learned that “the IOPS restrictions on the infrastructure around the coordinator need to be improved”.
As a result, a redundancy mechanism has been set up to ensure the storage of all transactions sent to and from the coordinator. In addition, a mechanism has been implemented to ensure that the coordinator reacts to the state of the network and automatically interrupts the output of milestones in situations where the coordinator is approaching the limit of hardware resources.
Details of the IOTA community spam test
Via IOTA’s Discord channel, an organizer of the spam test also revealed more details about the specifications of the test. According to this, over 2,000 nodes were in use. On average, 1,892 nodes were fully synchronized during the test and distributed across 24 different locations worldwide.
In addition to explaining the procedure for the spam test, the organizer also hinted that there might be “a little bit more behind” the test, specifically a larger company:
There were 3 waves, 5 minutes about 600-750, 5 minutes break to calm down the network and potentially get back into sync for external nodes, 10 minutes about 1000TPS, again 5 minutes break and then actually 30 minutes 1800-2000 TPS were planned, but here the monitoring metrics left us and the mentioned problem caught up with us.
Apart from these statistics, which I don’t find very interesting, maybe another approach …is it not more exciting to think about why someone would do such a spam test? So much for that: Not just for the fun of it. I mean it costs money, a lot of money, you can’t just take these orders of magnitude of resources. Isn’t it nice when you know that there is a little bit more behind it?! Future adaptation incomming. Therefore a small thanks to my company.