- Stellar’s network remained online with support from validators who managed to stay active after failure.
- The point of failure has not been determined, the investigation remains active.
- Most nodes in the Stellar network were not breached, with the exception of those operated by Lobstr and the SDF.
The Stellar network and XLM holders experienced a shock yesterday when a bug was reported that forced Bitfinex to stop its withdrawals. However, 19:00 UTC on April 6, Stellar Development Foundation (SDF) engineering team was able to complete repair works on the Horizon cluster API and SDF validators. To provide more details, the organization now released a report.
As clarified within the blog post, Stellar’s network remained “online” with support for some validators who were not unaffected by the failure and were able to process transactions on the blockchain without problems. The SDF stated:
(…) which is just the way a decentralized network is intended to work, and many of those validators continued to publish archives that keep track of ledger history, and that allow the halted nodes to fill in gaps when they need to recover from downtime.
Despite this, some exchanges such as Bitfinex stopped withdrawals with XLM, as reported by its CTO Paolo Ardoino via Twitter. SDF engineers managed to mitigate the failure by “quickly bringing new validators and a new instance of Horizon” back online.
What was the cause of the Stellar network’s problems?
The precise reasons that produced the failure have not yet been determined. Preliminary investigation indicates that it may have been caused by a ledger or an operation on a specific ledger. According to the SDF, the “majority” of nodes in the network did not experience the failure, but some operated by this organization and by a wallet called Lobstr.
(…) because there is sufficient validator redundancy, the network continued to function as normal despite the temporary unavailability of SDF’s infrastructure. While we realize this was inconvenient for organizations that rely on public network access, it also demonstrates that the Stellar network persists independent of SDF validators.
In total, the affected nodes sustained downtime for around 10 hours. The failure was detected by the SDF’s infrastructure monitoring operation, composed of Runscope and Prometheus alerts. The team, according to the report, responded “immediately”.
The organization emphasized that the network was not affected or stopped. Validators in sync managed to continue processing transactions in less than 5 seconds. However, the SDF clarifies:
However, some nodes, including those run by SDF and Lobstr, ceased to process transactions for about 9 hours. If you access the network via one of the affected nodes, you were not able to access the network to submit transactions during that time. If, however, you rely on one of the many unaffected nodes, your network access continued unabated
During the failure, the history of some ledgers was lost, but the engineering team is working to reintegrate it with support from organizations within tier 1. These organizations are responsible for publishing the complete network history in anticipation of situations such as the one presented yesterday.
It is estimated that about 43 ledgers or 5 minutes of network history will need to be reinstated. Furthermore, the SDF reported that withdrawals on all exchange platforms have been reinstated.