Cloudflare Enterprise Plan for Subscan from August 2021 to October 2021
Subscan is an aggregate high-precision blockchain explorer for Substrate-based chains. We have been supporting several influential blockchains, including Polkadot, Kusama, Rococo, and Westend for over a year. For many users, Subscan is their first stop in the blockchain world. As we continue to deliver new features and support more blockchains, our user base is on the steady rise. Therefore, high robustness and availability are currently among our major concerns.
Why We Can't Do Without a Firewall?
In the past few months, we have become one of many victims of multiple DDoS attacks. For example, here are some of the screenshots from the Google Cloud Platform Load Balancer and our Nginx monitoring system during one of the attacks:
The attacker generated much more packets and requests per second than usual. This forced us to find a firewall to protect Subscan and filter out DDoS traffic. In the next few days, we enabled Cloudflare as our firewall and CDN provider. However, attackers still point to Subscan as a valuable target. The following chart from Cloudflare shows that it successfully prevented many potential malicious attempts in a day:
Fortunately, we are able to mitigate the impact of those attempts and Subscan has not had any major outage due to DDoS attacks since using Cloudflare.
Why We Need the Cloudflare Enterprise Plan?
Up until June, we had been utilizing Cloudflare for some months. A considerable number of users reported multiple service interruptions in certain areas during the month. Benefit from the complete monitoring pipeline built by Subscan DevOps team, we noticed and observed the issues shortly after it was raised. For example, here are two dashboard snapshots on June 2nd and June 8th, which showing a very similar phenomenon:
With careful diagnosis and analysis, we managed to pinpoint the cause. They coincided with some Cloudflare incidents of the Pro plan. The screenshot below recorded the announcement and postmortem by our DevOps team explaining the incident at that time:
After consulting with Cloudflare sales and support, we realized that the enterprise plan is a great choice that meets our high availability requirements, just as they suggested. Argo smart routing, one core feature included in the enterprise plan, can help us re-route the traffic during zonal network failures. The enhanced WAF rules can be even more accurate and effective in blocking the attackers. Beyond that, here are also several features that we can utilize to improve Subscan's accessibility and reliability.
What Other Features Can the Cloudflare Enterprise Plan Offer to Us?
Opting into the Cloudflare enterprise plan will bring about other advantages besides higher availability.
For example, Argo smart routing promises an estimated increase of 30% in network latency.
Also, a relatively large proportion of Subscan users comes from mainland China, where Internet access can be unpredictable sometimes. With the Cloudflare enterprise plan, we can utilize their China network infrastructure to offer a faster and more reliable experience for visitors inside China.
The technical support for Cloudflare enterprise users is apparently better than other plans as well. We will be able to contact the support engineers, if there is any incident, via emails and even phone calls rather than just tickets. This allows us to become more proactive and advantageous in providing highly avaialble services to Subscan users in the Polkadot ecosystem.
The Financial Support
We have received financial support from the Polkadot treasury during the past month, and such support proved fruitful. This proposal is for financing Subscan's upgrade to the Cloudflare enterprise plan, which amounts to 967 DOTs. This includes 5000 $USD per month, and 3 months in total (August 2021 ~ October 2021), according to the EMA30 price 1 DOT ≈ 15.5 $USD
.
The quotation of the enterprise plan for Subscan can be found on GitHub.
In case we missed any information or you have any questions, please feel free to send us an inquiry.
References
- Cloudflare Enterprise plan: https://www.cloudflare.com/plans/enterprise/
- Cloudflare Argo: https://www.cloudflare.com/products/argo-smart-routing/
- Cloudflare incident on June 2nd: https://www.cloudflarestatus.com/incidents/zbzjv8sm3g94
- Cloudflare incident on June 8th: https://www.cloudflarestatus.com/incidents/2hnqwq90dlrk
The source code of this proposal is hosted on GitHub.
Comments (4)
While subscan is admitedly a centralized service, delying on Cloudflare sounds like yet another dependency on centralized services and yet another NON trustless intermediate.
If Cloudflare goes down (and that happens!), subscan goes down.
If Cloudflare gets hacked or becomes malicious itself, non of the data served by subscan can be trusted any more :(
The more centralized services subscan depends on, the higher the chances for the service to go down without having any say.
Did the team investigate options that would not ONLY rely on yet another centralized solution?
On a sidenote:
is not a surprise, they will likely not send you to other options and they will likely not tell you to go for a smaller plan... They are "sales".
Hello Chevdor,
Thank you for your judgment! My name is Way. I'm the DevOps engineer from Subscan team to answer the questions and give the community more details on the story of Subscan using Cloudflare.
We had other attempts before choosing Cloudflare:
In fact, we had very similar concerns as you. Before we migrate onto Cloudflare, we've tried a few simpler solutions, such as increasing the number of Nginx instances, optimize the Nginx parameters, enable rate-limiting by client IP address, adding a backup Google Cloud Platform Load Balancer ingress IP and so on. We also consulted Nginx's official blog Mitigating DDoS Attacks with NGINX and NGINX Plus, to make sure we followed the best practices to tune Nginx.
But the effect didn't look good. Even we redirected the traffic to the backup load balancer when the primary one is down, the service could have been unavailable for more than a while, because the backup became another target of the attackers shortly.
We have a complete incident postmortem which includes our attempts while the past attacks. However, I'm afraid we couldn't make it public since it was written in Chinese and contains some sensitive information that is related to monitoring and employment of the company. If the report is helpful to the proposal, we would be happy to provide it privately.
We had a research of other solutions while looking for a firewall provider:
We initially tend to use the DDoS mitigation service provided by Google Cloud Platform HTTP Load Balancer. However, the problems are:
Our second option is external firewall/CDN providers, for example, Akamai. The problems are:
Using an open-source solution and software, deploy our own network infrastructure and firewall is another choice as well. The problems are:
We consulted with sales AND support. Our DevOps engineers had multiple meetings with the technical support engineers, which focus on the tech detail of the features included in the enterprise plan. We especially discussed whether the corresponding functions can solve existing problems. This helps us made an independent decision.
The enterprise plan is also one of the solutions we found proactively after our research mentioned above. Basically, we had conversations with Cloudflare to ask "can the feature XX address the specific issue we had", not "how we can address the issue". We won't talk with them if any cheaper ideas work well, as the Pro plan we've been using for months.
Subscan currently cannot deprecate Cloudflare. According to the facts above, we are not able to remove the firewall, or say, Cloudflare:
The explorers are the most intuitive experience for end-users. If Subscan's service is impacted due to attacks, users may relate that and blockchain network issues, not just the explorers. Continuing to use the Pro plan instead of adopting the Enterprise plan or deploying other solutions could harm the experience and feelings of the users. I believe this is something we and the Polkadot treasury do not want to see. Therefore, we have aligned interests in this regard.
From my personal perspective as a DevOps engineer, Subscan definitely can rely on less 3rd party services or dependencies to be more decentralized, like hiring network engineers and building the network infrastructure. But being a relatively centralized blockchain explorer, the economics benefits of being completely free from external dependencies are too low. We have to somehow figure out and practice the "balance". I think using Cloudflare as a CDN and firewall provider for Subscan doesn't break this balance. Although if in a very low chance that Cloudflare make any malicious move, whether proactive or passive due to being hacked, we can quickly change the provider from Cloudflare to any other available options (e.g. Akamai). Subscan still has the control and any data rows stored in Subscan's database won't be tampered with. Utilizing Anycast provided by Cloudflare can reduce the risk of inaccessibility from the whole global network once if our single ingress endpoint has network failures, which is more decentralized.
I hope this helps answer your questions!
Regards,
Way