Polkassembly Logo

Create Pencil IconCreate
OpenGov

Notice: Polkadot has migrated to AssetHub. Balances, data, referenda, and other on-chain activity has moved to AssetHub.Learn more

View All Medium Spender

substrate-etl Retroactive funding: Q3+Q4 2022-Q1+Q2 2023

inMedium Spender
2 years ago
data
infrastructure
analytics
Executed

substrate-etl is a large scale index of Polkadot + Kusama and their parachains based on Polkaholic.io, a multichain block explorer/indexer. Using this data warehouse, anyone can query decoded blocks, extrinsics, events, calls across both ecosystems.

We onboarded Polkadot Data into Google Public Datasets, announced in September 2023:

  • Enhancing Google Cloud’s blockchain data offering with 11 new chains in BigQuery
  • Polkadot - Harness the power of Polkadot's BigQuery Public Dataset

The substrate-etl index results in two large datasets (crypto_polkadot and crypto_kusama) covering the 2 relay chains and over 70 parachains. This represents Polkadot/Kusama alongside other Web3 ecosystems and supports complex large scale analyses with SQL using BigQuery's lightning-fast query performance on extremely large datasets. Having substrate-etl enables Dune-like analytics to guide business intelligence, inform engineering decision making, and showcase ecosystem growth.

Power users include Parity Data (led by Karim Jedda and Pavla Mijic), Subwallet/DotInsights, Web3metrics,
Dolpha.com, ChainViz and many data analysts within parachains and outside the ecosystem.

Quick Start

For a full guide, see substrate-etl, but you can get started with the following Analytics Hub links:

  • Polkadot on Google Analytics Hub
  • Kusama on Google Analytics Hub

Roadmap

In addition to continuing support for substrate-etl/Polkaholic.io for existing and future chains, our plans for 2024 are to:

  • chart a course within the Decentralized Futures for 2024, supporting Web3 Foundation's Data Analysis Tools for Substrate-based Blockchains (e.g. traces and staking), chart a course for substrate-etl + Polkaholic APIs to support Dune Analytics Style Data Service for Polkadot/Kusama with leading teams in the ecosystem
  • develop + support a Polkadot Builder Collective with a "builders supporting builders" orientation
  • build support for CoreJam+CorePlay+DA and extend support for ink!/WASM contracts
  • extend substrate-etl to support raw + fully-decoded EVM datasets (transactions+logs), enabling Substrate+EVM hybrid chain analytics, as more parachains incorporate hybrid chain designs

Proposal Details

Requested: 99,025 DOT

Proposal Details here

Proposal Costs here

About Colorful Notion

Colorful Notion is led by Sourabh Niyogi and Michael Chung, based in the SF Bay Area. We have been active in the Polkadot ecosystem over the last 2 years and look forward to building with others in CoreJam/Polkadot 2.0’s “map reduce” ubiquitous computing future.

We are grateful for everyone who has helped us and hope we can pay it forward by helping others

Comments (7)

2 years ago

Your web site seems so rough and why we need you, we already have Subscan.

2 years ago

Thanks for your note. However, see this for an alternative viewpoint. We have the highest respect for Subscan and have focussed on the substrate-etl / indexing + analytics instead of replicating their level of polish.

To address the roughness differently, we are looking to work with new partners like ChainIDE (after a successful collaboration with them on ink! WASM Contract Explorer this summer), and in general designing a strategy for dealing with the complexity of hybrid Substrate+EVM chains, CoreTime/CoreJam/CorePlay, and more of a Dune Analytics type future for Polkaholic.io which requires a data warehouse backend.

On the "why we need X, we already have Y", I am so glad the Polkadot founders didn't listen to the people who said "why we do we need Polkadot, we already have Ethereum" -- they didn't and our subsection of the world is better for it. Its always important for there to be people who push the boundaries to do new things and for me, we see our last year as training us to do new things in the CoreXYZ future and I can't convey enough how committed we are to doing something with more impact than just building a block explorer with a little more polish and indexer alone. We would like to build high growth high scale solutions for millions of people here and did not show up to build a web site.

2 years ago

Hi,

Thanks for the proposal, some questions below.

  1. Have you ever been funded by the W3F, and if so, could you please provide the details including any open/undelivered applications?
  2. Can you please provide the details of any treasury grants you received for this or related projects and the related referenda? Also related to this, how is the team related to Polkaholic?
  3. You have two more full-time people on your team as can be seen on your costs sheet, but the sheet only has initials. Can you provide their details? Also it would be great to get to get to know more about the web presence of all 4 team members.
  4. We're already in Q4'23. Why wasn't Q3 '23 included in the proposal? And you probably have an estimation for Q3&Q4 summed up. It would be good if you can share that too so we can have an overall up-to-date yearly cost.
  5. Can you provide the source code repository? I went through the documents, sorry if I missed it.
  6. I have previously tried and accessed the data sets on Google, and APIs on Polkaholics and found the XCM API quite useful. Yet the data sets on Google seemed to be quite limited in scope. Is there a documentation of the data sets and the extent of the data they store? For example, can one observe the change of the staked amount for a nominator over time?

Overall, although I find the direction useful, your costs seem to be exceeding 600K USD per year, even more if you have received other funding either from the treasury or the W3F, and I find this total very unconvincing for the product provided.

Thanks,
Kutsal | Helikon

2 years ago

@helikon  Very awesome questions

  1. (a) We have an open W3F application on Deep Account Analytics and delivered Milestone 1 + Milestone 2 for $30K. The W3F grant process does not support running production applications with high compute/storage to serve the community, but this may change or not. (b) We have completed a ChainIDE+Polkaholic WASM Contract Explorer Integration.
    We worked on 1(a)+(b) in Q2+Q3 2023, and adjusted our cost model for Q2 2023 accordingly (and would adjust Q3 2023/Q4 2023 if we have to do this Treasury). I do think its important that you ask this type of accountability question and check the math so there isn't any double counting by any team, I do appreciate what you are doing, its very important.

  2. We did not receive any previous Treasury grants for this. We did attempt one Polkaholic Treasury proposal in Kusama OpenGov1 (got 8 AYEs and 0 NAYs, insufficient to pass, 4 council members were too busy) but we decided after decoded 2022 to focus on the substrate-etl/analytics side and work with Parity Data on a Polkadot Data Alliance bounty at Parity Data's direction but it did not materialize this Spring/Summer and given the "decentralize Parity" it is on hold until decentralisation activities take concrete shape.

  3. TJ+FN have no web presence -- but since you ask, TJ is our systems engineer Tapas who keeps this running -- runs the Google Cloud with a sizeable GKE cluster of substrate archive nodes+crawlers; FN (Francesca is my wife) makes sure Google + our team gets paid is part-time admin but in a literal sense has been funding this and is our food+water+cloud. Some people are amazed we can do so much with a 4 person team but we are hardly the only teams like this.

  4. Q3 involved SN+MC (a) working on W3F grants referenced in (1)(a); (b) the inkubator program referenced in (1)(b); (c) a parallel effort called evm-etl where we decode EVM Chains like Ethereum, Arbitrum, Optimism, Base (which we started in mid-May), which can also be applied to Moonbeam, Astar, etc. but was of secondary importance given the relative DAUs. This month, we were asked by Pavla to model the small but growing number of hybrid chains in substrate-etl, which we are now ready to do given (c). We got paid for (a) in Nov but did not get paid for (b) and (c) is far more of an EVM project than a Substrate project -- we wished to have a basic 1.0 solution that works across Substrate+EVM ecosystems and account for (a-c) payments before modeling Q3+Q4 2023 completely. At the end of Q2, we split up our Google Cloud projects into substrate and non-substrate to model Colorful Notion Polkadot and non-Polkadot activities but it is a bit tricky with hybrid Substrate+EVM chains having 2 chain lives (1284 vs 2004, 592 vs 2006 etc). As far as a forecast of yearly costs going forward, we made substrate-etl run smoothly so that it doesn't need both SN+MC working on it full time but there is maintenance (TJ+GC+.25 SN) and then there is innovation (SN+MC+X FTEs doing new things). We are looking to chart a course for innovation with W3F now but the maintenance of that innovation may still reside over here in the Treasury. I am not the architect of this, this is just what they (W3F+Parity) tell me to do...

  5. Colorful Notion's relevant repos are Polkaholic and substrate-etl

  6. substrate-etl documents the "decoded" datasets but generally a SQL knowledgable data analyst has to spend a few minutes to an hour to map those decoded datasets into SQL for any specific question. For high interest "deep" activity (staking+democracy now, assetconversion soon, coretime+coreplay later) it is reasonable to build special purpose pallet-centric datasets or area specific datasets like "dextrades" that take the decoded datasets to the next level and not simply bundle them in defillama dashboards or block explorer charts alone. For your specific question on staking nominators, join the Polkadot Data Alliance Matrix chat room and/or email michael@colorfulnotion.com -- your question likely has a new dataset output of polkadot_analytics which we are happy to figure out a specific plan (as in here is the pallet-specific data for polkadot) and general (how should multichain pallets + domains like "dextrade" be analyzed across the ecosystem) together.

On the overall idea that we are 120K DOT/yr in aggregate of all of this (with GC being a significant component), that's accurate. I'm proud of the work we've done as a 4 person team and think we can do more, not to just maintain+extend this, but to innovate. I hope the combination of Treasury+W3F can help us chart a course on the innovation possibilities in 2.0's CoreJam+CorePlay+CoreTime with this work as a foundation.

Load more comments
PleaseLogin to comment

Requested

DOT
99.03K DOT

Proposal Passed

Help Center

Report an Issue
Feedback
Terms and Conditions
Github

Our Services

Docs
Terms of Website
Privacy Policy

A House of Commons Initiative.

Polka Labs Private Limited 2026

All rights reserved.

Terms and ConditionsTerms of Website
Privacy Policy