To respond to the feature matrix Dan talked about in the BP channel, I thought useful of describing what the dfuse stack does, and what it doesn’t do.
I hope Igor (or Rio :P) drops note about what Hyperion does and doesn’t do.
dfuse Search offers a simple indexing of actions. Think of it as an Elasticsearch collection called actions
, with those fields: “receiver, account, action, auth, scheduled, status, notif, input, event, ram.consumed, ram.released, db.key, db.table, data.account, data.active, data.active_key, data.actor, data.amount, data.auth, data.authority, data.bid, data.bidder, data.canceler, data.creator, data.executer, data.from, data.is_active, data.is_priv, data.isproxy, data.issuer, data.level, data.location, data.maximum_supply, data.name, data.newname, data.owner, data.parent, data.payer, data.permission, data.producer, data.producer_key, data.proposal_name, data.proposal_hash, data.proposer, data.proxy, data.public_key, data.producers, data.quant, data.quantity, data.ram_payer, data.receiver, data.requested, data.requirement, data.symbol, data.threshold, data.to, data.transfer, data.voter, data.voter_name, data.weight, data.abi, data.code”. You can then query on those fields to retrieve actions from history. The search engine will return the whole transaction, highlighting the actions that matched your query.
The product itself is fork-aware, provides multiple guarantees (thanks to custom cursors) not present in other products. It is a distributed system, which can provide more or less replication factors for different parts of the chain, and reversible segments of the chain are queryable in tiny 1-block indexes. This means real-time querying of the reversible segment, in both directions (ASC or DESC), also means you can have streaming search (real-time listening) on incoming blocks. The solution also allows you to filter out what you don’t want when indexing, on two axis: only a part of the history (so time-wise), and/or by flushing out unwanted content (filtering). Note also that this software (like all dfuse components) is detached from nodeos
execution: it can be used to re-index large networks in a matter of minutes (provided enough CPU power of course :), without the need to replay the chain. All dfuse components are also designed with parallelism in mind, to allow those re-indexings to be done in parallel. dfuse Search feeds from the Firehose.
But mind you, this is an index of raw actions. It doesn’t provide the current state, or the past state (although actions do convey state changes they caused, in the form of old_row
and new_row
). It does not do aggregation queries. It is not useful to get all the latest token balances of an account. It is pretty much overkill for wallets that need a (often short) list of recent transactions for a new account they’re serving. It requires huge amounts of RAM, and of storage if you want to keep everything. It also requires a K/V store to be loaded with the actual transactions and blocks contents, since the indexes only contain that: indexes. It’s great to find a needle in a haystack fast, but you better have a need for it, because the cost can be much higher (especially if you don’t filter anything
You can see it in action here. Here’s an architecture overview of its components. Search query docs.
The dfuse Firehose contains a stream of all the data. Think of it as a better SHIP. Something that can be consumed online because it doesn’t hit a node. It can be filtered on-demand when the user queries the service (instead of needing to configure a nodeos
process to filter out what it writes). It contains block state (with data to generate merkle proofs), all transactions traces, and rich data about actions (including RAM consumes/releases and their cause, and state deltas at the action level), all feature upgrade operations, both global and user-centric resource limits deltas, all deferred transactions events, creations, cancellations, etc… (yeah I know its deprecated but its there). Basically, it lacks absolutely nothing you could desire if you squeezed a node executing transactions. It is backed by two things: 1) files that contain past blocks (and their traces, etc…), chunked by 100 blocks, usually stored in some object storage, shared disk or whatever, those files include all forks seen… 2) a live feed from one or more nodeos
nodes (for high-availability). This service is what feeds all the other higher-level services. It is fork-aware (helps a consumer navigate forks), through a similar use of cursors as Search, for guaranteed linearity, across disconnections, etc…
It’s a very raw form of streaming blockchain data access, and does that extremely well, reliably and is also the fastest thing you’ll see as nodes race to push out data to consumers (if there’s 3 nodes in the cluster, the first to see a block pushes it out to consumers, the other 2 will be dedup’d out).
The Firehose is currently served as a gRPC service, with data being binary packed in Protobuf. See GitHub - dfuse-io/playground-firehose-eosio-go: Playground to play with EOSIO Firehose service (with stats for sample code and to start using it.
This service is not useful to query for a random transaction. It’s not made to query the current state, nor the past state (although actions will come with their state deltas). It is not fit for searching the history if you’re looking for a needle in the haystack (unless you want the system to process TBs of data, by opening all the files and parsing them, and taking a lot of time). It will certainly not help you listing your current token balances.
The dfuse State DB is a purpose-built piece of software, and specialized database to provide a full snapshot of the whole state at each block. Of course, it does not make sense to clone 8GB of RAM to storage each 0.5s, and thankfully, the whole 8GB doesn’t change at each block. State DB, backed solely by a K/V store, uses special-purpose indexing strategy to allow for quick querying of any state, at any block height. The service can also do on-the-fly decoding of ABIs, or provide the rows in binary. Its main purpose is to allow fetching of large tables in one consistent swift, which would be impossible to do reliably/consistently by hitting /v1/.../get_table_rows
: when iterating by chunks of 1000, at any moment, a new block could come and invalidate what you already fetched.
The general purpose tech is GitHub - dfuse-io/fluxdb: A temporal database for blockchain state … applied in EOSIO as the StateDB here: dfuse-eosio/statedb at develop · dfuse-io/dfuse-eosio · GitHub It’s mostly exposed through REST today, but exists as a gRPC service defined here: proto-eosio/statedb.proto at master · dfuse-io/proto-eosio · GitHub (which supports streaming of the rows in the table, instead of a buffered dump).
This service also supports parallelized processing for extremely quick processing of large networks. However, by design it requires the linear history of row additions, updates and removals for a given table, so a first pass of parallel processing slices the full history into 100 full histories (each containing only a subset of tables). A second parallel operation can then insert those much smaller full-histories, tackling only 1/100th of the tables in each slice.
The live server is fork-aware, and allows querying of the reversible segment.
This service does not do any aggregation. It also does not support conditional filtering, or pagination (although the 2 latter could eventually be implemented). Today, it does not implement secondary indexes either. It is used by other services that bootstrap from a snapshot, and streams changes onwards (like the tokenmeta
service).
See GET /v0/state/table | dfuse docs and other REST endpoints under /v0/state
.
The dfuse Account History, is yet another purpose-built database, also backed by a K/V store. Its design is to provide a fixed number of historical actions, for each account, or for each tuple of contract+account. Say 1000 transactions per account. The goal is to be able to provide the full history for accounts that don’t do crazy amounts of transactions, yet flush out those realllly spammy accounts that do a lot.
Again, this process was designed to be able to process the history in parallel, and purge the extraneous data going forward. It is an autonomous service, and runs indepenently of the other user-facing services. It feeds from the dfuse Firehose.
It is exposed through GraphQL. You can try a sample query here: GraphiQL: Discover the dfuse GraphQL interface
It also has a gRPC interface (not exposed publicly on our hosted version). You can find its gRPC definition here: proto-eosio/accounthist.proto at master · dfuse-io/proto-eosio · GitHub
tokenmeta
is yet another specialized service to serve the token balances of users, and the token hodlers for a given contract, extremely fast, and with a single query. It holds everything in memory, boots by fetching consistent snapshots from StateDB, and then stays up-to-date with the Firehose. It otherwise runs completely independently from dfuse Search and dfuse Account History.
It is accessible here through GraphQL: GraphiQL: Discover the dfuse GraphQL interface
It also has an internal gRPC interface defined here: proto-eosio/tokenmeta.proto at master · dfuse-io/proto-eosio · GitHub
Two other services are not directly exposed but exist in the dfuse stack:
More recently, we’ve released GitHub - dfuse-io/dkafka: kafka cloudevent integration which is a Firehose to Kafka pipeline, which deals with reorgs to give good guarantees to kafka stream consumers. Not a data service you’d expose online, but just to show that people get creative when the Firehose is available
I think that covers most of the networked services… let me know if things aren’t clear.