How does Ethereum make use of bloom filters?

Question

As stated in the yellow paper:

Transaction Receipt. In order to encode information about a transaction concerning which it may be useful to form a zero-knowledge proof, or index and search, we encode a receipt of each transaction containing certain information from concerning its execution. Each receipt, denoted BR[i] for the ith transaction) is placed in an index-keyed trie and the root recorded in the header as He.

The transaction receipt is a tuple of four items comprising the post-transaction state, R, the cumulative gas used in the block containing the transaction receipt as of immediately after the transaction has happened, Ru, the set of logs created through execution of the transaction, Rl and the Bloom filter composed from information in those logs, Rb:
R = (R;Ru;Rb;Rl)

Can anyone give more details of how this Rl (logs) is structured and how the Rb (bloom filters) are constructed from it?

I've been doing some research about bloom filters and Broder and Mitzenmacher state that:

Wherever a list or set is used, and space is at a premium, consider using a Bloom filter if the effect of false positives can be mitigated.

So how does this relates to Ethereum's design rational?

score 52 · Accepted Answer · answered Apr 28 '16 at 06:49

52

Events in the ethereum system must be easily searched for, so that applications can filter and display events, including historical ones, without undue overhead. At the same time, storage space is expensive, so we don't want to store a lot of duplicate data - such as the list of transactions, and the logs they generate. The logs bloom filter exists to resolve this.

When a block is generated or verified, the address of any logging contract, and all the indexed fields from the logs generated by executing those transactions are added to a bloom filter, which is included in the block header. The actual logs are not included in the block data, to save space.

When an application wants to find all the log entries from a given contract, or with specific indexed fields (or both), the node can quickly scan over the header of each block, checking the bloom filter to see if it may contain relevant logs. If it does, the node re-executes the transactions from that block, regenerating the logs, and returning the relevant ones to the application.

answered Apr 28 '16 at 06:49

Nick Johnson

8,144
1
28
35

thank's =]. Do you know where I can find some material defining how is this bloom filter created and how ethereum deals with false positives? – Henrique Barcelos Apr 28 '16 at 14:06
@HenriqueBarcelos The bloom filter creation is described in the yellowpaper, in and just after the section you quoted. The notation's a bit obscure, unfortunately. False positives are handled by filtering the list of events generated by re-executing the transactions against the original filter criteria. – Nick Johnson Apr 28 '16 at 14:16
1

Yeah, that's the problem. I can't make any sense from the definition from the yellow paper. I was hopping to get something "for dummies". Anyway, I'll keep digging. Thank you very much! – Henrique Barcelos Apr 28 '16 at 14:17
4

@HenriqueBarcelos It's fairly straightforward: For the address, and each of the indexed topics, Keccac-256 hash them, then take the first 11 bits of each of the three least significant pairs of bytes of the hash. These are the indexes of the 3 bits in the bloom filter that you should set. – Nick Johnson Apr 28 '16 at 14:32
Got it. But Bloom filters are supposed to use k hashes, with k>=1, but when k=1, it behaves exactly like a hash. If we're using only Keccac-256, what are the advantages? – Henrique Barcelos Apr 28 '16 at 16:57
1

@HenriqueBarcelos k hashes, and k distinct subsets of a larger hash, are equivalent to each other, because all the bits of a hash are independent. – Nick Johnson Apr 28 '16 at 18:09
1

@NickJohnson: Are you aware of any simple code that would take an address and the bloom filter from the block head and return true/false if the address is in the bloom filter? – Thomas Jay Rush Oct 10 '16 at 00:01
@NickJohnson You say above: "For the address, and each of the indexed topics, Keccac-256 hash them, then take the first 11 bits of each of the three least significant pairs of bytes of the hash." I get his except for the first part. Does this mean concatenate the address and each non-empty topic and then sha3 them? I'm pretty sure that's what it means, but I thought I'd ask. – Thomas Jay Rush Feb 02 '17 at 19:34
@NickJohnson Also--when I "take the first 11 bits of the three least significant pairs of bytes," what do I do with them. Confused by 3*11 bits when concatenated would be 33 bits--but where does that go in the 2048 bit long filter? I must be missing something. – Thomas Jay Rush Feb 02 '17 at 19:35
You don't concatenate them - you take each one independently, hash it, and insert it into the bloom filter. Then, use the 11 LSBs of the result as an index into the bloom filter. – Nick Johnson Feb 03 '17 at 20:34
1

Isn't relevant logs downloaded under each block without any executing (miners already generated the output of the Tx results)? Does Bloom Filer also used for other tries (state, transaction and receipt)? @Nick Johnson – alper Dec 13 '17 at 11:26

How does Ethereum make use of bloom filters?

1 Answers1

Linked

Related