5

I'm trying to parse the Blockchain and extract to a txt file part of the transactions and Addresses to a specific contract (theDAO) for later analysis. I'd like to save the output to a file (.txt) in order to reuse it later.


2 Methods

Hi, I've succeded in parsing the Ethereum Blockchain with Web3.js both using simply the geth console

$ cd /path/to/file
$ geth --exec 'loadScript("myEthereumBlockchainParses.js")' attach

as well as connecting it to a Node.js instance with rpc

$ geth --rpc

$ node myEthereumBlockchainParses.js

Problems

I'm having 2 different problems:

  1. if I use the geth console method I can't use the Node filesystem "fs" module to write to a text file

err: Cannot find module 'fs'

  1. if I use the Node.js method, I can save to a localfile, but the script is very slow and it freezes my computer after just a 1000 blocks approx (even if I minify the code and simplify it - bare in mind it's the same parsing code that works fine in the geth console

the first method, in the geth console, is way faster.


Questions

  1. is there a faster/better way to do what I'm trying to do (parse and extract to filesystem part of the blockchain)?
  2. is there a way to use node modules and that syntax in the geth console (which is faster than --rpc)?
  3. is there a way to make the --rpc faster?
  4. Initially I thought of saving to a simple.txt file but maybe is it better to store that amount of info in a db like MongoDB?

SOLUTION

after the excellent answers by @BoppyKooBah here is some code https://github.com/lyricalpolymath/Ethereum_DaoExtraBalanceOwners/blob/master/extraBalanceRunScript

user3498
  • 735
  • 2
  • 9
  • 12
  • Related: http://ethereum.stackexchange.com/q/2184/2460 – galahad Jul 22 '16 at 14:43
  • Re:node.js modules: http://ethereum.stackexchange.com/q/6696/2460 – galahad Jul 22 '16 at 14:44
  • 1
    Welcome to Ethereum! It is preferred if you can post separate questions instead of combining your questions into one. That way, it helps the people answering your question and also others hunting for at least one of your questions. Thanks! – q9f Jul 23 '16 at 09:04
  • @5chdn . thanks. will do. it's that questions are like cherries: one always leads to another :) – user3498 Jul 23 '16 at 10:29
  • Nice github stuff. The data I generated for the extraBalance owners can be found through Which accounts contributed to The DAO's extraBalance account? . @BoppyKooBah. – BokkyPooBah Jul 26 '16 at 23:27
  • @BokkyPooBah your stuff is great. I've seen your other post but I can't comment or ask you questions there (newbie account). I have many more, like how do you read the stack and how did you found out what the function bytecode reference is (I'm going through the yellow paper to solve that but I don't have all the answers yet) :) anyways thanks for these tips – user3498 Jul 28 '16 at 22:19

2 Answers2

6

From How many The DAO recursive call vulnerability attacks have occurred to date?, this is what I do to get simple scripts to extract and save data:

Copy the following script into getTheDAOTransferEvents:

#!/bin/sh

# First search from 1428757 (The DAO creation) to 1736131
# First Transfer event in block 1599207

FIRSTBLOCK=${1:-1599207}
LASTBLOCK=${2:-"'latest'"}

echo "Searching for The DAO Transfer events to address 0x0000000000000000000000000000000000000000 between blocks $FIRSTBLOCK and $LASTBLOCK"


geth attach << EOF | egrep -e ",0x"

var theDAOABI = [{"anonymous":false,"inputs":[{"indexed":true,"name":"_from","type":"address"},{"indexed":true,"name":"_to","type":"address"},{"indexed":false,"name":"_amount","type":"uint256"}],"name":"Transfer","type":"event"}];

var theDAOAddress = "0xBB9bc244D798123fDe783fCc1C72d3Bb8C189413";

var theDAO = web3.eth.contract(theDAOABI).at(theDAOAddress);

var theDAOTransferEvent = theDAO.Transfer({}, {fromBlock: $FIRSTBLOCK, toBlock: $LASTBLOCK});

console.log("No,From,Block,DAOs");
var i = 0;
theDAOTransferEvent.watch(function(error, result){
  var args = result.args;
  if (args._to == "0x0000000000000000000000000000000000000000") {
    i++;
    var daos = args._amount / 1e16;
    console.log(i + "," + args._from + "," + result.blockNumber + "," + daos);
  }
});
theDAOTransferEvent.stopWatching();

EOF

Set the executable bit of the file using chmod 700 getTheDAOTransferEvents.

Then run the script in a separate terminal window to extract all Transfer events of interest using

./getTheDAOTransferEvents > output.txt


Q: is there a faster/better way to do what I'm trying to do (parse and extract to filesystem part of the blockchain)?

Very likely. You'll just have to test out different methods for your use cases.

Q: is there a way to use node modules and that syntax in the geth console (which is faster than --rpc)?

I've not evaluated this.

Q: is there a way to make the --rpc faster?

You can run multiple copies of the shell script at the same time, for example:

./getTheDAOTransferEvents 0 999999 > output0.txt &
./getTheDAOTransferEvents 1000000 1999999 > output1.txt &
./getTheDAOTransferEvents 2000000 2999999 > output2.txt &
./getTheDAOTransferEvents 3000000 3999999 > output3.txt &
...

Q: Initially I thought of saving to a simple.txt file but maybe is it better to store that amount of info in a db like MongoDB?

Useful if you want to access using indices. Redis is nice as well.

I like the Unix philosophy "that emphasizes building simple, short, clear, modular, and extensible code that can be easily maintained and repurposed by developers other than its creators."

I start off just creating comma (or tab) separated value files and then employ other processes to move the data into a SQL/NoSQL database.

Then if I need more speed/logic/coupling, I look to improve the process.



EDIT 23/07/2016 - Response to comments below

let me say it "geth attach << EOF | egrep -e ",0x" == MAGIC! :) that is some hardcore mambo jambo :) However I need to change the regular expression to fit my needs. I'll play with this and post a full solution. The only thing, this saves to only one file. Is there a way to save different parts of the console output to different files? - also. this doesn't give a visual feedback of the execution of the command. is there a way to show the progress? thanks!

Q: Is there a way to save different parts of the console output to different files?

I use grep to separate data sometime. Here's an example using theDAOVoter from https://github.com/bokkypoobah/TheDAOVoter to list the DAO splits.

# Generate list of DAO splits
user@Kumquat:~$ theDAOVoter --sumsplits > sumSplits

# Extract data for account 0x1368
user@Kumquat:~$ egrep "0x1368|Status" sumSplits 
       Prop Status                Yea             Nay Recipient                                  Description            
      1 Expired         967598.22      4276278.60 0x13680fa2a60fd551894199f009cca20fb63a3e31                                         
     18 Expired           2200.20      3913649.01 0x13680fa2a60fd551894199f009cca20fb63a3e31                                    

 # Extract data for account 0x3d55
user@Kumquat:~$ egrep "0x3d55|Status" sumSplits 
       Prop Status                Yea             Nay Recipient                                  Description            
      4 Expired           5279.34      4322941.58 0x3d5507b53d1613d8491a606ecf5c9268301095dd split                                   

# Extract data for accounts other than 0x1368 and 0x3d55
user@Kumquat:/tmp$ egrep -v -e "0x1368|0x3d55" sumSplits
   Prop Status                Yea             Nay Recipient                                  Description            

      6 Expired              1.99       175453.91 0xbeb0b93c01297146782a5581370489a36b24deca Original intent, non-interventionist cur

      7 Expired         118006.68      3967413.62 0xe82d5b10ad98d34df448b07a5a62c1affbef758f Leave me alone         

      8 Expired         199999.99      3931880.95 0xa72ded5c1122312d9f4ed66bf4a396139eadaf56                        

   ...

Q: this doesn't give a visual feedback of the execution of the command. is there a way to show the progress?

I use the Unix watch command to display the changes. -d highlights the differences. -n5 for example updates the display every 5 seconds.

# Watch the file size growing as the data gets written
watch -d -n5 'ls -al'

# Watch the number of lines in the file as the data gets written
watch -d -n5 'wc -l sumSplits'

# Watch the last 3 lines in the file changes as the data gets written
watch -d -n5 'tail -n3 sumSplits'

Or use tail -f sumSplits to display the file contents as the data gets written.

BokkyPooBah
  • 40,274
  • 14
  • 123
  • 193
  • let me say it "geth attach << EOF | egrep -e ",0x" == MAGIC! :) that is some hardcore mambo jambo :) However I need to change the regular expression to fit my needs. I'll play with this and post a full solution. The only thing, this saves to only one file. Is there a way to save different parts of the console output to different files?
    • also. this doesn't give a visual feedback of the execution of the command. is there a way to show the progress? thanks!
    – user3498 Jul 22 '16 at 19:16
  • 1
    See edit above. – BokkyPooBah Jul 23 '16 at 01:18
  • 1
    awesome! thanks. will play with these as well. In the meantime I've found a clean way for my case thanks to these tips. will clean the code and post the solution here. It's amazing how slow node / rpc can be compared to running the commands directly in the geth console (my code takes 2 hours in geth and 7.5 hours in Node) – user3498 Jul 23 '16 at 10:32
3

Perhaps not the detailed answer you were looking for - and it certainly doesn't cover your specific queries - but the EthSlurp tool might be of use.

It allows you to:

... 'slurp' the Ethereum blockchain extracting transactions from any Ethereum address, including smart contracts and, from there, store these transactions in familiar file formats such as .csv (MS Excel) or .txt.

The author of the tool has been running continuous daily scrapings of TheDAO since it went live, the data from which is available on the DAO Deep Dive website.

Richard Horrocks
  • 37,835
  • 13
  • 87
  • 144
  • thanks. I'll check it out. however I'd like to learn how to do it myself :) it seems odd that we have a "universally accessible" and distributed database but then there is no easy way to parse it and do data analysis on top of it :) – user3498 Jul 22 '16 at 14:44