2

This is a question for all those that are familiar with crafting Google BigData queries to extract data.

My request is fairly simple. I'm looking to extract a definitive list of smart contracts with unique bytecode.

The first query I sent through only gave me the # of contracts with unique bytecode (believe the # is about 25k or so).

Then I changed my query to the following:

SELECT
    DISTINCT `bigquery-public-data.ethereum_blockchain.contracts`.bytecode AS unique_bytecode,
  FROM
    `bigquery-public-data.ethereum_blockchain.contracts`

All this does is return the bytecode for all results that match the query...which is close to what I'm looking for but still no cigar. Instead, I'm looking for the accompanying smart contract addresses for the results that match the query (hope this makes sense how I've worded it).

In laymen's terms, I'm trying to craft a Google BigData query that returns the addresses of all smart contracts with unique bytecode (i.e., no smart contract address that's returned should have a bytecode that matches any other returned result from the query).

librehash
  • 33
  • 4
  • 1
    don't know bigquery, but in any other SQL database I would try a grouped query, something like SELECT MIN(address), bytecode FROM contracts GROUP BY bytecode – user253751 Dec 12 '22 at 11:36
  • @user253751 Thanks for that tip. Do I still need the DISTINCT before the bytecode modifier to stipulate that the results should be unique? If you happen to know a SQL query that will achieve this same goal, that works too bc Google BigData has a feature that allows you to convert SQL queries to BigData format. – librehash Dec 12 '22 at 17:11
  • I literally just told you an SQL query that would achieve the goal – user253751 Dec 14 '22 at 08:33
  • @user253751 you're right. I forgot to double back to this. Put in the SQL command & it gave me exactly what I was looking for. I sprinkled in the AS phrases for the sake of labeling the columns on output, but that was just for aesthetic / visual purposes. You nailed it man. Thanks! Wish I could give you best answer but it won't let me do it for comments. – librehash Dec 14 '22 at 09:12
  • 1
    now it's an answer – user253751 Dec 14 '22 at 09:44

1 Answers1

1

I don't know Bigquery's language but in SQL you can do this:

SELECT MIN(address), bytecode FROM contracts GROUP BY bytecode

This puts all contracts with the same bytecode into a group and from each group it selects the bytecode and the lowest address.

user253751
  • 244
  • 1
  • 6
  • Thanks! This solved it for me. Want to note for anyone that may come across this question in the future that the query above is valid as written as a BigData query.

    Additionally, I made some minor additions to the code for the sake of having it label columns for the output as well as specifying the data source: SELECT MIN(address) AS address, , bigquery-public-data.ethereum_blockchain.contracts.bytecode AS bytecode FROM bigquery-public-data.ethereum_blockchain.contracts GROUP BY bytecode

    – librehash Dec 14 '22 at 16:01