27

Maybe somebody can help me.

Starting with a CSV file like so:

Ticker,"Price","Market Cap"
ZUMZ,30.00,933.90
XTEX,16.02,811.57
AAC,9.83,80.02

I manage to read them into an array:

require 'csv'
tickers = CSV.read("stocks.csv", {:headers => true, :return_headers => true, :header_converters => :symbol, :converters => :all} )

To verify data, this works:

puts tickers[1][:ticker]
ZUMZ

However this doesn't:

puts tickers[:ticker => "XTEX"][:price]

How would I go about turning this array into a hash using the ticker field as unique key, such that I could easily look up any other field associatively as defined in line 1 of the input? Dealing with many more columns and rows.

Much appreciated!

Marcos
  • 4,588
  • 5
  • 39
  • 61

6 Answers6

34

Like this (it works with other CSVs too, not just the one you specified):

require 'csv'

tickers = {}

CSV.foreach("stocks.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
  tickers[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
end

Result:

{"ZUMZ"=>{:price=>30.0, :market_cap=>933.9}, "XTEX"=>{:price=>16.02, :market_cap=>811.57}, "AAC"=>{:price=>9.83, :market_cap=>80.02}}

You can access elements in this data structure like this:

puts tickers["XTEX"][:price] #=> 16.02

Edit (according to comment): For selecting elements, you can do something like

 tickers.select { |ticker, vals| vals[:price] > 10.0 }
Michael Kohl
  • 64,924
  • 11
  • 136
  • 155
  • Wow quick reply thank you!! Still getting the hang of this lang. Next I'm researching how to apply one or more filters(eg. return this hashed array w/all prices over 2.01) – Marcos Dec 12 '11 at 16:27
  • 2
    If this answer helped you, please upvote and/or accept (the little tick mark below the voting arrows) it, that's StackOverflow etiquette. I'll update my answer to address the filtering question :-) – Michael Kohl Dec 12 '11 at 16:30
  • 1
    My 2.8MB file with under 7000 rows and ~40 columns takes way too long on this foreach loop, over 5min, reading in only a few cols [1..4] testing in irb. Have to stick to snappy fast awk query to keep my script under 20s till I can figure this out within Ruby :( – Marcos Dec 12 '11 at 16:55
  • 1
    Dang, ain't got the reputation to upvote you...been consulting this great site for yrs but only now created acct :) – Marcos Dec 12 '11 at 16:57
  • Then just accept the answer, you should always be able to do that. As for the speed, Ruby isn't known for being fast, but 5 minutes does sound like a lot. You could try JRuby, but if you have a 20s limit, JVM startup may offset potential speed gains. But then there's nothing wrong with using `awk` for parsing CSV, it's actually very well suited for the task :-) – Michael Kohl Dec 12 '11 at 17:04
  • BTW my sample CSV datasource http://tinyurl.com/AllStocksFinviz if it helps. Thanks for all explaining & examples! – Marcos Dec 12 '11 at 17:05
  • This discussion is getting too long, but on my laptop this is how long it took me to parse the entire file (all columns): `ruby foo.rb Downloads/finviz.csv 36,00s user 1,07s system 92% cpu 40,260 total`. If you are on 1.8 you should try the `FasterCSV` gem. – Michael Kohl Dec 12 '11 at 17:27
  • try without the :converters => :all param. I'm not sure why that's there and it will likely slow it down a bit. – pguardiario Dec 12 '11 at 23:49
  • Now that my framework is maturing, and needing more fields of various types to read and process flexibly, I'm looking to shift back from my fast `awk` solution to 100% native Ruby. But it has to be just as fast and available to my scripts, so it seems `memcached` will be a good fit for me. I.e. I will structure to preload my CSV file into a memory object for Ruby to later use instantly. – Marcos Feb 25 '12 at 15:12
  • I wrote a gem that yields hashed CSV records, with no extra effort: https://rubygems.org/gems/hasherize_csv – brentiumbrent Feb 07 '13 at 16:40
6
CSV.read(file_path, headers:true, header_converters: :symbol, converters: :all).collect do |row|
  Hash[row.collect { |c,r| [c,r] }]
end
John Bachir
  • 22,081
  • 24
  • 147
  • 213
Mr. Demetrius Michael
  • 2,186
  • 5
  • 26
  • 37
1

To add on to Michael Kohl's answer, if you want to access the elements in the following manner

puts tickers[:price]["XTEX"] #=> 16.02

You can try the following code snippet:

CSV.foreach("Workbook1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
    hash_row =  row.headers[1..-1].zip( (Array.new(row.fields.length-1, row.fields[0]).zip(row.fields[1..-1])) ).to_h
    hash_row.each{|key, value| tickers[key] ? tickers[key].merge!([value].to_h) : tickers[key] = [value].to_h}
end
clouddra
  • 11
  • 1
0

Not as 1-liner-ie but this was more clear to me.

csv_headers = CSV.parse(STDIN.gets)
csv = CSV.new(STDIN)

kick_list = []
csv.each_with_index do |row, i|
  row_hash = {}
  row.each_with_index do |field, j|
    row_hash[csv_headers[0][j]] = field
  end
  kick_list << row_hash
end
Jesse Smith
  • 1,232
  • 12
  • 14
0

To get the best of both worlds (very fast reading from a huge file AND the benefits of a native Ruby CSV object) my code had since evolved into this method:

$stock="XTEX"
csv_data = CSV.parse IO.read(%`|sed -n "1p; /^#{$stock},/p" stocks.csv`), {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all}

# Now the 1-row CSV object is ready for use, eg:
$company = csv_data[:company][0]
$volatility_month = csv_data[:volatility_month][0].to_f
$sector = csv_data[:sector][0]
$industry = csv_data[:industry][0]
$rsi14d = csv_data[:relative_strength_index_14][0].to_f

which is closer to my original method, but only reads in one record plus line 1 of the input csv file containing the headers. The inline sed instructions take care of that--and the whole thing is noticably instant. This this is better than last because now I can access all the fields from Ruby, and associatively, not caring about column numbers anymore as was the case with awk.

Community
  • 1
  • 1
Marcos
  • 4,588
  • 5
  • 39
  • 61
  • 1
    This isn't Perl -- you don't need the `$`, and in fact you shouldn't have it, because in Ruby, `$` indicates global variables, and use of globals is generally bad practice. – Marnen Laibow-Koser Nov 19 '12 at 06:19
  • 1
    In this case the globals are intentional. But I do realize that if the entire program were more OO, things would be written better. – Marcos Nov 19 '12 at 09:43
  • Exactly. Excessive use of globals (i.e. really *any* use except for things like configuration data) usually indicates a design problem that you'll want to fix. – Marnen Laibow-Koser Nov 19 '12 at 17:21
  • Downvoter note: This solution crossing multiple technologies (not just Ruby) has been our chosen one for years, since the original purely-Ruby answer, while it works, is highly inefficient, and simply too slow in production. – Marcos Aug 19 '14 at 17:53
-1

While this isn't a 100% native Ruby solution to the original question, should others stumble here and wonder what awk call I wound up using for now, here it is:

$dividend_yield = IO.readlines("|awk -F, '$1==\"#{$stock}\" {print $9}' datafile.csv")[0].to_f

where $stock is the variable I had previously assigned to a company's ticker symbol (the wannabe key field). Conveniently survives problems by returning 0.0 if: ticker or file or field #9 not found/empty, or if value cannot be typecasted to a float. So any trailing '%' in my case gets nicely truncated.

Note that at this point one could easily add more filters within awk to have IO.readlines return a 1-dim array of output lines from the smaller resulting CSV, eg.

 awk -F, '$9 >= 2.01  &&  $2 > 99.99  {print $0}' datafile.csv 

outputs in bash which lines have a DivYld (col 9) over 2.01 and price (col 2) over 99.99. (Unfortunately I'm not using the header row to to determine field numbers, which is where I was ultimately hoping for some searchable associative Ruby array.)

Marcos
  • 4,588
  • 5
  • 39
  • 61
  • The problem with `-F,` is that `awk` treats every comma as delimiter even when it occurs inside a quoted field, like a company's name: `"Apple, Inc."` Only `"Apple` returns when I ask for field `$15` of that row. – Marcos Feb 25 '12 at 14:59