Rectangle 27 0

Ruby: How can I read a CSV file that contains two headers in Ruby?


"721","Air Force","09/01/12",

Thanks this is very helpful. I had looked through the Ruby CSV documentation and didn't see anything so I'm glad to see I'm not going blind!

When you start parsing your data, if the first column represents an integer, then, if you convert it to an int and if it's > 0 than you know you're dealing with a valid "row" and not a header.

You'll have to write your own logic. CSV is really just rows and columns, and by itself has no inherent idea of what each column or row really is, it's just raw data. Thus, CSV has no concept or awareness that it has two header rows, that's a human thing, so you'll need to build your own heuristics.

Note
Rectangle 27 0

Ruby: How can I read a CSV file that contains two headers in Ruby?


... |        Rushing        |         Passing         | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...
  • Read the first line, treat it as first header line.
  • Read the second line, treat it as second header line.
  • Read the third line, treat it as first data line.

(Not sure if I restored the groups properly.)

It looks like your CSV file was produced from an Excel spreadsheet that has columns grouped like this:

There is no standard tools to work with such kind of CSV files, AFAIK. You have to do the job manually.

Note
Rectangle 27 0

Ruby: How can I read a CSV file that contains two headers in Ruby?


arr_of_arrs = CSV.read("path/to/file.csv")
arr_of_arrs[2..arr_of_arrs.length].each do |x|
   # operation here
end

As noted in the edit, I got my python and ruby confused. Vote the answer down, if it makes you feel better.

Instead of doing [2..arr_of_arrs.length] you could do [2..-1], where -1 is the last element in the Array. It's a lot cleaner in my opinion.

Read the CSV in and skip the first line on output:

Shouldn't it be [2.. instead of [1.. ?

[1:] - what syntax is this?

Note
Rectangle 27 0

Ruby: How can I read a CSV file that contains two headers in Ruby?


require 'smarter_csv'
 options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ], 
            :headers_in_file => false}
 data = SmarterCSV.process(filename, options)
 data.pop # to ignore the first header line
 data.pop # to ignore the second header line
 # data now contains an array of hashes with your data

I'd recommend using the smarter_csv gem, and manually provide the correct headers:

One option you should use is :user_provided_headers , and then simply specify the headers you want in an array. This way you can work around cases like this.

Please check the GitHub page for the options, and examples. https://github.com/tilo/smarter_csv

Thanks I'll take a look!

You will have to do data.pop to ignore the header lines in the file.

can you please upload a small sample CSV file somewhere? e.g. as a gist? Should be easy to add a feature to smarter_csv for that.

probably the easiest way to do this is to use the :user_provided_headers option in smarter_csv

yes, I can try to auto-merge the two header-lines. The second line is two columns shorter than the first line.. weird

Note