Rectangle 27 2

What you dont want to do is feed FasterCSV invalid CSV. Because of the way the CSV format works, its common for a parser to need to read until the end of the file to be sure a field is invalid. This eats a lot of time and memory.

Luckily, when working with invalid CSV, Rubys built-in methods will almost always be superior in every way. For example, parsing non-quoted fields is as easy as:

This would give you an array. If you really want valid CSV (f.e. because you rescued the MalformedCSVError) then there is... fasterCSV!

require 'csv'
str= %q{abc,hello mahmoud,this is" description, bad}
puts str.split(',').to_csv 
#=> abc,hello mahmoud,"this is"" description", bad

this fixes the unquoted problem but what if the csv file is malformed for another reason that can be fixed by openoffice also? Fore example: stackoverflow.com/questions/9098759/ Is there a generic solution for all these problems?

just for clarification. fastercsv was an external library in ruby 1.8 and was then included as the csv standard library in ruby 1.9.

ruby - How to reformat CSV file to match proper CSV format - Stack Ove...

ruby csv openoffice.org fastercsv
Rectangle 27 4

Using a native CSV parser

Edit: Having my big file around I also tested Uri Agassi's aproach using grep to get the lines of the file with empty fields:

File.new(filename).grep(/(^,|,(,|$))/)

It's about 10 times faster. If you need access to the fields you can use CSV.parse:

require 'csv'

File.new("/tmp/big.csv").grep(/(^,|,(,|%))/).each do |row_string|
  CSV.parse(row_string) do |row|
    puts row[1]
  end
end

Otherwise, if you have to parse the whole CSV file anyway, the answer is most likely no. Try running your script without the checking part - just reading the CSV rows. You will see no change in running time. This is because most of the time is spent reading and parsing the CSV file.

You might wonder if there is a faster CSV library for ruby. There is indeed a gem called FasterCSV but Ruby 1.9 has adopted it as its built-in CSV library, so it probably won't get much faster using Ruby only.

There is a ruby gem named excelsior which uses a native CSV parser. You can install it via gem install excelsior and use it like this:

require 'excelsior'

Excelsior::Reader.rows(File.open('/tmp/big.csv')) do |row|

  row.each do |column|

    unless column
      puts "empty field"
    end
  end
end

I tested this code with a file like yours (72M, ~30k entries 2.5k fields) and it is about twice as fast, however it segfaults after a few lines, so the gem might not be stable.

As you mentioned in your comment, there are a few more idiomatic ways to write this, such as using each instead of the for loop or using unless instead of if !, and using two spaces for indentation, which will turn it into:

require 'csv'

CSV.foreach('/tmp/big.csv') do |row|

  row.each do |column|
    unless column
      puts "empty field"
    end
  end

end

Find out if CSV file contains empty field in Ruby? - Stack Overflow

ruby csv
Rectangle 27 117

To create a new file:

As mikeb pointed out, there are the docs - http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html - Or you can follow along with the examples below (all are tested and working):

In this file we'll have two rows, a header row and data row, very simple CSV:

require "csv"
CSV.open("file.csv", "wb") do |csv|
  csv << ["animal", "count", "price"]
  csv << ["fox", "1", "$90.00"]
end

result, a file called "file.csv" with the following:

Almost the same forumla as above only instead of using "wb" mode, we'll use "a+" mode. For more information on these see this stack overflow answer: What are the Ruby File.open modes and options?

CSV.open("file.csv", "a+") do |csv|
  csv << ["cow", "3","2500"]
end

Now when we open our file.csv we have:

animal,count,price
fox,1,$90.00
cow,3,2500

Now you know how to copy and to write to a file, to read a CSV and therefore grab the data for manipulation you just do:

CSV.foreach("file.csv") do |row|
  puts row #first row would be ["animal", "count", "price"] - etc.
end

Of course this is like one of like a hundred different ways you can pull info from a CSV using this gem. For more info I suggest visiting the docs now that you have a primer: http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html

What if I want to open without writing to right away? Just don't use the block?

How do I create a new CSV file in Ruby? - Stack Overflow

ruby csv
Rectangle 27 116

To create a new file:

As mikeb pointed out, there are the docs - http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html - Or you can follow along with the examples below (all are tested and working):

In this file we'll have two rows, a header row and data row, very simple CSV:

require "csv"
CSV.open("file.csv", "wb") do |csv|
  csv << ["animal", "count", "price"]
  csv << ["fox", "1", "$90.00"]
end

result, a file called "file.csv" with the following:

Almost the same forumla as above only instead of using "wb" mode, we'll use "a+" mode. For more information on these see this stack overflow answer: What are the Ruby File.open modes and options?

CSV.open("file.csv", "a+") do |csv|
  csv << ["cow", "3","2500"]
end

Now when we open our file.csv we have:

animal,count,price
fox,1,$90.00
cow,3,2500

Now you know how to copy and to write to a file, to read a CSV and therefore grab the data for manipulation you just do:

CSV.foreach("file.csv") do |row|
  puts row #first row would be ["animal", "count", "price"] - etc.
end

Of course this is like one of like a hundred different ways you can pull info from a CSV using this gem. For more info I suggest visiting the docs now that you have a primer: http://ruby-doc.org/stdlib-1.9.3/libdoc/csv/rdoc/CSV.html

What if I want to open without writing to right away? Just don't use the block?

How do I create a new CSV file in Ruby? - Stack Overflow

ruby csv
Rectangle 27 89

Using Ruby 1.9 and above, you can get a an indexable object:

CSV.foreach('my_file.csv', :headers => true) do |row|
  puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
  puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end

It's not dot syntax but it is much nicer to work with than numeric indexes.

As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.

FasterCSV was incorporated into Ruby, I think it was in Ruby 1.9+.

ruby - Parse CSV file with header fields as attributes for each row - ...

ruby parsing csv
Rectangle 27 89

Using Ruby 1.9 and above, you can get a an indexable object:

CSV.foreach('my_file.csv', :headers => true) do |row|
  puts row['foo'] # prints 1 the 1st time, "blah" 2nd time, etc
  puts row['bar'] # prints 2 the first time, 7 the 2nd time, etc
end

It's not dot syntax but it is much nicer to work with than numeric indexes.

As an aside, for Ruby 1.8.x FasterCSV is what you need to use the above syntax.

FasterCSV was incorporated into Ruby, I think it was in Ruby 1.9+.

ruby - Parse CSV file with header fields as attributes for each row - ...

ruby parsing csv
Rectangle 27 5

You'll likely get a massive speed boost by simply updating to a current version of Ruby. in Version 1.9, FasterCSV was integrated as Ruby's standard CSV library.

Why is Ruby CSV file reading very slow? - Stack Overflow

ruby csv
Rectangle 27 25

Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.

:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.

CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
  puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
  puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end

Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:

1,2,3
blah,7,blam
4,5,6

Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.

row.headers

So I loaded CSV file into an array with only allstocks << row inside the loop. How do I read one cell myrow[:company] where myrow[:ticker] == "ANAD"? There is only one record and ticker is my key field anyway.

Marcos - If the CSV has been converted into an array, you may have lost the the hashes (symbols). If this is the case, just reference the cell by the column number e.g. myrow[0].

ruby - Parse CSV file with header fields as attributes for each row - ...

ruby parsing csv
Rectangle 27 25

Here is an example of the symbolic syntax using Ruby 1.9. In the examples below, the code reads a CSV file named data.csv from Rails db directory.

:headers => true treats the first row as a header instead of a data row. :header_converters => :symbolize parameter then converts each cell in the header row into Ruby symbol.

CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
  puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end
require 'fastercsv'
CSV.foreach("#{Rails.root}/db/data.csv", {:headers => true, :header_converters => :symbol}) do |row|
  puts "#{row[:foo]},#{row[:bar]},#{row[:baz]}"
end

Based on the CSV provided by the Poul (the StackOverflow asker), the output from the example code above will be:

1,2,3
blah,7,blam
4,5,6

Depending on the characters used in the headers of the CSV file, it may be necessary to output the headers in order to see how CSV (FasterCSV) converted the string headers to symbols. You can output the array of headers from within the CSV.foreach.

row.headers

So I loaded CSV file into an array with only allstocks << row inside the loop. How do I read one cell myrow[:company] where myrow[:ticker] == "ANAD"? There is only one record and ticker is my key field anyway.

Marcos - If the CSV has been converted into an array, you may have lost the the hashes (symbols). If this is the case, just reference the cell by the column number e.g. myrow[0].

ruby - Parse CSV file with header fields as attributes for each row - ...

ruby parsing csv
Rectangle 27 17

$ curl -s http://jamesabbottdd.com/examples/testfile.csv | xxd | head -n3
0000000: fffe 4300 6100 6d00 7000 6100 6900 6700  ..C.a.m.p.a.i.g.
0000010: 6e00 0900 4300 7500 7200 7200 6500 6e00  n...C.u.r.r.e.n.
0000020: 6300 7900 0900 4200 7500 6400 6700 6500  c.y...B.u.d.g.e.

The byte order markffee at the start suggests the file encoding is little endian UTF-16, and the 00 bytes at every other position back this up.

This would suggest that you should be able to do this:

However that gives me invalid byte sequence in UTF-16LE (ArgumentError) coming from inside the CSV library. I think this is due to IO#gets only returning a single byte for some reason when faced with the BOM when called in CSV, resulting in the invalid UTF-16.

You can get CSV to strip of the BOM, by using bom|utf-16-le as the encoding:

You might prefer to convert the string to a more familiar encoding instead, in which case you could do:

CSV.foreach('./testfile.csv', :encoding => 'utf-16le:utf-8') do |row| ...

Both of these appear to work okay.

Not only spot on, but very educational as well. Top job - thanks!

ruby - Parsing a CSV file using different encodings and libraries - St...

ruby parsing csv google-adwords
Rectangle 27 1

Assuming you have CSV in following format

Zipcode Network ID Network Name Zone New Network? Display Name 64024 275 Kansas City 2 No Kansas City 64034 275 Kansas City 2 No Kansas City

You can user FasterCSV; If you have headers in your csv then you can specify it headers => true what you can do is to fetch data from row by row using FasterCSV,as given below

FasterCSV.foreach(path_to_file, { :headers => true, :row_sep => :auto }) do |row|

Each time you iterate the csv you would get row from your CSV file, now you already know that column 2 has "network_id" header and column 3 has "network name" header so you can easily give network_id = row[2], network_name = row[3]

Ruby csv file related operations - Stack Overflow

ruby
Rectangle 27 286

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end

You can put it in a Rake task, or in a controller action, or anywhere you like....

It worked perfectly. However I have a beginner-level question - when I tried to browse described methods in Ruby and Rails API documentation I was unable to find them on place (I looked on official Ruby and Rails sites, API docs). E.g. I couldn't find what object returns CSV.parse(), I didn't find to_hash() and with_indifferent_access() methods... Maybe I looked in wrong place or missed some basic principle on how to traverse Ruby & Rails API docs. Can anyone share the best practice how to read Ruby API docs?

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

Ruby on Rails - Import Data from a CSV file - Stack Overflow

ruby-on-rails csv import
Rectangle 27 284

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end

You can put it in a Rake task, or in a controller action, or anywhere you like....

It worked perfectly. However I have a beginner-level question - when I tried to browse described methods in Ruby and Rails API documentation I was unable to find them on place (I looked on official Ruby and Rails sites, API docs). E.g. I couldn't find what object returns CSV.parse(), I didn't find to_hash() and with_indifferent_access() methods... Maybe I looked in wrong place or missed some basic principle on how to traverse Ruby & Rails API docs. Can anyone share the best practice how to read Ruby API docs?

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

Ruby on Rails - Import Data from a CSV file - Stack Overflow

ruby-on-rails csv import
Rectangle 27 284

require 'csv'    

csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
  Moulding.create!(row.to_hash)
end

You can put it in a Rake task, or in a controller action, or anywhere you like....

It worked perfectly. However I have a beginner-level question - when I tried to browse described methods in Ruby and Rails API documentation I was unable to find them on place (I looked on official Ruby and Rails sites, API docs). E.g. I couldn't find what object returns CSV.parse(), I didn't find to_hash() and with_indifferent_access() methods... Maybe I looked in wrong place or missed some basic principle on how to traverse Ruby & Rails API docs. Can anyone share the best practice how to read Ruby API docs?

Ruby on Rails - Import Data from a CSV file - Stack Overflow

ruby-on-rails csv import
Rectangle 27 2

Check out the smarter_csv Gem, which has special options for handling huge files by reading data in chunks.

It also returns the CSV data as hashes, which can make it easier to insert or update the data in a database.

Why is Ruby CSV file reading very slow? - Stack Overflow

ruby csv
Rectangle 27 54

CSV.foreach(data_file, headers: true) do |row|
  puts row.inspect # hash
end

From there, you can manipulate the hash however you like.

(Tested with Ruby 2.0, but I think this has worked for quite a while.)

You say you don't have any headers - could you add a header line to the beginning of the file contents after reading them?

Technically, I think each row is an instance of CSV::Row, which acts like a Hash but doesn't actually inherit Hash.

Thanks for this! I could not find this option and I was banging my head against it quite hard

You can also use CSV.parse(data, headers: true).map(&:to_h) to similar effect, taking into account Jared's note above. This turns your CSV into an array of hashes with headers as keys. You can also want to toss in the option header_converters: :symbol to use symbols as keys instead of the column name as strings.

ruby - Convert CSV file into array of hashes - Stack Overflow

ruby csv multidimensional-array
Rectangle 27 6

It looks like your CSV file was produced from an Excel spreadsheet that has columns grouped like this:

... |        Rushing        |         Passing         | ...
... |Rushes|Gain|Loss|Net|TD|Att|Cmp|Int|Yards|TD|Conv| ...

(Not sure if I restored the groups properly.)

There is no standard tools to work with such kind of CSV files, AFAIK. You have to do the job manually.

  • Read the first line, treat it as first header line.
  • Read the second line, treat it as second header line.
  • Read the third line, treat it as first data line.

Ruby: How can I read a CSV file that contains two headers in Ruby? - S...

ruby-on-rails ruby parsing csv
Rectangle 27 34

require 'csv'

CSV.open('test.csv','w', 
    :write_headers=> true,
    :headers => ["numerator","denominator","calculation"] #< column header
  ) do|hdr|
  1.upto(12){|numerator|
    1.upto(12){ |denominator|
      data_out = [numerator, denominator, numerator/denominator.to_f]
      hdr << data_out
    }
  }
end

If you can't use the w option and you really need the a+ (e.g., the data isn't available all at once), then you could try the following trick:

require 'csv'

column_header = ["numerator","denominator","calculation"]
1.upto(12){|numerator|
  1.upto(12){ |denominator|
    CSV.open('test.csv','a+', 
        :write_headers=> true,
        :headers => column_header
      ) do|hdr|
          column_header = nil #No header after first insertion
          data_out = [numerator, denominator, numerator/denominator.to_f]
          hdr << data_out
        end
  }
}

How to write columns header to a csv file with Ruby? - Stack Overflow

ruby ruby-on-rails-3 csv fastercsv
Rectangle 27 4

I'd recommend using the smarter_csv gem, and manually provide the correct headers:

require 'smarter_csv'
 options = {:user_provided_headers => ["Institution ID","Institution","Game Date","Uniform Number","Last Name","First Name", ... provide all headers here ... ], 
            :headers_in_file => false}
 data = SmarterCSV.process(filename, options)
 data.pop # to ignore the first header line
 data.pop # to ignore the second header line
 # data now contains an array of hashes with your data

Please check the GitHub page for the options, and examples. https://github.com/tilo/smarter_csv

One option you should use is :user_provided_headers , and then simply specify the headers you want in an array. This way you can work around cases like this.

You will have to do data.pop to ignore the header lines in the file.

Thanks I'll take a look!

can you please upload a small sample CSV file somewhere? e.g. as a gist? Should be easy to add a feature to smarter_csv for that.

yes, I can try to auto-merge the two header-lines. The second line is two columns shorter than the first line.. weird

probably the easiest way to do this is to use the :user_provided_headers option in smarter_csv

Ruby: How can I read a CSV file that contains two headers in Ruby? - S...

ruby-on-rails ruby parsing csv
Rectangle 27 3

You'll have to write your own logic. CSV is really just rows and columns, and by itself has no inherent idea of what each column or row really is, it's just raw data. Thus, CSV has no concept or awareness that it has two header rows, that's a human thing, so you'll need to build your own heuristics.

"721","Air Force","09/01/12",

When you start parsing your data, if the first column represents an integer, then, if you convert it to an int and if it's > 0 than you know you're dealing with a valid "row" and not a header.

Thanks this is very helpful. I had looked through the Ruby CSV documentation and didn't see anything so I'm glad to see I'm not going blind!

Ruby: How can I read a CSV file that contains two headers in Ruby? - S...

ruby-on-rails ruby parsing csv