$ curl -s http://jamesabbottdd.com/examples/testfile.csv | xxd | head -n3
0000000: fffe 4300 6100 6d00 7000 6100 6900 6700 ..C.a.m.p.a.i.g.
0000010: 6e00 0900 4300 7500 7200 7200 6500 6e00 n...C.u.r.r.e.n.
0000020: 6300 7900 0900 4200 7500 6400 6700 6500 c.y...B.u.d.g.e.
CSV.foreach('./testfile.csv', :encoding => 'utf-16le:utf-8') do |row| ...
Both of these appear to work okay.
However that gives me invalid byte sequence in UTF-16LE (ArgumentError) coming from inside the CSV library. I think this is due to IO#gets only returning a single byte for some reason when faced with the BOM when called in CSV, resulting in the invalid UTF-16.
Not only spot on, but very educational as well. Top job - thanks!
The byte order markffee at the start suggests the file encoding is little endian UTF-16, and the 00 bytes at every other position back this up.
This would suggest that you should be able to do this:
You can get CSV to strip of the BOM, by using bom|utf-16-le as the encoding:
You might prefer to convert the string to a more familiar encoding instead, in which case you could do: