Reading DataFrames with non-UTF8 encoding in Julia

By: perfectionatic

Re-posted from: http://perfectionatic.org/?p=414

Recently I ran into problem where I was trying to read a CSV files from a Scandinavian friend into a DataFrame. I was getting errors it could not properly parse the latin1 encoded names.

I tried running

using DataFrames
dataT=readtable("example.csv", encoding=:latin1)

but the got this error

ArgumentError: Argument 'encoding' only supports ':utf8' currently.

The solution make use of (StringEncodings.jl)[https://github.com/nalimilan/StringEncodings.jl] to wrap the file data stream before presenting it to the readtable function.

f=open("example.csv","r")
s=StringDecoder(f,"LATIN1", "UTF-8")
dataT=readtable(s)
close(s)
close(f)

The StringDecoder generates an IO stream that appears to be utf8 for the readtable function.