Python Pandas: read_csv C-engine CParserError: Error tokenizing data


When using read_csv like this:

df = pd.read_pickle('faulty_row.pkl')
df.to_csv('faulty_row.csv', encoding='utf8', index=False)
df.read_csv('faulty_row.csv', encoding='utf8')

You get the following exception:

CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

Solution 1

You can read the CSV using the python engine then no exception is thrown:

df.read_csv('faulty_row.csv', encoding='utf8', engine='python')

Solution 2

If your second-to-last line includes an '\r' break. You can open in universal-new-line mode to solve the error.

pd.read_csv(open('test.csv','rU'), encoding='utf-8', engine='c')