}

Python Pandas: read_csv C-engine CParserError: Error tokenizing data

Created:

When using read_csv like this:

df = pd.read\_pickle('faulty\_row.pkl')
df.to\_csv('faulty\_row.csv', encoding='utf8', index=False)
df.read\_csv('faulty\_row.csv', encoding='utf8')
if(typeof \_\_ez\_fad\_position != 'undefined'){\_\_ez\_fad\_position('div-gpt-ad-tutorials\_technology-medrectangle-3-0')};if(typeof \_\_ez\_fad\_position != 'undefined'){\_\_ez\_fad\_position('div-gpt-ad-tutorials\_technology-medrectangle-3-0\_1')}; .medrectangle-3-multi-114{border:none !important;display:block !important;float:none;line-height:0px;margin-bottom:2px !important;margin-left:0px !important;margin-right:0px !important;margin-top:2px !important;min-height:250px;min-width:250px;text-align:center !important;}

You get the following exception:

CParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

Solution 1

You can read the CSV using the python engine then no exception is thrown:

df.read\_csv('faulty\_row.csv', encoding='utf8', engine='python')

Solution 2

If your second-to-last line includes an '\r' break. You can open in universal-new-line mode to solve the error.

if(typeof \_\_ez\_fad\_position != 'undefined'){\_\_ez\_fad\_position('div-gpt-ad-tutorials\_technology-medrectangle-4-0')};if(typeof \_\_ez\_fad\_position != 'undefined'){\_\_ez\_fad\_position('div-gpt-ad-tutorials\_technology-medrectangle-4-0\_1')};if(typeof \_\_ez\_fad\_position != 'undefined'){\_\_ez\_fad\_position('div-gpt-ad-tutorials\_technology-medrectangle-4-0\_2')}; .medrectangle-4-multi-137{border:none !important;display:block !important;float:none;line-height:0px;margin-bottom:2px !important;margin-left:0px !important;margin-right:0px !important;margin-top:2px !important;min-height:250px;min-width:300px;text-align:center !important;}pd.read\_csv(open('test.csv','rU'), encoding='utf-8', engine='c')