You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But in the source CSV there appears to be no problem:
"Kelsey Long";"679 Lindsay Drive Suite 413
Rogersfort, OR 90448";"jessicagarcia@example.com";"566-408-9606x633";"1992-08-29";"Dixon PLC";"Designer, ceramics/pottery";"GB70EVGJ62649904070295";"213141750145257";"2009-06-22"
"Christopher Collins";"5240 Williams Forge Suite 570
Port Gary, WY 31010";"kennedybarbara@example.org";"(578)971-0366x10767";"1977-07-15";"Reed, Edwards and Nguyen";"Engineer, electrical";"GB18OYSE78644896429599";"4767664624172615290";"1973-09-21"
"Ronnie Giles";"USNS Brown
FPO AP 51442";"julia23@example.net";"001-632-763-2460x0516";"1997-11-28";"Holt-Hale";"Patent attorney";"GB19MXNA87198353574367";"30192887443942";"1993-11-30"
And in fact, if you try to load just those three above lines with the header prepended it works without problems:
The bug does not occur with n_threads=1 so I believe this to be a bug in how we split up CSV files for parallel reading.
Ah yes.. we do some looking around to see if we made a valid split. Will have to increase strictness there. If we cannot find valid splits, we fallback to single threaded read.
test.csv works for me also, but this seems to reproduce the problem:
N=1041pl.read_csv(
pl.DataFrame({"foo": ['ABCDE FGHIJ\nKLMNOP'] *N})
.with_row_index()
.write_csv()
.encode()
)
# ComputeError: could not parse `KLMNOP"` as dtype `i64` at column 'index' (column number 1)
Given the following
test.csv
if we try to load it as such:We notice that the 5002th entry is broken up across two rows:
But in the source CSV there appears to be no problem:
And in fact, if you try to load just those three above lines with the header prepended it works without problems:
The text was updated successfully, but these errors were encountered: