Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV write has SEGV when trying to write data 2GB or larger #129409

Closed
bdrosen96 opened this issue Jan 28, 2025 · 4 comments
Closed

CSV write has SEGV when trying to write data 2GB or larger #129409

bdrosen96 opened this issue Jan 28, 2025 · 4 comments
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@bdrosen96
Copy link

bdrosen96 commented Jan 28, 2025

Crash report

What happened?

import csv


bad_size = 2 * 1024 * 1024 * 1024 + 1
val = 'x' * bad_size


print("Total size of data {}".format(len(val)))
for size in [2147483647, 2147483648, 2147483649]:
    data = val[0:size]
    print("Trying to write data of size {}".format(len(data)))
    with open('dump.csv', 'w', newline='') as csvfile:
        spamwriter = csv.writer(csvfile, delimiter=',',
                                quotechar='|', quoting=csv.QUOTE_MINIMAL)
        spamwriter.writerow([data])
 python dump2.py 
Total size of data 2147483649
Trying to write data of size 2147483647
Trying to write data of size 2147483648
Segmentation fault (core dumped)

This happens with both 3.10 and 3.12

When I reproduce this with python-dbg inside of gdb I see the following:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737352495104) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737352495104) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737352495104, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff7c42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff7c287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff7c2871b in __assert_fail_base (fmt=0x7ffff7ddd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x814bed "index >= 0", file=0x7ee008 "../Include/cpython/unicodeobject.h", line=318, 
    function=<optimized out>) at ./assert/assert.c:92
#6  0x00007ffff7c39e96 in __GI___assert_fail (assertion=assertion@entry=0x814bed "index >= 0", 
    file=file@entry=0x7ee008 "../Include/cpython/unicodeobject.h", line=line@entry=318, 
    function=function@entry=0x97ae28 <__PRETTY_FUNCTION__.4.lto_priv.56> "PyUnicode_READ") at ./assert/assert.c:101
#7  0x00000000006d46c3 in PyUnicode_READ (index=-2147483648, data=0x7ffe772fd058, kind=1)
    at ../Include/cpython/unicodeobject.h:318
#8  join_append_data (self=self@entry=0x7ffff74a0050, field_kind=field_kind@entry=1, 
    field_data=field_data@entry=0x7ffe772fd058, field_len=field_len@entry=2147483648, quoted=quoted@entry=0x7fffffffd0ec, 
    copy_phase=copy_phase@entry=0) at ../Modules/_csv.c:1108
#9  0x00000000006d49ea in join_append (self=self@entry=0x7ffff74a0050, 
    field=field@entry='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx', quoted=<optimized out>, quoted@entry=0) at ../Modules/_csv.c:1213
#10 0x00000000006d4c9a in csv_writerow (self=self@entry=0x7ffff74a0050, 
    seq=seq@entry=['xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx']) at ../Modules/_csv.c:1303
#11 0x000000000062b002 in _PyEval_EvalFrameDefault (tstate=0xcd8d80 <_PyRuntime+475008>, frame=0x7ffff7fb0020, throwflag=0)
    at Python/bytecodes.c:3094

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.12.8 (main, Dec 4 2024, 08:54:12) [GCC 11.4.0]

Linked PRs

@bdrosen96 bdrosen96 added the type-crash A hard crash of the interpreter, possibly with a core dump label Jan 28, 2025
@ZeroIntensity ZeroIntensity added extension-modules C modules in the Modules dir 3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes labels Jan 28, 2025
@bdrosen96
Copy link
Author

I think the issue is here:

https://github.com/python/cpython/blob/3.12/Modules/_csv.c#L1077

this uses an int for variable i, which is likely 32 bits while Py_ssize_t is 64 bits so it likely overflows.

@bdrosen96
Copy link
Author

When I patched it locally and built from source it fixes the problem:

--- Modules/_csv.c	2024-02-06 15:19:44.000000000 -0500
+++ Modules/_csv.c.new	2025-01-28 11:39:16.165889509 -0500
@@ -1072,7 +1072,7 @@
                  int copy_phase)
 {
     DialectObj *dialect = self->dialect;
-    int i;
+    Py_ssize_t i;
     Py_ssize_t rec_len;
 
 #define INCLEN \
 python dump2.py 
Total size of data 2147483649
Trying to write data of size 2147483647
Trying to write data of size 2147483648
Trying to write data of size 2147483649

@sobolevn
Copy link
Member

sobolevn commented Jan 28, 2025

I agree, since i is only used to be passed to PyUnicode_READ which accepts Py_ssize_t.

srinivasreddy added a commit to srinivasreddy/cpython that referenced this issue Jan 29, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 29, 2025
…than 2GB in CSV file (pythonGH-129413)

(cherry picked from commit 97b0ef0)

Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్  రెడ్డి) <thatiparthysreenivas@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jan 29, 2025
…than 2GB in CSV file (pythonGH-129413)

(cherry picked from commit 97b0ef0)

Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్  రెడ్డి) <thatiparthysreenivas@gmail.com>
@sobolevn
Copy link
Member

Thanks for the report and detailed analysis, @bdrosen96!
Thanks to @srinivasreddy for the PR.

sobolevn pushed a commit that referenced this issue Jan 29, 2025
… than 2GB in CSV file (GH-129413) (#129437)

gh-129409: Fix Integer overflow -  SEGV while writing data more than 2GB in CSV file (GH-129413)
(cherry picked from commit 97b0ef0)

Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్  రెడ్డి) <thatiparthysreenivas@gmail.com>
sobolevn pushed a commit that referenced this issue Jan 29, 2025
… than 2GB in CSV file (GH-129413) (#129436)

gh-129409: Fix Integer overflow -  SEGV while writing data more than 2GB in CSV file (GH-129413)
(cherry picked from commit 97b0ef0)

Co-authored-by: Srinivas Reddy Thatiparthy (తాటిపర్తి శ్రీనివాస్  రెడ్డి) <thatiparthysreenivas@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes 3.13 bugs and security fixes 3.14 new features, bugs and security fixes extension-modules C modules in the Modules dir type-crash A hard crash of the interpreter, possibly with a core dump
Projects
None yet
Development

No branches or pull requests

3 participants