-
Notifications
You must be signed in to change notification settings - Fork 252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyRDP Convert to MP4: out of memory issues and corrupted video #352
Comments
https://ffmpeg.org/ffmpeg-formats.html#mov_002c-mp4_002c-ismv:
Specifically:
😮 |
After video encoding changes but before I started backporting Alex's patches. After a couple of minutes, the process started to consume around 800Mb. After 1 hour or so it was still around this use. After 3 hours it started increasing but sometimes it drops significantly. Here's a screenshot after 4h30m at around 1.8GB RES: |
Tried Pympler without great results. The TK reference browser didn't work. The summary views were very similar to gumpy's and didn't speak for themselves. Tried fil-profiler. Segfaulted and trying to get a backtrace from gdb annoyed me greatly (args, python dev symbols, etc.) so I let it go. Looked at valgrind and it felt like an adventure (compile python, using specific tooling to avoid noise in valgrind, etc.) so I skipped. Tried tracemalloc. Adding instrumentation around the diff --git a/pyrdp/convert/ReplayConverter.py b/pyrdp/convert/ReplayConverter.py
index 0e66ccd..98f3a2b 100644
--- a/pyrdp/convert/ReplayConverter.py
+++ b/pyrdp/convert/ReplayConverter.py
@@ -26,8 +26,23 @@ class ReplayConverter(Converter):
print("The input file is already a replay file. Nothing to do.")
sys.exit(1)
- for event, _ in progressbar(replay):
+ import tracemalloc
+ snapshot1 = tracemalloc.take_snapshot()
+ for i, (event, _) in enumerate(replay):
+ #for event, _ in progressbar(replay):
+
handler.onPDUReceived(event)
+ if i % 100 == 0:
+ snapshot2 = tracemalloc.take_snapshot()
+ top_stats = snapshot2.compare_to(snapshot1, 'lineno')
+
+ print("[ 100 events later ... Top 3 memory differences ]")
+ #for stat in top_stats[:10]:
+ for stat in top_stats[:3]:
+ print(stat)
+ snapshot1 = snapshot2
+
+
print(f"\n[+] Succesfully wrote '{outputPath}'")
handler.cleanup() and then calling the conversion using this shim: #!/usr/bin/python3
from pathlib import Path
import tracemalloc
from pyrdp.convert.ReplayConverter import ReplayConverter
tracemalloc.start()
converter = ReplayConverter(Path('/home/olivier/Documents/gosecure/src/pyrdp/pyrdp_output/replays/rdp_replay_20210826_12-15-33_512_Stephen215343.pyrdp'),
'', 'mp4')
converter.process()
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("[ Top 10 ]")
for stat in top_stats[:10]:
print(stat) With the above we get a top 3 changes to memory consumption in the event processing tight loop every 100 events. This gave the following trace (with many many lines omitted for brievety):
Line 62 of buf = bytearray(width * height * 2) # <-- this is line 62
rle.bitmap_decompress(buf, width, height, data, 2)
image = QImage(buf, width, height, QImage.Format_RGB16) Why would this part of the code keep growing? If This It did:
Qt is no longer a top memory changer and there are no signs of uncontrolled memory growth. I read the documentation on Python C Extensions and realized that we need to manage the lifetime of the buffer variables passed to the
-- https://docs.python.org/3/c-api/arg.html#c.PyArg_ParseTuple Adding Bad: Good: I thought using a passed buffer was not pythonic so I wrote an alternate implementation of the diff --git a/ext/rle.c b/ext/rle.c
index 17e3226..35dbdb2 100644
--- a/ext/rle.c
+++ b/ext/rle.c
@@ -24,6 +24,7 @@
/* indent is confused by this file */
/* *INDENT-OFF* */
+#define PY_SSIZE_T_CLEAN
#include <Python.h>
/* Specific rename for RDPY integration */
@@ -918,6 +919,41 @@ bitmap_decompress(uint8 * output, int width, int height, uint8* input, int size,
/* *INDENT-ON* */
+static PyObject*
+bitmap_decompress_wrapper_ng(PyObject* self, PyObject* args)
+{
+ unsigned char* input;
+ Py_ssize_t input_len;
+ int width = 0, height = 0, bpp = 0;
+ unsigned char* dest;
+ int dest_size;
+ PyObject *result;
+
+ if (!PyArg_ParseTuple(args, "y#iii", &input, &input_len, &width, &height, &bpp))
+ return NULL;
+
+ dest_size = width * height * bpp;
+ dest = PyMem_Malloc (dest_size);
+ if (dest == NULL)
+ {
+ return PyErr_NoMemory();
+ }
+
+ if(bitmap_decompress((uint8*)dest, width, height, (uint8*)input, input_len, bpp) == False) {
+ return NULL;
+ }
+
+ result = PyByteArray_FromStringAndSize(dest, dest_size);
+
+ PyMem_Free (dest);
+ if (result == NULL)
+ {
+ return PyErr_NoMemory ();
+ }
+
+ return result;
+}
+
static PyObject*
bitmap_decompress_wrapper(PyObject* self, PyObject* args)
{
@@ -936,6 +972,7 @@ bitmap_decompress_wrapper(PyObject* self, PyObject* args)
static PyMethodDef rle_methods[] =
{
{"bitmap_decompress", bitmap_decompress_wrapper, METH_VARARGS, "decompress bitmap from microsoft rle algorithm."},
+ {"bitmap_decompress_ng", bitmap_decompress_wrapper_ng, METH_VARARGS, "decompress bitmap from microsoft rle algorithm."},
{NULL, NULL, 0, NULL}
};
Remember that everytime you make a change to This is where I go crazy: I added code that ran diff --git a/pyrdp/ui/qt.py b/pyrdp/ui/qt.py
index 05e07c4..64bc367 100644
--- a/pyrdp/ui/qt.py
+++ b/pyrdp/ui/qt.py
@@ -60,8 +60,15 @@ def RDPBitmapToQtImage(width: int, height: int, bitsPerPixel: int, isCompressed:
elif bitsPerPixel == 16:
if isCompressed:
buf = bytearray(width * height * 2)
+ # original
rle.bitmap_decompress(buf, width, height, data, 2)
- image = QImage(buf, width, height, QImage.Format_RGB16)
+ #image = QImage(buf, width, height, QImage.Format_RGB16)
+ # new
+ buf2 = rle.bitmap_decompress_ng(data, width, height, 2)
+ if buf != buf2:
+ # never triggered
+ import ipdb; ipdb.set_trace()
+ image = QImage(buf2, width, height, QImage.Format_RGB16)
else:
image = QImage(data, width, height, QImage.Format_RGB16).transformed(QMatrix(1.0, 0.0, 0.0, -1.0, 0.0, 0.0))
See the resulting video. I can share the replay file privately if there's interest. rdp_replay_20210826_12-15-33_512_Stephen215343.mp4The eventual release of the buffer by python's GC seem to corrupt the GDI drawing cache. But how can this happen given that it's a new object (QImage) that is returned for caching!? I also tried to make an explicit full copy of the bytearray and this didn't work either. I'm out of options. Maybe the |
from: https://doc.qt.io/qtforpython-5/PySide2/QtGui/QImage.html Tried to get the I added a reference to the buffer to the Bitmap cache. Reverted the code to use the previous working After looking for known QImage memory leaks for a while, I decided to do a minimal test: import rle
import tracemalloc
from PySide2.QtGui import QImage
tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot()
width, height = (800, 600)
for i in range(1000):
buf = bytearray(width * height * 2)
#rle.bitmap_decompress(buf, width, height, bytearray(120), 2)
#image = QImage.fromData(buf, QImage.Format_RGB16)
image = QImage(buf, width, height, QImage.Format_RGB16)
if i % 100 == 0:
snapshot2 = tracemalloc.take_snapshot()
top_stats = snapshot2.compare_to(snapshot1, 'lineno')
print("[ 100 events later ... Top 3 memory differences ]")
for stat in top_stats[:3]:
print(stat)
snapshot1 = snapshot2 Without the Without
With
|
With the reference to the buffer in the BitmapCache, I tried again the Now, back to the initial memory issue, in the BitmapCache update I also added an explicit cache eviction on key updates. I'm not sure this is required. Let's benchmark. Converting Stephen replay with BitmapCache (QImage, buffer), old
Converting Stephen replay with BitmapCache (QImage, buffer), new
A drastic reduction on top 2 consumers. A 15x reduction in memory allocations for this conversion job! Converting Stephen replay with BitmapCache (QImage, buffer), new
Python does its job. Explicit eviction is not required. Dropping that code. |
Returns a bytearray and takes an immutable bytes object as input. Reference counting was checked to be valid. See #352 for the whole memory analysis
The code is in #353. I'm testing with a jumbo replay now. |
The conversion of an 11-hours long replay to MP4 completed succesfully this morning. I can watch the video, seek throught it. The conversion took 13.75 hours (21 CPU hours!) and I never saw RAM usage go beyond 685MB of RAM. The capture is 973MB and the resulting video is 3.4GB. The player seem to have benefited from these improvements as well. However, seeking past 6h can get slow until you get to recoverable UI freezes. Once caught up, scaling + 10x speed replaying was smooth. RAM usage below 400MB. |
Fixed with the merge of #353. |
I can reliably make the
pyrdp-convert.py
tool go out of memory:My OOM logs say:
So 10GB of RAM for a 1GB replay file:
@alxbl created patches to mitigate these issues. I'll take a look at them.
Additionally, the converted MP4 is unplayable, its
moov
atom was not written out to disk so the resulting file is useless. I will try to address that problem separately by using a streaming container format instead of a fixed MP4.The text was updated successfully, but these errors were encountered: