Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carver and file signature for Shareaza download control files (.sd) #1816

Closed
wladimirleite opened this issue Aug 15, 2023 · 15 comments · Fixed by #1826
Closed

Carver and file signature for Shareaza download control files (.sd) #1816

wladimirleite opened this issue Aug 15, 2023 · 15 comments · Fixed by #1826
Assignees

Comments

@wladimirleite
Copy link
Member

While working on a CSAM case with very little concrete evidence, I used an external carver and a parser written by @felipecampanini, who is working on another similar case, and found very important Shareaza download control files (SDL).
A really great "byte-level" analysis by @felipecampanini!

These SDL files have information about ongoing Shareaza downloads.
In cases that the "main" Shareaza library files (like "Liibrary1.dat", already parsed by IPED) were lost, these SDLs can be very important.

Another observation is that these SDL files are "one per download file", so they are small, while "Library1.dat" contains all the library entries in a single file.
In my case, all SDLs were recovered from file slacks. The suspect wiped the unallocated space, but the tool he used didn't seem to clean the slacks.

This issue is about the carver of these "Shareaza Downloads". 
I believe @felipecampanini is still working on the standalone parser we are using for now.
When it is finalized, we can create another issue to implement an IPED parser.

@lfcnassif
Copy link
Member

Awesome work @felipecampanini and @tc-wleite! I don't remember well if it is Shareaza behavior, but in the past there were some download/sharing files app that used to put download info at the end of the file being downloaded and, at the end, the file used to be truncated to its size, so the download info used to be left in the file slack.

@wladimirleite wladimirleite changed the title Carver for Shareaza download control files Carver and file signature for Shareaza download control files (.sd) Aug 17, 2023
wladimirleite added a commit that referenced this issue Aug 17, 2023
@wladimirleite
Copy link
Member Author

wladimirleite commented Aug 17, 2023

@lfcnassif, a question about the current carving implementation for types that have only the header definition and a max length (i.e. no footer and no length information inside the file).
Suppose the header signature is xyz and maxLength is 12.
If we have the following sequence of bytes,

MNOPxyz0123xyzABCDEFGHIJKLM
    |----------|
    xyz0123xyzAB  <-- First carved item
           |----------|
           xyzABCDEFGHI  <-- Second carved item

Two entries will be carved, as expected, but the first one will contain the beginning of the second one, which I don't think would be the best guess.
To avoid missing part of the carved item, maxLength has to be set to a relatively large value (for the given file type). And the larger maxLength is, the higher are the chances this situation happens.

I changed (locally to test, didn't commit) this behavior (in AbstractCarver) to limit the length of these carved items (with maxLength only), so they end when a new hit is found (for the save carver type).
In this example, the first carved item would be smaller, stopping when the second item starts.

MNOPxyz0123xyzABCDEFGHIJKLM
    |-----|
    xyz0123  <-- First carved item
           |----------|
           xyzABCDEFGHI  <-- Second carved item

If I didn't miss anything, currently only ARES MOV and FLV carvers use the default carver, with header and maximum length only.
For containers (like a ZIP), the current behavior would make sense (a ZIP inside another). But for types that the carved item end is not clear, I think the second behavior would make more sense.

@lfcnassif, is there a reason to keep only the first behavior?
What do you think about changing to second?
Another option would be to have an explicit parameter (in the carver configuration) to choose between these two behaviors.

@lfcnassif
Copy link
Member

lfcnassif commented Aug 17, 2023

@lfcnassif, is there a reason to keep only the first behavior?
What do you think about changing to second?
Another option would be to have an explicit parameter (in the carver configuration) to choose between these two behaviors.

Hi @tc-wleite. Honestly I don't remember well if there was a specific reason, I implemented the carving algorithm years ago, I think the idea was to carve as much info as possible, not sure...

Not only ordinary containers like ZIP can have multiple headers of embedded items, but non common "containers" could also have, for instance, JPEG usually has thumbnails with multiple resolutions into Exif data.

I think above may also happen with files without a clear footer. Of course, this is file type specific and the new proposed approach seems useful. This would be a sensible change, ideally needing to be tested against a reasonable set of images...

Since I think the current behavior can be useful depending on file type and the change may cause some regression, my vote is to have a configurable parameter.

A somewhat related old idea was to have a configurable parameter to recover files which have a footer in the configuration even if the footer wasn't found in the max range specified. This would result in more carved files, can be good, but also in more garbage, of course bad...

@lfcnassif
Copy link
Member

lfcnassif commented Aug 17, 2023

Just to be clear, my opinion can be wrong, just testing would tell us what is the best default behavior, but a configurable parameter seems interesting...

@lfcnassif
Copy link
Member

MOV

I think MOV is from QuickTime family and is handled by the same carver that handles MP4, 3GP and QT movies.

@wladimirleite
Copy link
Member Author

I think MOV is from QuickTime family and is handled by the same carver that handles MP4, 3GP and QT movies.

Sorry, MOV uses a custom carver indeed.

@wladimirleite
Copy link
Member Author

I agree that the parameter would leave things more flexible and clearer.
I will change a bit the code I written to keep the current behavior (as default) or not, depending on a parameter.
What could be its name? breakOnNextHeader?

@lfcnassif
Copy link
Member

What could be its name? breakOnNextHeader?

Fine to me! Or maybe stopOnNextHeader.

@lfcnassif
Copy link
Member

lfcnassif commented Aug 18, 2023

Just to be clear, my opinion can be wrong, just testing would tell us what is the best default behavior, but a configurable parameter seems interesting...

Thinking better, possibly current FLV (and Shareaza) carving are merging 2 different files starting at the first header... For FLV this should not be hard to detect, since the video content, if playable, would change abruptly. Maybe we should run tests to evaluate this...

@lfcnassif
Copy link
Member

For FLV this should not be hard to detect

Actually it would be easy, the number of FLV files smaller than max size would increase. Today we can already get some, if parent item ends before reaching max size.

@wladimirleite
Copy link
Member Author

wladimirleite commented Aug 18, 2023

@lfcnassif, let me explain why I considered this other behavior (stopOnNextHeader).
These SD files are relatively small, but their sizes vary a lot (I have samples here of active files ranging from 200 bytes to 60 KB).
In the case when I found the carved files, with the current behavior, a single carved item sometimes contains 2-4 original SDs. The file names are stored in Unicode, so even without a parser IPED extracted them. And the result mixed different file names/paths, which is odd. Obviously, when the parser is integrated, that issue would be solved.

Another inconvenient is that from the carved file with multiple SDs, other SDs (after the first one) are carved. Something like slack >> carved-123.sd >> carved-23.sd >> carved-3.sd, which can be confusing.

EDIT: Sorry, I repeated the test with more attention and in fact what happens is that 3 items were created, like slack >> carved-123.sd, slack >> carved-23.sd and slack >> carved-3.sd, i.e. the 3rd original SD appeared 3 times in the carved content.

@lfcnassif
Copy link
Member

Thanks @tc-wleite I got it. I realized different FLV files could be merged together (as you explained before).

Another inconvenient is that from the carved file with multiple SDs, other SDs (after the first one) are carved. Something like slack >> carved-123.sd >> carved-23.sd >> carved-3.sd, which can be confusing.

This is odd, because we have a rule to skip carving on carved files:

protected boolean isToProcess(IItem evidence) {
if (evidence.isCarved() || evidence.getExtraAttribute(BaseCarveTask.FILE_FRAGMENT) != null
|| !carverConfig.isToProcess(evidence.getMediaType())) {
return false;
}
return true;
}

@wladimirleite
Copy link
Member Author

This is odd, because we have a rule to skip carving on carved files:

Hmmm, I thought we have that, but didn't try to find in the code to check.
Let me check what is going on.

@wladimirleite
Copy link
Member Author

I am sorry @lfcnassif , I repeated the test again (without stopOnNextHeader) and carving from carved items didn't happen.
I edited my previous message.

@lfcnassif
Copy link
Member

lfcnassif commented Aug 19, 2023

Don't worry @tc-wleite, thanks for testing again. After you finish the stopOnNextHeader option, I think we can test FLV carving at least with it and compare with current results, I can do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants