-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mach: Support archive entries in fat binaries #322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've been hesitant in the past to add any more binary files to goblin, but it's probably ok. However, could you strip the binaries to make sure they're minimally sized? 110kb seems like a lot to me for single function.
I'd also suggest changing the C lib as suggested in my comment.
@@ -0,0 +1,7 @@ | |||
// This is a file use to compile some of the binaries in this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just noticing we have both assets and tests/bin/
but whatever, not your fault
examples/dyldinfo.rs
Outdated
@@ -125,12 +126,18 @@ fn print_multi_arch( | |||
if let Some((cputype, _)) = mach::constants::cputype::get_arch_from_flag(&arch) { | |||
for bin in multi_arch.into_iter() { | |||
match bin { | |||
Ok(bin) => { | |||
Ok(MultiArchEntry::MachO(bin)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the name MultiArchEntry
, is this some kind of "prior art", e.g., did you see this in LLVM, or in otool/lipo references? Wondering about different possible names, for example, I'll show enum expanded with the Macho variant:
Entry::MachO
MultiBin::MachO
MultiEntry::MachO
ArchEntry::MachO
BinaryKind::MachO
EntryKind::MachO
I personally like what you have, and 1, 4, and 5. Do you have any thoughts on names here?
This is not the most interesting to talk about names here, but since this will become a public API, will be good to take small time to think about alternatives :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just made it up. Having a look around, I can't find reference to them in LLVM. In lipo
, the man pages say this:
lipo supports a number of commands for creating universal files from single-architecture files, extracting single-architecture files from universal files, and displaying architecture information.
So perhaps SingleArch
, then you could get a collection of SingleArch
's from a MultiArch
?
Another alternative would be something to do with "member" because multi-arch binaries are expanded archives and archives are made up of "members" (or at least that's the language ar
uses).
Of the options you listed, I quite like 4. I think that the name indicating that this is a particular arch is good.
src/mach/mod.rs
Outdated
/// Iterator over every `MachO` binary contained in this `MultiArch` container | ||
pub struct MachOIterator<'a> { | ||
/// Iterator over every entry contained in this `MultiArch` container | ||
pub struct MultiArchEntryIterator<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto name here; I'm glad you named this after the Variant above, so if that changes, we should keep it consistent like you've done here
src/mach/mod.rs
Outdated
#[test] | ||
fn parse_multi_arch_of_macho_binaries() { | ||
// Create via: | ||
// gcc -arch arm64 -o /tmp/hello_world_arm hello_world.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang is likely more accurate here on macos systems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair call, gcc
is really clang
on my laptop apparently 😁
$ gcc --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: arm64-apple-darwin21.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
} | ||
|
||
#[test] | ||
fn parse_multi_arch_of_archives() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great
assets/hello_world.c
Outdated
// directory for testing purposes. | ||
#include <stdio.h> | ||
|
||
int main(int argc, char *argv[]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the best example file for .so + .a files;
I'd expect something more like:
extern int foo(int x) {
return x * 2;
}
Since it's meant to have some public symbol that other downstream binaries/libraries link against and use.
MachO does allow multi-arch executables (i.e., something with a main and the executable flags set in macho headers, etc.), but I'd rather test the .so + .a versions, so executables are just .so's with special header values.
src/mach/mod.rs
Outdated
for entry in fat.into_iter() { | ||
let entry = entry.expect("failed to read entry"); | ||
match entry { | ||
MultiArchEntry::Archive(_) => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to assert existence of > 1 symbol perhaps, just as basic sanity test?
src/mach/mod.rs
Outdated
for entry in fat.into_iter() { | ||
let entry = entry.expect("failed to read entry"); | ||
match entry { | ||
MultiArchEntry::MachO(_) => (), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto might be good to assert > 1 symbol as sanity test
this looks great, other than some minor comments i had |
re this CI failure: https://github.com/m4b/goblin/runs/7601981804?check_suite_focus=true#step:10:52 I think you'll have to make macho32 + macho64 depend on archive feature? |
Thanks for the quick review! I don’t think I’ll have a chance to come back to this today. Hopefully tomorrow though |
No rush :) |
2733aa0
to
0c78713
Compare
src/lib.rs
Outdated
@@ -201,6 +201,32 @@ pub mod container { | |||
} | |||
} | |||
|
|||
/// Takes a reference to the first 16 bytes of the total bytes slice and convert it to an array for `peek_bytes` to use. | |||
/// Returns None if bytes's length is less than 16. | |||
pub fn take_hint_bytes(bytes: &[u8]) -> Option<&[u8; 16]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this public, rather than working through the feature combination that would stop the compiler complaining about it being unused, because I felt it'd be a bit more maintainable - totally your call though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I should have caught this earlier, can you make this private and then just do:
#[allow(unused)]
to silence any warnings for any of the feature combos.
I'd like to have bare pub functions as small as possible, and etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍
I've tried stripping the binaries but I get this error:
|
I think I've addressed all you comments, except for the naming of |
Ok I think I like |
0c78713
to
dd0b76b
Compare
I've updated it to be |
Actually, just playing around with this branch on my linker project and I've broken something in the code that looks at the hint bytes to determine the file type. |
bb9bcc1
to
9d7dd58
Compare
Okay. The issue was that I'd missed the check for fat binaries in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok last thing is to fix the spelling error and remove pub on the hint function as suggested, and this is good to go. thanks for your patience!
src/lib.rs
Outdated
@@ -201,6 +201,32 @@ pub mod container { | |||
} | |||
} | |||
|
|||
/// Takes a reference to the first 16 bytes of the total bytes slice and convert it to an array for `peek_bytes` to use. | |||
/// Returns None if bytes's length is less than 16. | |||
pub fn take_hint_bytes(bytes: &[u8]) -> Option<&[u8; 16]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I should have caught this earlier, can you make this private and then just do:
#[allow(unused)]
to silence any warnings for any of the feature combos.
I'd like to have bare pub functions as small as possible, and etc.
src/mach/mod.rs
Outdated
/// Iterator over every `MachO` binary contained in this `MultiArch` container | ||
pub struct MachOIterator<'a> { | ||
/// Iterator over every entry contained in this `MultiArch` container | ||
pub struct SingelArchIterator<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is mispelled, should be SIngle not Singel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 good catch, thanks
Multi-arch containers can be made up of archives or Mach-O binaries. This adds support for archives. It is a breaking change because previously the `MachO` struct was returned and now we're returning a new enum: `SingleArch`. This required some refactoring of the `lib.rs` file to share the required functions and data structures for parsing the hint at the top of files. `take_hint_bytes`, `Hint` and `HintData` dont't require any special features but I think were inside the `if_everything!` because that's the only case they were used in. I did expand the API by making `take_byte_hints` public, this was mainly because I thought it was better solution than trying to maintain the various combinations of features required to stop the compiler warning about the function being unused. It's also a function that may be useful for goblin users. resolves m4b#320
This tests parsing fat binaries made up of Mach-O binaries and archives. I've checked in the binaries to make testing easier as they're quite small (both are built from the hello_world.c file in the same directory). Above the tests themselves are instructions for how to compile the binaries they use.
9d7dd58
to
4021007
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@nick96 thanks so much for this PR and your patience! Are you ok to wait a bit for a release (since this is a breaking change, as I mentioned, I like to roll them up if possible, though ideally they become less and less) |
No worries on both counts 😊. Makes sense to batch them up. |
@nick96 thank you so much for your patience, this is now released in 0.6.0 :) |
Adds support for archives entries in a fat binary. I ended up going down the breaking change route because I wasn't sure of a good way of doing it without one. Super happy to do another pass on this if you think there is a way!
Basically, there's just an enum
MultiArchEntry
with variants for aMachO
or anArchive
. All the functions for retrieving entries inMultiArch
now return this enum, rather thanMachO
. This does mean you could technically represent a fat binary with a mixture of binaries and archives even though I'm not sure if that is actually allowed.Testing-wise, I've added tests for parsing fat binaries (both the binary and archive variants). They require binaries to be checked in which I've done. I thought it was easiest to check them in as they're pretty small and I saw there was already a DLL file checked in. Alternatively, we could build them as part of the test if you'd prefer, though that would mean that the tests could only run on MacOS because I don't think
lipo
exists on Linux.