Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mach: Support archive entries in fat binaries #322

Merged
merged 2 commits into from
Sep 12, 2022

Conversation

nick96
Copy link
Contributor

@nick96 nick96 commented Jul 31, 2022

Adds support for archives entries in a fat binary. I ended up going down the breaking change route because I wasn't sure of a good way of doing it without one. Super happy to do another pass on this if you think there is a way!

Basically, there's just an enum MultiArchEntry with variants for a MachO or an Archive. All the functions for retrieving entries in MultiArch now return this enum, rather than MachO. This does mean you could technically represent a fat binary with a mixture of binaries and archives even though I'm not sure if that is actually allowed.

Testing-wise, I've added tests for parsing fat binaries (both the binary and archive variants). They require binaries to be checked in which I've done. I thought it was easiest to check them in as they're pretty small and I saw there was already a DLL file checked in. Alternatively, we could build them as part of the test if you'd prefer, though that would mean that the tests could only run on MacOS because I don't think lipo exists on Linux.

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've been hesitant in the past to add any more binary files to goblin, but it's probably ok. However, could you strip the binaries to make sure they're minimally sized? 110kb seems like a lot to me for single function.
I'd also suggest changing the C lib as suggested in my comment.

@@ -0,0 +1,7 @@
// This is a file use to compile some of the binaries in this
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just noticing we have both assets and tests/bin/ but whatever, not your fault

@@ -125,12 +126,18 @@ fn print_multi_arch(
if let Some((cputype, _)) = mach::constants::cputype::get_arch_from_flag(&arch) {
for bin in multi_arch.into_iter() {
match bin {
Ok(bin) => {
Ok(MultiArchEntry::MachO(bin)) => {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the name MultiArchEntry, is this some kind of "prior art", e.g., did you see this in LLVM, or in otool/lipo references? Wondering about different possible names, for example, I'll show enum expanded with the Macho variant:

  1. Entry::MachO
  2. MultiBin::MachO
  3. MultiEntry::MachO
  4. ArchEntry::MachO
  5. BinaryKind::MachO
  6. EntryKind::MachO

I personally like what you have, and 1, 4, and 5. Do you have any thoughts on names here?

This is not the most interesting to talk about names here, but since this will become a public API, will be good to take small time to think about alternatives :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made it up. Having a look around, I can't find reference to them in LLVM. In lipo, the man pages say this:

lipo supports a number of commands for creating universal files from single-architecture files, extracting single-architecture files from universal files, and displaying architecture information.

So perhaps SingleArch, then you could get a collection of SingleArch's from a MultiArch?

Another alternative would be something to do with "member" because multi-arch binaries are expanded archives and archives are made up of "members" (or at least that's the language ar uses).

Of the options you listed, I quite like 4. I think that the name indicating that this is a particular arch is good.

src/mach/mod.rs Outdated
/// Iterator over every `MachO` binary contained in this `MultiArch` container
pub struct MachOIterator<'a> {
/// Iterator over every entry contained in this `MultiArch` container
pub struct MultiArchEntryIterator<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto name here; I'm glad you named this after the Variant above, so if that changes, we should keep it consistent like you've done here

src/mach/mod.rs Outdated
#[test]
fn parse_multi_arch_of_macho_binaries() {
// Create via:
// gcc -arch arm64 -o /tmp/hello_world_arm hello_world.c
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang is likely more accurate here on macos systems?

Copy link
Contributor Author

@nick96 nick96 Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair call, gcc is really clang on my laptop apparently 😁

$ gcc --version
Apple clang version 13.1.6 (clang-1316.0.21.2.5)
Target: arm64-apple-darwin21.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

}

#[test]
fn parse_multi_arch_of_archives() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great

// directory for testing purposes.
#include <stdio.h>

int main(int argc, char *argv[]) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the best example file for .so + .a files;

I'd expect something more like:

extern int foo(int x) {
  return x * 2;
}

Since it's meant to have some public symbol that other downstream binaries/libraries link against and use.

MachO does allow multi-arch executables (i.e., something with a main and the executable flags set in macho headers, etc.), but I'd rather test the .so + .a versions, so executables are just .so's with special header values.

src/mach/mod.rs Outdated
for entry in fat.into_iter() {
let entry = entry.expect("failed to read entry");
match entry {
MultiArchEntry::Archive(_) => (),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to assert existence of > 1 symbol perhaps, just as basic sanity test?

src/mach/mod.rs Outdated
for entry in fat.into_iter() {
let entry = entry.expect("failed to read entry");
match entry {
MultiArchEntry::MachO(_) => (),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto might be good to assert > 1 symbol as sanity test

@m4b
Copy link
Owner

m4b commented Jul 31, 2022

this looks great, other than some minor comments i had

@m4b
Copy link
Owner

m4b commented Jul 31, 2022

re this CI failure: https://github.com/m4b/goblin/runs/7601981804?check_suite_focus=true#step:10:52

I think you'll have to make macho32 + macho64 depend on archive feature?

@nick96
Copy link
Contributor Author

nick96 commented Aug 1, 2022

Thanks for the quick review! I don’t think I’ll have a chance to come back to this today. Hopefully tomorrow though

@m4b
Copy link
Owner

m4b commented Aug 1, 2022

No rush :)

@nick96 nick96 force-pushed the multi-arch-with-archives branch from 2733aa0 to 0c78713 Compare August 3, 2022 11:52
src/lib.rs Outdated
@@ -201,6 +201,32 @@ pub mod container {
}
}

/// Takes a reference to the first 16 bytes of the total bytes slice and convert it to an array for `peek_bytes` to use.
/// Returns None if bytes's length is less than 16.
pub fn take_hint_bytes(bytes: &[u8]) -> Option<&[u8; 16]> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this public, rather than working through the feature combination that would stop the compiler complaining about it being unused, because I felt it'd be a bit more maintainable - totally your call though.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I should have caught this earlier, can you make this private and then just do:
#[allow(unused)] to silence any warnings for any of the feature combos.
I'd like to have bare pub functions as small as possible, and etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

@nick96
Copy link
Contributor Author

nick96 commented Aug 3, 2022

However, could you strip the binaries to make sure they're minimally sized? 110kb seems like a lot to me for single function.

I've tried stripping the binaries but I get this error:

strip: error: symbols referenced by indirect symbol table entries that can't be stripped in: /Users/nspain/devel/personal/goblin/assets/hello_world_fat_binaries (for architecture x86_64)
_printf

@nick96
Copy link
Contributor Author

nick96 commented Aug 3, 2022

I think I've addressed all you comments, except for the naming of MultiArchEntry. I'll update that once we reach a decision

@m4b
Copy link
Owner

m4b commented Aug 15, 2022

Ok I think I like SingleArch the best so far, let's proceed with that; in case anyone else is monitoring this PR, please add your opinion too :)

@nick96 nick96 force-pushed the multi-arch-with-archives branch from 0c78713 to dd0b76b Compare August 15, 2022 10:04
@nick96
Copy link
Contributor Author

nick96 commented Aug 16, 2022

I've updated it to be SingleArch. Ready for re-review when you're free!

@nick96
Copy link
Contributor Author

nick96 commented Aug 16, 2022

Actually, just playing around with this branch on my linker project and I've broken something in the code that looks at the hint bytes to determine the file type.

@nick96 nick96 force-pushed the multi-arch-with-archives branch 2 times, most recently from bb9bcc1 to 9d7dd58 Compare August 16, 2022 11:10
@nick96
Copy link
Contributor Author

nick96 commented Aug 16, 2022

Okay. The issue was that I'd missed the check for fat binaries in macho::peek_bytes, fixed now. What I thought was an archive was actually a fat binary 🤦

Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok last thing is to fix the spelling error and remove pub on the hint function as suggested, and this is good to go. thanks for your patience!

src/lib.rs Outdated
@@ -201,6 +201,32 @@ pub mod container {
}
}

/// Takes a reference to the first 16 bytes of the total bytes slice and convert it to an array for `peek_bytes` to use.
/// Returns None if bytes's length is less than 16.
pub fn take_hint_bytes(bytes: &[u8]) -> Option<&[u8; 16]> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I should have caught this earlier, can you make this private and then just do:
#[allow(unused)] to silence any warnings for any of the feature combos.
I'd like to have bare pub functions as small as possible, and etc.

src/mach/mod.rs Outdated
/// Iterator over every `MachO` binary contained in this `MultiArch` container
pub struct MachOIterator<'a> {
/// Iterator over every entry contained in this `MultiArch` container
pub struct SingelArchIterator<'a> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mispelled, should be SIngle not Singel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 good catch, thanks

Nick Spain added 2 commits September 12, 2022 14:29
Multi-arch containers can be made up of archives or Mach-O
binaries. This adds support for archives. It is a breaking change
because previously the `MachO` struct was returned and now we're
returning a new enum: `SingleArch`.

This required some refactoring of the `lib.rs` file to share the
required functions and data structures for parsing the hint at the top
of files. `take_hint_bytes`, `Hint` and `HintData` dont't require any
special features but I think were inside the `if_everything!` because
that's the only case they were used in. I did expand the API by making
`take_byte_hints` public, this was mainly because I thought it was
better solution than trying to maintain the various combinations of
features required to stop the compiler warning about the function
being unused. It's also a function that may be useful for goblin users.

resolves m4b#320
This tests parsing fat binaries made up of Mach-O binaries and
archives. I've checked in the binaries to make testing easier as
they're quite small (both are built from the hello_world.c file in the
same directory). Above the tests themselves are instructions for how
to compile the binaries they use.
@nick96 nick96 force-pushed the multi-arch-with-archives branch from 9d7dd58 to 4021007 Compare September 12, 2022 04:29
Copy link
Owner

@m4b m4b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@m4b m4b merged commit e2b5207 into m4b:master Sep 12, 2022
@m4b
Copy link
Owner

m4b commented Sep 12, 2022

@nick96 thanks so much for this PR and your patience! Are you ok to wait a bit for a release (since this is a breaking change, as I mentioned, I like to roll them up if possible, though ideally they become less and less)

@nick96
Copy link
Contributor Author

nick96 commented Sep 12, 2022

@nick96 thanks so much for this PR and your patience! Are you ok to wait a bit for a release (since this is a breaking change, as I mentioned, I like to roll them up if possible, though ideally they become less and less)

No worries on both counts 😊. Makes sense to batch them up.

@m4b
Copy link
Owner

m4b commented Oct 24, 2022

@nick96 thank you so much for your patience, this is now released in 0.6.0 :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants