Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten ABI compatibility checks for side modules? #15917

Open
hoodmane opened this issue Jan 7, 2022 · 19 comments
Open

Emscripten ABI compatibility checks for side modules? #15917

hoodmane opened this issue Jan 7, 2022 · 19 comments

Comments

@hoodmane
Copy link
Collaborator

hoodmane commented Jan 7, 2022

There has been discussion with the CPython folks about trying to add wasm32-emscripten wheels to PyPI. For this, we would like to have some way to decide which wheels are compatible with which main modules. Looking around, I see there used to be EMSCRIPTEN_ABI_MAJOR and EMSCRIPTEN_ABI_MINOR but they were removed.

Suppose we compile a main module and a side module at separate times using possibly different versions of Emscripten. Is there any way to check whether they are compatible? Could such a feature be added? What will happen at load time if they are not compatible?

@hoodmane hoodmane changed the title Emscripten ABI compatibility checks Emscripten ABI compatibility checks for side modules? Jan 7, 2022
@sbc100
Copy link
Collaborator

sbc100 commented Jan 7, 2022

As of now we don't have any guarantees like that. Would it be possible to treat the entire emscripten version as the ABI version for now? Or perhaps that would be too limiting? Perhaps if would work if you selected just a few specific emscripten releases to support?

Can you explain the user case a little more? Is is idea that folks who use the emscripten version of python would be able to download pre-built side modules? Is there some way to disable pre-built binaries and have the side module always compile from source instead? Or might these users not even have emscripten itself installed?

@hoodmane
Copy link
Collaborator Author

hoodmane commented Jan 7, 2022

This is still in an exploratory phase, so we don't have a completely clear set of requirements (and other people involved might have slightly different ideas about what our requirements are).

Can you explain the user case a little more? Is is idea that folks who use the emscripten version of python would be able to download pre-built side modules?

Yeah exactly. The way Pyodide currently works is that we build a bunch of side modules with Python packages in tree with the Python interpreter.

The way the broader Python system works is that different people build wheels for different targets -- packages build and distribute their own wheels, wheels for Raspberry Pi are built by people other than the original author and distributed on https://www.piwheels.org/, etc. In order to try to make this work, wheels are tagged with some platform information which tries to indicate which systems they can run on.

The Python people suggest that in the long term we could add Emscripten as a supported wheel platform, but we would need to define the compatibility more precisely. Ideally we should discuss our plans with the Emscripten team before doing that.

Or might these users not even have emscripten itself installed?

End users probably don't have Emscripten and won't rebuild packages. Many people making applications based on Pyodide have a webdev background struggle with a complicated and unfamiliar toolchain like Emscripten, though if we can make the tools easy to install and use then more people will manage.

Would it be possible to treat the entire emscripten version as the ABI version for now? Or perhaps that would be too limiting? Perhaps if would work if you selected just a few specific emscripten releases to support?

Yeah, these are certainly options.

I guess we have a pretty different use case from typical Emscripten users who are porting fixed applications like games and don't have any need for side module dependency management.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 7, 2022

Don't get me wrong, as stable ABI for dyanmic linking is certainly something we would like to have in the future. It could be that this use case is compelling enough to try to make progress on it.

@akpmilot
Copy link

Hi,

The lack of clear ABI compatibility between Emscripten versions is very problematic for us. Our product is game middleware that is pluggable in various game engines like Unity, which supports the Web platform via Emscripten. I'll direct you to Unity's documentation page on external native plug-ins, which is what our product is:

https://docs.unity.cn/2023.3/Documentation/Manual/webgl-native-plugins-with-emscripten.html

In particular, this line in their documentation explains the problem clearly:

If you choose to build plug-in code in advance, you should use the Emscripten compiler toolchain. To ensure LLVM binary format compatibility, the version of Emscripten that’s used to compile the plug-ins must match the version of Emscripten that Unity uses.

Since we distribute pre-compiled binaries for our plug-in, we must provide binaries that are supported across all maintained Unity versions. Each Unity version uses a different version of Emscripten. Because ABI compatibility is not clearly defined between Emscripten versions, it is difficult for us to determine which version of Emscripten we should use to support Unity.

We got bit by this recently as we upgraded to a new minor version of Emscripten, 3.1.52 and then bumped into #20233 when integrating into Unity because they use 3.1.38.

I humbly suggest the Emscripten project revises its approach to versioning to more clearly advertise when ABI compatibility is broken.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 15, 2024

@akpmilot, in a world where we were able to detect ABI breakages and document them, presumably you would still need a process for updating your libraries to match unity's versions right?

With that process in place, can you not work with the current status quo which is basically that each emscripten release should be considered and ABI breaking release? i.e. is this just an issue or degree? You can publish new versions of your library for emscripten releases but you would rather not do it for all releases?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 15, 2024

Honestly its hard for me to imagine very many emscripten releases not breaking some kind of ABI, especially when you consider than native object files and libraries can contains JS code which can refer to the entire JS library code in emscripten. Any change the emscripten's JS library code could conceivable be and ABI breakage in that case.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 15, 2024

Would it help if we somehow embedded the emscripten version in each object file or shared library so that linking objects from different versions would cause an error? At least that might help the problem show up quicker perhaps?

@hoodmane
Copy link
Collaborator Author

I think that would be helpful. Currently if you load a wrong shared library sometimes the only sign is a "memory access out of bounds error".

@akpmilot
Copy link

@sbc100 Publishing different binaries targeting different versions of Emscripten is a possibility. In fact, it is something we do for other platforms. For example, in the Apple ecosystem ABI breakage is allowed between major versions of Xcode. Therefore, we publish separate binaries for the Xcode 14.x series and Xcode 15.x currently.

This works for platforms where ABI compatibility is clearly defined. However, if the definition of ABI breakage for Emscripten is "every release", we would not be able to do this. The reason is that we do not only support Unity games, but also games based on other engines, including custom ones for which we cannot predict the version of Emscripten being used. It's simply not feasible for us to publish binaries for every Emscripten release.

On the subject of the actual definition of "ABI compatibility", you bring up a good point that Emscripten is quite special due to the ability to embed Javascript code in the object files. I think it would be reasonable to exclude the Javascript environment from the definition of ABI. After all, Javascript code is interpreted at runtime, not precompiled; one can always write embedded Javascript in a way that tests for the existence of functions and global objects to cover multiple variants of the environment.

What if Emscripten continued to follow a X.Y.Z numbering scheme, but native code ABI is maintained for a given X.Y, while the Javascript environment is allowed to change at every Z release? Would that make it more feasible?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2024

It's simply not feasible for us to publish binaries for every Emscripten release.

Can you elaborate a little on what makes this not feasible? Why is it feasible to do this once in a while but not for every release? Are there a lot of manual steps involved? If so I wonder if this could somehow be automated? Perhaps via github actions or some other mechanism?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2024

What if Emscripten continued to follow a X.Y.Z numbering scheme, but native code ABI is maintained for a given X.Y, while the Javascript environment is allowed to change at every Z release? Would that make it more feasible?

In such as scheme the "native code ABI" would also include all of libc and libc++ and all the other native libraries that get included in the main module (libwasmfs, libmalloc, libembind, etc, etc). So I think it might also be rare that a release doesn't include some change to those libraries. Remember that even an internal bug fix that doesn't change the interface can break users of the library who we somehow dependent on that bug.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2024

I've been thinking recently we should drop the X.Y.X versioning completely and move to simple XX version number like chrome or firefox.

An interesting argument again compound versions: The last time we considered bumping the minor version we decided not so since some folks like to be able to effectively measure time/distance between two version using the X version alone. i.e. one can look at two versions and see roughly how far apart they are in time. e.g. Chrome 60 and Chrome 100 are clearly several years apart.

@akpmilot
Copy link

Can you elaborate a little on what makes this not feasible? Why is it feasible to do this once in a while but not for every release? Are there a lot of manual steps involved? If so I wonder if this could somehow be automated?

Build our product takes about 4-5 minutes for Emscripten. We already produce two builds: one with pthreads, and one without. We also have three build configs: Debug, Profile, and Release, although our builds are parallelized so they don't strictly add up; total build time for all Emscripten artifacts in our nightly is about 12 minutes. The resulting archive is about 117 megs. If we were to support every Emscripten version between 3.1.38 (what Unity 2023 uses) and 3.1.8 (what Unity 2022 uses), we're already at 6 hours and 3.5 GB of artifacts. That's an unrealistic ask, considering 90% of these binaries will never be used by anybody.

In such as scheme the "native code ABI" would also include all of libc and libc++ and all the other native libraries that get included in the main module (libwasmfs, libmalloc, libembind, etc, etc). So I think it might also be rare that a release doesn't include some change to those libraries. Remember that even an internal bug fix that doesn't change the interface can break users of the library who we somehow dependent on that bug.

I am not a toolchain developer, but as I understand it, these are the same concerns that other LLVM-based platforms like Xcode and the Android NDK deal with. Unreal Engine also offers some guarantees about ABI compatibility using their LLVM cross-platform toolchain: https://docs.unrealengine.com/5.3/en-US/linux-development-requirements-for-unreal-engine/
Perhaps the Emscripten project can borrow some techniques or policies from these other toolchain makers?

I've been thinking recently we should drop the X.Y.X versioning completely and move to simple XX version number like chrome or firefox.

Regardless of the numbering scheme, have you considered producing less frequent "LTS" releases? Perhaps this could be the solution for projects like ours and Unity's that need to agree on a version to use. If we were to standardize on "LTS" releases that happen only at specific intervals (once a year, for example), it would simplify things. As middleware developers, it would be reasonable for us to simply state "we will only provide binaries for LTS releases of Emscripten" and ask downstream projects to use LTS as well.

@hoodmane
Copy link
Collaborator Author

@akpmilot's situation is quite similar to what the Pyodide downstream faces. I would say that Emscipten ABI instability is the largest problem for the Python-Emscripten packaging ecosystem. Python itself has decided that having package authors compile against many different ABIs is untenable. manylinux and abi3 make it possible for packages to upload just one version for glibc linux and have it work for a long time. There is a perfectly viable way forward for us by only updating the Emscripten version when we also update the Python version. That way people only have one version per year. Unity could also consider yearly updates.

But the combination of:

  • Emscripten updates bring lots of confusing regressions, including exposure to llvm's tip of tree
  • Emscripten updates bring important improvements
  • Emscripten updates are completely ABI unstable

is difficult to deal with. If Emscripten either had better ABI compatibility between versions or used stable llvm, I think things would be much better for us. As it is, updating the Emscripten version once a year is a bit painful to consider.

@sbc100
Copy link
Collaborator

sbc100 commented Feb 16, 2024

Can you elaborate a little on what makes this not feasible? Why is it feasible to do this once in a while but not for every release? Are there a lot of manual steps involved? If so I wonder if this could somehow be automated?

Build our product takes about 4-5 minutes for Emscripten. We already produce two builds: one with pthreads, and one without. We also have three build configs: Debug, Profile, and Release, although our builds are parallelized so they don't strictly add up; total build time for all Emscripten artifacts in our nightly is about 12 minutes. The resulting archive is about 117 megs. If we were to support every Emscripten version between 3.1.38 (what Unity 2023 uses) and 3.1.8 (what Unity 2022 uses), we're already at 6 hours and 3.5 GB of artifacts. That's an unrealistic ask, considering 90% of these binaries will never be used by anybody.

In such as scheme the "native code ABI" would also include all of libc and libc++ and all the other native libraries that get included in the main module (libwasmfs, libmalloc, libembind, etc, etc). So I think it might also be rare that a release doesn't include some change to those libraries. Remember that even an internal bug fix that doesn't change the interface can break users of the library who we somehow dependent on that bug.

I am not a toolchain developer, but as I understand it, these are the same concerns that other LLVM-based platforms like Xcode and the Android NDK deal with. Unreal Engine also offers some guarantees about ABI compatibility using their LLVM cross-platform toolchain: https://docs.unrealengine.com/5.3/en-US/linux-development-requirements-for-unreal-engine/ Perhaps the Emscripten project can borrow some techniques or policies from these other toolchain makers?

I'll take a look at that, thanks.

I've been thinking recently we should drop the X.Y.X versioning completely and move to simple XX version number like chrome or firefox.

Regardless of the numbering scheme, have you considered producing less frequent "LTS" releases? Perhaps this could be the solution for projects like ours and Unity's that need to agree on a version to use. If we were to standardize on "LTS" releases that happen only at specific intervals (once a year, for example), it would simplify things. As middleware developers, it would be reasonable for us to simply state "we will only provide binaries for LTS releases of Emscripten" and ask downstream projects to use LTS as well.

We don't currently have any concept of LTS releases. We support all our releases equally, but we also don't so any patching of existing releases.

This sounds like a maybe something that could be worked out between you and your users. For example, you could say "we only support on in every 10 releases" (e.g. 3.1.10 and 3.1.20)?

@hoodmane
Copy link
Collaborator Author

maybe something that could be worked out between you and your users

They probably will want to ask unity to do this since telling their users that their plugin will only work with certain versions of unity isn't ideal.

@akpmilot
Copy link

The landscape in gaming is quite complex, it's not strictly a question of upstream/downstream users. You have game studios, game engine providers, middleware providers, and various platform toolchains. Each game project might have to juggle with a game engine AND 2-3 middleware plug-ins, all of which must agree on the same ABI for a deployment platform.

@hoodmane
Copy link
Collaborator Author

Surely the game engine picks the ABI and the plugin developers go along with it?

@akpmilot
Copy link

Most game middleware products support several game engines, as well as basic native libraries for use by custom game engines. For example, in addition to the 4 Emscripten versions required by the 4 different supported Unity versions, you also have Godot wanting 1.39.9 and Unreal Engine 4.26 wanting 1.38.31.

This is why a clear stable ABI becomes necessary. Coming back to Apple's case, every party can agree that "Xcode 15" is the target, and libraries built with Xcode 15 can be linked together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants