Replies: 8 comments 2 replies
-
That's horrible. pybind11 and nanobind work very similarly at a high level, just that nanobind migrates tons of template code that must be compiled many times to a separate library. If anything, compilation should become much simpler, which is what you are seeing on GCC/macOS. I am not really sure how to approach this problem because your bindings are so large. I tried two things: I changed Then I tried deleting the bottom half of the code in
|
Beta Was this translation helpful? Give feedback.
-
Hello, Many thanks for your answer! I had initially tried to split the file, and did not see any improvements. But this was because I was not patient enough, and I had given up hope after 30 minutes. Sorry for this. I gave it another try and observed that:
So the lessons are:
And on my side, I need to study whether it is feasible to split generated bindings files (I have several generated binding file which exceed 4000 lines). PS: Would you agree with me that when building the bindings with optimization disabled (but building the rest of the libraries with full optimizations), the performance should likely remains good (since bindings function merely provide a link). |
Beta Was this translation helpful? Give feedback.
-
Additional detail: it is sufficient to split the long function in several functions while keeping them in the same file (see this action which built in 5 minutes). Many thanks for your help, and for your continued work on nanobind and pybind! On my side, I'll study how to get the generator to split its bindings in several functions. It should be enough. Thanks again |
Beta Was this translation helpful? Give feedback.
-
Hi @wjakob, Good news, See this action, the reported size is 2.4MB, very close to the 2.2MB which we get when waiting for 3 hours; and the build time is 3min4s (Done with this commit) Thanks! Pascal |
Beta Was this translation helpful? Give feedback.
-
More results on a larger library (imgui bundle), when disabling __forceinline:
|
Beta Was this translation helpful? Give feedback.
-
FYI, my compilation time for bindings using pyodide (under linux) went down from 15 minutes (using pybind11) to 2 minutes (using nanobind). The link step, which was very slow when using pybind11 (12 minutes, using 6 GB of memory), is now much more reasonable. |
Beta Was this translation helpful? Give feedback.
-
This all sounds great, I am glad that the build performance is now better. It is surprising for me to see that pybind11 wheels are smaller than the nanobind ones, however. Is it possible that the number of wheels in that count is different? (I guess the average wheel size might be a more useful "bloat" measurement) |
Beta Was this translation helpful? Give feedback.
-
I was not different, but it went from containing Python 3.10, 3.11, 3.12 to containing 3.11, 3.12, 3.13. However, a wheel is a zipped file that includes a compiled library together with other artifacts so that the measure is not very efficient. I took some time to extract the wheel for imgui bundle, comparing from pybind to nanobind. Those wheels were produced with one week of interval, the changes in the code were small (and all related to switching from pybind to nanobind). Here are the raw results (size in bytes)
And the results as tables macOS
Ubuntu
Windows
=> Honestly, I do not know what explains the difference observed under windows. It might come from an option I inadvertently changed during the migration, or from nanobind, I don't know for sure. I did check the changes on my side (I checked the diffs before and after this commit), and saw no important changes in the compilation options (apart from switching to nanobind). |
Beta Was this translation helpful? Give feedback.
-
Hi,
I noticed extremely slow build times on Windows (more than 3 hours!) when using any optimization switch (
/Os
,/O1
, or/O2
). However, builds are much faster when using /Od (no optimization), completing in under 2 minutes.This issue seems to be specific to Windows, as similar slowdowns were not observed on Linux or macOS.
For context, my bindings were generated using litgen, a tool I authored, which I mentioned in this previous PR. I am in the process of porting imgui_bundle from Pybind11 to Nanobind, I experienced this issue during this.
I’ve conducted a detailed analysis and set up a minimal reproduction repository, which I will describe below.
Reproduction Repository
To investigate this issue, I created a minimal reproduction repository: nano_study_link. Its purpose is to analyze build times and library sizes for medium-sized projects (~8,300 lines of binding code) generated by litgen for both Pybind11 and Nanobind.
You can find detailed results in its GitHub Actions workflows
Results Overview
Here are the build times and library sizes for both Pybind11 and Nanobind across platforms:
When using Pybind11
When using Nanobind
Quick Analysis (on this particular project)
On Linux and macOS
-O3
vs-Os
) have a minor impact on build times and binary sizesOn Windows
Reproduction repository summary
Here is a summarized version of the repository structure:
A More Comprehensive Test on a Large Library
The tables below are based on imgui_bundle, a project that builds wheels for Dear ImGui along with 19 additional libraries.
This project is currently transitioning from Pybind11 to Nanobind, and the figures below show the build times and library sizes for both tools. For Windows, I forcefully disabled optimizations (
/Od
) to allow a fair comparison of build times and sizes.Results Overview
Total Size Comparison
Sources
Summary of Findings
Nanobind significantly outperforms Pybind11 in build times across all platforms. On macOS, the build time is reduced from ~50 minutes to ~13 minutes, while on Windows, it drops from ~29 minutes to ~22 minutes.
Library sizes are comparable between Nanobind and Pybind11, even on Windows, where the dramatic size difference observed with ImGui is mitigated when combined with other libraries in the bundle.
Details about Wheels Action
/Od
(no optimization)Summary
This part addresses nanobind only:
Observations:
On Windows, build times with optimization switches (
/Os
,/O1
,/O2
) may be prohibitively slow on large projects, taking several hours. In contrast, using/Od
(no optimization) results in much faster builds (~2 minutes), albeit with larger binary sizes.On Linux and macOS, testing across 20 different libraries on Linux and macOS showed fast build times with both -Os and -O3, and only minor differences in performance or binary size. However, slowdowns may still occur on different codebases.
What could be done:
Addressing multiple platforms and scenarios is hard (I know :-).
Perhaps, a note in the documentation could inform developers about the potential for extremely slow build times on Windows when using
/Os
or other optimization switches. For example:Alternatively, Nanobind’s CMake configuration could include an optional toggle to allow developers to switch between /Os (default) and /Od for faster builds when needed.
A draft could be:
This implementation is a starting point and would need further testing, especially to ensure compatibility with the existing NOMINSIZE option, which currently controls calls to
nanobind_opt_size
.I hope this analysis is helpful! Please let me know if additional details or testing are needed!
Beta Was this translation helpful? Give feedback.
All reactions