-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-13: Set up JNI build (dataset, etc.) #449
Conversation
ac6bbb9
to
d98309e
Compare
The docker image takes so long to build that we really need to cache it. |
0598a43
to
c8eb06b
Compare
It's going to be a while before I can get back to this. |
fc17d5e
to
dcef949
Compare
Hmm, this problem doesn't happen on a local build if I change to a debug build... |
Ah, interesting, LD_PRELOADING ASan (even if the binaries are built without it) "fixes" the issue. That's rather unfortunate, since ASan was how I was trying to figure out the corruption issue... |
Valgrind also seems to "fix" the issue :/ |
I may disable the ORC test for now to get CI passing...we'll have to revisit a lot of the JNI work. |
What is the error message for ORC test? |
It is crashing in CI. When I debug locally it appears that a malloc assertion fails. However it no longer reproduces for me after fiddling around. |
I tried to use ASan and Valgrind (separately) to identify possible memory corruption but it turns out under these tools, the crash no longer reproduces. Also after using the tools, even with them turned off now, I can't reproduce the crash locally anymore. |
I suspect ORC crash is due to missing the timezone database: apache/arrow#36026. |
It crashes in malloc, though. I do have the core dump:
|
And interestingly now ORC passes in CI. |
Anyways, for now I've disabled the test so that the JNI build can pass. It turns out one of the fixes here is needed to fix the Arrow CI (apache/arrow#45128). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ci/scripts/java_full_build.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove java_
prefix because this is the apache/arrow-java
repository?
.github/workflows/test_jni.yml
Outdated
ARCH_ALIAS: ${{ matrix.platform.archery_arch_alias }} | ||
ARCH_SHORT: ${{ matrix.platform.archery_arch_short }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove them.
.github/workflows/test_jni.yml
Outdated
password: ${{ secrets.GITHUB_TOKEN }} | ||
- name: Build C++ libraries | ||
env: | ||
VCPKG_BINARY_SOURCES: "clear;nuget,GitHub,readwrite" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not work because we need more codes for this.
For example, https://github.com/apache/arrow/pull/44644/files#diff-e45e45baeda1c1e73482975a664062aa56f20c03dd9d64a827aba57775bed0d3R2135 and so on are needed.
But we can work on this as a follow-up task.
I found that the JNI libraries built on ubuntu has linked with both jemalloc and mimalloc. The coredump indicates an invalid initial state in the sysmalloc. I'm not sure if it is an undefined behavior if we have enabled both jemalloc and mimalloc. Should we consider disabling jemalloc by default? |
: "${ARROW_GANDIVA:=ON}" | ||
export ARROW_GANDIVA | ||
: "${ARROW_GCS:=ON}" | ||
: "${ARROW_JEMALLOC:=ON}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
: "${ARROW_JEMALLOC:=ON}" | |
: "${ARROW_JEMALLOC:=OFF}" | |
: "${ARROW_MIMALLOC:=ON}" |
I can try that. But I thought we've shipped multiple allocators in one binary before. (Arrow doesn't use jemalloc or mimalloc to replace system malloc.) |
This reverts commit f0bcf4d.
Huh, that passed @wgtmac. Let's hope it stays that way 😅 I addressed Kou's feedback too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Hmm. The ORC crash is reproduced...?
|
I think it's flaky then. Are we ok with disabling it for now? |
Yes, we need to disable ORC test in this PR. Is it helpful to enable ASAN in a separate PR to build JNI libraries and use it for the ORC test? |
I can try again in another PR. I think I found that ASAN hid the problem (Valgrind too). (Also both tools are finicky with the JVM.) |
Filed #473 |
Now that we have some CI I'm going to start merging Dependabot updates again |
I tried to enable ASAN on Apache ORC and no issue has been found: https://github.com/apache/orc/actions/runs/12577822275/job/35055736586?pr=2097 |
ASan replaces the malloc implementation, so the error may get masked. (Though if it is indeed memory corruption presumably ASan would find that instead.) |
One other thing we could do is try various combinations of MALLOC_CHECK and MALLOC_PERTURB |
I still don't understand why we cannot see symbols of mimalloc from the coredump backtrace if we have linked mimalloc. |
We don't use mimalloc to replace malloc. It's only used by the Arrow memory pool. So we are still using glibc malloc for regular allocations. |
…ipts (#45165) ### Rationale for this change apache/arrow-java removed `java_` prefix from scripts by apache/arrow-java#449 . ### What changes are included in this PR? Follow the script name change. ### Are these changes tested? Yes. ### Are there any user-facing changes? No. * GitHub Issue: #45164 Authored-by: Sutou Kouhei <kou@clear-code.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Fixes #13.