Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Arm64 builds to Travis #5932

Closed
wants to merge 5 commits into from

Conversation

adamretter
Copy link
Collaborator

@adamretter adamretter commented Oct 16, 2019

Subsumes #5928

@adamretter adamretter added the arm label Oct 16, 2019
@adamretter adamretter force-pushed the feature/travis-arm64 branch 5 times, most recently from 52354c9 to bdef03c Compare October 18, 2019 20:36
@adamretter adamretter requested a review from siying October 19, 2019 08:50
@adamretter
Copy link
Collaborator Author

adamretter commented Oct 19, 2019

Hey @HouBingjian @guyuqi @cnqpzhang @wangxiyuan @Zhiwei-Dai, Travis-CI now offers builds on Arm64 based machines.

Good news - I have added this to our .travis.yml: https://travis-ci.org/facebook/rocksdb/builds/599829524?utm_source=github_status&utm_medium=notification

We have two test cases which are failing on Arm64: JOB_NAME=platform_dependent, and TEST_GROUP=4. I don't have a great deal of ARM expertise, would one (or more) of you be willing to take a look?

The TEST_GROUP=1 is also failing, but only due to a lack of Disk Space (10GB) in the ARM64 containers, I have opened an issue with Travis and they will hopefully provide a new image build this coming week that has a larger Disk quota.

@wangxiyuan
Copy link

@adamretter good news. Looking forward for ARM support. And @Yikun can help with the ARM problem. some of the errors is included in the issues I created before.

#5736
#5751

If you need arm VM for debugging, please let us know. We can provide for you.

@Yikun
Copy link
Contributor

Yikun commented Oct 21, 2019

The TEST_GROUP=4 problem is due to #5736 . And JOB_NAME=platform_dependent problem looks like a new problem.

@guyuqi
Copy link
Contributor

guyuqi commented Oct 21, 2019

TEST_GROUP=4 : failed in db_universal_compaction_test.

I also run the db_universal_compaction_test in my local environments and every tests cases in db_universal_compaction_test are ok:

builder@vm-rocksdb-yq:~/rocksdb$ ./db_universal_compaction_test
[==========] Running 147 tests from 5 test cases.
[----------] Global test environment set-up.
[----------] 5 tests from DBTestUniversalDeleteTrigCompaction
[ RUN      ] DBTestUniversalDeleteTrigCompaction.BasicL0toL1
[       OK ] DBTestUniversalDeleteTrigCompaction.BasicL0toL1 (1190 ms)
[ RUN      ] DBTestUniversalDeleteTrigCompaction.SingleLevel
[       OK ] DBTestUniversalDeleteTrigCompaction.SingleLevel (972 ms)
[ RUN      ] DBTestUniversalDeleteTrigCompaction.MultipleLevels
[       OK ] DBTestUniversalDeleteTrigCompaction.MultipleLevels (2210 ms)
[ RUN      ] DBTestUniversalDeleteTrigCompaction.OverlappingL0
[       OK ] DBTestUniversalDeleteTrigCompaction.OverlappingL0 (1093 ms)
[ RUN      ] DBTestUniversalDeleteTrigCompaction.IngestBehind
[       OK ] DBTestUniversalDeleteTrigCompaction.IngestBehind (984 ms)
[----------] 5 tests from DBTestUniversalDeleteTrigCompaction (6449 ms total)

[----------] 126 tests from UniversalCompactionNumLevels/DBTestUniversalCompaction
DBTestUniversalCompactionParallel/DBTestUniversalCompactionParallel.PickByFileNumberBug/1 (6588 ms)
[----------] 4 tests from DBTestUniversalCompactionParallel/DBTestUniversalCompactionParallel (47709 ms total)
.....................
.................
.............

[----------] 4 tests from DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId
[ RUN      ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/0
[       OK ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/0 (3238 ms)
[ RUN      ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/1
[       OK ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/1 (3038 ms)
[ RUN      ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/2
[       OK ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/2 (2828 ms)
[ RUN      ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/3
[       OK ] DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId.ManualCompactionOutputPathId/3 (3184 ms)
[----------] 4 tests from DBTestUniversalManualCompactionOutputPathId/DBTestUniversalManualCompactionOutputPathId (12288 ms total)

[----------] Global test environment tear-down
[==========] 147 tests from 5 test cases ran. (2159491 ms total)
[  PASSED  ] 147 tests.

And the db_properties_test are passed which failed in #5736

builder@vm-rocksdb-yq:~/rocksdb$ ./db_properties_test
[==========] Running 21 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 21 tests from DBPropertiesTest
[ RUN      ] DBPropertiesTest.Empty
[       OK ] DBPropertiesTest.Empty (37985 ms)
[ RUN      ] DBPropertiesTest.CurrentVersionNumber
.............
..........
.......
[ RUN      ] DBPropertiesTest.SstFilesSize
[       OK ] DBPropertiesTest.SstFilesSize (951 ms)
[ RUN      ] DBPropertiesTest.MinObsoleteSstNumberToKeep
[       OK ] DBPropertiesTest.MinObsoleteSstNumberToKeep (1329 ms)
[ RUN      ] DBPropertiesTest.BlockCacheProperties
[       OK ] DBPropertiesTest.BlockCacheProperties (1095 ms)
[----------] 21 tests from DBPropertiesTest (359952 ms total)

[----------] Global test environment tear-down
[==========] 21 tests from 1 test case ran. (359952 ms total)
[  PASSED  ] 21 tests.

But the bloom_test failed:

builder@vm-rocksdb-yq:~/rocksdb$ ./bloom_test
[==========] Running 11 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 4 tests from BloomTest
[ RUN      ] BloomTest.EmptyFilter
[       OK ] BloomTest.EmptyFilter (0 ms)
[ RUN      ] BloomTest.Small
[       OK ] BloomTest.Small (0 ms)
[ RUN      ] BloomTest.VaryingLengths
False positives:  0.23% @ length =      1 ; bytes =      9
...............
.............
..........
......
....
[ RUN      ] FullBloomTest.RawSchema
[       OK ] FullBloomTest.RawSchema (0 ms)
[ RUN      ] FullBloomTest.CorruptFilters
bloom_test: util/bloom.cc:288: virtual rocksdb::FilterBitsReader* rocksdb::{anonymous}::BloomFilterPolicy::GetFilterBitsReader(const rocksdb::Slice&) const: Assertion `num_probes <= 127' failed.
Aborted (core dumped)

@wangxiyuan
Copy link

db_properties_test passed in my last test. I removed it in my issue.
db_universal_compaction_test still failed the same as Travis CI shown.

@adamretter adamretter force-pushed the feature/travis-arm64 branch from bdef03c to f0e26ca Compare October 29, 2019 10:05
@adamretter
Copy link
Collaborator Author

@wangxiyuan I also just rebased to bring in any recent changes.

@pdillinger
Copy link
Contributor

pdillinger commented Dec 5, 2019

Things may have stabilized a bit for release (including #6024). I just did "make check" on master (f32a311) on our ARM64 instance on EC2, and only folly_synchronization_distributed_mutex_test failed consistently. Added PR #6126 for that.

Inconsistent failures:

facebook-github-bot pushed a commit that referenced this pull request Dec 5, 2019
…6126)

Summary:
This test is crashing on ARM but is not yet production code.
Let's not let it block ARM CI. See PR #5932
Pull Request resolved: #6126

Test Plan:
./folly_synchronization_distributed_mutex_test, on Linux/ARM,
on Linux/x86_64, and with LITE=1 on Linux/x86_64 (also disabled)

Differential Revision: D18836576

Pulled By: pdillinger

fbshipit-source-id: d8a36eea2f048e8330411d994435d1c58a15d978
@wangxiyuan
Copy link

@pdillinger Thanks for your fix. Could you take a look at #5751 ?
It's a failure for make jtest on aarch64.

@adamretter
Copy link
Collaborator Author

Thanks @pdillinger I have now rebased this on your fixes, let's see how she flies now...

@adamretter
Copy link
Collaborator Author

adamretter commented Dec 10, 2019

Now just a single failure in c_test::filter

@wangxiyuan
Copy link

Closed #5751 it's passed now.

@wangxiyuan
Copy link

@adamretter Cool, make jtest passed on my ubuntu16.04 arm env as well.

But when I ran make jtest on ubuntu18.04, it failed with:

java: db/file_indexer.cc:71: void rocksdb::FileIndexer::GetNextLevelIndex(size_t, size_t, int, int, int32_t*, int32_t*) const: Assertion `*left_bound <= *right_bound + 1' failed.
Aborted (core dumped)
Makefile:306: recipe for target 'run_test' failed
make[1]: *** [run_test] Error 134
make[1]: Leaving directory '/opt/rocksdb/java'
Makefile:2029: recipe for target 'jtest' failed
make: *** [jtest] Error 2

the java version is the same

openjdk version "1.8.0_222"
OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10)
OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)

But the gcc is not (it's 5.4.0 on ubuntu16.04):

gcc (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@pdillinger
Copy link
Contributor

Now just a single failure in c_test::filter

Ah, I know why. Different legacy filter schema on ARM due to cache line size. Will prepare a PR

@adamretter
Copy link
Collaborator Author

@wangxiyuan it would probably be useful to examine the stack trace

facebook-github-bot pushed a commit that referenced this pull request Dec 11, 2019
Summary:
This test was recently updated but failed to account for Bloom
schema variance by CACHE_LINE_SIZE. (Since CACHE_LINE_SIZE is not
defined in our C code, the test now simply allows a valid result for any
CACHE_LINE_SIZE, not just the current one.)

Unblock #5932
Pull Request resolved: #6153

Test Plan:
ran unit test with builds TEST_CACHE_LINE_SIZE=128, =256, and
unset (64 on Intel)

Differential Revision: D18936015

Pulled By: pdillinger

fbshipit-source-id: e5e3852f95283d34d624632c1ae8d3adb2f2662c
@adamretter adamretter force-pushed the feature/travis-arm64 branch 2 times, most recently from f0bed2b to eaa8758 Compare December 14, 2019 20:56
wolfkdy pushed a commit to wolfkdy/rocksdb that referenced this pull request Dec 22, 2019
…acebook#6126)

Summary:
This test is crashing on ARM but is not yet production code.
Let's not let it block ARM CI. See PR facebook#5932
Pull Request resolved: facebook#6126

Test Plan:
./folly_synchronization_distributed_mutex_test, on Linux/ARM,
on Linux/x86_64, and with LITE=1 on Linux/x86_64 (also disabled)

Differential Revision: D18836576

Pulled By: pdillinger

fbshipit-source-id: d8a36eea2f048e8330411d994435d1c58a15d978
wolfkdy pushed a commit to wolfkdy/rocksdb that referenced this pull request Dec 22, 2019
Summary:
This test was recently updated but failed to account for Bloom
schema variance by CACHE_LINE_SIZE. (Since CACHE_LINE_SIZE is not
defined in our C code, the test now simply allows a valid result for any
CACHE_LINE_SIZE, not just the current one.)

Unblock facebook#5932
Pull Request resolved: facebook#6153

Test Plan:
ran unit test with builds TEST_CACHE_LINE_SIZE=128, =256, and
unset (64 on Intel)

Differential Revision: D18936015

Pulled By: pdillinger

fbshipit-source-id: e5e3852f95283d34d624632c1ae8d3adb2f2662c
@JunHe77
Copy link
Contributor

JunHe77 commented Feb 13, 2020

I tried TEST_GROUP=platform_dependent on local Arm64 host with Ubuntu-16.04 of different modes (VM, docker, LXD), and passed DBWALTest.TruncateLastLogAfterRecoverWithoutFlush test in all of them.
A simple log is added to db_wal_test.cc to find logfile info, on local Arm platform, it outputed:
fsize:27 blocks:35992 blocksz: 4096
But on Travis it outputed:
fsize:27 blocks:1 blocksz: 512
Not sure if this is a Travis related issue (I knew little about TravisCI file system). But from what described here, and the failed test is file system dependent, it might be helpful to check with Travis guys.

@JunHe77
Copy link
Contributor

JunHe77 commented Feb 13, 2020

I created a topic in Travis. Welcome to comment. 😃
Tested platform_independent case using drone.io's Arm platform and the result is good:
https://cloud.drone.io/JunHe77/rocksdb/1/1/2

@pdillinger
Copy link
Contributor

Using #6436 instead

@pdillinger pdillinger closed this Mar 13, 2020
facebook-github-bot pushed a commit that referenced this pull request Mar 13, 2020
Summary:
This patch based on #5932 offers a better solution to add arm64 to TravisCI matrix.
Really thank adamretter for initiating Arm CI setup.

Difference comparing to amd64:
1. For CMake, as no official arm64 release ready on Kitware page,
a third party (conda-forge) released one is used instead of
building from source. The main reason is to save CI time.
2. Explicit export JAVA_HOME on arm64
3. Disable mingw test

Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
Pull Request resolved: #6436

Differential Revision: D20428505

Pulled By: pdillinger

fbshipit-source-id: 81ef02435e41480bb71710b783d85ebf452ce926
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants