[C++] Add hash table to Redis-Module #4911

guoyuhong · 2019-05-31T12:14:35Z

What do these changes do?

This PR add the Hash table to ray. This PR is a bit big, so DynamicResourceTable is not actually used. DynamicResourceTable is currently used for test purpose. I will use it in next PR to refine the dynamic resource feature. Current dynamic resource feature will append data to the LOG client table , which will increate indefinitely and become slower and slower.

With the special structure of hash, the interface to Hash is a bit different from Log and Table's.

Related issue number

Linter

I've run scripts/format.sh to lint the changes in this PR.

guoyuhong · 2019-05-31T12:15:14Z

CC: @kfstorm @zhijunfu .

AmplabJenkins · 2019-05-31T14:47:18Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14532/
Test FAILed.

AmplabJenkins · 2019-06-02T09:03:44Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1046/
Test PASSed.

src/ray/gcs/redis_module/ray_redis_module.cc

src/ray/gcs/tables.h

raulchen · 2019-06-03T08:20:24Z

src/ray/gcs/tables.h

+  /// \param remove_callback HashRemoveCallback that is called once the data has been
+  /// written to the GCS no matter whether the key exists in the hash table.
+  /// \return Status
+  Status RemoveEntry(const DriverID &driver_id, const ID &id,


RemoveEntries

raulchen · 2019-06-03T09:23:46Z

src/ray/gcs/tables.h

+  using HashNotificationCallback = std::function<void(
+      AsyncGcsClient *client, const ID &id,
+      const GcsTableNotificationMode notification_mode, const DataMap &data)>;
+  using SubscriptionCallback = typename Log<ID, Data>::SubscriptionCallback;


Document the args of these callback functions?

I moved the document from class Hash to class HashInterface and also document the args for the callbacks.

raulchen · 2019-06-03T09:34:53Z

src/ray/gcs/format/gcs.fbs

@@ -22,6 +22,7 @@ enum TablePrefix:int {
  TASK_LEASE,
  ACTOR_CHECKPOINT,
  ACTOR_CHECKPOINT_ID,
+  DYNAMIC_RESOURCE,


maybe name this node_resource?
I feel node_resource explains more clearly that this table stores the resources for each node.

In that case, should we change the dynamic_resource_table to resource_table?

kfstorm · 2019-06-03T08:22:25Z

src/ray/gcs/redis_module/ray_redis_module.cc

@@ -518,6 +532,142 @@ int SetRemove_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int ar
  return RedisModule_ReplyWithSimpleString(ctx, "OK");
 }

+int Hash_DoPublish(RedisModuleCtx *ctx, RedisModuleString **argv,
+                   GcsTableNotificationMode notification_mode) {


It seems that notification_mode is not used.

Good catch.

kfstorm · 2019-06-03T09:43:04Z

src/ray/gcs/redis_module/ray_redis_module.cc

+/// \param notification_mode Output the mode of the operation: APPEND_OR_ADD or REMOVE.
+/// \param deleted_data Output data if the deleted data is not the same as required.
+int HashUpdate_DoWrite(RedisModuleCtx *ctx, RedisModuleString **argv, int argc,
+                       GcsTableNotificationMode *notification_mode,


Can we change the name of GcsTableNotificationMode to a more meaningful name? Since you use it to represent the kind of operation, not notification.

I rename it to GcsChangeMode.

stephanie-wang

Thanks!

src/ray/gcs/client_test.cc

stephanie-wang · 2019-06-03T22:05:09Z

src/ray/gcs/redis_module/ray_redis_module.cc

+/// \param deleted_data Output data if the deleted data is not the same as required.
+int HashUpdate_DoWrite(RedisModuleCtx *ctx, RedisModuleString **argv, int argc,
+                       GcsTableNotificationMode *notification_mode,
+                       RedisModuleString *&deleted_data) {


The convention we now use is to have all out-arguments be a pointer instead of a reference.

My bad. This breaks the google style.

stephanie-wang · 2019-06-03T22:08:58Z

src/ray/gcs/redis_module/ray_redis_module.cc

+      deleted_flags[i] = deleted_num != 0;
+      remove_count += deleted_num;
+    }
+    if (remove_count != deleted_flags.size()) {


Maybe it's a little slower, but I would consider removing this if-statement and always creating the GcsTableEntry here no matter how many keys were successfully deleted, for code clarity. It may also make sense to do this even for additions, when the operation should always succeed.

Hmm it seems like this is still the original code?

stephanie-wang · 2019-06-03T22:12:24Z

src/ray/gcs/tables.cc

+    data_vec.push_back(fbb.CreateString(data));
+  }
+
+  fbb.Finish(CreateGcsTableEntry(fbb, GcsTableNotificationMode::APPEND_OR_ADD,


I'm not sure if it makes sense to overload GcsTableEntry with hash table semantics. What do you think about creating a separate flatbuffer type specific for hash table operations?

Maybe the name GcsTableEntry should be renamed to GcsEntry to make it more meaningful to hash. At beginning, I create a GcsHashEntry structure. After I checked HGETALL document which returns 2*n list. I decided to reuse GcsEntry since it better aligns to the redis semantic.

AmplabJenkins · 2019-06-04T09:13:42Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1084/
Test PASSed.

AmplabJenkins · 2019-06-04T09:43:51Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14569/
Test PASSed.

AmplabJenkins · 2019-06-04T10:44:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14572/
Test PASSed.

AmplabJenkins · 2019-06-04T13:54:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14576/
Test FAILed.

AmplabJenkins · 2019-06-04T16:23:09Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1087/
Test PASSed.

src/ray/gcs/format/gcs.fbs

-table GcsTableEntry {
-  notification_mode: GcsTableNotificationMode;
+table GcsEntry {
+  chagne_mode: GcsChangeMode;


src/ray/gcs/redis_module/ray_redis_module.cc

+  const char *update_data_buf = RedisModule_StringPtrLen(update_data, &update_data_len);
+
+  auto data_vec = flatbuffers::GetRoot<GcsEntry>(update_data_buf);
+  *chagne_mode = data_vec->chagne_mode();


stephanie-wang · 2019-06-04T23:07:29Z

src/ray/gcs/redis_module/ray_redis_module.cc

+      deleted_flags[i] = deleted_num != 0;
+      remove_count += deleted_num;
+    }
+    if (remove_count != deleted_flags.size()) {


Hmm it seems like this is still the original code?

AmplabJenkins · 2019-06-05T01:45:35Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1091/
Test PASSed.

guoyuhong · 2019-06-05T12:20:37Z

Do you have more comments?

AmplabJenkins · 2019-06-05T15:02:00Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14605/
Test PASSed.

stephanie-wang · 2019-06-05T19:22:24Z

Do you have more comments?

I still don't see any changes from my last review? If possible, can you avoid force-pushing, to make it easier to see new diffs?

guoyuhong · 2019-06-06T03:15:11Z

@stephanie-wang Sorry for the inconvenience. I rebased the code to the newest master and did the force push. Now I put 2 commits: 1 commit to fix the typo and 1 commit to remove remove_count != deleted_flags.size(). Thanks!

raulchen · 2019-06-06T03:59:50Z

It seems that if we use "merge master + normal push", sometimes GitHub can't properly squash merge the PR. So I use "rebase + force push" as well to avoid this problem.
I think you can still manually choose the commit range to only see the changes since last review. It's unfortunately a little more inconvenient though.

AmplabJenkins · 2019-06-06T05:54:28Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14609/
Test PASSed.

AmplabJenkins · 2019-06-06T10:53:05Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14613/
Test PASSed.

stephanie-wang · 2019-06-06T18:23:00Z

No problem, thanks!

It's just easier to see what has changed if we don't use force-push :) I'm a little surprised that github isn't able to handle the merge+push though; I use this myself all the time. Also, you probably know this already, but there usually isn't a need to merge unless the same files have changed in master.

stephanie-wang

Thanks!

stephanie-wang · 2019-06-06T18:34:41Z

Looks like there are just some lint errors on Travis.

guoyuhong · 2019-06-07T00:25:50Z

I will fix the lint error and do the rebasing.

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

AmplabJenkins · 2019-06-07T03:28:00Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14625/
Test FAILed.

AmplabJenkins · 2019-06-07T06:43:07Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14627/
Test PASSed.

AmplabJenkins · 2019-06-07T11:58:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1120/
Test FAILed.

AmplabJenkins · 2019-06-07T12:08:08Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1124/
Test FAILed.

AmplabJenkins · 2019-06-07T16:49:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1128/
Test FAILed.

AmplabJenkins · 2019-06-08T12:26:05Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1143/
Test FAILed.

AmplabJenkins · 2019-06-08T12:26:20Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1144/
Test FAILed.

* [rllib] Remove dependency on TensorFlow (ray-project#4764) * remove hard tf dep * add test * comment fix * fix test * Dynamic Custom Resources - create and delete resources (ray-project#3742) * Update tutorial link in doc (ray-project#4777) * [rllib] Implement learn_on_batch() in torch policy graph * Fix `ray stop` by killing raylet before plasma (ray-project#4778) * Fatal check if object store dies (ray-project#4763) * [rllib] fix clip by value issue as TF upgraded (ray-project#4697) * fix clip_by_value issue * fix typo * [autoscaler] Fix submit (ray-project#4782) * Queue tasks in the raylet in between async callbacks (ray-project#4766) * Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued * Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue * cleanups * updates * updates * [Java][Bazel] Refine auto-generated pom files (ray-project#4780) * Bump version to 0.7.0 (ray-project#4791) * [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798) * Add WorkerUncaughtExceptionHandler * Fix * revert bazel and pom * [tune] Fix CLI test (ray-project#4801) * Fix pom file generation (ray-project#4800) * [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771) * [rllib] TensorFlow 2 compatibility (ray-project#4802) * Change tagline in documentation and README. (ray-project#4807) * Update README.rst, index.rst, tutorial.rst and _config.yml * [tune] Support non-arg submit (ray-project#4803) * [autoscaler] rsync cluster (ray-project#4785) * [tune] Remove extra parsing functionality (ray-project#4804) * Fix Java worker log dir (ray-project#4781) * [tune] Initial track integration (ray-project#4362) Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution. * [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795) * dynamic graph * wip * clean up * fix * document trainer * wip * initialize the graph using a fake batch * clean up dynamic init * wip * spelling * use builder for ppo pol graph * add ppo graph * fix naming * order * docs * set class name correctly * add torch builder * add custom model support in builder * cleanup * remove underscores * fix py2 compat * Update dynamic_tf_policy_graph.py * Update tracking_dict.py * wip * rename * debug level * rename policy_graph -> policy in new classes * fix test * rename ppo tf policy * port appo too * forgot grads * default policy optimizer * make default config optional * add config to optimizer * use lr by default in optimizer * update * comments * remove optimizer * fix tuple actions support in dynamic tf graph * [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819) This implements some of the renames proposed in ray-project#4813 We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch. * [Java] Dynamic resource API in Java (ray-project#4824) * Add default values for Wgym flags * Fix import * Fix issue when starting `raylet_monitor` (ray-project#4829) * Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776) * Enable BaseId. * Change TaskID and make python test pass * Remove unnecessary functions and fix test failure and change TaskID to 16 bytes. * Java code change draft * Refine * Lint * Update java/api/src/main/java/org/ray/api/id/TaskId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/api/src/main/java/org/ray/api/id/BaseId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/api/src/main/java/org/ray/api/id/BaseId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/api/src/main/java/org/ray/api/id/ObjectId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Address comment * Lint * Fix SINGLE_PROCESS * Fix comments * Refine code * Refine test * Resolve conflict * Fix bug in which actor classes are not exported multiple times. (ray-project#4838) * Bump Ray master version to 0.8.0.dev0 (ray-project#4845) * Add section to bump version of master branch and cleanup release docs (ray-project#4846) * Fix import * Export remote functions when first used and also fix bug in which rem… (ray-project#4844) * Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions. * Documentation update * Fix tests. * Fix grammar * Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847) * [tune] Later expansion of local_dir (ray-project#4806) * [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832) * Fix a typo in kubernetes yaml (ray-project#4872) * Move global state API out of global_state object. (ray-project#4857) * Install bazel in autoscaler development configs. (ray-project#4874) * [tune] Fix up Ax Search and Examples (ray-project#4851) * update Ax for cleaner API * docs update * [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821) * wip * fix index * fix bugs * todo * add imports * note on get ph * note on get ph * rename to building custom algs * add rnn state info * [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO * Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854) * Refine TaskInfo * Fix * Add a test to print task info size * Lint * Refine * Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862) * Update deps to support bzl 2.5.x * Fix * Upgrade arrow to latest master (ray-project#4858) * [tune] Auto-init Ray + default SearchAlg (ray-project#4815) * Bump version from 0.8.0.dev0 to 0.7.1. (ray-project#4890) * [rllib] Allow access to batches prior to postprocessing (ray-project#4871) * [rllib] Fix Multidiscrete support (ray-project#4869) * Refactor redis callback handling (ray-project#4841) * Add CallbackReply * Fix * fix linting by format.sh * Fix linting * Address comments. * Fix * Initial high-level code structure of CoreWorker. (ray-project#4875) * Drop duplicated string format (ray-project#4897) This string format is unnecessary. java_worker_options has been appended to the commandline later. * Refactor ID Serial 2: change all ID functions to `CamelCase` (ray-project#4896) * Hotfix for change of from_random to FromRandom (ray-project#4909) * [rllib] Fix documentation on custom policies (ray-project#4910) * wip * add docs * lint * todo sections * fix doc * [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (ray-project#4894) * fix torch extra out * preserve setitem * fix docs * [tune] Pretty print params json in logger.py (ray-project#4903) * [sgd] Distributed Training via PyTorch (ray-project#4797) Implements distributed SGD using distributed PyTorch. * [rllib] Rough port of DQN to build_tf_policy() pattern (ray-project#4823) * fetching objects in parallel in _get_arguments_for_execution (ray-project#4775) * [tune] Disallow setting resources_per_trial when it is already configured (ray-project#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests * [rllib] Rename PolicyEvaluator => RolloutWorker (ray-project#4820) * Fix local cluster yaml (ray-project#4918) * [tune] Directional metrics for components (ray-project#4120) (ray-project#4915) * [Core Worker] implement ObjectInterface and add test framework (ray-project#4899) * [tune] Make PBT Quantile fraction configurable (ray-project#4912) * Better organize ray_common module (ray-project#4898) * Fix error * [tune] Add requirements-dev.txt and update docs for contributing (ray-project#4925) * Add requirements-dev.txt and update docs. * Update doc/source/tune-contrib.rst Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * Unpin everything except for yapf. * Fix compute actions return value * Bump version from 0.7.1 to 0.8.0.dev1. (ray-project#4937) * Update version number in documentation after release 0.7.0 -> 0.7.1 and 0.8.0.dev0 -> 0.8.0.dev1. (ray-project#4941) * [doc] Update developer docs with bazel instructions (ray-project#4944) * [C++] Add hash table to Redis-Module (ray-project#4911) * Flush lineage cache on task submission instead of execution (ray-project#4942) * [rllib] Add docs on how to use TF eager execution (ray-project#4927) * [rllib] Port remainder of algorithms to build_trainer() pattern (ray-project#4920) * Fix resource bookkeeping bug with acquiring unknown resource. (ray-project#4945) * Update aws keys for uploading wheels to s3. (ray-project#4948) * Upload wheels on Travis to branchname/commit_id. (ray-project#4949) * [Java] Fix serializing issues of `RaySerializer` (ray-project#4887) * Fix * Address comment. * fix (ray-project#4950) * [Java] Add inner class `Builder` to build call options. (ray-project#4956) * Add Builder class * format * Refactor by IDE * Remove uncessary dependency * Make release stress tests work and improve them. (ray-project#4955) * Use proper session directory for debug_string.txt (ray-project#4960) * [core] Use int64_t instead of int to keep track of fractional resources (ray-project#4959) * [core worker] add task submission & execution interface (ray-project#4922) * [sgd] Add non-distributed PyTorch runner (ray-project#4933) * Add non-distributed PyTorch runner * use dist.is_available() instead of checking OS * Nicer exception * Fix bug in choosing port * Refactor some code * Address comments * Address comments * Flush all tasks from local lineage cache after a node failure (ray-project#4964) * Remove typing from setup.py install_requirements. (ray-project#4971) * [Java] Fix bug of `BaseID` in multi-threading case. (ray-project#4974) * [rllib] Fix DDPG example (ray-project#4973) * Upgrade CI clang-format to 6.0 (ray-project#4976) * [Core worker] add store & task provider (ray-project#4966) * Fix bugs in the a3c code template. (ray-project#4984) * Inherit Function Docstrings and other metedata (ray-project#4985) * Fix a crash when unknown worker registering to raylet (ray-project#4992) * [gRPC] Use gRPC for inter-node-manager communication (ray-project#4968)

* [rllib] Remove dependency on TensorFlow (ray-project#4764) * remove hard tf dep * add test * comment fix * fix test * Dynamic Custom Resources - create and delete resources (ray-project#3742) * Update tutorial link in doc (ray-project#4777) * [rllib] Implement learn_on_batch() in torch policy graph * Fix `ray stop` by killing raylet before plasma (ray-project#4778) * Fatal check if object store dies (ray-project#4763) * [rllib] fix clip by value issue as TF upgraded (ray-project#4697) * fix clip_by_value issue * fix typo * [autoscaler] Fix submit (ray-project#4782) * Queue tasks in the raylet in between async callbacks (ray-project#4766) * Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued * Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue * cleanups * updates * updates * [Java][Bazel] Refine auto-generated pom files (ray-project#4780) * Bump version to 0.7.0 (ray-project#4791) * [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798) * Add WorkerUncaughtExceptionHandler * Fix * revert bazel and pom * [tune] Fix CLI test (ray-project#4801) * Fix pom file generation (ray-project#4800) * [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771) * [rllib] TensorFlow 2 compatibility (ray-project#4802) * Change tagline in documentation and README. (ray-project#4807) * Update README.rst, index.rst, tutorial.rst and _config.yml * [tune] Support non-arg submit (ray-project#4803) * [autoscaler] rsync cluster (ray-project#4785) * [tune] Remove extra parsing functionality (ray-project#4804) * Fix Java worker log dir (ray-project#4781) * [tune] Initial track integration (ray-project#4362) Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution. * [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795) * dynamic graph * wip * clean up * fix * document trainer * wip * initialize the graph using a fake batch * clean up dynamic init * wip * spelling * use builder for ppo pol graph * add ppo graph * fix naming * order * docs * set class name correctly * add torch builder * add custom model support in builder * cleanup * remove underscores * fix py2 compat * Update dynamic_tf_policy_graph.py * Update tracking_dict.py * wip * rename * debug level * rename policy_graph -> policy in new classes * fix test * rename ppo tf policy * port appo too * forgot grads * default policy optimizer * make default config optional * add config to optimizer * use lr by default in optimizer * update * comments * remove optimizer * fix tuple actions support in dynamic tf graph * [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819) This implements some of the renames proposed in ray-project#4813 We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch. * [Java] Dynamic resource API in Java (ray-project#4824) * Add default values for Wgym flags * Fix import * Fix issue when starting `raylet_monitor` (ray-project#4829) * Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776) * Enable BaseId. * Change TaskID and make python test pass * Remove unnecessary functions and fix test failure and change TaskID to 16 bytes. * Java code change draft * Refine * Lint * Update java/api/src/main/java/org/ray/api/id/TaskId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/api/src/main/java/org/ray/api/id/BaseId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/api/src/main/java/org/ray/api/id/BaseId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/api/src/main/java/org/ray/api/id/ObjectId.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Address comment * Lint * Fix SINGLE_PROCESS * Fix comments * Refine code * Refine test * Resolve conflict * Fix bug in which actor classes are not exported multiple times. (ray-project#4838) * Bump Ray master version to 0.8.0.dev0 (ray-project#4845) * Add section to bump version of master branch and cleanup release docs (ray-project#4846) * Fix import * Export remote functions when first used and also fix bug in which rem… (ray-project#4844) * Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions. * Documentation update * Fix tests. * Fix grammar * Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847) * [tune] Later expansion of local_dir (ray-project#4806) * [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832) * Fix a typo in kubernetes yaml (ray-project#4872) * Move global state API out of global_state object. (ray-project#4857) * Install bazel in autoscaler development configs. (ray-project#4874) * [tune] Fix up Ax Search and Examples (ray-project#4851) * update Ax for cleaner API * docs update * [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821) * wip * fix index * fix bugs * todo * add imports * note on get ph * note on get ph * rename to building custom algs * add rnn state info * [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO * Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854) * Refine TaskInfo * Fix * Add a test to print task info size * Lint * Refine * Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862) * Update deps to support bzl 2.5.x * Fix * Upgrade arrow to latest master (ray-project#4858) * [tune] Auto-init Ray + default SearchAlg (ray-project#4815) * Bump version from 0.8.0.dev0 to 0.7.1. (ray-project#4890) * [rllib] Allow access to batches prior to postprocessing (ray-project#4871) * [rllib] Fix Multidiscrete support (ray-project#4869) * Refactor redis callback handling (ray-project#4841) * Add CallbackReply * Fix * fix linting by format.sh * Fix linting * Address comments. * Fix * Initial high-level code structure of CoreWorker. (ray-project#4875) * Drop duplicated string format (ray-project#4897) This string format is unnecessary. java_worker_options has been appended to the commandline later. * Refactor ID Serial 2: change all ID functions to `CamelCase` (ray-project#4896) * Hotfix for change of from_random to FromRandom (ray-project#4909) * [rllib] Fix documentation on custom policies (ray-project#4910) * wip * add docs * lint * todo sections * fix doc * [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (ray-project#4894) * fix torch extra out * preserve setitem * fix docs * [tune] Pretty print params json in logger.py (ray-project#4903) * [sgd] Distributed Training via PyTorch (ray-project#4797) Implements distributed SGD using distributed PyTorch. * [rllib] Rough port of DQN to build_tf_policy() pattern (ray-project#4823) * fetching objects in parallel in _get_arguments_for_execution (ray-project#4775) * [tune] Disallow setting resources_per_trial when it is already configured (ray-project#4880) * disallow it * import fix * fix example * fix test * fix tests * Update mock.py * fix * make less convoluted * fix tests * [rllib] Rename PolicyEvaluator => RolloutWorker (ray-project#4820) * Fix local cluster yaml (ray-project#4918) * [tune] Directional metrics for components (ray-project#4120) (ray-project#4915) * [Core Worker] implement ObjectInterface and add test framework (ray-project#4899) * [tune] Make PBT Quantile fraction configurable (ray-project#4912) * Better organize ray_common module (ray-project#4898) * Fix error * [tune] Add requirements-dev.txt and update docs for contributing (ray-project#4925) * Add requirements-dev.txt and update docs. * Update doc/source/tune-contrib.rst Co-Authored-By: Richard Liaw <rliaw@berkeley.edu> * Unpin everything except for yapf. * Fix compute actions return value * Bump version from 0.7.1 to 0.8.0.dev1. (ray-project#4937) * Update version number in documentation after release 0.7.0 -> 0.7.1 and 0.8.0.dev0 -> 0.8.0.dev1. (ray-project#4941) * [doc] Update developer docs with bazel instructions (ray-project#4944) * [C++] Add hash table to Redis-Module (ray-project#4911) * Flush lineage cache on task submission instead of execution (ray-project#4942) * [rllib] Add docs on how to use TF eager execution (ray-project#4927) * [rllib] Port remainder of algorithms to build_trainer() pattern (ray-project#4920) * Fix resource bookkeeping bug with acquiring unknown resource. (ray-project#4945) * Update aws keys for uploading wheels to s3. (ray-project#4948) * Upload wheels on Travis to branchname/commit_id. (ray-project#4949) * [Java] Fix serializing issues of `RaySerializer` (ray-project#4887) * Fix * Address comment. * fix (ray-project#4950) * [Java] Add inner class `Builder` to build call options. (ray-project#4956) * Add Builder class * format * Refactor by IDE * Remove uncessary dependency * Make release stress tests work and improve them. (ray-project#4955) * Use proper session directory for debug_string.txt (ray-project#4960) * [core] Use int64_t instead of int to keep track of fractional resources (ray-project#4959) * [core worker] add task submission & execution interface (ray-project#4922) * [sgd] Add non-distributed PyTorch runner (ray-project#4933) * Add non-distributed PyTorch runner * use dist.is_available() instead of checking OS * Nicer exception * Fix bug in choosing port * Refactor some code * Address comments * Address comments * Flush all tasks from local lineage cache after a node failure (ray-project#4964) * Remove typing from setup.py install_requirements. (ray-project#4971) * [Java] Fix bug of `BaseID` in multi-threading case. (ray-project#4974) * [rllib] Fix DDPG example (ray-project#4973) * Upgrade CI clang-format to 6.0 (ray-project#4976) * [Core worker] add store & task provider (ray-project#4966) * Fix bugs in the a3c code template. (ray-project#4984) * Inherit Function Docstrings and other metedata (ray-project#4985) * Fix a crash when unknown worker registering to raylet (ray-project#4992) * [gRPC] Use gRPC for inter-node-manager communication (ray-project#4968) * Fix Java CI failure (ray-project#4995) * fix handling of non-integral timeout values in signal.receive (ray-project#5002) * temp fix for build (ray-project#5006) * [tune] Tutorial UX Changes (ray-project#4990) * add integration, iris, ASHA, recursive changes, set reuse_actors=True, and enable Analysis as a return object * docstring * fix up example * fix * cleanup tests * experiment analysis * Fix valgrind build by installing new version of valgrind (ray-project#5008) * Fix no cpus test (ray-project#5009) * Fix tensorflow-1.14 installation in jenkins (ray-project#5007) * Add dynamic worker options for worker command. (ray-project#4970) * Add fields for fbs * WIP * Fix complition errors * Add java part * FIx * Fix * Fix * Fix lint * Refine API * address comments and add test * Fix * Address comment. * Address comments. * Fix linting * Refine * Fix lint * WIP: address comment. * Fix java * Fix py * Refin * Fix * Fix * Fix linting * Fix lint * Address comments * WIP * Fix * Fix * minor refine * Fix lint * Fix raylet test. * Fix lint * Update src/ray/raylet/worker_pool.h Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Update java/runtime/src/main/java/org/ray/runtime/AbstractRayRuntime.java Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Address comments. * Address comments. * Fix test. * Update src/ray/raylet/worker_pool.h Co-Authored-By: Hao Chen <chenh1024@gmail.com> * Address comments. * Address comments. * Fix * Fix lint * Fix lint * Fix * Address comments. * Fix linting * [docs] docs for running Tensorboard without sudo (ray-project#5015) * Instructions for running Tensorboard without sudo When we run Tensorboard to visualize the results of Ray outputs on multi-user clusters where we don't have sudo access, such as RISE clusters, a few commands need to first be run to make sure tensorboard can edit the tmp directory. This is a pretty common usecase so I figured we may as well put it in the documentation for Tune. * Update tune-usage.rst * [ci] Change Jenkins to py3 (ray-project#5022) * conda3 * integration * add nevergrad, remotedata * pytest 0.3.1 * otherdockers * setup * tune * [gRPC] Migrate gcs data structures to protobuf (ray-project#5024) * [rllib] Add QMIX mixer parameters to optimizer param list (ray-project#5014) * add mixer params * Update qmix_policy.py * [grpc] refactor rpc server to support multiple io services (ray-project#5023) * [rllib] Give error if sample_async is used with pytorch for A3C (ray-project#5000) * give error if sample_async is used with pytorch * update * Update a3c.py * [tune] Update MNIST Example (ray-project#4991) * Add entropy coeff schedule * Revert "Merge with ray master" This reverts commit 108bfa2, reversing changes made to 2e0eec9. * Revert "Revert "Merge with ray master"" This reverts commit 92c0f88. * Remove entropy decay stuff

guoyuhong requested review from robertnishihara, stephanie-wang, raulchen and jovany-wang May 31, 2019 12:14

guoyuhong changed the title ~~[C++] Add hash table to Redis-Module and Table~~ [C++] Add hash table to Redis-Module May 31, 2019

guoyuhong mentioned this pull request Jun 3, 2019

Change client table from Log to Set #4598

Closed

1 task

raulchen reviewed Jun 3, 2019

View reviewed changes

kfstorm reviewed Jun 3, 2019

View reviewed changes

stephanie-wang reviewed Jun 3, 2019

View reviewed changes

guoyuhong force-pushed the addRedisHash branch from 1674782 to dbd31f3 Compare June 4, 2019 07:50

stephanie-wang reviewed Jun 4, 2019

View reviewed changes

guoyuhong force-pushed the addRedisHash branch from 5565e7b to d65b4c6 Compare June 5, 2019 12:20

stephanie-wang approved these changes Jun 6, 2019

View reviewed changes

Yuhong Guo and others added 11 commits June 7, 2019 08:26

Add hash to redis module and Table

b9bcb49

Lint cpp

7cf337b

Refine code

da454f2

Update src/ray/gcs/redis_module/ray_redis_module.cc

cc84f7f

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

Update src/ray/gcs/redis_module/ray_redis_module.cc

8b4ed82

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

Update src/ray/gcs/tables.h

b6f031b

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

Update src/ray/gcs/client_test.cc

ee6e2b5

Co-Authored-By: Stephanie Wang <swang@cs.berkeley.edu>

Address comments

3148365

Fix typo

5c34d8f

Always create the changed_data

44227a3

Lint python

bbc051b

guoyuhong force-pushed the addRedisHash branch from b3a652d to bbc051b Compare June 7, 2019 00:26

Lint cpp

5318378

raulchen merged commit 5eff47b into ray-project:master Jun 7, 2019

raulchen deleted the addRedisHash branch June 7, 2019 08:11

kfstorm mentioned this pull request Jun 27, 2019

[GCS] Move node resource info from client table to resource table #5050

Merged

1 task

[C++] Add hash table to Redis-Module #4911

[C++] Add hash table to Redis-Module #4911

Conversation

guoyuhong commented May 31, 2019

What do these changes do?

Related issue number

Linter

guoyuhong commented May 31, 2019

AmplabJenkins commented May 31, 2019

AmplabJenkins commented Jun 2, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanie-wang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guoyuhong Jun 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented Jun 4, 2019

AmplabJenkins commented Jun 4, 2019

AmplabJenkins commented Jun 4, 2019

AmplabJenkins commented Jun 4, 2019

AmplabJenkins commented Jun 4, 2019

This comment was marked as resolved.

This comment was marked as resolved.

Choose a reason for hiding this comment

AmplabJenkins commented Jun 5, 2019

guoyuhong commented Jun 5, 2019

AmplabJenkins commented Jun 5, 2019

stephanie-wang commented Jun 5, 2019

guoyuhong commented Jun 6, 2019

raulchen commented Jun 6, 2019

AmplabJenkins commented Jun 6, 2019

AmplabJenkins commented Jun 6, 2019

stephanie-wang commented Jun 6, 2019

stephanie-wang left a comment

Choose a reason for hiding this comment

stephanie-wang commented Jun 6, 2019

guoyuhong commented Jun 7, 2019

AmplabJenkins commented Jun 7, 2019

AmplabJenkins commented Jun 7, 2019

AmplabJenkins commented Jun 7, 2019

AmplabJenkins commented Jun 7, 2019

AmplabJenkins commented Jun 7, 2019

AmplabJenkins commented Jun 8, 2019

AmplabJenkins commented Jun 8, 2019

guoyuhong Jun 4, 2019 •

edited

Loading