Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID #4776

Merged
merged 17 commits into from
May 22, 2019

Conversation

guoyuhong
Copy link
Contributor

@guoyuhong guoyuhong commented May 12, 2019

What do these changes do?

  1. Use a template BaseID as the base class for all IDs and separate the ObjectID and TaskID definition from UniqueID. In the long run, UniqueID should be gone, but in this PR, we will keep it. TaskID now has 12-byte data and ObjectID contains one TaskID instance. Therefore, we don't need the functions of ComputeReturnId, ComputePutId, ComputeObjectIndex, ComputeTaskId, FinishTaskId. Since TaskID is only 16-byte length without useless 4 bytes 0, the data stored in memory and the data transfer will be a bit efficient. (Previously, I used 12-byte length TaskID. It is easy to change the length.)
  2. Refactor Ray IDs to support different lengths. Previously, when we write the code about the IDs, we have assumption that the ID should be 20 bytes. This PR will avoid the ID length assumption. In the future when the ID schema comes out, it will be much easier the do the ID change with this PR.

Work items:

  • Backend ID change.
  • Python ID change.
  • Java ID change.

Related issue number

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@guoyuhong guoyuhong changed the title [WIP] Refactor ID Serial 1: Separate ObjectID and TaskID. [WIP] Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID May 12, 2019
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/769/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14195/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/778/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14205/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/779/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14206/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/780/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/783/
Test PASSed.

@guoyuhong guoyuhong changed the title [WIP] Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID May 14, 2019
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/784/
Test PASSed.

@guoyuhong guoyuhong requested review from raulchen and jovany-wang and removed request for raulchen May 14, 2019 15:17
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14207/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14210/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14211/
Test FAILed.

src/ray/id.h Outdated
}
};

#pragma pack(push, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this pragma for? It should probably be documented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means push current compiler alignment strategy to stack and change current compiler alignment to 1 byte. The default compiler alignment is 8 bytes for x64 machines. #pragma pack(pop) will restore the previous compiler alignment strategy.

If I do not use this, when I set TaskID to 12 bytes and put it to ObjectID, there will be always 4 byte empty data to make it align with 16 bytes.

I will add a comment to it.

@pcmoritz
Copy link
Contributor

I went over the C++ and Cython changes and they look good to me.

java/api/src/main/java/org/ray/api/id/BaseId.java Outdated Show resolved Hide resolved
/**
* Create a BaseId instance according to the input byte array.
*/
public BaseId(byte[] id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, move the constructor before other methods

java/api/src/main/java/org/ray/api/id/BaseId.java Outdated Show resolved Hide resolved
java/api/src/main/java/org/ray/api/id/ObjectId.java Outdated Show resolved Hide resolved
/**
* Create a copy of this ObjectId.
*/
public ObjectId copy() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy() isn't being used. we can just remove it.

/**
* Generate a nil ObjectId.
*/
public static ObjectId genNil() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genNil can be private?

return DatatypeConverter.parseHexBinary(hex);
}

public static byte[] byteBuffer2Bytes(ByteBuffer bb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hexString2Bytes and byteBuffer2Bytes can be protected?

java/api/src/main/java/org/ray/api/id/TaskId.java Outdated Show resolved Hide resolved
ObjectId[] returnIds = new ObjectId[uniqueIds.length];
for (int i = 0; i < uniqueIds.length; i++) {
returnIds[i] = new ObjectId(uniqueIds[i].getBytes());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit weird. What about having another helper function that converts ByteBuffer to List<byte[]>, and then convert each byte[] to ObjectId?

if length == ray_constants.ID_SIZE:
return id_bytes
else:
return id_bytes[:length]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work if length > ray_constants.ID_SIZE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can just use os.urandom(length), which is fork safe as well

src/ray/id.h Outdated
template <typename T>
T BaseID<T>::from_binary(const std::string &binary) {
T t = T::nil();
std::memcpy(reinterpret_cast<uint8_t *>(&t), binary.data(), T::size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a unneeded copy here (from T::nil() to local t)? Why not just call the constructor with string parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To force the user to use from_binary, there is no public constrcutor of ID(const std::string &binary).

src/ray/id.h Outdated
protected:
TaskID task_id_;
int32_t index_;
mutable size_t hash_ = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not define hash_ in BaseID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as the TaskID. If we will nest some IDs into another ID, we cannot use cache_. Otherwise, we need to do some copy past operation in binary().

}

template <typename T>
bool BaseID<T>::is_nil() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just return *this == T::nil()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good.


template <typename T>
bool BaseID<T>::operator==(const BaseID &rhs) const {
return std::memcmp(data(), rhs.data(), T::size()) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compare hash code first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MurmurHash64A is expensive. It is faster to compare the bytes directly and delay the hash calculation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this still the case given that the hash code is computed only once?
or maybe you can instead check if the hash code exists, if it does, match the hash code first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently it is calculated only once. The MurmurHash64A is expensive that is why we change it to lazy evaluation. I think directly comparison is much faster than calculating the hash value. Moreover, directly comparison will avoid hash collision.
FYI, at first the hash cache is not lazy evaluation, say the hash value is calculated at the constructor. There was many test timeout. After changing to the lazy evaluation, the timeout was gone.

src/ray/id.h Outdated
/// \param task_id The task ID of the task that created the object.
/// \param index What number the object was created by in the task.
/// \return The computed object ID.
static ObjectID build(const TaskID &task_id, bool is_put, int64_t index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having 2 separate method: ObjectID::ForPut and ObjectID::ForTaskReturn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/ray/id.cc Outdated
const TaskID FinishTaskId(const TaskID &task_id) {
return TaskID(ComputeObjectId(task_id, 0));
}
size_t TaskID::hash() const { return MurmurHash64A(this, TaskID::size(), 0); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not cache hash code for TaskID?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now TaskID is nested in ObjectID, if TaskID has a value of cache_, Objected cannot contain it with one index field. In the future, the TaskID will have a int32_t index which can be used as a hash.

src/ray/gcs/redis_context.h Show resolved Hide resolved

std::mt19937 RandomlySeededMersenneTwister();

template <typename T>
Copy link
Contributor

@jiangzihao2009 jiangzihao2009 May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put the size in the template, like template <typename T, size_t SIZE>, we can specify the size when inherits the BaseID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, we may nest different IDs together. In this case, the template parameter size maybe not necessary.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14280/
Test FAILed.

@@ -21,12 +21,15 @@
import ray.ray_constants as ray_constants


def _random_string():
def _random_string(length=ray_constants.ID_SIZE):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should deprecate this function, just use ObjectID.random().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulchen I have discussed this with @pcmoritz . There will be some problem to use this in multi-thread cases.It may generated the same ObjectID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But I think we can move the same code to Cython's ObjectID.random to avoid that problem.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14287/
Test FAILed.

@guoyuhong
Copy link
Contributor Author

It looks like there are no more comments. I will rebase this PR and merge it when tests pass.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/885/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/886/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14361/
Test FAILed.

@guoyuhong guoyuhong merged commit 1a39fee into ray-project:master May 22, 2019
@guoyuhong guoyuhong deleted the baseId branch May 22, 2019 06:46
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/893/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/896/
Test FAILed.

stefanpantic added a commit to wingman-ai/ray that referenced this pull request May 28, 2019
* [rllib] Remove dependency on TensorFlow (ray-project#4764)

* remove hard tf dep

* add test

* comment fix

* fix test

* Dynamic Custom Resources - create and delete resources (ray-project#3742)

* Update tutorial link in doc (ray-project#4777)

* [rllib] Implement learn_on_batch() in torch policy graph

* Fix `ray stop` by killing raylet before plasma (ray-project#4778)

* Fatal check if object store dies (ray-project#4763)

* [rllib] fix clip by value issue as TF upgraded (ray-project#4697)

*  fix clip_by_value issue

*  fix typo

* [autoscaler] Fix submit (ray-project#4782)

* Queue tasks in the raylet in between async callbacks (ray-project#4766)

* Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued

* Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue

* cleanups

* updates

* updates

* [Java][Bazel]  Refine auto-generated pom files (ray-project#4780)

* Bump version to 0.7.0 (ray-project#4791)

* [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798)

* Add WorkerUncaughtExceptionHandler

* Fix

* revert bazel and pom

* [tune] Fix CLI test (ray-project#4801)

* Fix pom file generation (ray-project#4800)

* [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771)

* [rllib] TensorFlow 2 compatibility (ray-project#4802)

* Change tagline in documentation and README. (ray-project#4807)

* Update README.rst, index.rst, tutorial.rst and  _config.yml

* [tune] Support non-arg submit (ray-project#4803)

* [autoscaler] rsync cluster (ray-project#4785)

* [tune] Remove extra parsing functionality (ray-project#4804)

* Fix Java worker log dir (ray-project#4781)

* [tune] Initial track integration (ray-project#4362)

Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution.

* [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795)

* dynamic graph

* wip

* clean up

* fix

* document trainer

* wip

* initialize the graph using a fake batch

* clean up dynamic init

* wip

* spelling

* use builder for ppo pol graph

* add ppo graph

* fix naming

* order

* docs

* set class name correctly

* add torch builder

* add custom model support in builder

* cleanup

* remove underscores

* fix py2 compat

* Update dynamic_tf_policy_graph.py

* Update tracking_dict.py

* wip

* rename

* debug level

* rename policy_graph -> policy in new classes

* fix test

* rename ppo tf policy

* port appo too

* forgot grads

* default policy optimizer

* make default config optional

* add config to optimizer

* use lr by default in optimizer

* update

* comments

* remove optimizer

* fix tuple actions support in dynamic tf graph

* [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819)

This implements some of the renames proposed in ray-project#4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.

* [Java] Dynamic resource API in Java (ray-project#4824)

* Add default values for Wgym flags

* Fix import

* Fix issue when starting `raylet_monitor` (ray-project#4829)

* Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776)

* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict

* Fix bug in which actor classes are not exported multiple times. (ray-project#4838)

* Bump Ray master version to 0.8.0.dev0 (ray-project#4845)

* Add section to bump version of master branch and cleanup release docs (ray-project#4846)

* Fix import

* Export remote functions when first used and also fix bug in which rem… (ray-project#4844)

* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.

* Documentation update

* Fix tests.

* Fix grammar

* Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847)

* [tune] Later expansion of local_dir (ray-project#4806)

* [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832)

* Fix a typo in kubernetes yaml (ray-project#4872)

* Move global state API out of global_state object. (ray-project#4857)

* Install bazel in autoscaler development configs. (ray-project#4874)

* [tune] Fix up Ax Search and Examples (ray-project#4851)

* update Ax for cleaner API

* docs update

* [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821)

* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info

* [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO

* Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854)

* Refine TaskInfo

* Fix

* Add a test to print task info size

* Lint

* Refine

* Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862)

* Update deps to support bzl 2.5.x

* Fix
stefanpantic added a commit to wingman-ai/ray that referenced this pull request Jun 6, 2019
* [rllib] Remove dependency on TensorFlow (ray-project#4764)

* remove hard tf dep

* add test

* comment fix

* fix test

* Dynamic Custom Resources - create and delete resources (ray-project#3742)

* Update tutorial link in doc (ray-project#4777)

* [rllib] Implement learn_on_batch() in torch policy graph

* Fix `ray stop` by killing raylet before plasma (ray-project#4778)

* Fatal check if object store dies (ray-project#4763)

* [rllib] fix clip by value issue as TF upgraded (ray-project#4697)

*  fix clip_by_value issue

*  fix typo

* [autoscaler] Fix submit (ray-project#4782)

* Queue tasks in the raylet in between async callbacks (ray-project#4766)

* Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued

* Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue

* cleanups

* updates

* updates

* [Java][Bazel]  Refine auto-generated pom files (ray-project#4780)

* Bump version to 0.7.0 (ray-project#4791)

* [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798)

* Add WorkerUncaughtExceptionHandler

* Fix

* revert bazel and pom

* [tune] Fix CLI test (ray-project#4801)

* Fix pom file generation (ray-project#4800)

* [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771)

* [rllib] TensorFlow 2 compatibility (ray-project#4802)

* Change tagline in documentation and README. (ray-project#4807)

* Update README.rst, index.rst, tutorial.rst and  _config.yml

* [tune] Support non-arg submit (ray-project#4803)

* [autoscaler] rsync cluster (ray-project#4785)

* [tune] Remove extra parsing functionality (ray-project#4804)

* Fix Java worker log dir (ray-project#4781)

* [tune] Initial track integration (ray-project#4362)

Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution.

* [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795)

* dynamic graph

* wip

* clean up

* fix

* document trainer

* wip

* initialize the graph using a fake batch

* clean up dynamic init

* wip

* spelling

* use builder for ppo pol graph

* add ppo graph

* fix naming

* order

* docs

* set class name correctly

* add torch builder

* add custom model support in builder

* cleanup

* remove underscores

* fix py2 compat

* Update dynamic_tf_policy_graph.py

* Update tracking_dict.py

* wip

* rename

* debug level

* rename policy_graph -> policy in new classes

* fix test

* rename ppo tf policy

* port appo too

* forgot grads

* default policy optimizer

* make default config optional

* add config to optimizer

* use lr by default in optimizer

* update

* comments

* remove optimizer

* fix tuple actions support in dynamic tf graph

* [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819)

This implements some of the renames proposed in ray-project#4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.

* [Java] Dynamic resource API in Java (ray-project#4824)

* Add default values for Wgym flags

* Fix import

* Fix issue when starting `raylet_monitor` (ray-project#4829)

* Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776)

* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict

* Fix bug in which actor classes are not exported multiple times. (ray-project#4838)

* Bump Ray master version to 0.8.0.dev0 (ray-project#4845)

* Add section to bump version of master branch and cleanup release docs (ray-project#4846)

* Fix import

* Export remote functions when first used and also fix bug in which rem… (ray-project#4844)

* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.

* Documentation update

* Fix tests.

* Fix grammar

* Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847)

* [tune] Later expansion of local_dir (ray-project#4806)

* [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832)

* Fix a typo in kubernetes yaml (ray-project#4872)

* Move global state API out of global_state object. (ray-project#4857)

* Install bazel in autoscaler development configs. (ray-project#4874)

* [tune] Fix up Ax Search and Examples (ray-project#4851)

* update Ax for cleaner API

* docs update

* [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821)

* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info

* [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO

* Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854)

* Refine TaskInfo

* Fix

* Add a test to print task info size

* Lint

* Refine

* Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862)

* Update deps to support bzl 2.5.x

* Fix

* Upgrade arrow to latest master (ray-project#4858)

* [tune] Auto-init Ray + default SearchAlg (ray-project#4815)

* Bump version from 0.8.0.dev0 to 0.7.1. (ray-project#4890)

* [rllib] Allow access to batches prior to postprocessing (ray-project#4871)

* [rllib] Fix Multidiscrete support (ray-project#4869)

* Refactor redis callback handling (ray-project#4841)

* Add CallbackReply

* Fix

* fix linting by format.sh

* Fix linting

* Address comments.

* Fix

* Initial high-level code structure of CoreWorker. (ray-project#4875)

* Drop duplicated string format (ray-project#4897)

This string format is unnecessary. java_worker_options has been appended to the commandline later.

* Refactor ID Serial 2: change all ID functions to `CamelCase` (ray-project#4896)

* Hotfix for change of from_random to FromRandom (ray-project#4909)

* [rllib] Fix documentation on custom policies (ray-project#4910)

* wip

* add docs

* lint

* todo sections

* fix doc

* [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (ray-project#4894)

* fix torch extra out

* preserve setitem

* fix docs

* [tune] Pretty print params json in logger.py (ray-project#4903)

* [sgd] Distributed Training via PyTorch (ray-project#4797)

Implements distributed SGD using distributed PyTorch.

* [rllib] Rough port of DQN to build_tf_policy() pattern (ray-project#4823)

* fetching objects in parallel in _get_arguments_for_execution (ray-project#4775)

* [tune] Disallow setting resources_per_trial when it is already configured (ray-project#4880)

* disallow it

* import fix

* fix example

* fix test

* fix tests

* Update mock.py

* fix

* make less convoluted

* fix tests

* [rllib] Rename PolicyEvaluator => RolloutWorker (ray-project#4820)

* Fix local cluster yaml (ray-project#4918)

* [tune] Directional metrics for components (ray-project#4120) (ray-project#4915)

* [Core Worker] implement ObjectInterface and add test framework (ray-project#4899)

* [tune] Make PBT Quantile fraction configurable (ray-project#4912)

* Better organize ray_common module (ray-project#4898)

* Fix error

* Fix compute actions return value
stefanpantic added a commit to wingman-ai/ray that referenced this pull request Jun 21, 2019
* [rllib] Remove dependency on TensorFlow (ray-project#4764)

* remove hard tf dep

* add test

* comment fix

* fix test

* Dynamic Custom Resources - create and delete resources (ray-project#3742)

* Update tutorial link in doc (ray-project#4777)

* [rllib] Implement learn_on_batch() in torch policy graph

* Fix `ray stop` by killing raylet before plasma (ray-project#4778)

* Fatal check if object store dies (ray-project#4763)

* [rllib] fix clip by value issue as TF upgraded (ray-project#4697)

*  fix clip_by_value issue

*  fix typo

* [autoscaler] Fix submit (ray-project#4782)

* Queue tasks in the raylet in between async callbacks (ray-project#4766)

* Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued

* Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue

* cleanups

* updates

* updates

* [Java][Bazel]  Refine auto-generated pom files (ray-project#4780)

* Bump version to 0.7.0 (ray-project#4791)

* [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798)

* Add WorkerUncaughtExceptionHandler

* Fix

* revert bazel and pom

* [tune] Fix CLI test (ray-project#4801)

* Fix pom file generation (ray-project#4800)

* [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771)

* [rllib] TensorFlow 2 compatibility (ray-project#4802)

* Change tagline in documentation and README. (ray-project#4807)

* Update README.rst, index.rst, tutorial.rst and  _config.yml

* [tune] Support non-arg submit (ray-project#4803)

* [autoscaler] rsync cluster (ray-project#4785)

* [tune] Remove extra parsing functionality (ray-project#4804)

* Fix Java worker log dir (ray-project#4781)

* [tune] Initial track integration (ray-project#4362)

Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution.

* [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795)

* dynamic graph

* wip

* clean up

* fix

* document trainer

* wip

* initialize the graph using a fake batch

* clean up dynamic init

* wip

* spelling

* use builder for ppo pol graph

* add ppo graph

* fix naming

* order

* docs

* set class name correctly

* add torch builder

* add custom model support in builder

* cleanup

* remove underscores

* fix py2 compat

* Update dynamic_tf_policy_graph.py

* Update tracking_dict.py

* wip

* rename

* debug level

* rename policy_graph -> policy in new classes

* fix test

* rename ppo tf policy

* port appo too

* forgot grads

* default policy optimizer

* make default config optional

* add config to optimizer

* use lr by default in optimizer

* update

* comments

* remove optimizer

* fix tuple actions support in dynamic tf graph

* [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819)

This implements some of the renames proposed in ray-project#4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.

* [Java] Dynamic resource API in Java (ray-project#4824)

* Add default values for Wgym flags

* Fix import

* Fix issue when starting `raylet_monitor` (ray-project#4829)

* Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776)

* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict

* Fix bug in which actor classes are not exported multiple times. (ray-project#4838)

* Bump Ray master version to 0.8.0.dev0 (ray-project#4845)

* Add section to bump version of master branch and cleanup release docs (ray-project#4846)

* Fix import

* Export remote functions when first used and also fix bug in which rem… (ray-project#4844)

* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.

* Documentation update

* Fix tests.

* Fix grammar

* Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847)

* [tune] Later expansion of local_dir (ray-project#4806)

* [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832)

* Fix a typo in kubernetes yaml (ray-project#4872)

* Move global state API out of global_state object. (ray-project#4857)

* Install bazel in autoscaler development configs. (ray-project#4874)

* [tune] Fix up Ax Search and Examples (ray-project#4851)

* update Ax for cleaner API

* docs update

* [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821)

* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info

* [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO

* Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854)

* Refine TaskInfo

* Fix

* Add a test to print task info size

* Lint

* Refine

* Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862)

* Update deps to support bzl 2.5.x

* Fix

* Upgrade arrow to latest master (ray-project#4858)

* [tune] Auto-init Ray + default SearchAlg (ray-project#4815)

* Bump version from 0.8.0.dev0 to 0.7.1. (ray-project#4890)

* [rllib] Allow access to batches prior to postprocessing (ray-project#4871)

* [rllib] Fix Multidiscrete support (ray-project#4869)

* Refactor redis callback handling (ray-project#4841)

* Add CallbackReply

* Fix

* fix linting by format.sh

* Fix linting

* Address comments.

* Fix

* Initial high-level code structure of CoreWorker. (ray-project#4875)

* Drop duplicated string format (ray-project#4897)

This string format is unnecessary. java_worker_options has been appended to the commandline later.

* Refactor ID Serial 2: change all ID functions to `CamelCase` (ray-project#4896)

* Hotfix for change of from_random to FromRandom (ray-project#4909)

* [rllib] Fix documentation on custom policies (ray-project#4910)

* wip

* add docs

* lint

* todo sections

* fix doc

* [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (ray-project#4894)

* fix torch extra out

* preserve setitem

* fix docs

* [tune] Pretty print params json in logger.py (ray-project#4903)

* [sgd] Distributed Training via PyTorch (ray-project#4797)

Implements distributed SGD using distributed PyTorch.

* [rllib] Rough port of DQN to build_tf_policy() pattern (ray-project#4823)

* fetching objects in parallel in _get_arguments_for_execution (ray-project#4775)

* [tune] Disallow setting resources_per_trial when it is already configured (ray-project#4880)

* disallow it

* import fix

* fix example

* fix test

* fix tests

* Update mock.py

* fix

* make less convoluted

* fix tests

* [rllib] Rename PolicyEvaluator => RolloutWorker (ray-project#4820)

* Fix local cluster yaml (ray-project#4918)

* [tune] Directional metrics for components (ray-project#4120) (ray-project#4915)

* [Core Worker] implement ObjectInterface and add test framework (ray-project#4899)

* [tune] Make PBT Quantile fraction configurable (ray-project#4912)

* Better organize ray_common module (ray-project#4898)

* Fix error

* [tune] Add requirements-dev.txt and update docs for contributing (ray-project#4925)

* Add requirements-dev.txt and update docs.

* Update doc/source/tune-contrib.rst

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* Unpin everything except for yapf.

* Fix compute actions return value

* Bump version from 0.7.1 to 0.8.0.dev1. (ray-project#4937)

* Update version number in documentation after release 0.7.0 -> 0.7.1 and 0.8.0.dev0 -> 0.8.0.dev1. (ray-project#4941)

* [doc] Update developer docs with bazel instructions (ray-project#4944)

* [C++] Add hash table to Redis-Module (ray-project#4911)

* Flush lineage cache on task submission instead of execution (ray-project#4942)

* [rllib] Add docs on how to use TF eager execution (ray-project#4927)

* [rllib] Port remainder of algorithms to build_trainer() pattern (ray-project#4920)

* Fix resource bookkeeping bug with acquiring unknown resource. (ray-project#4945)

* Update aws keys for uploading wheels to s3. (ray-project#4948)

* Upload wheels on Travis to branchname/commit_id. (ray-project#4949)

* [Java] Fix serializing issues of `RaySerializer` (ray-project#4887)

* Fix

* Address comment.

* fix (ray-project#4950)

* [Java] Add inner class `Builder` to build call options. (ray-project#4956)

* Add Builder class

* format

* Refactor by IDE

* Remove uncessary dependency

* Make release stress tests work and improve them. (ray-project#4955)

* Use proper session directory for debug_string.txt (ray-project#4960)

* [core] Use int64_t instead of int to keep track of fractional resources (ray-project#4959)

* [core worker] add task submission & execution interface (ray-project#4922)

* [sgd] Add non-distributed PyTorch runner (ray-project#4933)

* Add non-distributed PyTorch runner

* use dist.is_available() instead of checking OS

* Nicer exception

* Fix bug in choosing port

* Refactor some code

* Address comments

* Address comments

* Flush all tasks from local lineage cache after a node failure (ray-project#4964)

* Remove typing from setup.py install_requirements. (ray-project#4971)

* [Java] Fix bug of `BaseID` in multi-threading case. (ray-project#4974)

* [rllib] Fix DDPG example (ray-project#4973)

* Upgrade CI clang-format to 6.0 (ray-project#4976)

* [Core worker] add store & task provider (ray-project#4966)

* Fix bugs in the a3c code template. (ray-project#4984)

* Inherit Function Docstrings and other metedata (ray-project#4985)

* Fix a crash when unknown worker registering to raylet (ray-project#4992)

* [gRPC] Use gRPC for inter-node-manager communication (ray-project#4968)
stefanpantic added a commit to wingman-ai/ray that referenced this pull request Jun 26, 2019
* [rllib] Remove dependency on TensorFlow (ray-project#4764)

* remove hard tf dep

* add test

* comment fix

* fix test

* Dynamic Custom Resources - create and delete resources (ray-project#3742)

* Update tutorial link in doc (ray-project#4777)

* [rllib] Implement learn_on_batch() in torch policy graph

* Fix `ray stop` by killing raylet before plasma (ray-project#4778)

* Fatal check if object store dies (ray-project#4763)

* [rllib] fix clip by value issue as TF upgraded (ray-project#4697)

*  fix clip_by_value issue

*  fix typo

* [autoscaler] Fix submit (ray-project#4782)

* Queue tasks in the raylet in between async callbacks (ray-project#4766)

* Add a SWAP TaskQueue so that we can keep track of tasks that are temporarily dequeued

* Fix bug where tasks that fail to be forwarded don't appear to be local by adding them to SWAP queue

* cleanups

* updates

* updates

* [Java][Bazel]  Refine auto-generated pom files (ray-project#4780)

* Bump version to 0.7.0 (ray-project#4791)

* [JAVA] setDefaultUncaughtExceptionHandler to log uncaught exception in user thread. (ray-project#4798)

* Add WorkerUncaughtExceptionHandler

* Fix

* revert bazel and pom

* [tune] Fix CLI test (ray-project#4801)

* Fix pom file generation (ray-project#4800)

* [rllib] Support continuous action distributions in IMPALA/APPO (ray-project#4771)

* [rllib] TensorFlow 2 compatibility (ray-project#4802)

* Change tagline in documentation and README. (ray-project#4807)

* Update README.rst, index.rst, tutorial.rst and  _config.yml

* [tune] Support non-arg submit (ray-project#4803)

* [autoscaler] rsync cluster (ray-project#4785)

* [tune] Remove extra parsing functionality (ray-project#4804)

* Fix Java worker log dir (ray-project#4781)

* [tune] Initial track integration (ray-project#4362)

Introduces a minimally invasive utility for logging experiment results. A broad requirement for this tool is that it should integrate seamlessly with Tune execution.

* [rllib] [RFC] Dynamic definition of loss functions and modularization support (ray-project#4795)

* dynamic graph

* wip

* clean up

* fix

* document trainer

* wip

* initialize the graph using a fake batch

* clean up dynamic init

* wip

* spelling

* use builder for ppo pol graph

* add ppo graph

* fix naming

* order

* docs

* set class name correctly

* add torch builder

* add custom model support in builder

* cleanup

* remove underscores

* fix py2 compat

* Update dynamic_tf_policy_graph.py

* Update tracking_dict.py

* wip

* rename

* debug level

* rename policy_graph -> policy in new classes

* fix test

* rename ppo tf policy

* port appo too

* forgot grads

* default policy optimizer

* make default config optional

* add config to optimizer

* use lr by default in optimizer

* update

* comments

* remove optimizer

* fix tuple actions support in dynamic tf graph

* [rllib] Rename PolicyGraph => Policy, move from evaluation/ to policy/ (ray-project#4819)

This implements some of the renames proposed in ray-project#4813
We leave behind backwards-compatibility aliases for *PolicyGraph and SampleBatch.

* [Java] Dynamic resource API in Java (ray-project#4824)

* Add default values for Wgym flags

* Fix import

* Fix issue when starting `raylet_monitor` (ray-project#4829)

* Refactor ID Serial 1: Separate ObjectID and TaskID from UniqueID (ray-project#4776)

* Enable BaseId.

* Change TaskID and make python test pass

* Remove unnecessary functions and fix test failure and change TaskID to
16 bytes.

* Java code change draft

* Refine

* Lint

* Update java/api/src/main/java/org/ray/api/id/TaskId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/BaseId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/api/src/main/java/org/ray/api/id/ObjectId.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comment

* Lint

* Fix SINGLE_PROCESS

* Fix comments

* Refine code

* Refine test

* Resolve conflict

* Fix bug in which actor classes are not exported multiple times. (ray-project#4838)

* Bump Ray master version to 0.8.0.dev0 (ray-project#4845)

* Add section to bump version of master branch and cleanup release docs (ray-project#4846)

* Fix import

* Export remote functions when first used and also fix bug in which rem… (ray-project#4844)

* Export remote functions when first used and also fix bug in which remote functions and actor classes are not exported from workers during subsequent ray sessions.

* Documentation update

* Fix tests.

* Fix grammar

* Update wheel versions in documentation to 0.8.0.dev0 and 0.7.0. (ray-project#4847)

* [tune] Later expansion of local_dir (ray-project#4806)

* [rllib] [RFC] Deprecate Python 2 / RLlib (ray-project#4832)

* Fix a typo in kubernetes yaml (ray-project#4872)

* Move global state API out of global_state object. (ray-project#4857)

* Install bazel in autoscaler development configs. (ray-project#4874)

* [tune] Fix up Ax Search and Examples (ray-project#4851)

* update Ax for cleaner API

* docs update

* [rllib] Update concepts docs and add "Building Policies in Torch/TensorFlow" section (ray-project#4821)

* wip

* fix index

* fix bugs

* todo

* add imports

* note on get ph

* note on get ph

* rename to building custom algs

* add rnn state info

* [rllib] Fix error getting kl when simple_optimizer: True in multi-agent PPO

* Replace ReturnIds with NumReturns in TaskInfo to reduce the size (ray-project#4854)

* Refine TaskInfo

* Fix

* Add a test to print task info size

* Lint

* Refine

* Update deps commits of opencensus to support building with bzl 0.25.x (ray-project#4862)

* Update deps to support bzl 2.5.x

* Fix

* Upgrade arrow to latest master (ray-project#4858)

* [tune] Auto-init Ray + default SearchAlg (ray-project#4815)

* Bump version from 0.8.0.dev0 to 0.7.1. (ray-project#4890)

* [rllib] Allow access to batches prior to postprocessing (ray-project#4871)

* [rllib] Fix Multidiscrete support (ray-project#4869)

* Refactor redis callback handling (ray-project#4841)

* Add CallbackReply

* Fix

* fix linting by format.sh

* Fix linting

* Address comments.

* Fix

* Initial high-level code structure of CoreWorker. (ray-project#4875)

* Drop duplicated string format (ray-project#4897)

This string format is unnecessary. java_worker_options has been appended to the commandline later.

* Refactor ID Serial 2: change all ID functions to `CamelCase` (ray-project#4896)

* Hotfix for change of from_random to FromRandom (ray-project#4909)

* [rllib] Fix documentation on custom policies (ray-project#4910)

* wip

* add docs

* lint

* todo sections

* fix doc

* [rllib] Allow Torch policies access to full action input dict in extra_action_out_fn (ray-project#4894)

* fix torch extra out

* preserve setitem

* fix docs

* [tune] Pretty print params json in logger.py (ray-project#4903)

* [sgd] Distributed Training via PyTorch (ray-project#4797)

Implements distributed SGD using distributed PyTorch.

* [rllib] Rough port of DQN to build_tf_policy() pattern (ray-project#4823)

* fetching objects in parallel in _get_arguments_for_execution (ray-project#4775)

* [tune] Disallow setting resources_per_trial when it is already configured (ray-project#4880)

* disallow it

* import fix

* fix example

* fix test

* fix tests

* Update mock.py

* fix

* make less convoluted

* fix tests

* [rllib] Rename PolicyEvaluator => RolloutWorker (ray-project#4820)

* Fix local cluster yaml (ray-project#4918)

* [tune] Directional metrics for components (ray-project#4120) (ray-project#4915)

* [Core Worker] implement ObjectInterface and add test framework (ray-project#4899)

* [tune] Make PBT Quantile fraction configurable (ray-project#4912)

* Better organize ray_common module (ray-project#4898)

* Fix error

* [tune] Add requirements-dev.txt and update docs for contributing (ray-project#4925)

* Add requirements-dev.txt and update docs.

* Update doc/source/tune-contrib.rst

Co-Authored-By: Richard Liaw <rliaw@berkeley.edu>

* Unpin everything except for yapf.

* Fix compute actions return value

* Bump version from 0.7.1 to 0.8.0.dev1. (ray-project#4937)

* Update version number in documentation after release 0.7.0 -> 0.7.1 and 0.8.0.dev0 -> 0.8.0.dev1. (ray-project#4941)

* [doc] Update developer docs with bazel instructions (ray-project#4944)

* [C++] Add hash table to Redis-Module (ray-project#4911)

* Flush lineage cache on task submission instead of execution (ray-project#4942)

* [rllib] Add docs on how to use TF eager execution (ray-project#4927)

* [rllib] Port remainder of algorithms to build_trainer() pattern (ray-project#4920)

* Fix resource bookkeeping bug with acquiring unknown resource. (ray-project#4945)

* Update aws keys for uploading wheels to s3. (ray-project#4948)

* Upload wheels on Travis to branchname/commit_id. (ray-project#4949)

* [Java] Fix serializing issues of `RaySerializer` (ray-project#4887)

* Fix

* Address comment.

* fix (ray-project#4950)

* [Java] Add inner class `Builder` to build call options. (ray-project#4956)

* Add Builder class

* format

* Refactor by IDE

* Remove uncessary dependency

* Make release stress tests work and improve them. (ray-project#4955)

* Use proper session directory for debug_string.txt (ray-project#4960)

* [core] Use int64_t instead of int to keep track of fractional resources (ray-project#4959)

* [core worker] add task submission & execution interface (ray-project#4922)

* [sgd] Add non-distributed PyTorch runner (ray-project#4933)

* Add non-distributed PyTorch runner

* use dist.is_available() instead of checking OS

* Nicer exception

* Fix bug in choosing port

* Refactor some code

* Address comments

* Address comments

* Flush all tasks from local lineage cache after a node failure (ray-project#4964)

* Remove typing from setup.py install_requirements. (ray-project#4971)

* [Java] Fix bug of `BaseID` in multi-threading case. (ray-project#4974)

* [rllib] Fix DDPG example (ray-project#4973)

* Upgrade CI clang-format to 6.0 (ray-project#4976)

* [Core worker] add store & task provider (ray-project#4966)

* Fix bugs in the a3c code template. (ray-project#4984)

* Inherit Function Docstrings and other metedata (ray-project#4985)

* Fix a crash when unknown worker registering to raylet (ray-project#4992)

* [gRPC] Use gRPC for inter-node-manager communication (ray-project#4968)

* Fix Java CI failure (ray-project#4995)

* fix handling of non-integral timeout values in signal.receive (ray-project#5002)

* temp fix for build (ray-project#5006)

* [tune] Tutorial UX Changes (ray-project#4990)

* add integration, iris, ASHA, recursive changes, set reuse_actors=True, and enable Analysis as a return object

* docstring

* fix up example

* fix

* cleanup tests

* experiment analysis

* Fix valgrind build by installing new version of valgrind (ray-project#5008)

* Fix no cpus test (ray-project#5009)

* Fix tensorflow-1.14 installation in jenkins (ray-project#5007)

* Add dynamic worker options for worker command. (ray-project#4970)

* Add fields for fbs

* WIP

* Fix complition errors

* Add java part

* FIx

* Fix

* Fix

* Fix lint

* Refine API

* address comments and add test

* Fix

* Address comment.

* Address comments.

* Fix linting

* Refine

* Fix lint

* WIP: address comment.

* Fix java

* Fix py

* Refin

* Fix

* Fix

* Fix linting

* Fix lint

* Address comments

* WIP

* Fix

* Fix

* minor refine

* Fix lint

* Fix raylet test.

* Fix lint

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Update java/runtime/src/main/java/org/ray/runtime/AbstractRayRuntime.java

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix test.

* Update src/ray/raylet/worker_pool.h

Co-Authored-By: Hao Chen <chenh1024@gmail.com>

* Address comments.

* Address comments.

* Fix

* Fix lint

* Fix lint

* Fix

* Address comments.

* Fix linting

* [docs] docs for running Tensorboard without sudo (ray-project#5015)

* Instructions for running Tensorboard without sudo

When we run Tensorboard to visualize the results of Ray outputs on multi-user clusters where we don't have sudo access, such as RISE clusters, a few commands need to first be run to make sure tensorboard can edit the tmp directory. This is a pretty common usecase so I figured we may as well put it in the documentation for Tune.

* Update tune-usage.rst

* [ci] Change Jenkins to py3 (ray-project#5022)

* conda3

* integration

* add nevergrad, remotedata

* pytest 0.3.1

* otherdockers

* setup

* tune

* [gRPC] Migrate gcs data structures to protobuf (ray-project#5024)

* [rllib] Add QMIX mixer parameters to optimizer param list (ray-project#5014)

* add mixer params

* Update qmix_policy.py

* [grpc] refactor rpc server to support multiple io services (ray-project#5023)

* [rllib] Give error if sample_async is used with pytorch for A3C (ray-project#5000)

* give error if sample_async is used with pytorch

* update

* Update a3c.py

* [tune] Update MNIST Example (ray-project#4991)

* Add entropy coeff schedule

* Revert "Merge with ray master"

This reverts commit 108bfa2, reversing
changes made to 2e0eec9.

* Revert "Revert "Merge with ray master""

This reverts commit 92c0f88.

* Remove entropy decay stuff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants