Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] rewrite #5175

Merged
merged 40 commits into from
Aug 6, 2019
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
281aafc
rewrite
richardliaw Jul 11, 2019
590b0d1
Merge branch 'master' into docsrevamp
richardliaw Jul 17, 2019
3847cd9
rst
richardliaw Jul 17, 2019
1b90922
moremodifications
richardliaw Jul 17, 2019
0f081fa
revert
richardliaw Jul 17, 2019
65ff355
revamptune
richardliaw Jul 17, 2019
009ed62
revert
richardliaw Jul 17, 2019
5f17fb9
update
richardliaw Jul 17, 2019
4e69c89
compile
richardliaw Jul 18, 2019
66b0eee
first framework
richardliaw Jul 18, 2019
076fa42
Merge branch 'master' into docsrevamp
richardliaw Jul 23, 2019
5e347d4
resources
richardliaw Jul 24, 2019
6dabb43
sym
richardliaw Jul 24, 2019
257d172
ln
richardliaw Jul 24, 2019
8643e6a
fixwalk
richardliaw Jul 26, 2019
8ef82ae
removal
richardliaw Jul 26, 2019
f37f911
configure
richardliaw Jul 26, 2019
d0a67d8
todos
richardliaw Jul 26, 2019
d8b5f20
todos
richardliaw Jul 26, 2019
784b266
Revampmore
richardliaw Jul 27, 2019
860047b
Get rid of things
richardliaw Jul 27, 2019
d4bf0c5
rm
richardliaw Jul 27, 2019
cdf4e5b
package
richardliaw Jul 27, 2019
31b5e4f
profile
richardliaw Jul 27, 2019
7709c4e
movem
richardliaw Jul 27, 2019
76acf5c
move serialization
richardliaw Jul 27, 2019
1db213b
index
richardliaw Jul 27, 2019
d06d4aa
homepage
richardliaw Jul 28, 2019
e03ecfb
Updates for api
richardliaw Jul 28, 2019
b016a59
changes
ericl Jul 30, 2019
1331fa9
Merge remote-tracking branch 'upstream/master' into docsrevamp
ericl Jul 30, 2019
fc5d396
Some updates to installation and walkthrough.
robertnishihara Aug 1, 2019
4964268
Update actor documentation.
robertnishihara Aug 1, 2019
e202735
Improve GPU documentation.
robertnishihara Aug 1, 2019
95d3faf
Small improvements
robertnishihara Aug 1, 2019
9f521e9
Update doc/source/index.rst
richardliaw Aug 5, 2019
6f8a07c
Update doc/source/index.rst
richardliaw Aug 5, 2019
e0620a7
spacing
ericl Aug 5, 2019
ea8909a
Merge branch 'docsrevamp' of github.com:richardliaw/ray into docsrevamp
ericl Aug 5, 2019
5a67db3
Merge remote-tracking branch 'upstream/master' into docsrevamp
ericl Aug 6, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
282 changes: 56 additions & 226 deletions doc/source/actors.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,13 @@
Actors
======
How-to: Using Actors
====================

Remote functions in Ray should be thought of as functional and side-effect free.
Restricting ourselves only to remote functions gives us distributed functional
programming, which is great for many use cases, but in practice is a bit
limited.

Ray extends the dataflow model with **actors**. An actor is essentially a
stateful worker (or a service). When a new actor is instantiated, a new worker
is created, and methods of the actor are scheduled on that specific worker and
An actor is essentially a stateful worker (or a service). When a new actor is instantiated, a new worker is created, and methods of the actor are scheduled on that specific worker and
can access and mutate the state of that worker.

Suppose we've already started Ray.

.. code-block:: python

import ray
ray.init()

Defining and creating an actor
------------------------------
Creating an actor
-----------------

Consider the following simple example. The ``ray.remote`` decorator indicates
that instances of the ``Counter`` class will be actors.
You can convert a standard python class into an Actor class as follows:

.. code-block:: python

Expand All @@ -35,226 +20,96 @@ that instances of the ``Counter`` class will be actors.
self.value += 1
return self.value

To actually create an actor, we can instantiate this class by calling
``Counter.remote()``.
Note that the above is equivalent to the following:

.. code-block:: python

a1 = Counter.remote()
a2 = Counter.remote()
class Counter(object):
def __init__(self):
self.value = 0

def increment(self):
self.value += 1
return self.value

RemoteCounter = ray.remote(Counter)

When an actor is instantiated, the following events happen.

1. A node in the cluster is chosen and a worker process is created on that node
(by the raylet on that node) for the purpose of running methods
called on the actor.
for the purpose of running methods called on the actor.
2. A ``Counter`` object is created on that worker and the ``Counter``
constructor is run.

Using an actor
--------------

We can schedule tasks on the actor by calling its methods.
Any method of the actor can return multiple object IDs with the ``ray.method`` decorator:

.. code-block:: python

a1.increment.remote() # ray.get returns 1
a2.increment.remote() # ray.get returns 1
@ray.remote
class Foo(object):

When ``a1.increment.remote()`` is called, the following events happens.
@ray.method(num_return_vals=2)
def bar(self):
return 1, 2

1. A task is created.
2. The task is assigned directly to the raylet responsible for the
actor by the driver's raylet.
3. An object ID is returned.
f = Foo.remote()

We can then call ``ray.get`` on the object ID to retrieve the actual value.

Similarly, the call to ``a2.increment.remote()`` generates a task that is
scheduled on the second ``Counter`` actor. Since these two tasks run on
different actors, they can be executed in parallel (note that only actor
methods will be scheduled on actor workers, regular remote functions will not
be).

On the other hand, methods called on the same ``Counter`` actor are executed
serially in the order that they are called. They can thus share state with
one another, as shown below.

.. code-block:: python
obj_id1, obj_id2 = f.bar.remote()
assert ray.get(obj_id1) == 1
assert ray.get(obj_id2) == 2

# Create ten Counter actors.
counters = [Counter.remote() for _ in range(10)]
Resources with Actors
---------------------

# Increment each Counter once and get the results. These tasks all happen in
# parallel.
results = ray.get([c.increment.remote() for c in counters])
print(results) # prints [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
You can specify that an actor requires CPUs or GPUs in the decorator. While Ray has built-in support for CPUs and GPUs, Ray can also handle custom resources.

# Increment the first Counter five times. These tasks are executed serially
# and share state.
results = ray.get([counters[0].increment.remote() for _ in range(5)])
print(results) # prints [2, 3, 4, 5, 6]

A More Interesting Actor Example
--------------------------------

A common pattern is to use actors to encapsulate the mutable state managed by an
external library or service.

`Gym`_ provides an interface to a number of simulated environments for testing
and training reinforcement learning agents. These simulators are stateful, and
tasks that use these simulators must mutate their state. We can use actors to
encapsulate the state of these simulators.

.. _`Gym`: https://gym.openai.com/
When using GPUs, Ray will automatically set the environment variable ``CUDA_VISIBLE_DEVICES`` for the actor after instantiated. The actor will have access to a list of the IDs of the GPUs
that it is allowed to use via ``ray.get_gpu_ids()``. This is a list of integers,
like ``[]``, or ``[1]``, or ``[2, 5, 6]``.

.. code-block:: python

import gym

@ray.remote
class GymEnvironment(object):
def __init__(self, name):
self.env = gym.make(name)
self.env.reset()

def step(self, action):
return self.env.step(action)

def reset(self):
self.env.reset()
@ray.remote(num_cpus=2, num_gpus=1)
class GPUActor(object):
pass

We can then instantiate an actor and schedule a task on that actor as follows.
Note that this is equivalent to the following:

.. code-block:: python

pong = GymEnvironment.remote("Pong-v0")
pong.step.remote(0) # Take action 0 in the simulator.
class GPUActor(object):
pass

Using GPUs on actors
--------------------
GPUActor = ray.remote(num_cpus=2, num_gpus=1)(GPUActor)

A common use case is for an actor to contain a neural network. For example,
suppose we have imported Tensorflow and have created a method for constructing
a neural net.
When an ``GPUActor`` instance is created, it will be placed on a node that has at least 1 GPU, and the GPU will be reserved for the actor for the duration of the actor's lifetime (even if the actor is not executing tasks). The GPU resources will be released when the actor terminates.

.. code-block:: python

import tensorflow as tf
If you want to use custom resources, make sure your cluster is configured to have these resources (see `configuration instructions <configure.html#cluster-resources>`__):

def construct_network():
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])
.. code-block:: python

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
@ray.remote(resources={'Resource2': 1})
class GPUActor(object):
pass

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

return x, y_, train_step, accuracy
Terminating Actors
------------------

We can then define an actor for this network as follows.
For any actor, you can call ``__ray_terminate__.remote()`` to terminate the actor.
This will kill the actor process and release resources associated/assigned to the actor:

.. code-block:: python

import os

# Define an actor that runs on GPUs. If there are no GPUs, then simply use
# ray.remote without any arguments and no parentheses.
@ray.remote(num_gpus=1)
class NeuralNetOnGPU(object):
def __init__(self):
# Set an environment variable to tell TensorFlow which GPUs to use. Note
# that this must be done before the call to tf.Session.
os.environ["CUDA_VISIBLE_DEVICES"] = ",".join([str(i) for i in ray.get_gpu_ids()])
with tf.Graph().as_default():
with tf.device("/gpu:0"):
self.x, self.y_, self.train_step, self.accuracy = construct_network()
# Allow this to run on CPUs if there aren't any GPUs.
config = tf.ConfigProto(allow_soft_placement=True)
self.sess = tf.Session(config=config)
# Initialize the network.
init = tf.global_variables_initializer()
self.sess.run(init)

To indicate that an actor requires one GPU, we pass in ``num_gpus=1`` to
``ray.remote``. Note that in order for this to work, Ray must have been started
with some GPUs, e.g., via ``ray.init(num_gpus=2)``. Otherwise, when you try to
instantiate the GPU version with ``NeuralNetOnGPU.remote()``, an exception will
be thrown saying that there aren't enough GPUs in the system.

When the actor is created, it will have access to a list of the IDs of the GPUs
that it is allowed to use via ``ray.get_gpu_ids()``. This is a list of integers,
like ``[]``, or ``[1]``, or ``[2, 5, 6]``. Since we passed in
``ray.remote(num_gpus=1)``, this list will have length one.
@ray.remote
class Foo(object):
pass

We can put this all together as follows.
f = Foo.remote()
f.__ray_terminate__.remote()

.. code-block:: python
This is important since actors are not garbage collected.

import os
import ray
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

ray.init(num_gpus=8)

def construct_network():
x = tf.placeholder(tf.float32, [None, 784])
y_ = tf.placeholder(tf.float32, [None, 10])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

return x, y_, train_step, accuracy

@ray.remote(num_gpus=1)
class NeuralNetOnGPU(object):
def __init__(self, mnist_data):
self.mnist = mnist_data
# Set an environment variable to tell TensorFlow which GPUs to use. Note
# that this must be done before the call to tf.Session.
os.environ["CUDA_VISIBLE_DEVICES"] = ",".join([str(i) for i in ray.get_gpu_ids()])
with tf.Graph().as_default():
with tf.device("/gpu:0"):
self.x, self.y_, self.train_step, self.accuracy = construct_network()
# Allow this to run on CPUs if there aren't any GPUs.
config = tf.ConfigProto(allow_soft_placement=True)
self.sess = tf.Session(config=config)
# Initialize the network.
init = tf.global_variables_initializer()
self.sess.run(init)

def train(self, num_steps):
for _ in range(num_steps):
batch_xs, batch_ys = self.mnist.train.next_batch(100)
self.sess.run(self.train_step, feed_dict={self.x: batch_xs, self.y_: batch_ys})

def get_accuracy(self):
return self.sess.run(self.accuracy, feed_dict={self.x: self.mnist.test.images,
self.y_: self.mnist.test.labels})


# Load the MNIST dataset and tell Ray how to serialize the custom classes.
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)

# Create the actor.
nn = NeuralNetOnGPU.remote(mnist)

# Run a few steps of training and print the accuracy.
nn.train.remote(100)
accuracy = ray.get(nn.get_accuracy.remote())
print("Accuracy is {}.".format(accuracy))

Passing Around Actor Handles (Experimental)
-------------------------------------------
Expand Down Expand Up @@ -299,29 +154,4 @@ If we instantiate an actor, we can pass the handle around to various tasks.
for _ in range(10):
print(ray.get(counter.get_counter.remote()))

Current Actor Limitations
-------------------------

We are working to address the following issues.

1. **Actor lifetime management:** Currently, when the original actor handle for
an actor goes out of scope, a task is scheduled on that actor that kills the
actor process (this new task will run once all previous tasks have finished
running). This could be an issue if the original actor handle goes out of
scope, but the actor is still being used by tasks that have been passed the
actor handle.
2. **Returning actor handles:** Actor handles currently cannot be returned from
a remote function or actor method. Similarly, ``ray.put`` cannot be called on
an actor handle.
3. **Reconstruction of evicted actor objects:** If ``ray.get`` is called on an
evicted object that was created by an actor method, Ray currently will not
reconstruct the object. For more information, see the documentation on
`fault tolerance`_.
4. **Deterministic reconstruction of lost actors:** If an actor is lost due to
node failure, the actor is reconstructed on a new node, following the order
of initial execution. However, new tasks that are scheduled onto the actor
in the meantime may execute in between re-executed tasks. This could be an
issue if your application has strict requirements for state consistency.

.. _`asynchronous parameter server example`: http://ray.readthedocs.io/en/latest/example-parameter-server.html
.. _`fault tolerance`: http://ray.readthedocs.io/en/latest/fault-tolerance.html
Loading