Skip to content

Commit

Permalink
Add info about abuse of type system [ci skip]
Browse files Browse the repository at this point in the history
Also adds numerous tweaks in response to review comments.
  • Loading branch information
timholy committed Apr 20, 2016
1 parent b9935d5 commit f38993c
Show file tree
Hide file tree
Showing 2 changed files with 107 additions and 25 deletions.
83 changes: 72 additions & 11 deletions doc/manual/performance-tips.rst
Original file line number Diff line number Diff line change
Expand Up @@ -589,8 +589,8 @@ loop. There are several possible fixes:

.. _man-performance-kernel-functions:

Separate kernel functions
-------------------------
Separate kernel functions (aka, function barriers)
--------------------------------------------------

Many functions follow a pattern of performing some set-up work, and then
running many iterations to perform a core computation. Where possible,
Expand Down Expand Up @@ -641,8 +641,8 @@ contain either integers, floats, strings, or something else.

.. _man-performance-val:

Using the ``Val{T}`` trick
--------------------------
Types with values-as-parameters
-------------------------------

Let's say you want to create an ``N``-dimensional array that
has size 3 along each axis. Such arrays can be created like this:
Expand All @@ -652,13 +652,14 @@ has size 3 along each axis. Such arrays can be created like this:
A = fill(5.0, (3, 3))

This approach works very well: the compiler can figure out that ``A``
is an ``Array{Float64,2}`` because it know the type of the fill value
is an ``Array{Float64,2}`` because it knows the type of the fill value
(``5.0::Float64``) and the dimensionality (``(3, 3)::NTuple{2,Int}``).
This implies that the compiler can generate very efficient code for
any future usage of ``A`` in the same function.

But now let's say you want to support this for arbitrary dimensions;
you might be tempted to write a function
But now let's say you want to write a function that creates a
3×3×... array in arbitrary dimensions; you might be tempted to write a
function

.. doctest::

Expand All @@ -678,7 +679,8 @@ Now, one very good way to solve such problems is by using the
:ref:`function-barrier technique
<man-performance-kernel-functions>`. However, in some cases you might
want to eliminate the type-instability altogether. In such cases, one
approach is to make use of ``Val{T}`` (see :ref:`man-val-trick`):
approach is to pass the dimensionality as a parameter, for example
through ``Val{T}`` (see :ref:`man-val-trick`):

.. doctest::

Expand All @@ -692,8 +694,9 @@ type-parameter, you make its "value" known to the compiler.
Consequently, this version of ``array3`` allows the compiler to
predict the return type.

However, making use of ``Val`` can be surprisingly subtle: it would be
of no help if you called ``array3`` from a function like this:
However, making use of such techniques can be surprisingly subtle. For
example, it would be of no help if you called ``array3`` from a
function like this:

.. doctest::

Expand All @@ -718,11 +721,69 @@ An example of correct usage of ``Val`` would be:
filter(A, kernel)
end

In this example, ``N`` is specified as a parameter, so its "value" is
In this example, ``N`` is passed as a parameter, so its "value" is
known to the compiler. Essentially, ``Val{T}`` works only when ``T``
is either hard-coded (``Val{3}``) or already specified in the
type-domain.

The dangers of abusing multiple dispatch (aka, more on types with values-as-parameters)
---------------------------------------------------------------------------------------

Once one learns to appreciate multiple dispatch, there's an
understandable tendency to go crazy and try to use it for everything.
For example, you might imagine using it to store information, e.g.
::
immutable Car{Make,Model}
year::Int
...more fields...
end

and then dispatch on objects like ``Car{:Honda,:Accord}(year, args...)``.

This might be worthwhile when the following are true:

- You require CPU-intensive processing on each ``Car``, and it becomes
vastly more efficient if you know the ``Make`` and ``Model`` at
compile time.

- You have huge, homogenous lists of the same type of ``Car`` to
process, so that you can store them all in an
``Array{Car{:Honda,:Accord},N}``.

When the latter holds, a function processing such a homogenous array
can be productively specialized: julia knows the type of each element
in advance (all objects in the container have the same concrete type),
so julia can "look up" the correct method calls when the function is
being compiled (obviating the need to check at run-time) and thereby
emit efficient code for processing the whole list.

When these do not hold, then it's likely that you'll get no benefit;
worse, the resulting "combinatorial explosion of types" will be
counterproductive. If ``items[i+1]`` has a different type than
``item[i]``, julia has to look up the type at run-time, search for the
appropriate method in method tables, decide (via type intersection)
which one matches, determine whether it has been JIT-compiled yet (and
do so if not), and then make the call. In essence, you're asking the
full type- system and JIT-compilation machinery to basically execute
the equivalent of a switch statement or dictionary lookup in your own
code.

Some run-time benchmarks comparing (1) type dispatch, (2) dictionary
lookup, and (3) a "switch" statement can be found `on the mailing list
<https://groups.google.com/d/msg/julia-users/jUMu9A3QKQQ/qjgVWr7vAwAJ>`_.

Perhaps even worse than the run-time impact is the compile-time
impact: julia will compile specialized functions for each different
``Car{Make, Model}``; if you have hundreds or thousands of such types,
then every function that accepts such an object as a parameter (from a
custom ``get_year`` function you might write yourself, to the generic
``push!`` function in the standard library) will have hundreds or
thousands of variants compiled for it. Each of these increases the
size of the cache of compiled code, the length of internal lists of
methods, etc. Excess enthusiasm for values-as-parameters can easily
waste enormous resources.


Access arrays in memory order, along columns
--------------------------------------------

Expand Down
49 changes: 35 additions & 14 deletions doc/manual/types.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1242,31 +1242,52 @@ If you apply :func:`supertype` to other type objects (or non-type objects), a
"Value types"
-------------

As one application of these ideas, Julia includes a parametric type,
``Val{T}``, designated for dispatching on bits-type *values*.
Normally, you can't dispatch on a value such as ``true`` or ``false``,
but the ``Val`` type makes this possible:
In Julia, you can't dispatch on a *value* such as ``true`` or
``false``. However, you can dispatch on parametric types, and Julia
allows you to include "plain bits" values (Types, Symbols,
Integers, floating-point numbers, tuples, etc.) as type parameters. A
common example is the dimensionality parameter in ``Array{T,N}``,
where ``T`` is a type (e.g., ``Float64``) but ``N`` is just an
``Int``.

You can create your own custom types that take values as parameters,
and use them to control dispatch of custom types. By way of
illustration of this idea, let's introduce a parametric type,
``Val{T}``, which serves as a customary way to exploit this technique
for cases where you don't need a more elaborate hierarchy.

``Val`` is defined as::

immutable Val{T}
end

There is no more to the implementation of ``Val`` than this. Some
functions in Julia's standard library accept ``Val`` types as
arguments, and you can also use it to write your own functions. For
example:

.. doctest::

firstlast(::Type{Val{true}}) = "First"
firstlast(::Type{Val{false}}) = "Last"

println(firstlast(Val{true}))
julia> println(firstlast(Val{true}))
First

Any legal type parameter (Types, Symbols, Integers, floating-point
numbers, tuples, etc.) can be passed via ``Val``.
julia> println(firstlast(Val{false}))
Last

For consistency across Julia, the call site should always pass a
``Val`` type rather than creating an instance, i.e., use
``Val`` *type* rather than creating an *instance*, i.e., use
``foo(Val{:bar})`` rather than ``foo(Val{:bar}())``.

It's worth noting that it's extremely easy to mis-use the ``Val``
trick, and you can easily end up making the performance of your code
much *worse*. For example, you would never want to write actual code
as illustrated above. For more information about the proper (and
improper) uses of ``Val``, please read the more extensive discussion
in :ref:`the performance tips <man-performance-val>`.
It's worth noting that it's extremely easy to mis-use parametric
"value" types, including ``Val``; in unfavorable cases, you can easily
end up making the performance of your code much *worse*. In
particular, you would never want to write actual code as illustrated
above. For more information about the proper (and improper) uses of
``Val``, please read the more extensive discussion in :ref:`the
performance tips <man-performance-val>`.

.. _man-nullable-types:

Expand Down

0 comments on commit f38993c

Please sign in to comment.