Merge pull request #48 from avik-pal/ap/relax

Formatting updates and relax parameter type
LuxDL · Jun 11, 2022 · 72a39e7 · 72a39e7
2 parents 95d27d0 + 195041b
commit 72a39e7
Show file tree

Hide file tree

Showing 36 changed files with 673 additions and 545 deletions.
diff --git a/.JuliaFormatter.toml b/.JuliaFormatter.toml
@@ -1,2 +1,9 @@
 style = "sciml"
 whitespace_in_kwargs = false
+always_use_return = true
+margin = 92
+indent = 4
+format_docstrings = true
+join_lines_based_on_source = true
+separate_kwargs_with_semicolon = true
+always_for_in = true
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,20 +1,24 @@
 # v0.4
 
+## v0.4.5
+
+  - Allow Arbitrary Parameter Types
+
 ## v0.4.4
 
-* Updated to support julia v1.6 (test time dependency issues)
+  - Updated to support julia v1.6 (test time dependency issues)
 
 ## v0.4.3
 
-* Extending Scale to allow for multiple dimension inputs (https://github.com/avik-pal/Lux.jl/pull/40)
+  - Extending Scale to allow for multiple dimension inputs (https://github.com/avik-pal/Lux.jl/pull/40)
 
 ## v0.4.2
 
-* `SelectDim` is no longer type unstable -- Internal storage for the Layer has been changed
-* `Dropout` & `VariationalDropout` return `NoOpLayer` if the probability of dropout is `0`
-* Code Formatting -- SciMLStyle (https://github.com/avik-pal/Lux.jl/pull/31)
+  - `SelectDim` is no longer type unstable -- Internal storage for the Layer has been changed
+  - `Dropout` & `VariationalDropout` return `NoOpLayer` if the probability of dropout is `0`
+  - Code Formatting -- SciMLStyle (https://github.com/avik-pal/Lux.jl/pull/31)
 
 ## v0.4.1
 
-* Fix math rendering in docs
-* Add Setfield compat for v1.0
+  - Fix math rendering in docs
+  - Add Setfield compat for v1.0
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,7 @@
 name = "Lux"
 uuid = "b2108857-7c20-44ae-9111-449ecde12c47"
 authors = ["Avik Pal <[email protected]> and contributors"]
-version = "0.4.4"
+version = "0.4.5"
 
 [deps]
 Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 # Lux 🔥
 
-[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Latest Docs](https://img.shields.io/badge/docs-latest-blue.svg)](http://lux.csail.mit.edu/dev/) [![Stable Docs](https://img.shields.io/badge/docs-stable-blue.svg)](http://lux.csail.mit.edu/stable/) [![CI](https://github.com/avik-pal/Lux.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/avik-pal/Lux.jl/actions/workflows/CI.yml) [![codecov](https://codecov.io/gh/avik-pal/Lux.jl/branch/main/graph/badge.svg?token=IMqBM1e3hz)](https://codecov.io/gh/avik-pal/Lux.jl) [![ColPrac: Contributor's Guide on Collaborative Practices for Community Packages](https://img.shields.io/badge/ColPrac-Contributor's%20Guide-blueviolet)](https://github.com/SciML/ColPrac)
+[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![Latest Docs](https://img.shields.io/badge/docs-latest-blue.svg)](http://lux.csail.mit.edu/dev/) [![Stable Docs](https://img.shields.io/badge/docs-stable-blue.svg)](http://lux.csail.mit.edu/stable/) [![CI](https://github.com/avik-pal/Lux.jl/actions/workflows/CI.yml/badge.svg)](https://github.com/avik-pal/Lux.jl/actions/workflows/CI.yml) [![codecov](https://codecov.io/gh/avik-pal/Lux.jl/branch/main/graph/badge.svg?token=IMqBM1e3hz)](https://codecov.io/gh/avik-pal/Lux.jl) [![ColPrac: Contributor's Guide on Collaborative Practices for Community Packages](https://img.shields.io/badge/ColPrac-Contributor's%20Guide-blueviolet)](https://github.com/SciML/ColPrac) [![SciML Code Style](https://img.shields.io/static/v1?label=code%20style&message=SciML&color=9558b2&labelColor=389826)](https://github.com/SciML/SciMLStyle)
 
 
 The 🔥 Deep Learning Framework
@@ -21,15 +21,8 @@ rng = Random.default_rng()
 Random.seed!(rng, 0)
 
 # Construct the layer
-model = Chain(
-    BatchNorm(128),
-    Dense(128, 256, tanh),
-    BatchNorm(256),
-    Chain(
-        Dense(256, 1, tanh),
-        Dense(1, 10)
-    )
-)
+model = Chain(BatchNorm(128), Dense(128, 256, tanh), BatchNorm(256),
+              Chain(Dense(256, 1, tanh),Dense(1, 10)))
 
 # Parameter and State Variables
 ps, st = Lux.setup(rng, model) .|> gpu
@@ -54,9 +47,7 @@ Look in the [examples](/examples/) directory for self-contained usage examples.
 
 ## Ecosystem
 
-### Prebuilt Deep Learning Models
-
-See [Boltz](lib/Boltz/) for pre-built deep learning models with pretrained weights for popular datasets.
+Checkout our [Ecosystem](http://lux.csail.mit.edu/dev/introduction/ecosystem/) page for more details. 
 
 ## Getting Help
 

diff --git a/docs/make.jl b/docs/make.jl
@@ -72,9 +72,8 @@ makedocs(;
                  "Utilities" => "api/utilities.md",
              ],
              "Design Docs" => [
-                 "Documentation" => "design/documentation.md",
-                 "Recurrent Neural Networks" => "design/recurrent.md",
-                 "Add new functionality to Lux" => "design/core.md",
+                 "Contribution Guide" => "design/contributing.md",
+                 "Layer Implementation" => "design/layer_implementation.md",
              ],
          ])
 

diff --git a/docs/src/design/contributing.md b/docs/src/design/contributing.md
@@ -0,0 +1,40 @@
+# Contribution Guidelines
+
+## Adding New Functionality/Layers
+
+For Style we try to follow [SciMLStyle](https://github.com/SciML/SciMLStyle). The only reason we don't have a badge yet, is we haven't yet updated the package to followed all the guidelines. Here, I am documenting some additional guidelines we enforce:
+
+### Mutability
+
+See [SciMLStyle](https://github.com/SciML/SciMLStyle#out-of-place-and-immutability-is-preferred-when-sufficient-performant) for reference. This is strictly enforced, i.e. all layers/functions provided as part of the external API must be pure functions, even if they come with a performance penalty.
+
+### Branching -- Generated Functions
+
+Zygote doesn't like branches in code. Like it or not, we are stuck with it for the near future. Even if julia is able to optimize branches away, Zygote will most certainly throw away those optimizations (these can be tested via `Zygote.@code_ir`).
+
+#### Writing efficient non-branching code to make Zygote happy
+
+* Rely on `@generated` functions to remove **most** runtime branching. Certain examples:
+  * Layers behaving differently during training and inference -- we know at compile-time whether a layer is being run in training/inference mode via `istraining(st)`.
+  * Composite Layers relying on a variable number of internal layers -- Again we know the length of the number of internal layers at compile time. Hence we can manually unroll the loops. See [`Parallel`](@ref), [`Chain`](@ref), etc.
+* Pass around `Val` in state. `Flux.jl` sets `training` to be `(:auto, true, false)`. Hence, which branch will be evaluated, will have to be determined at runtime time (*bad*). Instead if we pass `Val(true)`, we will be able to specialize functions directly based on `true`, `false`, etc. ensuring there is no runtime cost for these operations. See [`BatchNorm`](@ref), [`Dropout`](@ref), etc.
+
+
+## Guide to Documentation for Lux.jl
+
+### Documentation for Layers
+
+The first line must be indented by 4 spaces and should contain the possible ways to construct the layer. This should be followed up with a description about what the layer does. If mathematical equations are needed to explain what the layer does, go for it. Often times we fuse parameters to make computation faster, this should be reflected in the equations being used, i.e. equations and the internal code must be consistent. (See [`LSTMCell`](@ref), [`GRUCell`](@ref) for some examples)
+
+!!! note
+    There is no need to document how the layers are being called since they **must** adhere to `layer(x, ps, st)`. Any deviation from that and the PR will not be accepted.
+
+Next, we will have certain subsections (though all of them might not be necessary for all layers)
+
+* **Arguments**: This section should be present unless the layer is constructed without any arguments (See [`NoOpLayer`](@ref)). All the arguments and their explicit constraints must be explained.
+  * It is recommended to separate out the Keyword Arguments in their own section
+* **Inputs**: This section should always be present. List out the requirements `x` needs to satisfy. (don't write about `ps` and `st` since that is expected by default)
+* **Returns**: What will the layer return? We know the second element will be a state but is that updated in any form or not? 
+* **Parameters**: What are the properties of the NamedTuple returned from `initialparameters`? Omit if the layer is parameterless
+* **States**: What are the properties of the NamedTuple returned from `initialstates`? Omit if the layer is stateless
+
diff --git a/docs/src/design/core.md b/docs/src/design/core.md
diff --git a/docs/src/design/documentation.md b/docs/src/design/documentation.md
diff --git a/docs/src/design/recurrent.md → docs/src/design/layer_implementation.md b/docs/src/design/recurrent.md → docs/src/design/layer_implementation.md
@@ -1,42 +1,44 @@
-# Recurrent Neural Networks
+# Layer Implementation
 
-## Cell Implementations
+## Recurrent Neural Networks
 
-### Explicit Management on End-User Side
+### Cell Implementations
+
+#### Explicit Management on End-User Side
 
 !!! note
     We currently use this implementation
 
 User is responsible for managing the memory and hidden states.
 
-#### Pros
+##### Pros
 
 1. Simple Design and Implementation
 2. Hard for the User to mess up, i.e. there is no explicit requirement to call things like `Flux.reset!`
     * In the first call user passes the `input`
     * In the subsequent calls, the user passes a tuple containing the `input`, `hidden_state` and `memory` (if needed)
 
-#### Cons
+##### Cons
 
 1. Requires more explicit management from the user which might make it harder to use.
 2. Currently the call order convention is not enforced which could lead to sneaky errors. (Implementing a check is quite trivial if we store a call counter in the model `state`)
 
 
-### Store Hidden State and Memory in Model State
+#### Store Hidden State and Memory in Model State
 
 Storing the memory and hidden state in `st` would allow user to just pass `x` without varying how calls are made at different timesteps
 
-#### Pros
+##### Pros
 
 1. Easier for the end-user
 
-#### Cons
+##### Cons
 
 1. `reset`ing the hidden-state and memory is slightly tricky.
    1. One way would be to store a `initial_hidden_state` and `initial_memory` in the state alongside the `hidden_state` and `memory`
 
 
-## RNN Blocks
+### RNN Blocks
 
 !!! note
     This is currently unimplemented

diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -1,10 +1,11 @@
 !!! warning
     These were not written in the form of tutorials but standalone scripts/packages for people to use
 
-## Packages
-
-* [Deep Equilibrium Models](https://github.com/SciML/FastDEQ.jl)
-
 ## Scipts
 
 * [ImageNet Classification using Metalhead.jl Models](https://github.com/avik-pal/Lux.jl/tree/main/examples/ImageNet)
+
+
+## Packages
+
+See [Ecosystem](introduction/ecosystem.md) for more details
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -1,12 +1,16 @@
-# Lux
+# Introduction
+
+Welcome to the documentation of Lux!
+
+# What is Lux?
 
 `Lux` is a julia deep learning framework which decouples models and parameterization using deeply nested named tuples.
 
-- Functional Layer API -- Pure Functions and Deterministic Function Calls.
-- No more implicit parameterization -- `Zygote.Params`. Everything is a `NamedTuple`.
+- Functional Design -- Pure Functions and Deterministic Function Calls.
+- No more implicit parameterization.
 - Compiler and AD-friendly Neural Networks
 
-# Installation
+# Installation Guide
 
 Install [julia v1.6 or above](https://julialang.org/downloads/).
 
@@ -15,7 +19,16 @@ using Pkg
 Pkg.add("Lux")
 ```
 
-# Quick Example
+# Resources to Get Started
+
+* Go through the [Quickstart Example](#quickstart).
+* Read the introductory tutorials on [julia](https://jump.dev/JuMP.jl/stable/tutorials/getting_started/getting_started_with_julia/#Getting-started-with-Julia) and [Lux](introduction/overview.md)
+* Go through the examples sorted based on their complexity in the documentation
+
+!!! tip
+    For usage related questions, please use [Github Discussions](https://github.com/avik-pal/Lux.jl/discussions) or [JuliaLang Discourse (machine learning domain)](https://discourse.julialang.org/c/domain/ml/) which allows questions and answers to be indexed. To report bugs use [github issues](https://github.com/avik-pal/Lux.jl/issues) or even better send in a [pull request](https://github.com/avik-pal/Lux.jl/pulls).
+
+# Quickstart
 
 ```julia
 using Lux, Random, Optimisers, Zygote
@@ -33,15 +46,8 @@ Build the model
 
 ```julia
 # Construct the layer
-model = Chain(
-    BatchNorm(128),
-    Dense(128, 256, tanh),
-    BatchNorm(256),
-    Chain(
-        Dense(256, 1, tanh),
-        Dense(1, 10)
-    )
-)
+model = Chain(BatchNorm(128), Dense(128, 256, tanh), BatchNorm(256),
+              Chain(Dense(256, 1, tanh), Dense(1, 10)))
 ```
 
 Models don't hold parameters and states so initialize them. From there on, we just use our standard AD and Optimisers API.
@@ -57,13 +63,24 @@ x = rand(rng, Float32, 128, 2) |> gpu
 y, st = Lux.apply(model, x, ps, st)
 
 # Gradients
-gs = gradient(p -> sum(Lux.apply(model, x, p, st)[1]), ps)[1]
+## Pullback API to capture change in state
+(l, st_), pb = pullback(p -> Lux.apply(model, x, p, st), ps)
+gs = pb((one.(l), nothing))
 
 # Optimization
 st_opt = Optimisers.setup(Optimisers.ADAM(0.0001), ps)
 st_opt, ps = Optimisers.update(st_opt, ps, gs)
 ```
 
+# How the documentation is structured
+
+Having a high-level overview of how this documentation is structured will help you know where to look for certain things.
+
+* `Introduction` -- Talks about why we wrote Lux and has pointers to frameworks in the extended julia ecosystem which might help users to get started with deep learning
+* `Examples` -- Contain tutorials of varying complexity. These contain worked examples of solving problems with Lux. Start here if you are new to Lux, or you have a particular problem class you want to model.
+* `API` -- Contains a complete list of the functions you can use in Lux. Look here if you want to know how to use a particular function.
+* `Design Docs` -- Contains information for people contributing to Lux development or writing Lux extensions. Don't worry about this section if you are using Lux to formulate and solve problems as a user.
+
 # Citation
 
 If you found this library to be useful in academic work, then please cite: