String enum for activations #138

anssiko · 2021-02-04T17:28:02Z

If there are layers that will be taking activations as string enums, there should simply be a string enum for activations rather than have it just in RecurrentNetworkActivation. (One may argue that hyperbolic tangent is RNN specific, but...

via w3ctag/design-reviews#570 (comment)

Some activations can not be supported in GRU, will clarify in the spec.

wchao1115 · 2021-02-15T00:41:47Z

@cynthia As @anssiko pointed out, there are a lot of activation functions and new ones are discovered from time to time, however not all activations are suitable for recurrent network and other categories of networks, so having a single enum that defines every possible activation functions would create a maintenance burden and that it could be misused i.e. an unsupported activation in a network would then need to be handled as a runtime error, which complicates the caller's code somewhat.

wchao1115 · 2021-04-13T05:55:59Z

@anssiko Can this issue be closed? Is there any other aspect of the original feedback?

cynthia · 2021-04-13T09:05:07Z

so having a single enum that defines every possible activation functions would create a maintenance burden and that it could be misused

Is there any prior art (as in, a neural network API) that restricts this though? I checked and activating a GRU with relu seemed to work fine in all of the frameworks I tried - even if it's nonsensical.

wchao1115 · 2021-04-13T17:25:59Z

A problem with defining all the known activations as a single enum that is used everywhere is that every operation that uses it will be forced to either support all of it, or having an ever-changing logic over time that accepts some and rejects some. This is very unpredictable and hard to maintain the right set of expectation over time, from the API longevity standpoint.

In the use case of gruCell there is in fact a known use of relu activation in place of tanh for the recurrent gate (also known as the 'new' gate), but stretching that to cover many more may be a bridge too far. That is why relu is currently part of the MLRecurrentNetworkActivation enum, but not others.

Remember that the main purpose of defining these networks such as the recurrent ones as standalone operations is to provide a pathway for the implementer to hook them up directly with the optimizations already built into many hardware platforms today. For the apps, these "big" operations are seen as just short hands for the common known cases. But from the implementer's point of view, it is a huge performance boost to be able to tap into the existing silicon blocks or optimized instructions specially built for it.

For a lesser known use case of using other less common activations e.g. hardSigmoid at the recurrent gate, there is nothing stopping the app from constructing the entire network out of the smaller operations (take a look at the note section of gruCell for an example).

So in short I think helper operations such as gruCell should be defined with common known use cases in mind, as opposed to API maximum coverage, since it's already possible to achieve the latter through operation composition.

cynthia · 2021-04-14T04:00:09Z

The fact that hardware-accelerated implementations of common blocks can have limitations is an implementation detail that should not be ossified into the design of the API. Instead of baking the silicon limitations into the API, shouldn't it throw based on the limitations of the context?

Given the GRU example, if the underlying silicon does not support a relu activation, then that restriction should be raised as an error. Right now that restriction is baked into the API - since hardware is constantly evolving this limitation may become invalid in the foreseeable future, at which point we have to figure out how to evolve the API so that it is supported. Changing APIs, once it has shipped on the web is incredibly difficult if designed without flexibility for evolution in mind.

wchao1115 · 2021-04-14T06:54:13Z

if the underlying silicon does not support a relu activation, then that restriction should be raised as an error

A hardware limitation should not be the reason a Web API fails. I only raised an example of the API's ability to utilize a specific hardware block in the context of performance, not functionality. The point is that a recurrent network API such as GRU should only support known use cases because it's actually a complete topology, and not just a single atomic operation. An uncommon use case can already be achieved through composition -- you have all the flexibility there.

This section in the explainer discusses the topic of level of abstraction reasonably clearly. Maybe that could help explaining my rationale.

cynthia · 2021-04-14T09:14:23Z

A hardware limitation should not be the reason a Web API fails.

There is precedent for such patterns, such as codec support. (although this isn't always because of hardware)

anssiko · 2021-04-22T12:28:35Z

@cynthia I believe you're referring to canPlayType() precedent?

I think it'd be helpful for the group to discuss and document the pros and cons of adopting such a "failure is an option" pattern before proposing a resolution to this issue.

@cynthia's feedback suggests such a pattern would better future-proof the API and that there's precedent for this type of design on the web platform, while @wchao1115's comments suggest error handling would complicate the API caller's code, and make performance more unpredictable on today's hardware in particular.

Let's use this issue to come up with any other pros and cons I may have overlooked.

Please chime in @cynthia @wchao1115 @huningxin. I'll bring this issue up again on our next call to have a checkpoint.

cynthia · 2021-04-23T02:40:42Z

@cynthia I believe you're referring to canPlayType() precedent?

Yes. There are also numerous places in WebRTC that do similar things. (usually related to codecs)

The reason why I think this pattern might be better is because it discourages preemptively implementing different code branches (e.g. if accelerator is A at the time of implementation, ossify model to the capabilities of A at the time being) like how user agent based branching is abused as of today.

Having it throw would encourage developers to try accelerated paths and fallback to hand-rolled if the accelerator does not support it. This allows graceful upgrades in the event the underlying acceleration runtime later supports the accelerated path.

It is worth noting that this is speculative, since there isn't much to refer to in terms of prior art for accelerated neural networks.

wchao1115 · 2021-04-23T06:39:07Z

In practice, an ML graph once loaded and compiled is expected to run successfully regardless of the underlying device, although the degrees of efficiency may vary depending on the level of optimization supported. A kind of fallback that may happen in some cases, such as when an operator falls back to the CPU, does not alter the topology of the graph, and as such does not break the user's expectation. For the context, the topology of the graph is either explicitly created or converted from a pre-trained model; in either case, it is finalized before it is eventually executed. The topology-altering kind of fallback at the graph execution time, as suggested here, would be far too late for the users of ML models.

From the API design point of view, it is also a bit odd to allow use cases that are known to be invalid by deferring them to the implementer of the API. The implementer will have no choice but to fail on them, and the users will have no choice but to prepare for the failures even though they aren't going to be in a position to handle them well. This is different in nature from the codec failures.

anssiko · 2021-05-13T15:39:32Z

Per https://www.w3.org/2021/05/13-webmachinelearning-minutes.html#t05 @wchao1115 to draft an informative note to be incorporated into the spec to explain current design principles around activations.

wchao1115 · 2021-06-28T03:07:17Z

This issue is resolved by this PR #188 as linked here -- we no longer need any string enum to represent fused activation functions. Please take a look.

anssiko · 2021-09-02T16:19:59Z

As discussed in WebML WG Teleconference – 2 Sep 2021 the group feels this issue has been addressed by #188.

anssiko added the tag label Feb 4, 2021

anssiko mentioned this issue Feb 4, 2021

Web Neural Network API w3ctag/design-reviews#570

Closed

1 task

dontcallmedom added tag-tracker Group bringing to attention of the TAG, or tracked by the TAG but not needing response. and removed tag labels Feb 12, 2021

w3cbot mentioned this issue Feb 15, 2021

String enum for activations w3ctag/tracking-issues#37

Open

wchao1115 mentioned this issue Jun 28, 2021

Add activation functions as fused operators to BatchNorm and Conv2d #188

Merged

wchao1115 closed this as completed Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String enum for activations #138

String enum for activations #138

anssiko commented Feb 4, 2021 •

edited

Loading

wchao1115 commented Feb 15, 2021

wchao1115 commented Apr 13, 2021

cynthia commented Apr 13, 2021

wchao1115 commented Apr 13, 2021

cynthia commented Apr 14, 2021

wchao1115 commented Apr 14, 2021

cynthia commented Apr 14, 2021

anssiko commented Apr 22, 2021

cynthia commented Apr 23, 2021

wchao1115 commented Apr 23, 2021

anssiko commented May 13, 2021

wchao1115 commented Jun 28, 2021 •

edited

Loading

anssiko commented Sep 2, 2021

String enum for activations #138

String enum for activations #138

Comments

anssiko commented Feb 4, 2021 • edited Loading

wchao1115 commented Feb 15, 2021

wchao1115 commented Apr 13, 2021

cynthia commented Apr 13, 2021

wchao1115 commented Apr 13, 2021

cynthia commented Apr 14, 2021

wchao1115 commented Apr 14, 2021

cynthia commented Apr 14, 2021

anssiko commented Apr 22, 2021

cynthia commented Apr 23, 2021

wchao1115 commented Apr 23, 2021

anssiko commented May 13, 2021

wchao1115 commented Jun 28, 2021 • edited Loading

anssiko commented Sep 2, 2021

anssiko commented Feb 4, 2021 •

edited

Loading

wchao1115 commented Jun 28, 2021 •

edited

Loading