-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String enum for activations #138
Comments
@cynthia As @anssiko pointed out, there are a lot of activation functions and new ones are discovered from time to time, however not all activations are suitable for recurrent network and other categories of networks, so having a single enum that defines every possible activation functions would create a maintenance burden and that it could be misused i.e. an unsupported activation in a network would then need to be handled as a runtime error, which complicates the caller's code somewhat. |
@anssiko Can this issue be closed? Is there any other aspect of the original feedback? |
Is there any prior art (as in, a neural network API) that restricts this though? I checked and activating a GRU with relu seemed to work fine in all of the frameworks I tried - even if it's nonsensical. |
A problem with defining all the known activations as a single enum that is used everywhere is that every operation that uses it will be forced to either support all of it, or having an ever-changing logic over time that accepts some and rejects some. This is very unpredictable and hard to maintain the right set of expectation over time, from the API longevity standpoint. In the use case of Remember that the main purpose of defining these networks such as the recurrent ones as standalone operations is to provide a pathway for the implementer to hook them up directly with the optimizations already built into many hardware platforms today. For the apps, these "big" operations are seen as just short hands for the common known cases. But from the implementer's point of view, it is a huge performance boost to be able to tap into the existing silicon blocks or optimized instructions specially built for it. For a lesser known use case of using other less common activations e.g. So in short I think helper operations such as |
The fact that hardware-accelerated implementations of common blocks can have limitations is an implementation detail that should not be ossified into the design of the API. Instead of baking the silicon limitations into the API, shouldn't it throw based on the limitations of the context? Given the GRU example, if the underlying silicon does not support a relu activation, then that restriction should be raised as an error. Right now that restriction is baked into the API - since hardware is constantly evolving this limitation may become invalid in the foreseeable future, at which point we have to figure out how to evolve the API so that it is supported. Changing APIs, once it has shipped on the web is incredibly difficult if designed without flexibility for evolution in mind. |
A hardware limitation should not be the reason a Web API fails. I only raised an example of the API's ability to utilize a specific hardware block in the context of performance, not functionality. The point is that a recurrent network API such as GRU should only support known use cases because it's actually a complete topology, and not just a single atomic operation. An uncommon use case can already be achieved through composition -- you have all the flexibility there. This section in the explainer discusses the topic of level of abstraction reasonably clearly. Maybe that could help explaining my rationale. |
There is precedent for such patterns, such as codec support. (although this isn't always because of hardware) |
@cynthia I believe you're referring to I think it'd be helpful for the group to discuss and document the pros and cons of adopting such a "failure is an option" pattern before proposing a resolution to this issue. @cynthia's feedback suggests such a pattern would better future-proof the API and that there's precedent for this type of design on the web platform, while @wchao1115's comments suggest error handling would complicate the API caller's code, and make performance more unpredictable on today's hardware in particular. Let's use this issue to come up with any other pros and cons I may have overlooked. Please chime in @cynthia @wchao1115 @huningxin. I'll bring this issue up again on our next call to have a checkpoint. |
Yes. There are also numerous places in WebRTC that do similar things. (usually related to codecs) The reason why I think this pattern might be better is because it discourages preemptively implementing different code branches (e.g. if accelerator is A at the time of implementation, ossify model to the capabilities of A at the time being) like how user agent based branching is abused as of today. Having it throw would encourage developers to try accelerated paths and fallback to hand-rolled if the accelerator does not support it. This allows graceful upgrades in the event the underlying acceleration runtime later supports the accelerated path. It is worth noting that this is speculative, since there isn't much to refer to in terms of prior art for accelerated neural networks. |
In practice, an ML graph once loaded and compiled is expected to run successfully regardless of the underlying device, although the degrees of efficiency may vary depending on the level of optimization supported. A kind of fallback that may happen in some cases, such as when an operator falls back to the CPU, does not alter the topology of the graph, and as such does not break the user's expectation. For the context, the topology of the graph is either explicitly created or converted from a pre-trained model; in either case, it is finalized before it is eventually executed. The topology-altering kind of fallback at the graph execution time, as suggested here, would be far too late for the users of ML models. From the API design point of view, it is also a bit odd to allow use cases that are known to be invalid by deferring them to the implementer of the API. The implementer will have no choice but to fail on them, and the users will have no choice but to prepare for the failures even though they aren't going to be in a position to handle them well. This is different in nature from the codec failures. |
Per https://www.w3.org/2021/05/13-webmachinelearning-minutes.html#t05 @wchao1115 to draft an informative note to be incorporated into the spec to explain current design principles around activations. |
This issue is resolved by this PR #188 as linked here -- we no longer need any string enum to represent fused activation functions. Please take a look. |
As discussed in WebML WG Teleconference – 2 Sep 2021 the group feels this issue has been addressed by #188. |
via w3ctag/design-reviews#570 (comment)
Some activations can not be supported in GRU, will clarify in the spec.
The text was updated successfully, but these errors were encountered: