Skip to content

Commit

Permalink
add content
Browse files Browse the repository at this point in the history
  • Loading branch information
ayaka14732 committed Sep 26, 2022
1 parent beb554c commit 08787fc
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 6 deletions.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
# TrAVis: Transformer Attention Visualiser

TrAVis is a Transformer Attention Visualiser. It visualise the attention matrices generated by the BERT model.

The idea of visualising the attention matrices is inspired by [_Neural Machine Translation by Jointly Learning to Align and Translate_](https://arxiv.org/abs/1409.0473).
13 changes: 13 additions & 0 deletions index.css
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,11 @@ p {
margin: 1em 0;
}

footer {
margin: 30px 0 20px 0;
font-style: italic;
}

.range-slider {
display: flex;
align-items: center;
Expand Down Expand Up @@ -92,6 +97,14 @@ p {
display: none;
}

.centered {
text-align: center;
}

.emoji {
font-style: normal;
}

/* Text Input */
input {
border-style: none;
Expand Down
12 changes: 6 additions & 6 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -56,18 +56,18 @@ <h1>Transformer Attention Visualiser</h1>
<output id="result"></output>
<div class="content-container">
<h2>What is this?</h2>
<p>TrAVis is a Transformer Attention Visualiser. The idea of visualising the attention matrices is inspired by <a href="https://arxiv.org/abs/1409.0473"><em>Neural Machine Translation by Jointly Learning to Align and Translate</em></a>.</p>
<p>The original paper of the Transformer model was named <a href="https://arxiv.org/abs/1706.03762"><em>Attention Is All You Need</em></a>, demonstrating the centrality of the attention mechanism to Transformer-based models. These models generate attention matrices during the computation of the attention mechanism, which indicate how the models process the input data, and can therefore be seen as a concrete representation of the mechanism.</p>
<p>TrAVis (source code on <a href="https://github.com/ayaka14732/TrAVis">GitHub</a>) is a Transformer Attention Visualiser. The idea of visualising the attention matrices is inspired by <a href="https://arxiv.org/abs/1409.0473"><em>Neural Machine Translation by Jointly Learning to Align and Translate</em></a>.</p>
<p>The original paper of the Transformer model was named <a href="https://arxiv.org/abs/1706.03762"><em>Attention Is All You Need</em></a>, demonstrating the centrality of the attention mechanism to <a href="https://huggingface.co/docs/transformers/model_summary">Transformer-based models</a>. These models generate attention matrices during the computation of the attention mechanism, which indicate how the models process the input data, and can therefore be seen as a concrete representation of the mechanism.</p>
<p>In the <a href="https://arxiv.org/abs/1810.04805">BERT</a> Base Uncased model, for example, there are 12 transformer layers, each layer contains 12 heads, and each head generates one attention matrix. TrAVis is the tool for visualising these attention matrices.</p>
<h2>Why is it important?</h2>
<p>Despite the popularity of Transformer-based models, we often utilise them by just simply running the training scripts, ignoring what is going on inside the model.</p>
<p>TrAVis helps us to better understand how Transformer-based models work internally, thus enabling us to better exploit them to solve our problems and, furthermore, giving us inspirations to make improvements to the model architecture.</p>
<p>Despite the popularity of Transformer-based models, people often utilise them by just simply running the training scripts, ignoring what is going on inside the model. TrAVis helps us to better understand how Transformer-based models work internally, thus enabling us to better exploit them to solve our problems and, furthermore, giving us inspirations to make improvements to the model architecture.</p>
<h2>How does it work?</h2>
<p>The project consists of 4 parts.</p>
<p>Firstly, we implemented the <a href="https://arxiv.org/abs/1910.13461">BART</a> model from scratch using JAX. We chose JAX because it is an amazing deep learning framework that enables us to write clear source code, and it can be easily converted to NumPy, which can be executed in-browser. We chose the BART model because it is a complete encoder-decoder model, so it can be easily adapted to other models, such as BERT, by simply taking a subset of the source code.</p>
<p>Secondly, we <a href="https://github.com/ztjhz/word-piece-tokenizer">re-implemented</a> the HuggingFace BERT Tokeniser using pure Python, as it can be more easily executed in-browser. Moreover, we have optimised the tokenisation algorithm, which is 57% faster than the original HuggingFace implementation.</p>
<p>Firstly, we <a href="https://github.com/ayaka14732/bart-base-jax"><b>implemented</b></a> the <a href="https://arxiv.org/abs/1910.13461">BART</a> model from scratch using <a href="https://github.com/google/jax">JAX</a>. We chose JAX because it is an amazing deep learning framework that enables us to write clear source code, and it can be easily converted to NumPy, which can be executed in-browser. We chose the BART model because it is a complete encoder-decoder model, so it can be easily adapted to other models, such as BERT, by simply taking a subset of the source code.</p>
<p>Secondly, we <a href="https://github.com/ztjhz/word-piece-tokenizer"><b>implemented</b></a> the <a href="https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer">HuggingFace BERT Tokeniser</a> in pure Python, as it can be more easily executed in-browser. Moreover, we have optimised the tokenisation algorithm, which is 57% faster than the original HuggingFace implementation.</p>
<p>Thirdly, we use <a href="https://pyodide.org/">Pyodide</a> to run our Python code in browser. Pyodide supports all Python libraries implemented in pure Python, with additional support for a number of other libraries such as NumPy and SciPy.</p>
<p>Fourthly, we visualise the attention matrices on the web page using <a href="https://d3js.org/">d3.js</a>.</p>
<footer class="centered">Brought to you with <span class="emoji">❤️</span> by <a href="https://github.com/ayaka14732">Ayaka</a> and <a href="https://github.com/ztjhz">Nixie (Jing Hua)</a></footer>
</div>
</body>
</html>

0 comments on commit 08787fc

Please sign in to comment.