-
Notifications
You must be signed in to change notification settings - Fork 93
Continuous attribute embedding #102
Continuous attribute embedding #102
Conversation
…uild continuous attribute embedders
…te embedders amd to be more robust
… preventing overfitting. Reflected in the tensorboard histograms when learning was stagnant/non-existent in the attribute embedders before introducing dropout
… corresponding reporting metrics. This has made convergence to good performance 4-8 times faster, from 100-200 iterations down to 250 iterations
…d performance. Adding more examples (200), and using a useless attribute always with value zero, the model converges to a good and expected accuracy. Remove dropout for continuous attributes. Add histograms and better output location for plots
… extra information over the existing type information) works but leads to overfitting
… best combination for stability and minimising overfitting
…ne with the attribute embeddings
…lity and speed of convergence of the model
Now that gradient clipping and type embedding normalisation are in place, using dropout as intended in the continuous attribute MLP works very well to counter overfitting whilst keeping good stability. At least, for the first 2000 iterations, possibly trying to overfit subsequent to that
… not specified and therefore summaries should not be written
step_op = optimizer.minimize(loss_op_tr) | ||
gradients, variables = zip(*optimizer.compute_gradients(loss_op_tr)) | ||
|
||
for grad, var in zip(gradients, variables): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this for debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and also functional - optimiser.minimize(loss)
computes the gradients and applies them. In order to intercept the gradients to do anything with them you have to first compute them and then apply them manually.
Here we do this to:
- Visualise the gradients in TensorBoard
- Apply gradient clipping as mentioned in the description
@@ -136,3 +124,57 @@ def make_blank_embedder(): | |||
|
|||
_, _, _, _, _, solveds_tr, solveds_ge = tr_info | |||
return ge_graphs, solveds_tr, solveds_ge | |||
|
|||
|
|||
def configure_embedders(node_types, attr_embedding_dim, categorical_attributes, continuous_attributes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit awkward to read... maybe it can be split into separate file/class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, it's generally awkward and needs more architectural work, including how to expose this configurability to the user. Intending to address this in a separate PR as it's a big task
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #103
kglib/kgcn/pipeline/pipeline_test.py
Outdated
from kglib.kgcn.pipeline.pipeline import configure_embedders | ||
|
||
|
||
class TestConstructEmbedders(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already being tested separatedly!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean configure_embedders
is? Whereabouts?
The name of the test is incorrect though, should be TestConfigureEmbedders
What is the goal of this PR?
Enable ingesting numerical attributes with continuous values.
The aim has been to add a continuous numerical attribute to the diagnosis example which adds no additional information. In this case, the model should be able to stably achieve the same performance as without this attribute. Empirically, this has been achieved, perhaps taking longer to converge (about 500 training iterations minimum, compared to a minimum of 250 iterations prior).
Closes #99
What are the changes implemented in this PR?
severity
)ContinuousAttribute
model, consisting of an MLP followed by layer normalisation