-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Engineering using Lambda Layers for an end to end training pipeline. #812
base: master
Are you sure you want to change the base?
Feature Engineering using Lambda Layers for an end to end training pipeline. #812
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). For more information, open the CLA check for this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Feature engineering for categorical data is a great topic. However, using Lambda
layers is not recommended. They're not safely serializable and I wouldn't recommend them in production for this reason.
We also already have an example on structured data feature engineering here: https://keras.io/examples/structured_data/structured_data_classification_from_scratch/
I would recommend turning your example into a tutorial that focuses on something that's absent from the example above. Perhaps we could take the approach of doing the feature engineering in a single Layer subclass that takes in a dict of data. What do you think?
@fchollet That sounds great yes. I will have a look at changing it to subclass Layer class. |
@fchollet Hi Francois. Thanks for the suggestion. Changes made to use a single feature layer by subclassing Layer class and using a dict of Input objects as input. Please let me know if this is also what you had in mind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. A lot of the complexity here comes from the fact that you use a separate Input layer for each feature in the data, which isn't necessary if you use a Layer subclass. In addition, we should be showcasing Keras preprocessing layers.
I recommend something like:
class FeaturePreprocessing(layers.Layer):
def __init__(self):
# Create preprocessing layers that will be needed for feature encoding / normalization / etc
def adapt(self, dataset):
# Split the dataset into individual feature datasets and use them to adapt the previously created layers
def call(self, data):
# Preprocess the data dict with the previously created layers, then concatenate the features
Does that make sense? Perhaps a different dataset might be a better fit too, since we're going to want to do things like:
- Indexing a set of categorical string values
- Indexing a set of categorical int values
- Normalizing numerical features
- Hashing large categorical feature spaces
- etc.
Hi @fchollet . Thanks for the suggestions. We now have one layer for feature preprocessing that utilises keras preprocessing layers. I have implemented the class you suggested. It contains multiple preprocessing layers and combinations of them. Please let me know what you think. Is the dataset used fine for showcasing preprocessing layers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update!
Hi @fernandonieuwveldt, thanks again for this PR. Are you planning to make the requested changes? Let us know if you're still working on this. Otherwise we'll close the request. Thanks! |
Hi. Let me give it another go and than we can see if this will be a good addition to the website. |
Hi. I made the requested changes. Hope we can still work further on this. |
In this example we look at how we can create a full training and inference pipeline implemented only using the Keras library. As we build up our graph we also visualize our network.
We will end up with only one artifact containing the full pipeline. This can easily be deployed and you do not need to create features with other libraries before feeding data to your model.
Feature engineering will be part of our network.