When calculating direct features use default value if parent missing #682

CJStadler · 2019-07-23T13:28:42Z

For example, if there is a relationship transaction.session_id -> sessions.id and we are calculating a feature transactions: sessions.SUM(transactions.value) any rows for which there is no corresponding session should be given the default value of 0 instead of NaN.

Of course this should not normally occur, but when it does it seems more reasonable to use the default_value.

DirectFeature.default_value is already implemented. We should be able to use the same logic that we do for aggregation features.
https://github.com/Featuretools/featuretools/blob/6f4ffd7ef7ea42f95dbaf3892615717a521299db/featuretools/computational_backends/feature_set_calculator.py#L611-L618

The text was updated successfully, but these errors were encountered:

scorpioluck20 · 2019-07-30T16:16:20Z

Is there any sample codes that reproduces this problem? I would like to confirm my understanding of this problem.

For example, if transactions.value is [1,2,3,4,float('nan')], SUM(transactions.value) should be 10.0 (ignoring nan). Am I correct?

kmax12 · 2019-07-30T16:51:03Z

Here is code that reproduces

import pandas as pd
import featuretools as ft
from featuretools.primitives import Sum

transactions = pd.DataFrame({
    "id": [1, 2, 3, 4],
    "session_id": ["a", "a", "b", "c"],
    "value": [1, 1, 1, 1]
})

sessions = pd.DataFrame({
    "id": ["a", "b"]
})

es = ft.EntitySet()
es.entity_from_dataframe(entity_id="transactions",
                         dataframe=transactions,
                         index="id")
es.entity_from_dataframe(entity_id="sessions",
                         dataframe=sessions,
                         index="id")

es.add_relationship(ft.Relationship(es["sessions"]["id"], es["transactions"]["session_id"]))
es

sum_features = ft.Feature(es["transactions"]["value"], parent_entity=es["sessions"], primitive=Sum)
sessions_sum = ft.Feature(sum_features, entity=es["transactions"])

fm = ft.calculate_feature_matrix(features=[sessions_sum], entityset=es)
fm

the output of fm is

    sessions.SUM(transactions.value)
id                                  
1                                2.0
2                                2.0
3                                1.0
4                                NaN

id 4 should be 0

seriallazer · 2020-11-03T07:42:29Z

If no one is working on this, may I take this up?

rwedge · 2020-11-03T17:15:43Z

@seriallazer sure!

seriallazer · 2020-11-07T06:15:41Z

I've created a pull-request for the change: #1217.
Can someone please review the changes.
Thanks!

kmax12 added the good first issue Good for newcomers label Jul 23, 2019

scorpioluck20 mentioned this issue Jun 4, 2020

Adding logic for the default value when the parent is missing #688

Closed

thehomebrewnerd mentioned this issue Jan 27, 2021

Fill default values when parent is missing #1312

Merged

thehomebrewnerd closed this as completed in #1312 Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When calculating direct features use default value if parent missing #682

When calculating direct features use default value if parent missing #682

CJStadler commented Jul 23, 2019

scorpioluck20 commented Jul 30, 2019

kmax12 commented Jul 30, 2019

seriallazer commented Nov 3, 2020

rwedge commented Nov 3, 2020

seriallazer commented Nov 7, 2020

When calculating direct features use default value if parent missing #682

When calculating direct features use default value if parent missing #682

Comments

CJStadler commented Jul 23, 2019

scorpioluck20 commented Jul 30, 2019

kmax12 commented Jul 30, 2019

seriallazer commented Nov 3, 2020

rwedge commented Nov 3, 2020

seriallazer commented Nov 7, 2020