Add metric for sequence length similarity #638

npatki · 2024-10-09T00:36:44Z

Problem Description

In this paper, we introduced a new methodology for calculating multi-sequence metrics called MSAS. We should add the MSAS-related metrics to SDMetrics so that users with sequential data can use them for evaluation.

Expected behavior

Add a new metric called SequenceLengthSimilarity to SDMetrics.

Data compatibility: ID columns (representing the sequence key)

Parameters:

(required) real_data: A column (pd.Series) containing the sequence key of the real data
(required) synthetic_data: A column (pd.Series) containing the sequence key of the synthetic data

Output: A score in range [0, 1] -- 0 being the worst and 1 being the best

from sdmetrics.single_column import SequenceLengthSimilarity

score = SequenceLengthSimilarity.compute(
  real_data=real_table['patient_id'],
  synthetic_data=synthetic_table['patient_id']
)

How does it work? The length of a sequence is determined by the number of times the same sequence key occurs. For example if id_09231 appeared 150 times in the sequence key, then the sequence is of length 150. This metric compares the lengths of all sequence keys in the real data vs. the synthetic data:

Calculate the length of each sequence in the real data (call this distribution D_r)
Calculate the length of each sequence in the synthetic data (call this distribution D_s)
Now apply the KSComplement metric to compare the similarities of the distributions (D_r, D_s). Return this score.

The text was updated successfully, but these errors were encountered:

npatki added feature request Request for a new feature data:sequential Related to timeseries datasets labels Oct 9, 2024

fealho mentioned this issue Oct 23, 2024

Add metric for sequence length similarity #643

Merged

fealho self-assigned this Nov 14, 2024

fealho added this to the 0.16.1 milestone Nov 14, 2024

amontanez24 mentioned this issue Nov 14, 2024

Add InterRowMSAS, StatisticMSAS and SequenceLengthSimilarity metrics #662

Merged

fealho closed this as completed in #662 Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metric for sequence length similarity #638

Add metric for sequence length similarity #638

npatki commented Oct 9, 2024 •

edited

Loading

Add metric for sequence length similarity #638

Add metric for sequence length similarity #638

Comments

npatki commented Oct 9, 2024 • edited Loading

Problem Description

Expected behavior

npatki commented Oct 9, 2024 •

edited

Loading