Skip to content

Latest commit

 

History

History
15 lines (14 loc) · 422 Bytes

id_sort_leak.md

File metadata and controls

15 lines (14 loc) · 422 Bytes

Effect

Exploits of ranking preditions using information from ids

Rules to avoid leakage

Shuffle dataset related tables with .reset_index() prior tag dataset as modeling ready.

Symptom

id to target ordering is observed during explorative data analyses

Quick fix example

df = df.sample(frac=1).reset_index(drop=True)

Incorporation stage

Dataset preparation.

Locate in code

Dataset saving.