Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for loading data from pandas dataframe #18

Open
agarwl opened this issue Jan 18, 2023 · 2 comments
Open

Add support for loading data from pandas dataframe #18

agarwl opened this issue Jan 18, 2023 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@agarwl
Copy link
Collaborator

agarwl commented Jan 18, 2023

Right now, we only support loading data from numpy arrays. It would be nice if there was a helper function to convert a dataframe of scores to numpy arrays. Some initial code to help what this might look like:

def get_all_return_values(df):
  games = list(df['game'].unique())
  return_vals = {}
  for game in games:
    game_df = df[df['game'] == game]
    arr = game_df.groupby('wid')['normalized_score'].apply(list).values
    return_vals[game] = np.stack(arr, axis=0)
  return return_vals

def convert_to_matrix(x):
  return np.stack([x[k] for k in sorted(x.keys())], axis=1)

## Usage
# Array of shape (num_runs, num_games, num_steps)`
all_normalized_scores = convert_to_matrix(get_all_return_values(score_df))

The above code assumes we have a pandas Dataframe with keys run_number, 'gameandnormalized_score` containing scores for all steps (in a ordered manner).

@agarwl agarwl added enhancement New feature or request help wanted Extra attention is needed labels Jan 18, 2023
@agarwl agarwl pinned this issue Jan 18, 2023
@agarwl agarwl added the good first issue Good for newcomers label Jan 21, 2023
@stefanbschneider
Copy link

Hi, just to better understand the assumed structure of the DataFrame: We have one row, for each step?
Are these all the steps during evaluation (not training) on all the tasks?

And we'd assume separate DataFrames for each approach, which are each read separately by get_all_return_values()?
Eg, to construct the required dict for computing performance profiles.

@agarwl
Copy link
Collaborator Author

agarwl commented Jul 12, 2023

Yeah, for performance profiles, the data frames contain per-step results from evaluation (obtained during the course of training).

For aggregate metrics, we use the final performance, so that corresponds to evaluation results at the final step or a pre-specified step.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants