[python-package] factor out positional indexing for eval result tuples #6748

jameslamb · 2024-12-12T05:02:27Z

Description

While working on #3756 and #3867, as well as a variety of early-stopping related things (like #6424), I've found the heavy use of positional indexing and tuple unpacking in early stopping and related code make it really difficult to understand and modify.

For example:

LightGBM/python-package/lightgbm/callback.py

Lines 167 to 176 in 53e0ddf

    
           for item in env.evaluation_result_list: 
        
               if len(item) == 4: 
        
                   data_name, eval_name, result = item[:3] 
        
                   self.eval_result[data_name][eval_name].append(result) 
        
               else: 
        
                   data_name, eval_name = item[1].split() 
        
                   res_mean = item[2] 
        
                   res_stdv = item[4]  # type: ignore[misc] 
        
                   self.eval_result[data_name][f"{eval_name}-mean"].append(res_mean) 
        
                   self.eval_result[data_name][f"{eval_name}-stdv"].append(res_stdv)

and:

LightGBM/python-package/lightgbm/callback.py

Lines 410 to 412 in 53e0ddf

    
           score = env.evaluation_result_list[i][2] 
        
           if first_time_updating_best_score_list or self.cmp_op[i](score, self.best_score[i]): 
        
               self.best_score[i] = score

I'm opening this issue to track some work I'd like to do to simplify that.

Benefits of this work

Reduces the effort required to finish these:

And to add finer-grained control over early stopping and validation, e.g.:

[WIP] Add chosen metric argument to clarify early stopping behaviour #6424 (comment)

Approach

I'm planning a series of PRs with the following types of changes:

unpacking tuples into named variables and using those named variables
introducing collections.namedtuple (docs) where appropriate to allow for named-attribute access while preserving backwards compatibility with the significant amount of custom code in the world relying on lightgbm expecting tuples for this like custom metrics:

LightGBM/python-package/lightgbm/engine.py

Lines 135 to 138 in 53e0ddf

    
               feval : callable, list of callable, or None, optional (default=None) 
        
                   Customized evaluation function. 
        
                   Each evaluation function should accept two parameters: preds, eval_data, 
        
                   and return (eval_name, eval_result, is_higher_better) or list of such tuples.

Notes

Looking at the current state of this code, it's important to remember that when the lightgbm Python package was first introduced 8+ years ago:

it supported Python 2.7
dataclasses was not in the standard library yet (that came in Python 3.7, first release in June 2018)

The text was updated successfully, but these errors were encountered:

jameslamb added the maintenance label Dec 12, 2024

jameslamb self-assigned this Dec 12, 2024

jameslamb mentioned this issue Dec 12, 2024

[python-package] simplify eval result printing #6749

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[python-package] factor out positional indexing for eval result tuples #6748

[python-package] factor out positional indexing for eval result tuples #6748

jameslamb commented Dec 12, 2024 •

edited

Loading

[python-package] factor out positional indexing for eval result tuples #6748

[python-package] factor out positional indexing for eval result tuples #6748

Comments

jameslamb commented Dec 12, 2024 • edited Loading

Description

Benefits of this work

Approach

Notes

jameslamb commented Dec 12, 2024 •

edited

Loading