Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the possibility to load a configuration instead of init parameters #98

Open
NP4567-dev opened this issue Nov 29, 2024 · 3 comments
Labels
feature request Feature request

Comments

@NP4567-dev
Copy link
Collaborator

NP4567-dev commented Nov 29, 2024

Description

Context:
In this first issue on the topic, the aim is to enable the user to provide a configuration for both existing parameters (providers and electricity_mix_zone).
The addition of other configuration parameters, while taken into account will be addressed in other issues
It is important to stay compatible with older versions while guaranteeing enough flexibility to implement future configuration features.

Changes:
Add the configuration argument. It is a path pointing to a locally stored yaml configuration in Ecologits.init().
The file could have the following formatting for now:

providers:
    - mistral
    - openai
electricity_mix_zone: WOR

I see three possible ways to handle it:

  • Ecologits.init() now expects a _Config: this is pretty simple but not retrocompatible and would probably remove some typing helps for users
  • Ecologits.init() can now handle either the current parameters or a _Config parameter. This means the priority between arguments must be set (if both a config and current parameters are provided, which value is kept?). Also, for each additionnal parameter handled by the configuration, it will start a discussion on wether it should be added to the base parameters of the init function
  • Add a different 'init_with_config' function, expecting a _Config object to the Ecologits class. This goes towards having different uses in experimental and production settings for users, but also might be a bit harder to maintain.

These questions have to be answered before proceeding:

  • Should an unvalid configuration (non existing provider or electricity mix) raise an exception from the get go?
  • Should a missing parameter (neither key nor value in yaml) in configuration be replaced by a default value? (in the future wether to fallback on default values could be a parameter)
  • Should an empty parameter (key but empty value in yaml) in configuration be considered an error from the user and raise an exception?
  • What should be the specific value for when the user wants all the providers to be loaded? It could be just a null value or a specific word.

Other remarks:
Not in this feature but could be added later:

  • Having a specific electricity_mix for a given provider
  • Providing custom model infos
  • Error handling configuration (should a missing info raise an error or just be loggued)
  • ...

Feel free to correct or add anything 😃

@NP4567-dev NP4567-dev added the feature request Feature request label Nov 29, 2024
@NP4567-dev NP4567-dev changed the title Adding the possibility to load a configuration at Ecologits.init() Adding the possibility to load a configuration instead of init parameters Nov 29, 2024
@samuelrince
Copy link
Member

samuelrince commented Nov 30, 2024

Thanks @NP4567-dev for drafting this issue, I'll add some bits on what I imagined as well!

  • We don't necessarily need to keep the _Config dataclass, we can design a new system to handle configuration properly.
  • I think it is okay and practical to have multiple ways of configuring the lib (init() method, config file, ...) as long as we define an order to apply the config. (parameters from init() > config file > default values).
  • I'd rather use a toml file has the python ecosystem is converging to that file format instead of yaml. Plus, toml is part of the standard library since 3.10 or 3.11 and not yaml.
  • About custom models or aliases, I don't think it should be integrated within this config file. It can take a lot of space, and it could be shared between projects of a team, for instance. We should consider it as loading an external/additional "model_repository" from a local file or from a cached file (in~/.ecologits or ~/.cache/ecologits directories) or from a URL.

Example of what it could look like (ecologits.toml):

[ecologits]
region="WOR"		# Can be removed, as it is the default value can be named `zone` as well, or something else...
providers=[
  "openai",
  "anthropic",		# Active, but no custom config
  "mistralai",
  "scaleway"		# Not supported right now, just to give an example with a cloud provide
]

[ecologits.openai]
region="USA"		# Location set to the US, we will select according values for eletricity mix, off-site water consumption...

[ecologits.mistralai]
region="FRA"		# Location set to France, we will select according values for eletricity mix, off-site water consumption...

[ecologits.scaleway]
region="fr-par"		# This parameter could also accept cloud regions
pue=1.18			# Known PUE and WUE of the specific data center
wue=0.4
wue_offsite=3.2		# Off-site WUE also changed to a custom value

We can add more specific parameters to precisely customize the methodology regarding electricity mix, embodied impacts of hardware and more. But, let's do it step-by-step with simple and useful things first!

@adrienbanse
Copy link
Member

adrienbanse commented Dec 2, 2024

Many thanks for the draft @NP4567-dev !

@samuelrince Totally agree about the custom models. Let's maybe avoid any connection to an external network if not needed

To me the toml config file that you propose is a nice and feasible solution. However it could be also interesting for the user to change the parameter in the same script for the same provider. Should we also add a EcoLogits.set_parameter(parameter:str, value:ParameterValue), or something like that? I would advise for the most flexible solution.

@NP4567-dev
Copy link
Collaborator Author

The last part with "Other remarks" was about things that are not part of this feature and will be tackled in other issue later, sorry if that was not clear enough. I definitely agree with going steps by steps. I think the ability to edit parameters as you suggested @adrienbanse could be kept in an other separated issue for later ? Or maybe developing the other parts will naturally enable this?

For now I propose the following acceptance criteria for this feature:

  • Providing a path to a .toml parameter file to the Ecologits.init() method ensure my custom config is taken into account. In such a file I can set providers, region and combinations of both.
  • If I'd rather not use such a file, I can still set providers and electricity intensity directly in the init method
  • If I provide both in the init, the config file takes precedence and I am warned
  • If I provide neither, I still won't get an error as default values will bet used by the package
  • If the parameter file I provide does is not found or not compatible I get an error message so I can correct my config.
  • [to be validated] I can update the provided or default parameters directly in my python script

Please update them as you see fit or just 👍 this and I can get started on this 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Feature request
Projects
None yet
Development

No branches or pull requests

3 participants