-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"can't convert np.ndarray of type numpy.object_" on first reset() - something changed with last release? #1534
Comments
Can you provide a minimal code to reproduce this? Please refer to the custom env issue template. Also, please provide the system info (instructions also in the issue template) |
I cannot release the entire environment yet, but possibly I can publish the reset(), _get_obs(), _get_info() and maybe the init functions. Provided that is ok with the policies here since my attempts go in the direction of a sensible topic (araffin in case is it ok if I publish a part of the code even if some names can suggest the topic I am working on?). In the meantime does anybody knows if in the last few days, there was any change on the part of the code that manages the terminal state of an episode during the learning process and the relative observation data? Secondly, does anybody know if here
the info array refers to the info returned by the step() and reset() functions of the environment or is it something else? Can be the returned info as a pandas dataset? Thank you again, I wait for opinions if it is ok to publish part of an environment working for finance purposes. |
And we certainly don't want you to. We just need a minimal example to reproduce the error.
It is indeed the info returned by Once again, I advise you to follow the custom env issue template, as a lot of information is still missing (in addition to that already mentioned): system info, |
Dear Quentin, I apologize for the late answer, but I needed time to crosscheck everything above all because I found that the issue was not occurring in every condition. Right now I am perfectly able to reproduce it and I found which is the parameter and its value that trigger that error even if I don't know why. But let's proceed with order; I try to answer your request to have more contextual pieces of information here, if there are others that I could report please let me know:
actually, this is a little bit hermetic to me. May I ask what does it refers to?
this may be related to the fact that I am returning a number of observation rows equal to the buffer size. I thought, but I could be mistaken, that I had to manage in the environment the correct collection of a number of observations to correctly fill the buffer size. Shouldn't I do it?
Printing the type of observations I get:
and I am using Gymnasium, not Gym
After this installation, here are the results of printing the version numbers of the main libraries:
The problem came out with the reset() once the first episode was done; the strange thing was that the init function of the environment is using the very same reset function run by the agent at the end of each episode: so the trouble is not triggered at init, but it is triggered after an episode is done. I first checked into the data returned by the reset function as info; but they appear to be perfectly equal. My info structure is a dictionary with 4 fields: the current data datetime, the current OHCL prices, the current mask_array and a DataFrame with all the previous and current states (to be used for debugging and statistics).
Here some runtime checks on the returned info structure:
The issue was not triggered every time and now I know why. The episode could end for two reasons:
Well, the issue was not occurring most of times because it is triggered only by the state true of the truncated variable In the first tests the parameters were inducing to close the episode because of too high loss; right now the losses are limited (hoping a day to have gain) and the model reaches the end of the training data. So to generate again the trouble it should be enough to:
provided that I am passing as observation a 2D array the second dimension of which, the number of rows, is equal to the buffer size and could be a wrong structure as well as also a part of the problem. I don't know if this could be enough for inspiring a possible understanding. |
Yep, see #913. The easiest way is probably to create a wrapper to shift the action value.
I'm not sure I understand. Here, we're dealing with the environment, which is independent of the model, and therefore of the buffer. I assume you're using a 2D (or n-D) array as an observation. Likewise, it's highly inadvisable to do this with SB3. Once again, the simplest thing to do is to wrap your environment with a wrapper that flattens the observation, for example: https://gymnasium.farama.org/api/wrappers/observation_wrappers/#gymnasium.wrappers.FlattenObservation
No, you've printed the python type, with the built-in At this point, there are already a lot of things wrong with your environment that are bound to create problems with SB3. So I suggest you work on them, and correct your environment. If the problem persists, I invite you to work on a minimal code to reproduce the error. |
Thank you Quentin, I agree even if the only remaining trouble on my code seems to be the one linked to the dimensions of the returned observations. I had already managed the issue at the start of discrete space subtracting the correct value at each step. The only doubt I have now is what is the expectation of the SB3 model when the hyperparameter batch_size is set to a value N higher than 1: does it expect as obeservation a mono-dimensional array (just one "row" of observed values) or it expect to have a number of rows equal to N? In the first case the buffer should be managed internally to the model i think, in the second one it is passed from the environment at each step/rest call that was my first solution probably not needed and not correct. |
You have to understand that the model and the environment are really independent. If your environment is correctly built (read, if it passes the env checker tests), then it can be used with SB3, regardless of any hyperparameters, especially batch size.
And in SB3, there's no difference between a batch size of 1 or greater.
The answer is in the output of the env checker:
I'm a bit worried that this issue loses readability for other users who might face the same problem. So I suggest that if you still have the problem initially mentioned, you share a minimal code so that we can explain where the problem comes from. If you encounter other problems, please open a new dedicated issue. |
Dear Quentin, I again agree with you that is important to clarify the context by providing part of the code. It is perfectly clear to me that the environment and the model are two independent entities. But buffering the experiences is something in the middle that stores the outputs of the environment for the inputs and the purpose of the model; a previous small experience of mine with TF-Agents by Tensorflow for which the buffer must be defined externally and passed to the agent at the instantiation moment and not finding something similar in the SB3 PPO model definition, unconsciously leads me to the, probably wrong, conclusion that "somebody", the environment, should have buffered the last N observations for the model, where N was the size of the batch (batch_size = N). Consequently, this is the code that returned at each step and each reset the last N observations (that are the current one plus the past N-1 ones), and that triggered the issue described originally in this post, but just when the event truncated = True occurs; it doesn't appear at each step, neither it occurs when the end state is due to the condition terminated = True.
In this case, the function _get_obs(), used in both the reset() and step() function, returns a batch of N observations, the N-th of which is the last one. This was the development that triggered the issue both on learning time and using the checker. Instead, the issue disappears returning just the last observations array as following coded
This last code works correctly both in the learning process and with the checker. Still wondering if the minibatch of observations coming from the current and the past N-1 steps is correctly buffered and managed by the model itself, as at this point I guess. |
Ok, I understand, but there's a vocabulary problem, because the batch size is actually the number of interractions sampled by the model at the time of learning. What you're describing sounds more like the number of previous observations stacked. As I still don't have complete minimal code, it's difficult to help you. I just have the impression that the observation size is not consistent from one step to the next. It seems to me that the size in the second dimension depends on the number of timesteps that have already elapsed. This is a major problem you need to solve. Finally, I implore you not to feed this issue any further until you've converged on minimal code to reproduce your error. This issue should be of use to everyone, and I'm simply assisting you in your debugging. |
Thanks, according to me the issue can be closed since its origin has been identified and summarized as follow:
In case there was a further need of investigating the reason why the issue occurs just when the truncated = True event occurs and not in the other cases, it should be possible to reproduce it with any environment modifying the function that returns the observation following the concept of my first example that returns the observation of the current and previous N-1 steps. |
❓ Question
I am using this SB3 release "pip install git+https://github.com/DLR-RM/stable-baselines3" re-installed at every new session of Kaggle notebook and recently updated also in my PC environment. Suddenly I get the following error when the first episode end and the reset function is called by SB3 to start the second one:
Since I have not touched the involved part of the code of my custom environment in the last weeks and the provided data is always the same, I suspect that something could have changed in the last version, if any was released in the last days, of SB3. In fact, before the last two days, I have never had a similar issue and the code was running without error till the learning task reached the limit of the steps. I have checked for "numpy.object_" in the observation data with this code at every _get_obs function call with this code
but this condition is never reached.
I have not enough expertise to inspect the SB3 code, but after reading the error log I wonder if this segment of code
refers to the info returned by the environment in addition to the observations or if it is something of different. The doubt arises because I have used a numpy array as data structure for the returned observations but I kept the freedom, maybe wrong or risky but for sure comfortable for other uses, to return as extra info a pandas dataframe. Could be this the problem and in case are there any constraint/requirement on the info data?
Checklist
The text was updated successfully, but these errors were encountered: