Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Refactor load serialization of the Model to avoid potential memory issues #12959

Merged
merged 1 commit into from
Jan 14, 2025

Conversation

loumalouomega
Copy link
Member

📝 Description

  • Bad news, I found in a production case an error in the Model loading serializer.
  • Good news, I fix it with a minor refactor, loading the model parts after its creation, not in the same stage.
  • Bad news, I couldn't reproduce it in a small case.

(Yes this looks like yogulado conversation)

🆕 Changelog

ModelPart* pmodel_part = new ModelPart(aux_names[i], 1, dummy_list, *this );
rSerializer.load(aux_names[i], pmodel_part);
mRootModelPartMap.insert(std::make_pair(aux_names[i],std::unique_ptr<ModelPart>(pmodel_part)));
const std::string& r_name = aux_names[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why the old version was wrong? i simply cannot see it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me neither, but this solves the issue...

Looks like lifespan issue, but should not be a problem IMHO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to look for a failing problem, but I cannot find one without sharing an industrial case. The only thing is that it fails when more than one model part, and the model part empty is the last one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Only in Windows BTW)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two codes look essentially equivalent to me, so i have no real issue, however:

aux_names has no repetitions correct?

@roigcarlo do you see any reason for which the second version is better than the first?
@loumalouomega could there be an issue in the "load" of the modelpart in case it is empty?

Copy link
Member

@roigcarlo roigcarlo Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking to this last night and I don't know what to say. There are to many things happening at the same time that are different from the original to the new code, so I will go change by change:

1 - Using a const std::string & to the best of my knowledge is syntax sugar, the code produced by the compiler if that were the only change (to understand):

const std::string& r_name = aux_names[i];
ModelPart* pmodel_part = new ModelPart(r_name, 1, dummy_list, *this );

rSerializer.load(r_name, pmodel_part);
mRootModelPartMap.insert(std::make_pair(r_name,std::unique_ptr<ModelPart>(pmodel_part)));

should be exactly the same.

2 - Inverting the insert / load order may cause some behavior to change (specially if there are pointers involved in the load call), so I can imagine that may help to solve the problem:

mRootModelPartMap.insert(std::make_pair(r_name,std::unique_ptr<ModelPart>(pmodel_part)));
// Pseudo 
rSerializer.load(r_name, mRootModelPartMap.get(r_name));

3 - Manually loading all the ModelParts in the auxiliary list not following the order in which they were serialized (which we should safely assume is the same in which they appear in the aux_names but at this point who knows...) does not guarantee we are doing the same.

for (name in aux_names) {
    // Create
    // Insert
}

for (name in aux_names) {
    // Load
}

Also, with this change made, I can se a corner case where a ModelPart that has a repeated name for some reason would fail to be loaded in the original code, but work in the second one, just because the load function from the serializer will load it on the top of an already existing Modelpart that were created by the original, so that would also "solve" a possible crash.

TL;DR: I doubt that the alias for the name does anything unless some optimization dark magic from the compiler is changing things, and from the other changes is impossible to tell without having more details. If it works I see no harm in merging but I would like to know what exactly is happening.

Copy link
Contributor

@matekelemen matekelemen Dec 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 - Inverting the insert / load order may cause some behavior to change (specially if there are pointers involved in the load call), so I can imagine that may help to solve the problem:

This is the one I'd bet on. The only difference I see between the old and new versions is that the ModelPart tree is complete before any ModelParts start getting loaded, so any object that would want to store some reference/pointer to a specific ModelPart can safely do so (not a great idea tho).

The thing that worries me about our deserialization system is potentially loading data that is used for keys in a hash/ordered table after the object is constructed. Example:

  1. construct a ModelPart and insert it into the hash table of Model
  2. deserialize ModelPart
    • while deserializing, the ModelPart's name also gets deserialized, which is used for computing its hash in the hash table

Thankfully, step 2 is not a problem right now because we require that the deserialized name is identical to the ModelPart's name we inserted it with. However, if we run into a similar situation somewhere else, and the name/id/etc does get overwritten, that's a disaster.

@loumalouomega can you compile with address sanitizer and run your case with that? I usually compile in RelWithDebInfo mode and pass -fsanitize=address to CMAKE_CXX_FLAGS. Before running your case, you'll need to load the address sanitizer lib (export LD_PRELOAD=/usr/lib/libasan.so or wherever your libasan.so is).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, after epiphany I don't have acess to a computer

@loumalouomega
Copy link
Member Author

I cannot find time to add the proper testing, and we have peiorities due to future releases. For the moment this can stay like this. But the suggested code is not a crazy solution at all at least.

Copy link
Member

@RiccardoRossi RiccardoRossi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no particular issues other than i do not understand why the old code is wrong.

ModelPart* pmodel_part = new ModelPart(aux_names[i], 1, dummy_list, *this );
rSerializer.load(aux_names[i], pmodel_part);
mRootModelPartMap.insert(std::make_pair(aux_names[i],std::unique_ptr<ModelPart>(pmodel_part)));
const std::string& r_name = aux_names[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two codes look essentially equivalent to me, so i have no real issue, however:

aux_names has no repetitions correct?

@roigcarlo do you see any reason for which the second version is better than the first?
@loumalouomega could there be an issue in the "load" of the modelpart in case it is empty?

@loumalouomega
Copy link
Member Author

i have no particular issues other than i do not understand why the old code is wrong.

Me neither, I just noticed when using it

@loumalouomega
Copy link
Member Author

loumalouomega commented Dec 19, 2024

@RiccardoRossi aux_names is not repeated, because the origin is from a map (therefore not repeated) that has been serialized.

@RiccardoRossi
Copy link
Member

@roigcarlo on my side i am ok with it. If you agree that the new code is essentially equivalent i would approve

@roigcarlo
Copy link
Member

Aside from what I said in the first conversation, I see nothing wrong, but I also can't tell why exactly the original was failing, so 👍 to merge...

P.s. If i had to beat, probably a repeated modelpart name because a couple of characters in windows make c++ see two different string as the same.

@roigcarlo
Copy link
Member

Could you print the aux_names (censoring the names) and also the Mdpa names?

@loumalouomega
Copy link
Member Author

Could you print the aux_names (censoring the names) and also the Mdpa names?

I am without computer and in vacancies until 6th January. But I already checked that, the names look correct.

@loumalouomega
Copy link
Member Author

Could you print the aux_names (censoring the names) and also the Mdpa names?

I am without computer and in vacancies until 6th January. But I already checked that, the names look correct.

I can verify that the names look correct

@aronnoordam
Copy link
Member

@loumalouomega , you mentioned that your issue only occurs in windows, but does it occur with all supported python versions?

I have a similar case where I get some memory issues when loading, but only with windows py312, and only when using:

KratosMultiphysics.SerializerTraceType.SERIALIZER_NO_TRACE

@loumalouomega
Copy link
Member Author

@loumalouomega , you mentioned that your issue only occurs in windows, but does it occur with all supported python versions?

I have a similar case where I get some memory issues when loading, but only with windows py312, and only when using:

KratosMultiphysics.SerializerTraceType.SERIALIZER_NO_TRACE

I was using python 3.9 and no flag.

@loumalouomega
Copy link
Member Author

loumalouomega commented Jan 9, 2025

Could you print the aux_names (censoring the names) and also the Mdpa names?

I found a simpler case (also failing):

aux_names[i] : convection_model_part
pmodel_part->Name() : convection_model_part
aux_names[i] : main_model_part
pmodel_part->Name() : main_model_part

Doesn't look like bad. Sincerely as this code doesn't look like actually problematic and actually fixes the issue I would simply merge this if you are OK @roigcarlo

@RiccardoRossi
Copy link
Member

No further issues on my side to merge

@loumalouomega loumalouomega merged commit cec2a60 into master Jan 14, 2025
11 checks passed
@loumalouomega loumalouomega deleted the core/fix-model-serialization branch January 14, 2025 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants