-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add tests and rudimentary protections for default-constructed PortableCollections #44844
Conversation
…eCollections Also add tests for zero-sized PortableCollections
cms-bot internal usage |
View& view() { | ||
assert(isValid()); | ||
return view_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively the View accessors could be left unchecked, and require the users to explicitly check isValid()
or/and the column/scalar accessors of the View to be non-nullptr
when it is not clearly guaranteed that the PortableCollection
is non-default constructed.
(we could also throw an exception instead of assert()
, but maybe the use of default-constructed PortableCollection
could be more of a logic error rather than something that would depend e.g. on the data?)
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44844/40093
|
A new Pull Request was created by @makortel for master. It involves the following packages:
@fwyzard, @cmsbuild, @makortel can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
//REQUIRE(coll->num() == 42); | ||
|
||
// CopyToDevice<PortableHostCollection<T>> is not defined | ||
#ifndef ALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These #ifdef
s could be removed with #43969
enable gpu |
@cmsbuild, please test |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef15bb/39082/summary.html Comparison SummaryThere are some workflows for which there are errors in the baseline: Summary:
GPU Comparison SummarySummary:
|
Are there any use case for a default-constructed PortableCollection, other than the ROOT dictionary ? If that is the only use case, could we make the default constructor private, and somehow declare the ROOT dictionary stuff as a friend ? |
I think that a 0-size PortableCollection should be a well-defined object, with a device and a 0-size buffer associated to it. The reason that in general we cannot do this for a default constructed PortableCollection is that we don't know what device to use. |
It's a good question, and I'd be tempted to answer "no". On the other hand, this issue came up with code along // in class definition
device::EDPutToken<PortableCollection<...>> putToken_;
...
// in produce() function
if (inputVector.empty()) {
iEvent.emplace(putToken_); // leads to default-constructed PortableCollection
} . It's not necessarily a good use case, but it is easy to do.
I don't know how easy it would be to figure out the components in ROOT that need the default constructor (some of them might even be in anonymous namespaces). In addition, there are some code paths where With @Dr15Jones we were not able to come up with hacks that would allow hiding the default constructor from user code (or even reporting with static analyzer) would not explode to our face in some way. (in the long term it would be great to be able to move |
I see. Can we prevent at least default-constructed |
I agree 0-size PortableCollection should be defined at least to the extent of having a device, and the recommended way of asking the size returns 0. What would be the meaning of the 0-size buffer? Whatever happens to be returned by the underlying allocator of the backend? E.g. for Does Alpaka itself make any assumptions on whether the underlying allocator returns a Would we want a buffer object, but containing a
Right, I thought the host case too. Device itself is known, but the we would not be able to make the "queue association" (i.e. cached allocation). I'm not sure if that would really matter though for this corner case. |
The additional indirection (or wrapping) via |
layout_(std::move(other.layout_)), | ||
view_(std::move(other.view_)) | ||
{ | ||
other.buffer_.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and I just learned something new 🤷🏻
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and I just learned something new 🤷🏻
Just curious, what was that?
{ | ||
auto tmp = std::move(other.view_); | ||
other.view_ = View(); | ||
view_ = View(); | ||
view_ = std::move(tmp); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should (to be verified) safe to do just
{ | |
auto tmp = std::move(other.view_); | |
other.view_ = View(); | |
view_ = View(); | |
view_ = std::move(tmp); | |
} | |
{ | |
view_ = other.view_; // self-assignment is safe | |
other.view_ = View(); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, what about the layout_
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should (to be verified) safe to do just
This is because View
contains only pointers and fundamental types? Right, then I'd also expect the self-assignment to be safe.
however, what about the
layout_
?
"Whoops", I guess, thanks for catching. Then I need to think a test that would demonstrate it failing.
{ | ||
auto tmp = std::move(other.buffer_); | ||
other.buffer_.reset(); | ||
buffer_.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reset()
necessary ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this
reset()
necessary ?
I'd think (now) it would not be necessary, and I was just following the pattern in an example https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-move-self (that uses raw pointers where the delete
in the assigned-to-object's member is necessary). I'd expect the assignment on the following line
buffer_ = std::move(tmp);
to work equally well in both cases of self-assignment and assignment from a different object. Any existing value in buffer_
(be it a valid Buffer or a moved-from Buffer) should get destructed first before initializing the content of buffer_
from tmp
.
//coll->num() = 42; | ||
//REQUIRE(coll->num() == 42); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
//coll->num() = 42; | |
//REQUIRE(coll->num() == 42); | |
//coll_h->num() = 42; | |
//REQUIRE(coll_H->num() == 42); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching
Thinking now (can't remember anymore what I was thinking back then), the only case coming to mind would be catching a programming errors along class Foo : public SynchronizingEDProducer {
std::optional<PortableCollection<FooLayout>> coll_;
void acquire(...) {
if (not coll_) {
coll_ = PortableCollection<FooLayout>(...);
}
// access coll_
}
void produce(...) {
iEvent.emplace(token_, std::move(*coll_));
}
}; or where the aforementioned But one could also argue that such use-after-move cases should be caught by code review or by static analysis tools rather than defensive coding. I'm presently leaning towards this direction. |
If we go with
I think the
I fully agree.
I can only think of a programming errors like use-after-move resulting an invalid PortableCollection object. |
@fwyzard Do you know how |
|
Looking at the example at #44844 (comment) (took me a while to figure out the error) how about adding a check in |
We can't generally tell if an object being put in the event is "valid". But we could add a hook there (e.g. member function or standalone function) that the framework would call at the time of I don't think we could make such a check with With class Foo : public SynchronizingEDProducer {
std::unique_ptr<PortableCollection<FooLayout>> coll_;
void acquire(...) {
if (not coll_) {
coll_ = std::make_unique<PortableCollection<FooLayout>>(...);
}
// access coll_
}
void produce(...) {
iEvent.put(token_, std::move(coll_));
}
}; and, given the move behavior of |
With |
Good point, at least that would be technically possible (has to be done in |
I think we need to make a decision which way to proceed:
In both cases the moved-from state could be implemented as
I think it would be cleanest to pick option 1 for both default constructor and move-from behavior. I could buy the "performance argument" to prefer option 2 more easily for the moved-from state (where one would clearly avoid a memory allocation) than for the default constructor (when one trades the memory allocation with conditional code). I'd expect us to encounter (much) more moved-from states than default construction. |
Notes from discussion with @fwyzard
|
Milestone for this pull request has been moved to CMSSW_15_0_X. Please open a backport if it should also go in to CMSSW_14_2_X. |
I had tested this approach, and after understanding better from Philippe how exactly ROOT does the construction of the On this path, we could avoid the "IO constructor" to allocate the buffer (just to be thrown away in the |
Does #46877 go in the direction you had in mind ? |
Yes (thanks!). Given it, I think we can close this PR. |
PR description:
The HCAL CUDA->Alpaka porting work by @kakwok demonstrated bad behavior with default-constructed PortableCollection, so I added some tests to demonstrate those (at one point the HCAL work seemed to point to some strange behavior also for zero-sized PortableCollection, but I was not able to replicate that, and the strange behavior also didn't repeat later with the HCAL Alpaka code).
A default-constructed PortableCollection is in a state where it has no buffer. The PortableCollection interface does not provide a way to check the state, and therefore a caller has no way to know if
PortableCollection::buffer()
leads to defined or undefined behavior (becausestd::optional<T>::operator*()
leads to undefined behavior if theoptional
does not contain value). This PR takes one attempt to add a functionPortableCollection::isValid()
that allows checking the validity, and addsassert()
s to all accessors (plussize()
to be able to access the SoA size without theassert()
. I'm not sure if this the behavior we really want, but at least it is a starting point for a discussion.The tests for 0-size SoA Layout and PortableCollection are for ensuring valid behavior when the SoA has also scalars in addition to columns. (and now I'm wondering what should be the behavior of a 0-size PortableCollection for a SoA that has only columns?)
The PortableObject and PortableMultiCollection should also be treated consistently, but I wanted to get feedback first on the direction we want to go first.
PR validation:
Unit tests pass
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
Possibly to be backported to 14_0_X