[RFC] Add tests and rudimentary protections for default-constructed PortableCollections #44844

makortel · 2024-04-24T21:51:29Z

PR description:

The HCAL CUDA->Alpaka porting work by @kakwok demonstrated bad behavior with default-constructed PortableCollection, so I added some tests to demonstrate those (at one point the HCAL work seemed to point to some strange behavior also for zero-sized PortableCollection, but I was not able to replicate that, and the strange behavior also didn't repeat later with the HCAL Alpaka code).

A default-constructed PortableCollection is in a state where it has no buffer. The PortableCollection interface does not provide a way to check the state, and therefore a caller has no way to know if PortableCollection::buffer() leads to defined or undefined behavior (because std::optional<T>::operator*() leads to undefined behavior if the optional does not contain value). This PR takes one attempt to add a function PortableCollection::isValid() that allows checking the validity, and adds assert()s to all accessors (plus size() to be able to access the SoA size without the assert(). I'm not sure if this the behavior we really want, but at least it is a starting point for a discussion.

The tests for 0-size SoA Layout and PortableCollection are for ensuring valid behavior when the SoA has also scalars in addition to columns. (and now I'm wondering what should be the behavior of a 0-size PortableCollection for a SoA that has only columns?)

The PortableObject and PortableMultiCollection should also be treated consistently, but I wanted to get feedback first on the direction we want to go first.

PR validation:

Unit tests pass

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Possibly to be backported to 14_0_X

…eCollections Also add tests for zero-sized PortableCollections

cmsbuild · 2024-04-24T21:51:55Z

cms-bot internal usage

makortel · 2024-04-24T21:56:53Z

DataFormats/Portable/interface/PortableHostCollection.h

+  View& view() {
+    assert(isValid());
+    return view_;
+  }


Alternatively the View accessors could be left unchecked, and require the users to explicitly check isValid() or/and the column/scalar accessors of the View to be non-nullptr when it is not clearly guaranteed that the PortableCollection is non-default constructed.

(we could also throw an exception instead of assert(), but maybe the use of default-constructed PortableCollection could be more of a logic error rather than something that would depend e.g. on the data?)

cmsbuild · 2024-04-24T21:57:13Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-44844/40093

This PR adds an extra 24KB to repository

cmsbuild · 2024-04-24T21:57:39Z

A new Pull Request was created by @makortel for master.

It involves the following packages:

DataFormats/Portable (heterogeneous)
DataFormats/SoATemplate (heterogeneous)

@fwyzard, @cmsbuild, @makortel can you please review it and eventually sign? Thanks.
@missirol, @rovere this is something you requested to watch as well.
@antoniovilela, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

makortel · 2024-04-24T21:57:46Z

DataFormats/Portable/test/alpaka/test_catch2_portableCollection.dev.cc

+    //REQUIRE(coll->num() == 42);
+
+    // CopyToDevice<PortableHostCollection<T>> is not defined
+#ifndef ALPAKA_ACC_CPU_B_SEQ_T_SEQ_ENABLED


These #ifdefs could be removed with #43969

makortel · 2024-04-24T21:58:35Z

enable gpu

makortel · 2024-04-24T21:58:40Z

@cmsbuild, please test

cmsbuild · 2024-04-25T01:16:53Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-ef15bb/39082/summary.html
COMMIT: c30702c
CMSSW: CMSSW_14_1_X_2024-04-24-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/44844/39082/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

There are some workflows for which there are errors in the baseline:
24834.78 step 2
25088.203 step 3
The results for the comparisons for these workflows could be incomplete
This means most likely that the IB is having errors in the relvals.The error does NOT come from this pull request

Summary:

You potentially removed 59 lines from the logs
Reco comparison results: 51 differences found in the comparisons
DQMHistoTests: Total files compared: 48
DQMHistoTests: Total histograms compared: 3319852
DQMHistoTests: Total failures: 9
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3319823
DQMHistoTests: Total skipped: 20
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 47 files compared)
Checked 202 log files, 165 edm output root files, 48 DQM output files
TriggerResults: no differences found

GPU Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 0 differences found in the comparisons
DQMHistoTests: Total files compared: 3
DQMHistoTests: Total histograms compared: 39740
DQMHistoTests: Total failures: 23
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 39717
DQMHistoTests: Total skipped: 0
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
Checked 8 log files, 10 edm output root files, 3 DQM output files
TriggerResults: no differences found

fwyzard · 2024-04-25T06:36:41Z

Are there any use case for a default-constructed PortableCollection, other than the ROOT dictionary ?

If that is the only use case, could we make the default constructor private, and somehow declare the ROOT dictionary stuff as a friend ?

fwyzard · 2024-04-25T06:44:44Z

what should be the behavior of a 0-size PortableCollection for a SoA that has only columns?

I think that a 0-size PortableCollection should be a well-defined object, with a device and a 0-size buffer associated to it.

The reason that in general we cannot do this for a default constructed PortableCollection is that we don't know what device to use.
Though, for the host case, we do know since there is only one possibility (so far), and we could do the same ?

makortel · 2024-04-25T15:20:22Z

Are there any use case for a default-constructed PortableCollection, other than the ROOT dictionary ?

It's a good question, and I'd be tempted to answer "no". On the other hand, this issue came up with code along

// in class definition
device::EDPutToken<PortableCollection<...>> putToken_;

...

// in produce() function
if (inputVector.empty()) {
  iEvent.emplace(putToken_); // leads to default-constructed PortableCollection
}

. It's not necessarily a good use case, but it is easy to do.

If that is the only use case, could we make the default constructor private, and somehow declare the ROOT dictionary stuff as a friend ?

I don't know how easy it would be to figure out the components in ROOT that need the default constructor (some of them might even be in anonymous namespaces). In addition, there are some code paths where edm::Wrapper<T> calls the default constructor of T. In addition, since there are classes that either inherit from PortableCollection, or use it via composition, at least the default constructors of those classes must be able to call the default constructor of PortableCollection. Having the Portable{Host,Device}Collection to declare all of those as friends doesn't sound very scalable.

With @Dr15Jones we were not able to come up with hacks that would allow hiding the default constructor from user code (or even reporting with static analyzer) would not explode to our face in some way.

(in the long term it would be great to be able to move edm::Wrapper<T> to use std::optional<T> some day, but that won't be easy either)

fwyzard · 2024-04-25T16:04:33Z

I see.

Can we prevent at least default-constructed PortableDeviceCollections ?

makortel · 2024-04-25T18:41:11Z

what should be the behavior of a 0-size PortableCollection for a SoA that has only columns?

I think that a 0-size PortableCollection should be a well-defined object, with a device and a 0-size buffer associated to it.

I agree 0-size PortableCollection should be defined at least to the extent of having a device, and the recommended way of asking the size returns 0.

What would be the meaning of the 0-size buffer? Whatever happens to be returned by the underlying allocator of the backend? E.g. for malloc() that would be implementation-defined (cppreference), and for cudaMalloc() apparently returns NULL (NVIDIA forum).

Does Alpaka itself make any assumptions on whether the underlying allocator returns a nullptr or not?

Would we want a buffer object, but containing a nullptr? Or do we care?

The reason that in general we cannot do this for a default constructed PortableCollection is that we don't know what device to use. Though, for the host case, we do know since there is only one possibility (so far), and we could do the same ?

Right, I thought the host case too. Device itself is known, but the we would not be able to make the "queue association" (i.e. cached allocation). I'm not sure if that would really matter though for this corner case.

makortel · 2024-04-25T18:43:43Z

Can we prevent at least default-constructed PortableDeviceCollections ?

The additional indirection (or wrapping) via edm::DeviceProduct<T> would at least technically make it straightforward. I need to think a bit and make some tests. It would be an asymmetry between PortableHostCollection and PortableDeviceCollection, but it probably would prevent the use of the default constructor in a portable code (i.e. even if such code would compile for the Serial backend, it would not compile for GPU backends).

fwyzard · 2024-09-02T21:52:35Z

DataFormats/Portable/interface/PortableHostCollection.h

+    layout_(std::move(other.layout_)),
+    view_(std::move(other.view_))
+          {
+            other.buffer_.reset();


... and I just learned something new 🤷🏻

... and I just learned something new 🤷🏻

Just curious, what was that?

fwyzard · 2024-09-02T21:59:17Z

DataFormats/Portable/interface/PortableHostCollection.h

+    {
+      auto tmp = std::move(other.view_);
+      other.view_ = View();
+      view_ = View();
+      view_ = std::move(tmp);
+    }


it should (to be verified) safe to do just

Suggested change

{

auto tmp = std::move(other.view_);

other.view_ = View();

view_ = View();

view_ = std::move(tmp);

}

{

view_ = other.view_; // self-assignment is safe

other.view_ = View();

}

however, what about the layout_ ?

it should (to be verified) safe to do just

This is because View contains only pointers and fundamental types? Right, then I'd also expect the self-assignment to be safe.

however, what about the layout_ ?

"Whoops", I guess, thanks for catching. Then I need to think a test that would demonstrate it failing.

fwyzard · 2024-09-02T22:04:11Z

DataFormats/Portable/interface/PortableHostCollection.h

+    {
+      auto tmp = std::move(other.buffer_);
+      other.buffer_.reset();
+      buffer_.reset();


Is this reset() necessary ?

Is this reset() necessary ?

I'd think (now) it would not be necessary, and I was just following the pattern in an example https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rc-move-self (that uses raw pointers where the delete in the assigned-to-object's member is necessary). I'd expect the assignment on the following line

buffer_ = std::move(tmp);

to work equally well in both cases of self-assignment and assignment from a different object. Any existing value in buffer_ (be it a valid Buffer or a moved-from Buffer) should get destructed first before initializing the content of buffer_ from tmp.

fwyzard · 2024-09-02T22:07:37Z

DataFormats/Portable/test/alpaka/test_catch2_portableCollection.dev.cc

+    //coll->num() = 42;
+    //REQUIRE(coll->num() == 42);


Suggested change

//coll->num() = 42;

//REQUIRE(coll->num() == 42);

//coll_h->num() = 42;

//REQUIRE(coll_H->num() == 42);

Thanks for catching

makortel · 2024-09-11T20:49:49Z

In e4d2b53 I explored what it would entail to make PortableHostCollection's moved-from state well defined (isValid() == false, and metadata().size() returns 0). That made me wonder how much we really care of that state.

Mhm, what is the use case of being able to check the metadata().size() for an invalid portable collection ?

Thinking now (can't remember anymore what I was thinking back then), the only case coming to mind would be catching a programming errors along

class Foo : public SynchronizingEDProducer {
  std::optional<PortableCollection<FooLayout>> coll_;

  void acquire(...) {
    if (not coll_) {
      coll_ = PortableCollection<FooLayout>(...);
    }
    // access coll_
  }

  void produce(...) {
    iEvent.emplace(token_, std::move(*coll_));
  }
};

or where the aforementioned coll_ has been moved from (possibly in some complicated hard-to-catch way) before iEvent.emplace(). Hopefully most (if not all) these cases would eventually result a crash.

But one could also argue that such use-after-move cases should be caught by code review or by static analysis tools rather than defensive coding. I'm presently leaning towards this direction.

makortel · 2024-09-11T21:03:40Z

Would the interface still make sense if isValid() was private ?

If we go with

PortableDeviceCollection can not be default-constructed
PortableHostCollection can be default-constructed, in which case we can do a 0-element allocation (whose result being implementation defined for column-only SoA)
- Behavior of element access being undefined is acceptable
All use-after-move's of PortableCollections are similar programming errors as use-after-moves of anything else, and we don't need code defensively for that case

I think the isValid() and separate size() methods could be removed.

I'm not fond of the possibility that invalid collections or objects are stored, and that users have to check explicitly for their validity.

I would prefer that the collections are either valid (even if potentially with 0 size) or not produced at all (i.e. iEvent.emplace(putToken_) is not called).

I fully agree.

However, I haven't given too much thought to the possible consequences, and I could be convinced that there may be valid uses cases for storing an invalid collection.

I can only think of a programming errors like use-after-move resulting an invalid PortableCollection object.

makortel · 2024-09-11T21:48:32Z

@fwyzard Do you know how alpaka::memcpy() behaves for 0-sized buffers? (I'm wondering if the CopyTo{Host,Device}::copyAsync() should skip the memcpy() for 0-sized buffers)

fwyzard · 2024-09-12T12:51:34Z

Do you know how alpaka::memcpy() behaves for 0-sized buffers? (I'm wondering if the CopyTo{Host,Device}::copyAsync() should skip the memcpy() for 0-sized buffers)

alpaka::memcpy will not do anything for a 0-sized buffer or 0-sized copy.

fwyzard · 2024-09-12T14:08:02Z

Looking at the example at #44844 (comment) (took me a while to figure out the error) how about adding a check in Event::emplace(token, collection) that checks if the buffer is valid ?

makortel · 2024-09-12T15:17:32Z

Looking at the example at #44844 (comment) (took me a while to figure out the error) how about adding a check in Event::emplace(token, collection) that checks if the buffer is valid ?

We can't generally tell if an object being put in the event is "valid". But we could add a hook there (e.g. member function or standalone function) that the framework would call at the time of put() that would tell if the object would be valid. The hook function would be called only if it is defined (that's what we do already with post_insert()). In absence of the hook function the objects would be considered as valid by default.

I don't think we could make such a check with emplace() though. In emplace() the parameters after the PutToken are forwarded to the constructor (in the case of the example above invoking the move constructor). (well ok, maybe it would be possible to overload the emplace() for the case where the object itself is being passed on)

With put() the example above would be expressed as

class Foo : public SynchronizingEDProducer {
  std::unique_ptr<PortableCollection<FooLayout>> coll_;

  void acquire(...) {
    if (not coll_) {
      coll_ = std::make_unique<PortableCollection<FooLayout>>(...);
    }
    // access coll_
  }

  void produce(...) {
    iEvent.put(token_, std::move(coll_));
  }
};

and, given the move behavior of unique_ptr, that would work correctly, and to me the validity check doesn't look that beneficial.

fwyzard · 2024-09-12T15:51:50Z

I don't think we could make such a check with emplace() though. In emplace() the parameters after the PutToken are forwarded to the constructor (in the case of the example above invoking the move constructor). (well ok, maybe it would be possible to overload the emplace() for the case where the object itself is being passed on)

With emplace(), would it make sense to call isValid() on the object after it has been constructed ?

makortel · 2024-09-13T16:12:59Z

I don't think we could make such a check with emplace() though. In emplace() the parameters after the PutToken are forwarded to the constructor (in the case of the example above invoking the move constructor). (well ok, maybe it would be possible to overload the emplace() for the case where the object itself is being passed on)

With emplace(), would it make sense to call isValid() on the object after it has been constructed ?

Good point, at least that would be technically possible (has to be done in edm::Event, but that would have been probably best anyway). Actually, the whole validity check (of class invariants) could be implemented already today with post_insert() method, which the framework would call from both put() and emplace().

makortel · 2024-09-13T18:44:10Z

I think we need to make a decision which way to proceed:

Make PortableHostCollection default constructor to behave the same as the other constructor called with 0 elements
- The buffer_ member would not have to be std::optional anymore
- No need for buffer existence checks
Keep the present behavior of PortableHostCollection default constructor, i.e. such object does not have the buffer.
- Need to add checks on buffer existence to (at least) buffer() and zeroInitialise() (throwing an exception if buffer_ does not have a value)
- Need to implement post_insert() to ensure buffer_ is set

In both cases the moved-from state could be implemented as

The default-constructed state
- This is what the Core Guidelines recommend, unless there is "an exceptionally good reason not to" link
The same as the behavior of alpaka::Buf or std::optional

I think it would be cleanest to pick option 1 for both default constructor and move-from behavior. I could buy the "performance argument" to prefer option 2 more easily for the moved-from state (where one would clearly avoid a memory allocation) than for the default constructor (when one trades the memory allocation with conditional code). I'd expect us to encounter (much) more moved-from states than default construction.

makortel · 2024-10-30T15:26:52Z

Notes from discussion with @fwyzard

Ideally PortableHostCollection would not have default constructor
We don't have a clear consensus between options 1 and 2 ([RFC] Add tests and rudimentary protections for default-constructed PortableCollections #44844 (comment)). @fwyzard would accept proceeding in either way
We could test if it would be possible to allow not defining default constructor by having Wrapper<T>'s default constructor to call e.g. special tag-argument constructor if such exists. The tag type could be along edm::ConstructorForIO.

cmsbuild · 2024-11-22T13:11:32Z

Milestone for this pull request has been moved to CMSSW_15_0_X. Please open a backport if it should also go in to CMSSW_14_2_X.

makortel · 2024-12-03T21:46:21Z

We could test if it would be possible to allow not defining default constructor by having Wrapper<T>'s default constructor to call e.g. special tag-argument constructor if such exists. The tag type could be along edm::ConstructorForIO.

I had tested this approach, and after understanding better from Philippe how exactly ROOT does the construction of the Wrapper<T> (also in case where a sub-branch is accessed in bare ROOT), this seems like a viable way forward.

On this path, we could avoid the "IO constructor" to allocate the buffer (just to be thrown away in the ROOTReadStreamer() function), at the "cost" of keeping the buffer_ as std::optional. But in contrast to the present state of this PR, I'd be comfortable with removing the validity checks of the optional, because it would be without a value only in a very controlled code path. @fwyzard Would you agree?

fwyzard · 2024-12-05T12:55:13Z

Does #46877 go in the direction you had in mind ?

makortel · 2024-12-06T20:58:30Z

Does #46877 go in the direction you had in mind ?

Yes (thanks!). Given it, I think we can close this PR.

Add tests and rudimentary protections for default-constructed Portabl…

c30702c

…eCollections Also add tests for zero-sized PortableCollections

cmsbuild added this to the CMSSW_14_1_X milestone Apr 24, 2024

cmsbuild added pending-signatures tests-pending orp-pending code-checks-pending heterogeneous-pending labels Apr 24, 2024

makortel commented Apr 24, 2024

View reviewed changes

cmsbuild added code-checks-approved and removed code-checks-pending labels Apr 24, 2024

makortel commented Apr 24, 2024

View reviewed changes

cmsbuild added tests-started and removed tests-pending labels Apr 24, 2024

cmsbuild added tests-approved and removed tests-started labels Apr 25, 2024

This was referenced Apr 26, 2024

Initial investigation and an RFC PR cms-sw/framework-team#904

Closed

Update the RFC PR cms-sw/framework-team#907

Closed

Allow DeviceProduct to hold objects that do not have default constructor

7f2113b

fwyzard reviewed Sep 2, 2024

View reviewed changes

cmsbuild mentioned this pull request Nov 8, 2024

Fix CopyToDevice for PortableHostMultiCollection and New Test #46636

Merged

cmsbuild modified the milestones: CMSSW_14_2_X, CMSSW_15_0_X Nov 22, 2024

fwyzard mentioned this pull request Dec 5, 2024

Constrain the TriggerObject template constructors #46875

Merged

This was referenced Dec 5, 2024

Do not use default-constructed PortableCollections #46876

Merged

Let edm::Wrapper<T> use the constructor T{kUninitialized} #46877

Merged

makortel closed this Dec 6, 2024

makortel mentioned this pull request Dec 6, 2024

Improve behavior of default-constructed PortableCollection cms-sw/framework-team#898

Closed

4 tasks

[RFC] Add tests and rudimentary protections for default-constructed PortableCollections #44844

[RFC] Add tests and rudimentary protections for default-constructed PortableCollections #44844

Conversation

makortel commented Apr 24, 2024 • edited Loading

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

cmsbuild commented Apr 24, 2024 • edited Loading

Choose a reason for hiding this comment

cmsbuild commented Apr 24, 2024

cmsbuild commented Apr 24, 2024

Choose a reason for hiding this comment

makortel commented Apr 24, 2024

makortel commented Apr 24, 2024

cmsbuild commented Apr 25, 2024

Comparison Summary

GPU Comparison Summary

fwyzard commented Apr 25, 2024

fwyzard commented Apr 25, 2024

makortel commented Apr 25, 2024

fwyzard commented Apr 25, 2024

makortel commented Apr 25, 2024

makortel commented Apr 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fwyzard Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

makortel commented Sep 11, 2024

makortel commented Sep 11, 2024

makortel commented Sep 11, 2024

fwyzard commented Sep 12, 2024

fwyzard commented Sep 12, 2024

makortel commented Sep 12, 2024

fwyzard commented Sep 12, 2024

makortel commented Sep 13, 2024

makortel commented Sep 13, 2024

makortel commented Oct 30, 2024

cmsbuild commented Nov 22, 2024

makortel commented Dec 3, 2024

fwyzard commented Dec 5, 2024

makortel commented Dec 6, 2024

makortel commented Apr 24, 2024 •

edited

Loading

cmsbuild commented Apr 24, 2024 •

edited

Loading

fwyzard Sep 2, 2024 •

edited

Loading