Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the delete early ability of the Framework #16481

Open
Dr15Jones opened this issue Nov 7, 2016 · 23 comments
Open

Improve the delete early ability of the Framework #16481

Dr15Jones opened this issue Nov 7, 2016 · 23 comments

Comments

@Dr15Jones
Copy link
Contributor

The framework's present delete data product early system
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SWGuideFrameWorkMemoryOptimization#Deleting_Products_Early

Was developed before the advent of the consumes interface. Therefore the system should check that if deleteEarly is requested that all modules with consumes of that data product also have the mayGet set.

We can not replace the mayGet with consumes because a data product is allowed to "reference" another data product via the use of a pointer, smart pointer, edm::Ref, etc. The consumes interface knows nothing about such references and is therefore insufficient for determining when a data product can be deleted early.

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2016

A new Issue was created by @Dr15Jones Chris Jones.

@davidlange6, @smuzaffar, @Dr15Jones can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@Dr15Jones
Copy link
Contributor Author

assign core

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2016

New categories assigned: core

@Dr15Jones,@smuzaffar you have been requested to review this Pull request/Issue and eventually sign? Thanks

@Dr15Jones
Copy link
Contributor Author

There are several additional difficulties in dealing with 'references'.

One difficulty is the data products to which another data product refers can change event to event. Therefore if one were to express the dependencies at compile time it would have to be for the entire possible set of dependencies and not to the actual set for that event.

A second difficulty is if one copies data from one data product to another one might not be aware that they have created a data dependency to a third data product.

And finally a third difficulty is there is no practical way of having the framework check for references from one data product to another (part of) a data product. Without such enforcement there is no way to know such information is complete or correct.

@Dr15Jones
Copy link
Contributor Author

One possible extension is the produce call could take as an additional argument a list of EDGetTokens which express the references to which that data product refers. This would also allow expression of indirect references such as when one gets a edm::Ref from an AssociationMap. (the EDGetToken would be to the AssociationMap and in the produce call to the AssociationMap would be the reference to the original data product).

Unfortunately, since the framework can not check any of this it still means the information could be incomplete or just wrong.

@makortel
Copy link
Contributor

makortel commented Nov 7, 2016

Let me try to argue why I think moving the declarations of data->data dependencies to C++ would be beneficial, even if the statement "information could be incomplete or just wrong" would still stay true (i.e. we can still shoot our feet, but we have plenty of opportunities for that already).

  • mightGet needs the branch name, which includes the process name. Therefore those can't be made part of the module configurations in various cfi/cff's. The only option coming to my mind is a customize function, but that has potential to be(come) as fragile as the old customize functions for eras.
  • Information is repeated (mightGet is essentially a copy-paste of InputTags etc. that get consumed by a module).
  • When including data->data degree of freedom, the dependencies form a tree. With mightGet it is up to a human to craft and maintain the dependency tree. By declaring the dependencies in C++ the job of constructing the dependency tree is elevated to the framework.

But maybe we don't need the most general solution, at first at least. E.g. for the "new offenders" added in #16202 something simple would be sufficient (inter-product references point to the same products in all events).

@makortel
Copy link
Contributor

makortel commented Nov 9, 2016

Continuing to think out loud: for the data->data dependencies, would something along the following work out? (with some similarity to global/stream/one EDModule migration)

  • By default all data products are considered to "may depend on any other product" and hence not eligible for early delete.
  • In produces() call, one can declare that either the product does not refer to any other product, or give the tokens to which the product may refer to.
    • Only in this case the data product of that EDProducer/EDFilter becomes eligible for early delete, and gets deleted after the last module consuming the product (or any other product declared to refer to it).
    • The responsibility of declaring data dependencies correctly would be on the EDModule developers.

I would also include a global configuration parameter (under process.options) to enable/disable the feature.

@makortel
Copy link
Contributor

makortel commented Nov 9, 2016

Just to comment on the potential, for 2023D1 workflow with ttbar+200PU I was able to decrease the RSS by 30-300 MB (as reported by SimpleMemoryCheck) wrt. 810pre15 out of the box by early-deleting bunch of intermediate tracking data products (like seed+track candidate+track collections of individual iterations).

@Dr15Jones
Copy link
Contributor Author

The threading interface change has the strong advantages that we have been developing tools to be able to enforce that the code is actually correct. In contrast, I haven't been able to think of any way to ever be able to enforce that request for early delete is actually correct. Therefore I am very much against making this automatic since we have no mechanism to maintain the code.

I do want to be pragmatic and make this possible IF there are no other reasonable way to achieve our memory goals.

@makortel
Copy link
Contributor

I highly appreciate all work for tools checking the correctness, and fully agree that the correctness of early delete is very difficult or impossible to verify.

One possibility to reduce memory (or not an increase of memory in the context of #16202) is to go for larger EDProducers so the temporary objects get deleted (see a concrete thought in #16202 (comment)). In other words, is the memory needs of temporary data one constraint for deciding how to best divide the necessary work to EDProducers?

@makortel
Copy link
Contributor

Taking the line of thought to its extreme, in absence of "early delete" we should prefer "big EDProducers" to minimize the amount of memory spent in temporary-and-no-longer-needed data products.

@slava77
Copy link
Contributor

slava77 commented Nov 10, 2016

@makortel
What is the plan for the transient products?
Is it a rigid pattern with only one module B consuming the products made by A with maybe additional products that were already available during A?
In this case possibly some formalized "module chain" can be added to the framework with more formalized dependency declaration.

In case the use pattern can extend to e.g. using the transient products in DQM with possible coupling to products that are produced in a longer range of modules/sequences, then the early delete is not a good solution, because it will not really help.

@makortel
Copy link
Contributor

@slava77
In the context of #16202, the added products are mostly consumed within the tracking iteration they were produced. "Mostly" means that in the PR this is fully the case, but we have plans for using some of the to-be-added products in later tracking iterations. Everything would still be confined to iterative tracking, there is no plan to add any of these to e.g. DQM.

@Dr15Jones
Copy link
Contributor Author

I have been thinking about this since my opening of this issue (which was intended to start a discussion). In the EventSetup system we have a ESTransientHandle which is used to allow early deleting of "read once and don't access till next IOV" items. If all ESProducers which read that data use an ESTransientHandle then the framework is allowed to delete that data product early. A similar idea might work for the EDM system:

One would specify the code only needs a 'transient' access to the data

   edm::TransientGetTokenT<Foo> fooToken = transientConsumes<Foo>(InputTag("foo"));

Then one requests the data using an edm::TransientHandle<>

   edm::TransientHandle<Foo> foos;
   event.getByToken(fooToken, foos);

A edm::TransientHandle<Foo> does not contain the information one needs to make an edm::Ref and therefor by construction one can't accidently create one.

Only if all modules declare transientConsumes would the data product be eligible for deleting early. The change of API also makes it easier to do maintenance on the code since there is a very explicit policy the module has agreed to follow (i.e. not propagating 'pointers' from the data product).

@makortel
Copy link
Contributor

@Dr15Jones
Nice idea, I think such a mechanism would be sufficient to get rid of all added temporary products in #16202.

But to toy with the idea a bit further, since one would be forbidden to create a reference (edm::Ref explicitly, raw pointer implicitly) to the object of hold by edm::TransientHandle<Foo>, the system would not support "early deleting" any product that is referred to by anything else, right?

Would something like edm::TransientRef make any sense? (should be as close as bare pointer as possible, with some mechanism to inform the referred-to EDProduct that last reference inside has been destructed; sounds a bit like shared_ptr)

I don't have any clear use cases in mind right now for the "data->data" dependency case, but I think it would be interesting to figure out possibility for such.

@Dr15Jones
Copy link
Contributor Author

So if a TransientRef was used then every data product which held such a reference would have to be accessed via a TransientHandle (else the data product holding the TransientRef would not be deleted early).

@makortel
Copy link
Contributor

Such a "restriction" makes perfect sense to me. But how "heavy" mechanism would be needed to enforce it? I guess TransientHandle should basically know all TransientRefs the data product is holding. (just thinking that if the necessary machinery becomes too heavy/complex, maybe it's not worth it, at least before a real need)

@makortel
Copy link
Contributor

@Dr15Jones
What would happen to data products that are never read? Would they be deleted quickly or kept alive until the "end of an event"?

I'm raising the question because I realized that two producers in #16202 put products into event that are never read (but they are referred to by products being read via raw pointers).

@makortel
Copy link
Contributor

makortel commented Jan 28, 2019

An alternative to the TransientHandle/TransientRef etc would be a traits template along

namespace edm {
  // By default empty definition
  template <typename T>
  struct FillReferencesToExternalDataProduct {};

  // Specialize to all types that could be deleted early
  // E.g. for generic vector we can only loop over it.
  // Can do better with more information (e.g. for edm::RefVector)
  // (std::set is used here only for an example, actual implementation would likely be different)
  template <typename E>
  struct<std::vector<E>> FillReferencesToExternalDataProduct {
    static void fill(const std::vector<E>& vec, std::set<ProductID>& refsTo) {
      for(const auto& elem: vec) {
        FillReferencesToExternalDataProduct<T>::fill(elem, refsTo);
      }
    }
  };

  // For types for which we know won't refer to anything else, filling function would be empty
  template <> struct<int> FillReferencesToExternalDataProduct {
    static void fill(int a, std::set<ProductID& refsTo) {}
  };

  // Could add some macro magic to shorten the specializations
}

For types for which the template has been specialized, Event::put() or Event::emplace() would then use the ProductIDs to record the data dependencies. Types without specialization would be assumed to refer to all input products and to all other output products of a producer.

In addition, to support early deletion for types for which the refers-to ProductIDs can not be (easily) obtained from the product (e.g. raw pointers), an Event::putWithReferences(...)/Event::emplaceWithReferences() would be added that take also the set of ProductIDs as an argument.

@makortel
Copy link
Contributor

makortel commented Jan 2, 2020

Here is an additional complication that came to my mind in the context of GPUs/accelerators/offloading.

In the current CUDA tooling (#28537, used in patatrack) it is technically possible that a non-ExternalWork EDProducer produces a CUDAProduct<T> such that the asynchronous work has not finished by the time the EDProducer's produce() call finishes (the product thus resembles std::future). Further work will be either queued on the same CUDA stream, and cudaStreamWaitEvent() is used to synchronize (join) with another CUDA stream. My expectation is that in the end the last operation on every CUDA stream is a cudaStreamAddCallback() from an ExternalWork-module (either directly or via another CUDA stream and cudaStreamWaitEvent()), and therefore all asynchronous work of an event would be completed by the time the data products are destroyed (at "the end of the event").

On the other hand, nothing in the framework (or #28537) forces every CUDA stream to be synchronized at event boundaries. Therefore, as a simple protection #28537 calls cudaEventSynchronize() in the destructor of CUDA products. But that call blocks the CPU thread until the asynchronous work has finished. The blocking behavior of data product destructor is barely acceptable for end-of-event destruction, given that "it should never happen in any real workflow". But it would be unacceptable for early-deletion of e.g. an unused product.

A possible solution could be to create a WaitingTaskWithArenaHolder holding the task "for emitting postEvent signal" (or similar), and pass that to the data product just before its destruction. This way a future-like data product could queue the necessary operation(s) to signal the WaitingTaskWithArenaHolder when all asynchronous work launched for that event have finished.

@Dr15Jones
Copy link
Contributor Author

One piece of information we would also need is to know that if the originating EDProducer makes multiple data products where some of those data products refer to each other either via a Ref or bare pointers.

@fwyzard
Copy link
Contributor

fwyzard commented Jun 11, 2020

IIUC, the problem with deleting products before the end of the Event processing is that

  • products may be explicitly consumed by a module - which the framework can track and take into account;
  • products may be access via an edm::Ref, edm::Ptr, etc. that are part of other products - which the framework currently cannot know about.

So, a possibly silly suggestion...

Would it help to add reference counting to the product "branch", and make edm::RefProd, edm::Ref, edm::Ptr etc. increment/decrement it as needed ?
This would let the framework know when a product is no longer accessible, and thus can safely be deleted.

Unless what we want to do is to also delete products that do have live references to them, if the modules have somehow declared they do not intend to use those references ?

@makortel
Copy link
Contributor

  • products may be access via an edm::Ref, edm::Ptr, etc. that are part of other products - which the framework currently cannot know about.

This is indeed the main challenge (where etc. may also include raw pointers in transient products).

Would it help to add reference counting to the product "branch", and make edm::RefProd, edm::Ref, edm::Ptr etc. increment/decrement it as needed ?

I'm a bit concerned of the overheads. Every Ref etc construction/copy/destruction would have to touch the reference count. Each Ref etc would also have to have a pointer to the counter (size increases by 8 bytes), or we make the reference counts a singleton (I'd rather avoid). All RefVector, PtrVector, AssociationMap etc return newly created Ref/Ptr objects, and the counting is unnecessary unless the Refs etc are stored in new products. In that sense the per-Ref counting goes into too much detail (it is sufficient to touch the reference counts by referring product instead of referring element).

Using Refs as handles to the reference counts would also not solve the cases where raw pointers are used (ok, in principle such EDProducts could be required to hold a RefProd or RefCore to the EDProduct(s) where the raw pointers point to; and e.g. #16481 (comment) would add similar requirement).

Considering a possible deployment, being able to enable the early deletion gradually could be nice to be able to discover any "crazy stuff" in small pieces.

Unless what we want to do is to also delete products that do have live references to them, if the modules have somehow declared they do not intend to use those references ?

After a quick thought that sounds a bit complicated and error prone to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants