Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SetClusterPrefetch(true) breaks BulkIO with more than one basket #8962

Open
ktf opened this issue Sep 6, 2021 · 12 comments · Fixed by #11501 · May be fixed by #16640
Open

SetClusterPrefetch(true) breaks BulkIO with more than one basket #8962

ktf opened this issue Sep 6, 2021 · 12 comments · Fixed by #11501 · May be fixed by #16640
Assignees
Milestone

Comments

@ktf
Copy link
Contributor

ktf commented Sep 6, 2021

Describe the bug

Doing:

#include <TBufferFile.h>
#include <TFile.h>
#include <TTree.h>
#include <iostream>

void bar() {
  auto f = TFile::Open("/Users/ktf/Downloads/AO2D-2.root");
  char const *dfs[1] = {"DF_2449800039997798139;1"};
  
  for (auto di = 0; di < 1; di++) {
    auto df = dfs[di];
    auto d = (TDirectoryFile*)f->Get(df);
    
    std::cout << "Reading DF " << df  << std::endl;
    char const *treeNames[4] = {"O2bc", "O2collision", "O2fdd", "O2ft0"};

    for (auto ti = 0; ti < 4; ti++) {
      std::cout << ti << std::endl;
      std::cout << treeNames[ti] << std::endl;
      auto t = (TTree*)d->Get(treeNames[ti]);
      t->SetClusterPrefetch(true);
      auto branchList = t->GetListOfBranches();
      auto e = t->GetEntries();
      for (auto bi = 0; bi < branchList->GetEntries(); ++bi) {
        auto b = (TBranch*)branchList->At(bi);
        assert(b);
        int pos = 0;
        while (pos < e) {
          b->Print();
          TBufferFile buf(TBuffer::EMode::kWrite, 32*1024);
          auto &r = b->GetBulkRead();
          auto s = r.GetBulkEntries(0, buf);
          pos += s;
          std::cout << "Read " << s << " elements " << std::endl;
        }
      }
    }
  }
}

crashes with:

[22971:internal-dpl-aod-reader_t0]: Fatal: fExtraBasket == nullptr && "fExtraBasket should have been set to nullptr by GetFreshBasket" violated at line 1523 of `/Users/ktf/src/sw/SOUR
CES/ROOT/ad1ddb8593/0/tree/tree/src/TBranch.cxx'

if one of the branches has more than one basket. Removing t->SetClusterPrefetch(true) fixes the issue. Notice how it still works fine with branches which have only one basket. I can provide a file to reproduce if needed.

Expected behavior

I would expect it to work.

Setup

Vanilla 6.24.02 on linux or macOS.

@Axel-Naumann
Copy link
Member

Axel-Naumann commented Sep 7, 2021

@ktf how critical is this for ALICE?

@ktf
Copy link
Contributor Author

ktf commented Sep 7, 2021

With the current benchmarking we have, turning it off does not seem to alter performance much, so I would say it's not critical to have it fixed by "yesterday". However it would be nice if a fix could make the 6.24.08 timeline.

That said I reserve the right to bump the priority if further benchmarks show that SetClusterPrefech(true) gives a significant gain.

@ktf
Copy link
Contributor Author

ktf commented Sep 19, 2022

Out of curiosity, will this be fixed for 6.26?

@pcanal
Copy link
Member

pcanal commented Sep 20, 2022

It did not make it. I'll try to give a try ....

@ktf
Copy link
Contributor Author

ktf commented Sep 20, 2022

Of course I meant 6.28

@pcanal
Copy link
Member

pcanal commented Sep 20, 2022

Ah ;) so the likelihood increased exponentially :)

@Axel-Naumann Axel-Naumann added this to the 6.28/00 milestone Sep 21, 2022
@Axel-Naumann Axel-Naumann added the experiment Affects an experiment / reported by its software & computimng experts label Sep 21, 2022
@ktf
Copy link
Contributor Author

ktf commented Sep 21, 2022

Read my lips, no new taxes. ;-)

pcanal added a commit to pcanal/root that referenced this issue Oct 5, 2022
This fixes root-project#8962

The code pattern is similar to 0987896

  Add infrastructure for sharing memory in a TBuffer.
@pcanal
Copy link
Member

pcanal commented Oct 5, 2022

@ktf Can you try #11501 ?

pcanal added a commit that referenced this issue Oct 11, 2022
This fixes #8962

The code pattern is similar to 0987896

  Add infrastructure for sharing memory in a TBuffer.
pcanal added a commit that referenced this issue Oct 11, 2022
This fixes #8962

The code pattern is similar to 0987896

  Add infrastructure for sharing memory in a TBuffer.
pcanal added a commit that referenced this issue Oct 11, 2022
This fixes #8962

The code pattern is similar to 0987896

  Add infrastructure for sharing memory in a TBuffer.
pcanal added a commit that referenced this issue Oct 11, 2022
This fixes #8962

The code pattern is similar to 0987896

  Add infrastructure for sharing memory in a TBuffer.
@ktf
Copy link
Contributor Author

ktf commented Oct 4, 2024

This is actually still buggy and we now have a use case where not having it brings down the storage of some site, due to the excessive number of IOPS per server (50 IOPS, 720 concurrent accesses to 3 storage servers for a total of 0.2PB of data being read). Enabling this would allow us to reduce the IOPS by a factor ~5. The actual error that we get by enabling it is:

[1923904:internal-dpl-aod-reader]: Fatal: fExtraBasket == nullptr && "fExtraBasket should have been set to nullptr by GetFreshBasket" violated at line 1679 of `/local/workspace/DailyBuilds/DailyO2Physics-slc9/daily-tags.nRQdop69vk/SOURCES/ROOT/v6-32-06-alice1/v6-32-06-alice1/tree/tree/src/TBranch.cxx'
[1923904:internal-dpl-aod-reader]: aborting

I can provide some file which has the issue, if needed. This becomes rather urgent for us now.

@ktf ktf reopened this Oct 4, 2024
@pcanal
Copy link
Member

pcanal commented Oct 4, 2024

Yes, please point us to one of the file with the unexpected behavior.

@ktf
Copy link
Contributor Author

ktf commented Oct 4, 2024

By mail.

@ktf
Copy link
Contributor Author

ktf commented Oct 8, 2024

The testcase in the description is actually wrong, it only reads starting from the first entry. A corrected / expanded version is:

#include <TBufferFile.h>
#include <TFile.h>
#include <TTree.h>
#include <iostream>
#include <TKey.h>

void test() {
  auto f = TFile::Open("AO2D.root");

  for (auto&& keyAsObj : *f->GetListOfKeys()){
    auto key = (TKey*) keyAsObj;
    std::cout << key->GetName() << " " << key->GetClassName() << std::endl;
    auto d = (TDirectoryFile*)f->Get(key->GetName());

    if (key->GetName() == std::string("metaData")) {
      continue;
    }
    std::cout << "Reading DF " << key->GetName() << std::endl;

    char const *treeNames[] = {
      "O2trackextra_001",
      //"O2ambiguousfwdtr",
      //"O2ambiguousmfttr",
      //"O2ambiguoustrack",
      //"O2bc_001",
      //"O2calo",
      //"O2calotrigger",
      //"O2cascade_001",
      //"O2collision_001",
      //"O2cpvcluster",
      //"O2decay3body",
      //"O2fdd_001",
      //"O2ft0",
      //"O2fv0a",
      //"O2fwdtrack",
      //"O2fwdtrackcov",
      //"O2fwdtrkcl",
      //"O2hmpid_001",
      //"O2mccalolabel_001",
      //"O2mccollision",
      //"O2mccollisionlabel",
      //"O2mcfwdtracklabel",
      //"O2mcmfttracklabel",
      //"O2mcparticle_001",
      //"O2mctracklabel",
      //"O2mfttrack_001",
      //"O2origin",
      //"O2track_iu",
      //"O2trackcov_iu",
      //"O2tracked3body",
      //"O2trackedcascade",
      //"O2trackedv0",
      //"O2trackqa",
      //"O2v0_002",
      //"O2zdc_001",
    };

    for (auto ti = 0; ti < sizeof(treeNames)/sizeof(void*); ti++) {
      std::cout << ti << std::endl;
      std::cout << "  " << treeNames[ti] << std::endl;
      auto t = (TTree*)d->Get(treeNames[ti]);
      std::cout << t << std::endl;
      if (t == nullptr) {
        std::cout << "Missing branch." << treeNames[ti] << std::endl;
      }
      t->SetCacheSize(25000000);
      t->SetClusterPrefetch(true);
      auto branchList = t->GetListOfBranches();
      for (size_t bi = 0; bi < branchList->GetSize(); bi++) {
        t->AddBranchToCache((TBranch*)branchList->At(bi));
      }
      t->StopCacheLearningPhase();
      static TBufferFile buf(TBuffer::EMode::kWrite, 4*1024*1024);
      for (auto bi = 0; bi < branchList->GetEntries(); ++bi) {
        auto b = (TBranch*)branchList->At(bi);
        auto e = t->GetEntries();
        assert(b);
        int pos = 0;
        while (pos < e) {
          //b->Print();
          buf.Reset();
          auto &r = b->GetBulkRead();
          auto s = r.GetBulkEntries(pos, buf);
          pos += s;
          std::cout << "      " << b->GetName() << ": " << s << " elements read."  << std::endl;
          if (s == -1) {
            break;
          }
        }
        std::cout << "      " << b->GetName() << ": " << pos << " total."  << std::endl;
        b->SetStatus(false);
        b->DropBaskets("all");
        b->Reset();
        b->GetTransientBuffer(0)->Expand(0);
      }
    }
    break;
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants