Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test RooGradMinimizer #11

Open
egpbos opened this issue Oct 30, 2017 · 39 comments
Open

Test RooGradMinimizer #11

egpbos opened this issue Oct 30, 2017 · 39 comments

Comments

@egpbos
Copy link
Member

egpbos commented Oct 30, 2017

After #10 is done, we should write a test comparing optimization of a RooGaussian pdf via a regular RooMinimizer and maybe also RooGaussMinimizer to doing the same with RooGradMinimizer. We must get exactly the same results given the same dataset and initial parameter values.

Run with a few different tests (1D, multi-dim; small, big workspaces) to be sure it works reliably.

@egpbos
Copy link
Member Author

egpbos commented Oct 30, 2017

To summarize where we left off in #10, some first runs of the comparison of Minuit2 (via RooMinimizer) and RooGradMinimizer in GradMinimizer.cpp showed:

  • Likelihood values are typically equal in the first 14 digits
  • Parameter of interest mu is usually equal in its first 4 digits
  • Limits on mu are always a factor $\sqrt{2}$ higher in the RooGradMinimizer case
  • EDM is off by sometimes orders of magnitude, but always very small in both cases
  • Number of reported function calls is too low in RooGradMinimizer (in reality it calls the function one time less than Minuit2 does).

One additional observation from a few more test runs is that the RooGradMinimizer sometimes does not converge in the same number of function calls! Sometimes it takes a lot more with some warnings like:

Info in <Minuit2>: DavidonErrorUpdator: delgam < 0 : first derivatives increasing along search line
Info in <Minuit2>: VariableMetricBuilder: matrix not pos.def, gdel > 0
Info in <Minuit2>: gdel = 1.37581e+07
Info in <Minuit2>: gdel = -9.45864e+08
Info in <Minuit2>: DavidonErrorUpdator: delgam < 0 : first derivatives increasing along search line
Info in <Minuit2>: VariableMetricBuilder: matrix not pos.def, gdel > 0
Info in <Minuit2>: gdel = 2052
Info in <Minuit2>: gdel = -153427

or even

Info in <Minuit2>: DavidonErrorUpdator: delgam < 0 : first derivatives increasing along search line
Info in <Minuit2>: VariableMetricBuilder: matrix not pos.def, gdel > 0
Info in <Minuit2>: gdel = 2.59465e+07
Info in <Minuit2>: gdel = -1.96343e+09
Info in <Minuit2>: VariableMetricBuilder: no improvement in line search
Info in <Minuit2>: VariableMetricBuilder: iterations finish without convergence.
Info in <Minuit2>: VariableMetricBuilder : edm = 11495
Info in <Minuit2>:             requested : edmval = 2e-05
Info in <Minuit2>: VariableMetricBuilder: INVALID function minimum - edm is above tolerance, : edm = 2877.43
Info in <Minuit2>: VariableMetricBuilder: Required tolerance  is 10 x edmval  : edmval = 2e-05
Info in <Minuit2>: Minuit2Minimizer::Minimize : Minimization did NOT converge, Edm is above max
Minuit2Minimizer : Invalid Minimum - status = 3

Find out what causes these differences! Mainly:

  • \sqrt{2} difference in POI limits
  • POI difference > 4 digits
  • EDM different order of magnitude
  • DavidonErrorUpdator warning
  • VariableMetricBuilder line search warning

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

Tried replacing eps2 with 2*sqrt(eps), didn't change the any of the above points. Also tried replacing eps with the MnMachinePrecision version (which differs by a factor 4 from std::numeric_limits<double>::epsilon()), but also doesn't change the above points.

This was the last remaining difference with the Minuit2 implementation that I found, so there must be something that I missed.

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

Checked test script by setting boundaries on POI very high, so that effectively the minuit-internal and -external values should be equal (arctan thing to remove boundaries in internal parameter space). This uncovered very odd behavior, the POI jumps to large values in the GradMinimizer at the first step. Possibly caused by wrong initial gradient.

  • Compare GradMinimizer initial gradient values to those of Minuit initial gradient

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

Clearly this is an issue:

initial gradient minuit:
fGrd[0] = 1e+06	fG2[0] = 1e+12	fGstep[0] = 1e-07
our initial gradient:
fGrd[0] = 10	fG2[0] = 100	fGstep[0] = 0.01
  • Find out why they are different
  • Make them the same :P

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

In fact, the above was with mu[0,-100000,100000]. If we set mu[0,-10,10] we get:

fGrd[0] = 95.7004	fG2[0] = 9158.57	fGstep[0] = 0.00104493

for Minuit, whereas ours stays exactly the same.

egpbos added a commit that referenced this issue Nov 1, 2017
This gets rid of the sqrt(2) difference in the POI limits, see #11 (comment)
@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

By the way, the factor sqrt(2) in the POI limits was fixed by adding setErrorLevel to RooGradMinimizer.

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

We had set the error on mu to 0.1 manually. When this is not done, it apparently defaults to 2 (for boundaries -10,10) and the initial gradient changes to

fGrd[0] = 0.5	fG2[0] = 0.25	fGstep[0] = 0.2

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

When we then set mu[0,-1000,1000], it becomes

fGrd[0] = 0.005	fG2[0] = 2.5e-05	fGstep[0] = 0.5

So now it apparently is dependent on the boundaries too.

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

Perhaps after all I do need to do the int2ext and ext2int conversions. Apparently it influences the initial gradient in Minuit2 significantly, which maybe it should, so that the first step doesn't immediately go out of bounds.

  • Implement int2ext and ext2int conversions in NumericalDerivatorMinuit2
  • Apply them in the initial gradient calculator

@egpbos
Copy link
Member Author

egpbos commented Nov 1, 2017

Moving NumericalDerivatorMinuit2 from ROOT::Math to RooFit, since I really need to link with Minuit2 for the int2ext/ext2int conversions (and it also allows me to clean up the precision thing).

egpbos added a commit that referenced this issue Nov 2, 2017
@egpbos
Copy link
Member Author

egpbos commented Nov 2, 2017

Unfortunately, the initial gradient does not fix the differences in EDM and POI and also the number of function calls is still different.

@egpbos
Copy link
Member Author

egpbos commented Nov 2, 2017

I think the problem is that Minuit2 also transforms the gradient from external to internal coordinates using DInt2Ext in AnalyticalGradientCalculator::operator().

  • Implement an inverse DExt2Int function.
  • Transform our initial gradient (which is now in internal coordinates) to external coordinates using DExt2Int.

@egpbos
Copy link
Member Author

egpbos commented Nov 6, 2017

Tried dividing out the DInt2Ext factor in the initial gradient, but this doesn't help. In fact, it hardly changes things.

When printing out s0 at the beginning of the do-while loop of Minuit2::VariableMetricBuilder::Minimum(), the initial values are indeed quite far off from what they are in the Minuit2 case. For Minuit2 the first s0 is:

minimum function Value: 16138.46645139
minimum edm: 2069.61683815
minimum internal state vector: LAVector parameters:
    -0.2088278458461

minimum internal Gradient vector: LAVector parameters:
      -18447.1067083

minimum internal covariance matrix: LASymMatrix parameters:
  2.4327273e-05

For our derivator without the DInt2Ext correction the first s0 is:

minimum function Value: 56225.63481053
minimum edm: 1096476.547189
minimum internal state vector: LAVector parameters:
     -1.311874784789

minimum internal Gradient vector: LAVector parameters:
     -21982.38636908

minimum internal covariance matrix: LASymMatrix parameters:
   0.0090763172

For our derivator with the DInt2Ext correction the first s0 is:

minimum function Value: 56049.10221712
minimum edm: 1092248.220251
minimum internal state vector: LAVector parameters:
     -1.311874784789

minimum internal Gradient vector: LAVector parameters:
     -21939.96024556

minimum internal covariance matrix: LASymMatrix parameters:
   0.0090763172

@egpbos
Copy link
Member Author

egpbos commented Nov 6, 2017

It seems like the initial s0 comes from seed.State() in the top-level VariableMetricBuilder::Minimum(). Why this differs is unclear, but already in this top-level function, at creation time, it is different, as can be seen when setting printLevel to 2.

In Minuit2:

trying nominal calculation
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 1 strategy 0
MnSeedGenerator: for initial parameters FCN = 56023.82843258
fGrd[0] = 14.8443	fG2[0] = 220.354	fGstep[0] = 0.00673659
MnSeedGenerator: Initial state:   - FCN =   56023.82843258 Edm =     -3102.62 NCalls =      5
MnSeedGenerator: Negative G2 found - new state:   - FCN =   16164.38458147 Edm =      2053.17 NCalls =     14
VariableMetric: start iterating until Edm is < 0.001
VariableMetric: Initial state   - FCN =   16164.38458147 Edm =      2053.17 NCalls =     14

In our derivator:

trying GradMinimizer
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 0.01 strategy 0
fGrd[0] = 14.8443	fG2[0] = 220.354	fGstep[0] = 0.00673659
INTERNAL: fGrd[0] = 14.8443	fG2[0] = 220.354	fGstep[0] = 0.00673659
EXTERNAL: fGrd[0] = 19.3257	fG2[0] = 220.354	fGstep[0] = 0.00673659
VariableMetric: start iterating until Edm is < 1e-05
VariableMetric: Initial state   - FCN =   56023.82843258 Edm =  1.09004e+06 NCalls =      1

Actually, this shows a couple of differences:

  • VariableMetric edm target is different by a factor 100!
  • MnSeedGenerator does a lot of stuff, apparently, without going through the Numerical2PGradientCalculator!

But one important thing to notice: the initial FCN value is equal after all, just not anymore when the process reaches VariableMetricBuilder::Minimum().

@egpbos
Copy link
Member Author

egpbos commented Nov 6, 2017

There are two MnSeedGenerator::operator() overloads: one for generic GradientCalculators and one for AnalyticGradientCalculators. Our GradMinimizer uses the latter.

@egpbos
Copy link
Member Author

egpbos commented Nov 6, 2017

InitialGradientCalculator is called from MnSeedGenerator::operator() for AnalyticGradientCalculators... Did I even need to initialize it at all in our derivator?!

@egpbos
Copy link
Member Author

egpbos commented Nov 6, 2017

The two operator() functions do almost the same things, except:

  • There's a if(gc.CheckGradient()) block in the AnalyticGradientCalculator overload, which does some checking and aborts if checks don't pan out.

  • The block below if(ng2ls.HasNegativeG2(dgrad, prec)); in the AnalyticGradientCalculator case, this creates a new Numerical2PGradientCalculator! What?!

  • Find out why [second bullet above].

@egpbos
Copy link
Member Author

egpbos commented Nov 6, 2017

The second bullet doesn't change anything... Tried changing it to take the analytical gradient calculator instead of the Numerical 2P one and the answer stayed the same. Probably because the negative G2 is not encountered here. No, the G2 is in fact negative for the nominal run, but not for the GradMinimizer run!

  • Find out why G2 is negative in the nominal run but not in the GradMinimizer run.

@egpbos
Copy link
Member Author

egpbos commented Nov 8, 2017

As I just mailed Lorenzo, I think I found why I cannot exactly reproduce the Minuit2 results from an external gradient calculator: https://github.com/root-project/root/blob/master/math/minuit2/src/AnalyticalGradientCalculator.cxx#L45

The AnalyticalGradientCalculator only forwards the gradient itself, not the g2 and gstep.

Options to fix this:

  • Build a new class for an external derivator that can also calculate g2 and gstep
  • Modify AnalyticalGradientCalculator so that it can do this as well
  • Wait for Lorenzo's reply, that hopefully contains a tip that will save a lot of effort 😄

@egpbos
Copy link
Member Author

egpbos commented Nov 8, 2017

Lorenzo agrees that modifying the current class(es) to be able to also provide g2 and gstep is the best way to go.

  • Do this.

@egpbos
Copy link
Member Author

egpbos commented Nov 8, 2017

These have to be transformed as well, just like the gradient currently is transformed. We may need to implement a second derivative Jacobian for that, similar to DInt2Ext.

@egpbos
Copy link
Member Author

egpbos commented Nov 20, 2017

After some more int2ext/ext2int magic, I managed to get the exact same output in the tests. The exact values of the gradients, second derivatives and step sizes still differ during the minimization (inside the derivator), but apparently, this does not matter for the outcome in terms of traveling through the parameter space, since both the parameters x that are visited and the corresponding function values f(x) are identical for Minuit2 and RooGradMinimizer.

egpbos added a commit that referenced this issue Nov 20, 2017
@egpbos
Copy link
Member Author

egpbos commented Nov 20, 2017

I'm guessing the reason the gradients aren't identical internally is that I'm not transforming the step sizes at all...

egpbos added a commit that referenced this issue Nov 21, 2017
Another important step in #11.

Also removed many debug prints.
@egpbos
Copy link
Member Author

egpbos commented Nov 21, 2017

The step transformation wasn't the issue, I was doing the conversion from external back to internal wrong, was using the new parameter to transform, whereas I should be using the old parameter. Now fixed and now everything is exactly equal, only sometimes edm differs at the machine precision order, i.e. only in the last 1/2 digits, so that's fine. So we can finally tick all the above boxes!

Some remaining issues regarding Minuit extension are gathered in #18, since the goal of this issue is to get our gradient working.

For this issue, what remains is to do some additional tests. These can also be used when we move on to a parallelized version of RooGradMinimizer.

  • [ ] 1D (Gaussian)
  • [ ] multi-dimensional
  • [ ] small "realistic" workspace(s)
  • [ ] big "realistic" workspace(s)

@egpbos
Copy link
Member Author

egpbos commented Nov 21, 2017

In fact, let's do this properly and build real roottest tests and also make rootbench benchmarks out of them. New issue: #19

@egpbos
Copy link
Member Author

egpbos commented Nov 22, 2017

In trying to setup the Google Test version of GradMinimizer.cpp ( #19 ), I found that edm and the mu error are not actually 100% equal, though they always are very close, typically close to machine precision level, but sometimes they differ up to about the 6th (edm) or 8th (mu error) digits. Looking again at the test output, it seems like there is one remaining difference in the minimizer runs:

trying nominal calculation
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 1 strategy 0
...
trying GradMinimizer
Minuit2Minimizer: Minimize with max-calls 500 convergence for edm < 0.01 strategy 0

The aimed for edm values are different.

Apparently, we forgot to set this value in RooGradMinimizer, i.e. we took out setEps from our copy of RooMinimizer. Added it back in, compiling to run test...

@egpbos
Copy link
Member Author

egpbos commented Nov 22, 2017

Doesn't change anything in the resulting edm and mu.

@egpbos
Copy link
Member Author

egpbos commented Nov 22, 2017

The only remaining difference is that the default minimizer type is set to Minuit(1) in the ctor for RooMinimizer, whereas I set it to Minuit2 for RooGradMinimizer. Possibly, the initialization of parameters is different in these cases. Let's try setting it to Minuit 1 in the RooGradMinimizer as well...

@egpbos
Copy link
Member Author

egpbos commented Nov 22, 2017

This also doesn't change things. Let's leave this for now. I'll leave a reference to this discussion in the code if anyone wants to follow up.

@egpbos
Copy link
Member Author

egpbos commented Nov 27, 2017

The hierarchical test (see #19) ends up walking a slightly different path at the end. This shows that (for pathlogical cases, but still) the small differences we noticed above can in fact lead to large differences in end result. We therefore need to fix this now.

Let's try printing and comparing all the exact values in the derivative calculations of Minuit2 and GradMinimizer to find out where the differences first appear.

egpbos added a commit to roofit-dev/parallel-roofit-scripts that referenced this issue Nov 27, 2017
This is at least one cause of the problem in
roofit-dev/root#11 introducing 1-bit changes
in the floating point numbers that are converted back and forth.
@egpbos
Copy link
Member Author

egpbos commented Nov 27, 2017

At least one problem that was identified (the first one that appears) is with the conversion of parameters from internal to external parameter space and back. Specifically, in the double bounded case that we were testing, the functions involved are sin(double) and asin(double). Changing the calculations to long double would solve our problem, since then sin(asin(x)) == x in double precision. However, this would probably change results in previous fits...

@egpbos
Copy link
Member Author

egpbos commented Nov 27, 2017

I set the tests back to EXPECT_EQ (I had downgraded them to EXPECT_FLOAT_EQ).

Changing the precision of the int2ext / ext2int functions to long double fixes most of the tests, almost all of which did not pass before in EXPECT_EQ mode. This is using 10 for loop iterations, with 4 EXPECTs per loop, so 40 tests. In run 6 (7th run) three numbers were not exactly equal (mu, muerr and edm). In run 9 (10th), two numbers were unequal (mu and edm):

/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:115: Failure
      Expected: mu0
      Which is: 0.00096625
To be equal to: mu1
      Which is: 0.00096625
/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:116: Failure
      Expected: muerr0
      Which is: 0.0100001
To be equal to: muerr1
      Which is: 0.0100001
/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:117: Failure
      Expected: edm0
      Which is: 8.68852e-11
To be equal to: edm1
      Which is: 8.68851e-11

/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:115: Failure
      Expected: mu0
      Which is: 0.00199726
To be equal to: mu1
      Which is: 0.00199726

/Users/pbos/projects/apcocsm/root-roofit_gradient_testing-worktree/roofit/roofitcore/test/RooGradMinimizer.cxx:117: Failure
      Expected: edm0
      Which is: 1.10307e-10
To be equal to: edm1
      Which is: 1.10307e-10

@egpbos
Copy link
Member Author

egpbos commented Nov 28, 2017

RooFit tests (ctest -R 'roofit') run unchanged, so no side effects there.

@egpbos
Copy link
Member Author

egpbos commented Nov 28, 2017

Same with RooStats and HistFactory tests (ctest -R 'tutorial-roostats' / -histfactory') and Minuit tests (ctest -R 'minuit').

@egpbos
Copy link
Member Author

egpbos commented Nov 28, 2017

Other fitting tests (ctest -R 'fit' -E 'roofit', not sure whether they all use Minuit, but anyway) all pass as well.

@egpbos
Copy link
Member Author

egpbos commented Nov 28, 2017

Possibly relevant roottest tests (ctest -R 'math-mathcore|fit|meta-unit_unittests|roofit') don't show issues either.

@egpbos
Copy link
Member Author

egpbos commented Nov 28, 2017

To not have to deal with merging Minuit changes to long double into ROOT core, we may for the time being simply clone the parameter transformation functions into RooFit and make them long double.

  • Try this out.
  • Make sure it still works, i.e. that there are no problems with using both the double Minuit and the long double RooFit versions simultaneously.

egpbos added a commit that referenced this issue Dec 19, 2017
Passing all three RooGradMinimizer tests now! #11

This commit also contains a whole lot of debugging prints and some remaining code from many tries of getting the AnalyticalGradientCalculator to work (i.e. get every floating point bit exactly right). In the end, it turned out that getting that to work would involve rewriting almost all of Minuit2's classes into long double precision versions. The ExternalInternalGradientCalculator approach just assumes that the external gradient calculator does things in Minuit2-internal parameter space, which means many int2ext and ext2int steps are no longer necessary. Since these were the main cause of the precision loss, this problem is now gone.
@egpbos
Copy link
Member Author

egpbos commented Dec 19, 2017

We went for a different approach, see above commit message (3651eb8). Now works perfectly, up to the bit!

@egpbos
Copy link
Member Author

egpbos commented Dec 19, 2017

Cheered too early. The 1D Gauss tests are perfect, the multi-layer tree of gaussians and gamma functions as well, but the N-D Gaussian sum test is still off by quite a margin.

egpbos pushed a commit that referenced this issue Nov 21, 2018
…transactions.

This fixes https://sft.its.cern.ch/jira/browse/ROOT-9672 by having
cling::Interpreter::DeclareCFunction return the transaction containing the
compiled code.

With the previous code, cling::Interpreter::compileFunction will get confused by
transaction created during the callbacks executed during the
cling::IncrementalParser::commitTransaction of the main transaction.

Reproducer:

With a main composed of 'only':

int main(int argc, char ** argv)
{
  char const * class_string = (argc == 2) ? argv[1] : "std::vector<int>";
  auto const result [[gnu::unused]] = TClass::GetClass(class_string);
  return 0;
}

which is a representation of real use case (in a more complex setup) in ART.
We were getting:

Error in <TClingCallFunc::make_wrapper>: Failed to compile
  ==== SOURCE BEGIN ====
#pragma clang diagnostic push
#pragma clang diagnostic ignored "-Wformat-security"
__attribute__((used)) extern "C" void __cf_0(void* obj, int nargs, void** args, void* ret)
{
   if (ret) {
      (*(TStreamerInfo**)ret) = new TStreamerInfo();
      return;
   }
   else {
      new TStreamerInfo();
      return;
   }
}
#pragma clang diagnostic pop
  ==== SOURCE END ====
Error in <TClingCallFunc::ExecT>: Called with no wrapper, not implemented!
Error in <TVirtualStreamerInfo::Factory>: The plugin handler for TVirtualStreamerInfo was found but failed to create the factory object!

The reason is that during TClingCallFunc::make_wrapper, the call to cling::Interpreter::compileFunction ends with:

    if (const llvm::GlobalValue* GV
        = getLastTransaction()->getModule()->getNamedValue(name))

However in the 'broken' case, the getLastTransaction does not return the transaction for the code being compiled by DeclareCFunction but instead the one used/created at:

#0  cling::IncrementalParser::endTransaction (this=0x4a2980, T=0x8c0fb0) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/IncrementalParser.cpp:345
#1  0x00007fffeebc7899 in cling::Interpreter::PushTransactionRAII::pop (this=0x7fffffffcb00) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/Interpreter.cpp:111
#2  0x00007fffeebc785e in cling::Interpreter::PushTransactionRAII::~PushTransactionRAII (this=0x7fffffffcb00, __in_chrg=<optimized out>)
    at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/Interpreter.cpp:106
#3  0x00007fffeebeb659 in cling::LookupHelper::findScope (this=0x4a9dd0, className=..., diagOnOff=cling::LookupHelper::NoDiagnostics, resultType=0x7fffffffcd08, instantiateTemplate=false)
    at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/LookupHelper.cpp:466
#4  0x00007fffeeabe0df in TCling::CheckClassInfo (this=0x4a0550, name=<optimized out>, autoload=<optimized out>, isClassOrNamespaceOnly=<optimized out>)
    at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TCling.cxx:3630
#5  0x00007ffff7c3040d in TClass::Init (this=this@entry=0xdafd20, name=name@entry=0x7ffff7cb7638 "TGlobal", cversion=cversion@entry=2, typeinfo=typeinfo@entry=0x7ffff7d8b6d8 <typeinfo for TGlobal>, isa=isa@entry=0x477430,
    dfil=dfil@entry=0x7ffff7cb8cab "TGlobal.h", ifil=<optimized out>, dl=<optimized out>, il=<optimized out>, givenInfo=<optimized out>, silent=<optimized out>)
    at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TClass.cxx:1431
#6  0x00007ffff7c3a1b8 in TClass::TClass (this=0xdafd20, name=0x7ffff7cb7638 "TGlobal", cversion=<optimized out>, info=..., isa=0x477430, dfil=0x7ffff7cb8cab "TGlobal.h",
    ifil=0x7ffff7cccf88 "/local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TGlobal.cxx", dl=27, il=25, silent=false) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TClass.cxx:1273
#7  0x00007ffff7c3a72a in ROOT::CreateClass (cname=0x7ffff7cb7638 "TGlobal", id=id@entry=2, info=..., isa=isa@entry=0x477430, dfil=dfil@entry=0x7ffff7cb8cab "TGlobal.h",
    ifil=ifil@entry=0x7ffff7cccf88 "/local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TGlobal.cxx", dl=27, il=25) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TClass.cxx:5607
#8  0x00007ffff7c4b552 in ROOT::Internal::TDefaultInitBehavior::CreateClass (il=25, dl=27, ifil=0x7ffff7cccf88 "/local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TGlobal.cxx", dfil=0x7ffff7cb8cab "TGlobal.h",
    isa=0x477430, info=..., id=2, cname=<optimized out>, this=0x7ffff7da7508 <ROOT::Internal::DefineBehavior(void*, void*)::theDefault>) at /home/pcanal/root_builds/v6-14-00-patches/opt/include/Rtypes.h:176
#9  ROOT::TGenericClassInfo::GetClass (this=0x7ffff7dab660 <ROOT::GenerateInitInstanceLocal(TGlobal const*)::instance>) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TGenericClassInfo.cxx:250
#10 0x00007ffff7b1a2d8 in TGlobal::Class () at /home/pcanal/root_builds/v6-14-00-patches/opt/core/base/G__Core.cxx:17156
#11 0x00007ffff7ac01de in TGlobal::IsA (this=0xee3bc0) at /home/pcanal/root_builds/v6-14-00-patches/opt/include/TGlobal.h:48
#12 TGlobal::CheckTObjectHashConsistency (this=0xee3bc0) at /home/pcanal/root_builds/v6-14-00-patches/opt/include/TGlobal.h:48
#13 0x00007ffff7be9dcd in TObject::CheckedHash (this=0xee3bc0) at /home/pcanal/root_builds/v6-14-00-patches/opt/include/TObject.h:314
#14 THashTable::GetCheckedHashValue (this=0xe65a20, obj=0xee3bc0) at /home/pcanal/root_builds/v6-14-00-patches/opt/include/THashTable.h:94
#15 THashTable::Add (this=0xe65a20, obj=0xee3bc0) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/cont/src/THashTable.cxx:96
#16 0x00007ffff7be6bf1 in THashList::AddLast (this=this@entry=0x5be690, obj=obj@entry=0xee3bc0) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/cont/src/THashList.cxx:100
#17 0x00007ffff7c4e0d1 in TListOfDataMembers::AddLast (this=0x5be690, obj=0xee3bc0) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TListOfDataMembers.cxx:103
#18 0x00007ffff7ab8785 in TList::Add (obj=0xee3bc0, this=0x5be690) at /home/pcanal/root_builds/v6-14-00-patches/opt/include/TList.h:87
#19 TROOT::GetListOfGlobals (this=0x7ffff7da7a60 <ROOT::Internal::GetROOT1()::alloc>, load=load@entry=false) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/base/src/TROOT.cxx:1767
#20 0x00007fffeeab1058 in TCling::HandleNewDecl (this=0x4a0550, DV=0xedf238, isDeserialized=isDeserialized@entry=true, modifiedTClasses=...) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TCling.cxx:555
#21 0x00007fffeeabb785 in TCling::UpdateListsOnCommitted (this=0x4a0550, T=...) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TCling.cxx:6115
#22 0x00007fffeebd0103 in cling::MultiplexInterpreterCallbacks::TransactionCommitted (this=0x57fe20, T=...) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/MultiplexInterpreterCallbacks.h:76
#23 0x00007fffeed05d71 in cling::IncrementalParser::commitTransaction (this=0x4a2980, PRT=..., ClearDiagClient=true) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/IncrementalParser.cpp:532
#24 0x00007fffeed06399 in cling::IncrementalParser::Compile (this=0x4a2980, input=..., Opts=...) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/IncrementalParser.cpp:663
#25 0x00007fffeebcbc4e in cling::Interpreter::DeclareInternal (this=0x4a0f30, input=..., CO=..., T=0x7fffffffd680) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/Interpreter.cpp:1195
root-project#26 0x00007fffeebca8e8 in cling::Interpreter::declare (this=0x4a0f30, input=..., T=0x7fffffffd680) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/Interpreter.cpp:823
root-project#27 0x00007fffeebcb560 in cling::Interpreter::DeclareCFunction (this=0x4a0f30, name=..., code=..., withAccessControl=true) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/Interpreter.cpp:1096
root-project#28 0x00007fffeebcb862 in cling::Interpreter::compileFunction (this=0x4a0f30, name=..., code=..., ifUnique=false, withAccessControl=true)
    at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/interpreter/cling/lib/Interpreter/Interpreter.cpp:1140
root-project#29 0x00007fffeeafb83c in TClingCallFunc::compile_wrapper (withAccessControl=true, wrapper=..., wrapper_name=..., this=0xcf3c10) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TClingCallFunc.cxx:270
root-project#30 TClingCallFunc::make_wrapper (this=this@entry=0xcf3c10) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TClingCallFunc.cxx:1096
root-project#31 0x00007fffeeafbcb8 in TClingCallFunc::IFacePtr (this=this@entry=0xcf3c10) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TClingCallFunc.cxx:2233
root-project#32 0x00007fffeeafbe83 in TClingCallFunc::ExecT<long> (address=0x0, this=0xcf3c10) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TClingCallFunc.cxx:2045
root-project#33 TClingCallFunc::ExecInt (this=0xcf3c10, address=0x0) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/metacling/src/TClingCallFunc.cxx:2065
root-project#34 0x00007ffff7c56e8d in TMethodCall::Execute (this=0xd97710, object=<optimized out>, retLong=@0x7fffffffd958: 0) at /local2/pcanal/cint_working/rootcling/v6-14-00-patches/core/meta/src/TMethodCall.cxx:457
root-project#35 0x0000000000401009 in TMethodCall::Execute(long&) ()
root-project#36 0x00000000004010ea in long TPluginHandler::ExecPluginImpl<>() ()
root-project#37 0x000000000040106d in long TPluginHandler::ExecPlugin<>(int) ()
root-project#38 0x0000000000400e21 in mytest() ()
root-project#39 0x0000000000400e92 in main ()
egpbos pushed a commit that referenced this issue May 29, 2021
Before, MetaParser might have pointed to a StringRef whose storage
was gone, see asan failure in roottest/cling/other/runfileClose.C below.

This was caused by recursive uses of MetaParser; see stack trace below:
the inner recursion returned, but as the same MetaParser object was used
by both frames, the objects cursor now pointed to freed memory.

Instead, create a MetaParser (and MetaLexer) object per input. That way,
their lifetime corresponds to the lifetime of their input.

=================================================================
==529104==ERROR: AddressSanitizer: stack-use-after-return on address 0x7ffff3afd82a at pc 0x7fffea18df6d bp 0x7fffffff8170 sp 0x7fffffff8168
READ of size 1 at 0x7ffff3afd82a thread T0
[Detaching after fork from child process 529183]
    #0 0x7fffea18df6c in cling::MetaLexer::Lex(cling::Token&) src/interpreter/cling/lib/MetaProcessor/MetaLexer.cpp:58:11
    #1 0x7fffea190d7c in cling::MetaParser::lookAhead(unsigned int) src/interpreter/cling/lib/MetaProcessor/MetaParser.cpp:89:15
    #2 0x7fffea190bd5 in cling::MetaParser::consumeToken() src/interpreter/cling/lib/MetaProcessor/MetaParser.cpp:49:5
    #3 0x7fffea191d4d in cling::MetaParser::isLCommand(cling::MetaSema::ActionResult&) src/interpreter/cling/lib/MetaProcessor/MetaParser.cpp:147:9
    #4 0x7fffea1914dd in cling::MetaParser::isCommand(cling::MetaSema::ActionResult&, cling::Value*) src/interpreter/cling/lib/MetaProcessor/MetaParser.cpp:123:12
    #5 0x7fffea191216 in cling::MetaParser::isMetaCommand(cling::MetaSema::ActionResult&, cling::Value*) src/interpreter/cling/lib/MetaProcessor/MetaParser.cpp:101:33
    #6 0x7fffea14e5aa in cling::MetaProcessor::process(llvm::StringRef, cling::Interpreter::CompilationResult&, cling::Value*, bool) src/interpreter/cling/lib/MetaProcessor/MetaProcessor.cpp:317:24
    #7 0x7fffe99b67b7 in HandleInterpreterException(cling::MetaProcessor*, char const*, cling::Interpreter::CompilationResult&, cling::Value*) src/core/metacling/src/TCling.cxx:2431:29
    #8 0x7fffe99bde30 in TCling::Load(char const*, bool) src/core/metacling/src/TCling.cxx:3454:10
    #9 0x7ffff7865f11 in TSystem::Load(char const*, char const*, bool) src/core/base/src/TSystem.cxx:1941:27
    #10 0x7ffff7b8a0e3 in TUnixSystem::Load(char const*, char const*, bool) src/core/unix/src/TUnixSystem.cxx:2789:20
    #11 0x7fffd78dd08b  (<unknown module>)
    #12 0x7fffe9f8a5d9 in cling::IncrementalExecutor::executeWrapper(llvm::StringRef, cling::Value*) const src/interpreter/cling/lib/Interpreter/IncrementalExecutor.cpp:376:3
    #13 0x7fffe9d73dc2 in cling::Interpreter::RunFunction(clang::FunctionDecl const*, cling::Value*) src/interpreter/cling/lib/Interpreter/Interpreter.cpp:1141:20
    #14 0x7fffe9d6e317 in cling::Interpreter::EvaluateInternal(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::CompilationOptions, cling::Value*, cling::Transaction**, unsigned long) src/interpreter/cling/lib/Interpreter/Interpreter.cpp:1391:29
    #15 0x7fffe9d6c1fe in cling::Interpreter::process(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cling::Value*, cling::Transaction**, bool) src/interpreter/cling/lib/Interpreter/Interpreter.cpp:819:9
    #16 0x7fffea151826 in cling::MetaProcessor::readInputFromFile(llvm::StringRef, cling::Value*, unsigned long, bool) src/interpreter/cling/lib/MetaProcessor/MetaProcessor.cpp:507:22
    #17 0x7fffe99b585b in TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) src/core/metacling/src/TCling.cxx:2570:39
    #18 0x7fffe99bbfee in TCling::ProcessLineSynch(char const*, TInterpreter::EErrorCode*) src/core/metacling/src/TCling.cxx:3496:17
    #19 0x7ffff77203d3 in TApplication::ExecuteFile(char const*, int*, bool) src/core/base/src/TApplication.cxx:1608:30
    #20 0x7ffff771ebdf in TApplication::ProcessFile(char const*, int*, bool) src/core/base/src/TApplication.cxx:1480:11
    #21 0x7ffff771e385 in TApplication::ProcessLine(char const*, bool, int*) src/core/base/src/TApplication.cxx:1453:14
    #22 0x7ffff7f8157a in TRint::ProcessLineNr(char const*, char const*, int*) src/core/rint/src/TRint.cxx:766:11
    #23 0x7ffff7f802f0 in TRint::Run(bool) src/core/rint/src/TRint.cxx:424:22
    #24 0x4ff96d in main src/main/src/rmain.cxx:30:12
    #25 0x7ffff6e040b2 in __libc_start_main /build/glibc-YbNSs7/glibc-2.31/csu/../csu/libc-start.c:308:16
    root-project#26 0x41f35d in _start (asan/bin/root.exe+0x41f35d)

Address 0x7ffff3afd82a is located in stack of thread T0 at offset 42 in frame
    #0 0x7fffe99b3d8f in TCling::ProcessLine(char const*, TInterpreter::EErrorCode*) src/core/metacling/src/TCling.cxx:2456

  This frame has 21 object(s):
    [32, 56) 'sLine' (line 2462) <== Memory access at offset 42 is inside this variable
    [96, 104) 'R__guard2471' (line 2471)
    [128, 136) 'R__guard2488' (line 2488)
    [160, 176) 'interpreterFlagsRAII' (line 2491)
    [192, 240) 'result' (line 2511)
    [272, 276) 'compRes' (line 2512)
    [288, 312) 'mod_line' (line 2517)
    [352, 376) 'aclicMode' (line 2518)
    [416, 440) 'arguments' (line 2519)
    [480, 504) 'io' (line 2520)
    [544, 568) 'fname' (line 2521)
    [608, 632) 'ref.tmp' (line 2547)
    [672, 696) 'ref.tmp145' (line 2547)
    [736, 768) 'code' (line 2555)
    [800, 832) 'codeline' (line 2556)
    [864, 1384) 'in' (line 2559)
    [1520, 1552) 'ref.tmp176' (line 2562)
    [1584, 1600) 'agg.tmp'
    [1616, 1624) 'ref.tmp198' (line 2568)
    [1648, 1664) 'agg.tmp207'
    [1680, 1696) 'autoParseRaii' (line 2588)
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-use-after-return src/interpreter/cling/lib/MetaProcessor/MetaLexer.cpp:58:11 in cling::MetaLexer::Lex(cling::Token&)
Shadow bytes around the buggy address:
  0x10007e757ab0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757ac0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757ad0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757ae0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757af0: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
=>0x10007e757b00: f5 f5 f5 f5 f5[f5]f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757b10: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757b20: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757b30: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757b40: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
  0x10007e757b50: f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5 f5
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==529104==ABORTING

    at src/interpreter/cling/lib/MetaProcessor/MetaLexer.cpp:49
    at src/interpreter/cling/lib/MetaProcessor/MetaParser.cpp:41
    compRes=@0x7ffff3afd910: cling::Interpreter::kSuccess, result=0x7ffff3afd8c0, disableValuePrinting=false)
    at src/interpreter/cling/lib/MetaProcessor/MetaProcessor.cpp:314
    input_line=0x7ffff3afd829 "#define XYZ 21", compRes=@0x7ffff3afd910: cling::Interpreter::kSuccess,
    result=0x7ffff3afd8c0) at src/core/metacling/src/TCling.cxx:2431
    error=0x7fffd78cb0f4 <x>) at src/core/metacling/src/TCling.cxx:2591
    sync=false, err=0x7fffd78cb0f4 <x>) at src/core/base/src/TApplication.cxx:1472
    line=0x7fffd78c9000 "#define XYZ 21", error=0x7fffd78cb0f4 <x>)
    at src/core/base/src/TROOT.cxx:2328
   from asan/roottest/cling/other/fileClose_C.so
    filename=0x6070000f0fd0 "asan/roottest/cling/other/fileClose_C.so", flag=257)
    at /home/axel/build/llvm/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:6270
    at src/interpreter/cling/lib/Utils/PlatformPosix.cpp:118
    permanent=false, resolved=true)
    at src/interpreter/cling/lib/Interpreter/DynamicLibraryManager.cpp:184
    at src/interpreter/cling/lib/Interpreter/Interpreter.cpp:1444
    T=0x0) at src/interpreter/cling/lib/Interpreter/Interpreter.cpp:1560
    at src/interpreter/cling/lib/MetaProcessor/MetaSema.cpp:57
    actionResult=@0x7ffff39532b0: cling::MetaSema::AR_Success)
egpbos pushed a commit that referenced this issue Oct 27, 2021
This tutorial crashed with the following stacktrace when run
interactively. Avoiding to register a canvas with the same name
multiple times fixes the crash:

```
    #8  0x00007f5b7876967d in TCanvas::Resize(char const*) (this=0x55e768e126c0) at ../graf2d/gpad/src/TCanvas.cxx:1740
    #9  0x00007f5b3e90d668 in TRootCanvas::HandleContainerConfigure(Event_t*) (this=0x55e76852b460) at ../gui/gui/src/TRootCanvas.cxx:1789
    #10 0x00007f5b3e8464fd in TGFrame::HandleEvent(Event_t*) (this=0x55e767938e70, event=0x7f5b79adff40) at ../gui/gui/src/TGFrame.cxx:476
    #11 0x00007f5b3e7f4c9a in TGClient::HandleEvent(Event_t*) (this=0x55e768de2290, event=0x7f5b79adff40) at ../gui/gui/src/TGClient.cxx:846
    #12 0x00007f5b3e7f531d in TGClient::ProcessOneEvent() (this=0x55e768de2290) at ../gui/gui/src/TGClient.cxx:656
    #13 TGClient::ProcessOneEvent() (this=0x55e768de2290) at ../gui/gui/src/TGClient.cxx:648
    #14 0x00007f5b3e7f536b in TGClient::HandleInput() (this=0x55e768de2290) at ../gui/gui/src/TGClient.cxx:703
    #15 0x00007f5b8dcb0ff8 in TUnixSystem::DispatchOneEvent(bool) (this=0x55e75ccfd080, pendingOnly=<optimized out>) at ../core/unix/src/TUnixSystem.cxx:1067
    #16 0x00007f5b8dbd0dca in TSystem::ProcessEvents() (this=0x55e75ccfd080) at ../core/base/src/TSystem.cxx:424
    #17 0x00007f5b8130600d in  ()
    #18 0x00007f5b79ae0450 in  ()
    #19 0x00007f5b8de5215f in WrapperCall(Cppyy::TCppMethod_t, size_t, void*, void*, void*) (method=94452242807424, nargs=0, args_=0x7f5b79ae01d7, self=0x55e75ccfd080, result=0x7f5b79ae01d7) at ../bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx:778
    #20 0x00007f5b8de527cf in CallT<unsigned char> (args=<optimized out>, nargs=<optimized out>, self=<optimized out>, method=<optimized out>) at ../bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx:816
    #21 Cppyy::CallB(long, void*, unsigned long, void*) (method=<optimized out>, self=<optimized out>, nargs=<optimized out>, args=<optimized out>) at ../bindings/pyroot/cppyy/cppyy-backend/clingwrapper/src/clingwrapper.cxx:833
    #22 0x00007f5b8decdc0f in GILCallB (ctxt=0x7f5b79ae0430, self=<optimized out>, method=<optimized out>) at ../bindings/pyroot/cppyy/CPyCppyy/src/Executors.cxx:69
    #23 CPyCppyy::(anonymous namespace)::BoolExecutor::Execute(Cppyy::TCppMethod_t, Cppyy::TCppObject_t, CPyCppyy::CallContext*) (this=<optimized out>, method=<optimized out>, self=<optimized out>, ctxt=0x7f5b79ae0430) at ../bindings/pyroot/cppyy/CPyCppyy/src/Executors.cxx:148
    #24 0x00007f5b8deba4c9 in CPyCppyy::CPPMethod::ExecuteFast(void*, long, CPyCppyy::CallContext*) (self=<optimized out>, offset=<optimized out>, ctxt=<optimized out>, this=<optimized out>, this=<optimized out>) at ../bindings/pyroot/cppyy/CPyCppyy/src/CPPMethod.cxx:74
    #25 0x00007f5b8debd3a8 in CPyCppyy::CPPMethod::ExecuteProtected(void*, long, CPyCppyy::CallContext*) (this=this entry=0x55e760617f50, self=0x55e75ccfd080, offset=0, ctxt=0x7f5b79ae0430) at ../bindings/pyroot/cppyy/CPyCppyy/src/CPPMethod.cxx:149
    root-project#26 0x00007f5b8debb6fa in CPyCppyy::CPPMethod::Execute(void*, long, CPyCppyy::CallContext*) (this=this entry=0x55e760617f50, self=self entry=0x55e75ccfd080, offset=<optimized out>, ctxt=ctxt entry=0x7f5b79ae0430) at ../bindings/pyroot/cppyy/CPyCppyy/src/CPPMethod.cxx:728
    root-project#27 0x00007f5b8debc46c in CPyCppyy::CPPMethod::Call(CPyCppyy::CPPInstance*&, _object*, _object*, CPyCppyy::CallContext*) (this=0x55e760617f50, self= 0x7f5b8080ef50: 0x7f5b808043c0, args=0x7f5b8e1ab040, kwds=<optimized out>, ctxt=0x7f5b79ae0430) at ../bindings/pyroot/cppyy/CPyCppyy/src/CPPMethod.cxx:783
    root-project#28 0x00007f5b8dec09fe in CPyCppyy::(anonymous namespace)::mp_call(CPyCppyy::CPPOverload*, PyObject*, PyObject*) (pymeth=0x7f5b8080ef40, args=0x7f5b8e1ab040, kwds=0x0) at ../bindings/pyroot/cppyy/CPyCppyy/src/CPPOverload.cxx:566
    root-project#29 0x00007f5b8e941333 in _PyObject_MakeTpCall () at /usr/lib/libpython3.9.so.1.0
    root-project#30 0x00007f5b8e93d218 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.9.so.1.0
    root-project#31 0x00007f5b8e936fd9 in  () at /usr/lib/libpython3.9.so.1.0
    root-project#32 0x00007f5b8e948b8e in _PyFunction_Vectorcall () at /usr/lib/libpython3.9.so.1.0
    root-project#33 0x00007f5b8e93aec9 in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.9.so.1.0
    root-project#34 0x00007f5b8e94896b in _PyFunction_Vectorcall () at /usr/lib/libpython3.9.so.1.0
    root-project#35 0x00007f5b8e93858e in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.9.so.1.0
    root-project#36 0x00007f5b8e94896b in _PyFunction_Vectorcall () at /usr/lib/libpython3.9.so.1.0
    root-project#37 0x00007f5b8e93858e in _PyEval_EvalFrameDefault () at /usr/lib/libpython3.9.so.1.0
    root-project#38 0x00007f5b8e94896b in _PyFunction_Vectorcall () at /usr/lib/libpython3.9.so.1.0
    root-project#39 0x00007f5b8e95795b in  () at /usr/lib/libpython3.9.so.1.0
    root-project#40 0x00007f5b8ea3cac6 in  () at /usr/lib/libpython3.9.so.1.0
    root-project#41 0x00007f5b8ea17554 in  () at /usr/lib/libpython3.9.so.1.0
    root-project#42 0x00007f5b8e62c259 in start_thread () at /usr/lib/libpthread.so.0
    root-project#43 0x00007f5b8e7425e3 in clone () at /usr/lib/libc.so.6
```
Zeff020 pushed a commit that referenced this issue Jun 1, 2022
Most commonly seen on ppc64le. Backtrace:

===========================================================
The lines below might hint at the cause of the crash.
You may get help by asking at the ROOT forum https://root.cern/forum
Only if you are really convinced it is a bug in ROOT then please submit a
report at https://root.cern/bugs Please post the ENTIRE stack trace
from above as an attachment in addition to anything else
that might help us fixing this issue.
===========================================================
 #11 ROOT::Experimental::RColor::toHex[abi:cxx11](unsigned char) (v=<optimized out>) at /usr/include/c++/11/ext/new_allocator.h:82
 #12 0x00007fff90c220ec in ROOT::Experimental::RColor::SetRGB (this=0x7fffeadf5d10, r=<optimized out>, g=<optimized out>, b=<optimized out>) at /usr/include/c++/11/ext/new_allocator.h:89
egpbos pushed a commit that referenced this issue Oct 30, 2024
The test was dynamically allocating the array data members of the `Data` struct, but never deallocating them. This commit polishes the `Data` struct definition and ensures proper management of the data members.

The previous way of writing data to the TTree was leading to a bad memory access in the ReadBasicPointer inlined function in TStreamerInfoReadBuffer.cxx while reading the `double*` array. In particular, the issue arises when accessing and then deallocating the array at the current index provided by the `TCompInfo` object.

```
Target 0: (repro.out) stopped.
(lldb)
Process 13498 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001044cf140 libRIO.so`int TStreamerInfo::ReadBuffer<char**>(this=<unavailable>, b=<unavailable>, arr=<unavailable>, compinfo=<unavailable>, first=<unavailable>, last=<unavailable>, narr=<unavailable>, eoffset=<unavailable>, arrayMode=0) at TStreamerInfoReadBuffer.cxx:923:65 [opt]
   920 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kLong:   ReadBasicPointer(Long_t);  continue;
   921 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kLong64: ReadBasicPointer(Long64_t);  continue;
   922 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kFloat:  ReadBasicPointer(Float_t);  continue;
-> 923 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kDouble: ReadBasicPointer(Double_t);  continue;
   924 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kUChar:  ReadBasicPointer(UChar_t);  continue;
   925 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kUShort: ReadBasicPointer(UShort_t);  continue;
   926 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kUInt:   ReadBasicPointer(UInt_t);  continue;
Target 0: (repro.out) stopped.
(lldb)
Process 13498 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001044cf184 libRIO.so`int TStreamerInfo::ReadBuffer<char**>(TBuffer&, char** const&, TStreamerInfo::TCompInfo* const*, int, int, int, int, int) [inlined] TBuffer::BufferSize(this=0x000060e00010ef00) const at TBuffer.h:98:41 [opt]
   95  	   TObject *GetParent()  const;
   96  	   char    *Buffer()     const { return fBuffer; }
   97  	   char    *GetCurrent() const { return fBufCur; }
-> 98  	   Int_t    BufferSize() const { return fBufSize; }
   99  	   void     DetachBuffer() { fBuffer = nullptr; }
   100 	   Int_t    Length()     const { return (Int_t)(fBufCur - fBuffer); }
   101 	   void     Expand(Int_t newsize, Bool_t copy = kTRUE);  // expand buffer to newsize
Target 0: (repro.out) stopped.
(lldb) p fBufSize
(Int_t) 32008
(lldb) s
Process 13498 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step in
    frame #0: 0x00000001044cf194 libRIO.so`int TStreamerInfo::ReadBuffer<char**>(this=<unavailable>, b=<unavailable>, arr=<unavailable>, compinfo=<unavailable>, first=<unavailable>, last=<unavailable>, narr=<unavailable>, eoffset=<unavailable>, arrayMode=0) at TStreamerInfoReadBuffer.cxx:923:65 [opt]
   920 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kLong:   ReadBasicPointer(Long_t);  continue;
   921 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kLong64: ReadBasicPointer(Long64_t);  continue;
   922 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kFloat:  ReadBasicPointer(Float_t);  continue;
-> 923 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kDouble: ReadBasicPointer(Double_t);  continue;
   924 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kUChar:  ReadBasicPointer(UChar_t);  continue;
   925 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kUShort: ReadBasicPointer(UShort_t);  continue;
   926 	         case TStreamerInfo::kOffsetP + TStreamerInfo::kUInt:   ReadBasicPointer(UInt_t);  continue;
Target 0: (repro.out) stopped.
(lldb) s
Process 13498 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xbebebebebebebeae)
    frame #0: 0x0000000107bac674 libclang_rt.asan_osx_dynamic.dylib`__asan::Allocator::Deallocate(void*, unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType) + 76
libclang_rt.asan_osx_dynamic.dylib`__asan::Allocator::Deallocate:
->  0x107bac674 <+76>: casalb w8, w9, [x22]
    0x107bac678 <+80>: cmp    w8, #0x2
    0x107bac67c <+84>: b.ne   0x107bac6f4    ; <+204>
    0x107bac680 <+88>: mov    x8, #-0x100000000 ; =-4294967296
Target 0: (repro.out) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xbebebebebebebeae)
  * frame #0: 0x0000000107bac674 libclang_rt.asan_osx_dynamic.dylib`__asan::Allocator::Deallocate(void*, unsigned long, unsigned long, __sanitizer::BufferedStackTrace*, __asan::AllocType) + 76
    frame #1: 0x0000000107c0c444 libclang_rt.asan_osx_dynamic.dylib`wrap__ZdaPv + 232
    frame #2: 0x00000001044d4a60 libRIO.so`int TStreamerInfo::ReadBuffer<char**>(this=<unavailable>, b=<unavailable>, arr=<unavailable>, compinfo=<unavailable>, first=<unavailable>, last=<unavailable>, narr=<unavailable>, eoffset=<unavailable>, arrayMode=0) at TStreamerInfoReadBuffer.cxx:923:65 [opt]
    frame #3: 0x0000000103ffc888 libRIO.so`TStreamerInfoActions::GenericReadAction(buf=0x000060e00010ef00, addr=0x0000602000056bd0, config=0x0000604000149910) at TStreamerInfoActions.cxx:195:45
    frame #4: 0x0000000103caa5ec libRIO.so`TStreamerInfoActions::TConfiguredAction::operator()(this=0x00006030001693f0, buffer=0x000060e00010ef00, object=0x0000602000056bd0) const at TStreamerInfoActions.h:123:17
    frame #5: 0x0000000103ca9ef8 libRIO.so`TBufferFile::ApplySequence(this=0x000060e00010ef00, sequence=0x000060600011ac20, obj=0x0000602000056bd0) at TBufferFile.cxx:3702:10
    frame #6: 0x00000001064bc570 libTree.so`TBranchElement::ReadLeavesMemberBranchCount(this=0x0000619000566380, b=0x000060e00010ef00) at TBranchElement.cxx:4603:6
    frame #7: 0x0000000106455ce4 libTree.so`TBranch::GetEntry(this=0x0000619000566380, entry=0, getall=0) at TBranch.cxx:1753:4
    frame #8: 0x00000001064a1764 libTree.so`TBranchElement::GetEntry(this=0x0000619000566380, entry=0, getall=0) at TBranchElement.cxx:2783:27
    frame #9: 0x000000010739915c libTreePlayer.so`ROOT::Detail::TBranchProxy::Read(this=0x00006110000c9580) at TBranchProxy.h:163:42
    frame #10: 0x0000000107649ba8 libTreePlayer.so`(anonymous namespace)::TObjectArrayReader::At(this=0x0000603000169900, proxy=0x00006110000c9580, idx=1) at TTreeReaderArray.cxx:176:22
    frame #11: 0x000000010000c2e4 repro.out`ROOT::Internal::TTreeReaderArrayBase::UntypedAt(this=0x000000016fdfe740, idx=1) const at TTreeReaderArray.h:41:62
    frame #12: 0x000000010000c200 repro.out`TTreeReaderArray<double>::At(this=0x000000016fdfe740, idx=1) at TTreeReaderArray.h:205:54
    frame #13: 0x00000001000065e0 repro.out`TTreeReaderArray<double>::operator[](this=0x000000016fdfe740, idx=1) at TTreeReaderArray.h:207:44
    frame #14: 0x0000000100007b48 repro.out`simpleTest() at repro.cpp:123:26
    frame #15: 0x0000000100007e10 repro.out`main at repro.cpp:128:5
    frame #16: 0x000000018c718274 dyld`start + 2840
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant