Skip to content

Commit

Permalink
fix: intro long
Browse files Browse the repository at this point in the history
  • Loading branch information
jonas-eschle committed Apr 12, 2024
1 parent 198ba2b commit c4ce783
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 310 deletions.
169 changes: 14 additions & 155 deletions _website/tutorials/introduction/Introduction_long.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -186,146 +186,6 @@
"data_normal.n_obs"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"## Data\n",
"\n",
"This component in general plays a minor role in zfit: it is mostly to provide a unified interface for data.\n",
"\n",
"Preprocessing is therefore not part of zfit and should be done beforehand. Python offers many great possibilities to do so (e.g. Pandas).\n",
"\n",
"zfit `Data` can load data from various sources, most notably from Numpy, Pandas DataFrame, TensorFlow Tensor and ROOT (using uproot). It is also possible, for convenience, to convert it directly `to_pandas`. The constructors are named `from_numpy`, `from_root` etc."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import zfit\n",
"# znp is a subset of numpy functions with a numpy interface but using actually the zfit backend (currently TF)\n",
"import zfit.z.numpy as znp\n",
"from zfit import z"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"A `Data` needs not only the data itself but also the observables: the human readable string identifiers of the axes (corresponding to \"columns\" of a Pandas DataFrame). It is convenient to define the `Space` not only with the observable but also with a limit: this can directly be re-used as the normalization range in the PDF.\n",
"\n",
"First, let's define our observables"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"obs = zfit.Space('obs1', (-5, 10))"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"This `Space` has limits. Next to the effect of handling the observables, we can also play with the limits: multiple `Spaces` can be added to provide disconnected ranges. More importantly, `Space` offers functionality:\n",
"- limit1d: return the lower and upper limit in the 1 dimensional case (raises an error otherwise)\n",
"- rect_limits: return the n dimensional limits\n",
"- area(): calculate the area (e.g. distance between upper and lower)\n",
"- inside(): return a boolean Tensor corresponding to whether the value is _inside_ the `Space`\n",
"- filter(): filter the input values to only return the one inside"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"size_normal = 10000\n",
"data_normal_np = np.random.normal(size=size_normal, scale=2)\n",
"\n",
"data_normal = zfit.Data.from_numpy(obs=obs, array=data_normal_np)"
]
},
{
"cell_type": "markdown",
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"The main functionality is\n",
"- nevents: attribute that returns the number of events in the object\n",
"- data_range: a `Space` that defines the limits of the data; if outside, the data will be cut\n",
"- n_obs: defines the number of dimensions in the dataset\n",
"- with_obs: returns a subset of the dataset with only the given obs\n",
"- weights: event based weights\n",
"\n",
"Furthermore, `value` returns a Tensor with shape `(nevents, n_obs)`.\n",
"\n",
"To retrieve values, in general `z.unstack_x(data)` should be used; this returns a single Tensor with shape (nevents) or a list of tensors if `n_obs` is larger then 1."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"print(\n",
" f\"We have {data_normal.nevents} events in our dataset with the minimum of {np.min(data_normal.unstack_x())}\") # remember! The obs cut out some of the data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"pycharm": {
"name": "#%%\n"
}
},
"outputs": [],
"source": [
"data_normal.n_obs"
]
},
{
"cell_type": "markdown",
"metadata": {
Expand Down Expand Up @@ -362,7 +222,7 @@
"A `Parameter` (there are different kinds actually, more on that later) takes the following arguments as input:\n",
"`Parameter(human readable name, initial value[, lower limit, upper limit])` where the limits are recommended but not mandatory. Furthermore, `step_size` can be given (which is useful to be around the given uncertainty, e.g. for large yields or small values it can help a lot to set this). Also, a `floating` argument is supported, indicating whether the parameter is allowed to float in the fit or not (just omitting the limits does _not_ make a parameter constant).\n",
"\n",
"Parameters have a unique name. This is served as the identifier for e.g. fit results. However, a parameter _cannot_ be retrieved by its string identifier (its name) but the object itself should be used. In places where a parameter maps to something, the object itself is needed, not its name."
"The name of the parameter identifies it; therefore, while multiple parameters with the same name can exist, they cannot exist inside the same model/loss/function, as they would be ambiguous."
]
},
{
Expand All @@ -376,6 +236,7 @@
"outputs": [],
"source": [
"mu = zfit.Parameter('mu', 1, -3, 3, step_size=0.2)\n",
"another_mu = zfit.Parameter('mu', 2, -3, 3, step_size=0.2)\n",
"sigma_num = zfit.Parameter('sigma42', 1, 0.1, 10, floating=False)"
]
},
Expand Down Expand Up @@ -412,10 +273,7 @@
"name": "#%% md\n"
}
},
"source": [
"*PITFALL NOTEBOOKS: since the parameters have a unique name, a second parameter with the same name cannot be created; the behavior is undefined and therefore it raises an error.\n",
"While this does not pose a problem in a normal Python script, it does in a Jupyter-like notebook, since it is an often practice to \"rerun\" a cell as an attempt to \"reset\" things. Bear in mind that this does not make sense, from a logic point of view. The parameter already exists. Best practice: write a small wrapper, do not rerun the parameter creation cell or simply rerun the notebook (restart kernel & run all). For further details, have a look at the discussion and arguments [here](https://github.com/zfit/zfit/issues/186)*"
]
"source": []
},
{
"cell_type": "markdown",
Expand Down Expand Up @@ -779,7 +637,7 @@
"\n",
" nbins = 50\n",
"\n",
" lower, upper = data.v1.limits\n",
" lower, upper = data.space.v1.limits\n",
" x = znp.linspace(lower, upper, num=1000) # np.linspace also works\n",
" y = model.pdf(x) * size_normal / nbins * data.data_range.area()\n",
" y *= scale\n",
Expand Down Expand Up @@ -863,7 +721,7 @@
},
"outputs": [],
"source": [
"mass_obs = zfit.Space('mass', (0, 1000))"
"mass_obs = zfit.Space('mass', 0, 1000)"
]
},
{
Expand Down Expand Up @@ -897,7 +755,7 @@
"source": [
"# combinatorial background\n",
"\n",
"lam = zfit.Parameter('lambda', -0.01, -0.05, -0.001)\n",
"lam = zfit.Parameter('lambda', -0.01, -0.05, -0.00001)\n",
"comb_bkg = zfit.pdf.Exponential(lam, obs=mass_obs)"
]
},
Expand Down Expand Up @@ -1426,11 +1284,12 @@
"outputs": [],
"source": [
"values = z.unstack_x(data)\n",
"obs_right_tail = zfit.Space('mass', (700, 1000))\n",
"obs_right_tail = zfit.Space('mass', (550, 1000))\n",
"data_tail = zfit.Data.from_tensor(obs=obs_right_tail, tensor=values)\n",
"with comb_bkg.set_norm_range(obs_right_tail):\n",
" nll_tail = zfit.loss.UnbinnedNLL(comb_bkg, data_tail)\n",
" minimizer.minimize(nll_tail)"
"comb_bkg_right = comb_bkg.to_truncated(limits=obs_right_tail) # this gets the normalization right\n",
"nll_tail = zfit.loss.UnbinnedNLL(comb_bkg_right, data_tail)\n",
"result_sideband = minimizer.minimize(nll_tail)\n",
"print(result_sideband)"
]
},
{
Expand Down Expand Up @@ -1617,7 +1476,7 @@
},
"outputs": [],
"source": [
"result.hesse(method='minuit_hesse', name='hesse')"
"result.hesse(method='minuit_hesse', name='hesse') # these are the default values"
]
},
{
Expand Down Expand Up @@ -1654,7 +1513,7 @@
},
"outputs": [],
"source": [
"print(result.params)"
"print(result)"
]
},
{
Expand Down Expand Up @@ -1707,7 +1566,7 @@
},
"outputs": [],
"source": [
"print(result.params)"
"print(result)"
]
},
{
Expand Down
Loading

0 comments on commit c4ce783

Please sign in to comment.