SimpleDetector3

Basics

Detector & statistics in a nutshell

Statistical data analysis in a nutshell
The description of measurement processes relies on a basic understanding of random processes and the statistical interpretation of experimental data. Hence, in this section we shall briefly discuss some basic concepts of random processes and statistical data analysis.

PROBABILITY
One of the central concepts of statistical data analysis is probability. Probability is either interpreted as limiting relative frequency: This definition is the basis of so-called frequentist or classical statistics and assumes that an experiment is at least in principle repeatable.
Or it is interpreted in a more general sense as the degree of belief (subjective or Bayesian probability).
Ex. 1: P(Morgen geht die Sonne auf) = 1
Ex. 2: is understood as the (a-posteriori) probability that a certain theory is true after having measured a certain set of data. According to Bayes theorem it is given by the (a-priori) probability (degree of believe) that the theory is true times the probability to observe this set of data if the theory is indeed true.
In the following we will make use only of classical statistics if not stated otherwise.

PROBABILITY DISTRIBUTIONS
Measurements deal with random processes: either the process under consideration is random (e.g. the number of decays within a bunch of unstable particle within a time interval T) or the measurement procedure contains random (or uncontrollable) errors. The outcome of a measurement are hence considered as a random variable with a corresponding probability distribution.
There are two types of probability distributions, namely for:
a) Discrete random variables: e.g. the probability to observe N events in a counting experiment is given by a positive function P(N).
b) Continuous random variables: For continuous random variables the probability to observe a certain result x is exactly zero: P(x)=0. Instead we are using here a probability density function (p.d.f.) f(x) which quantifies the probability to observe x lying in an interval [x,x+dx]: In case of more than one random variable we have to consider joint p.d.f.'s. E.g. in two dimensions, the probability for x to lie in an interval [x,x+dx] and for y to lie in an interval [y,y+dy] is given by If the two random variables x and y are independent then the joint p.d.f. factorizes: f(x,y)=g(x)h(y).

CUMULATIVE DISTRIBUTIONS
The integral of a p.d.f. f(x) up to a certain value x is called the cumulative distribution Hence The last condition reflects the fact that any probability distribution is normalized to 1.

EXPECTATION VALUES
The expectation value E[x] (also called mean value) of a random variable x with corresponding p.d.f. f(x) is defined as More generally, the n-th algebraic moment of x is defined as the following expectation value The second central moment, the variance, measures the spread of the random variable x around its mean value. The square root of the variance is called the standard deviation.
If we consider e.g. two random variables the generalization of the variance is the covariance The covariance is a measure of the correlation between two random variables. If two variables are independent then they are also uncorrelated. However, two variables may be uncorrelated but are not independent.
With these definitions in hand we can also give the error propagation formula. If y is a function of random variables x=(x₁,x₂) then mean and variance of y can be expressed by the mean values and variances in x as follows:

FUNCTIONS OF RANDOM VARIABLES
If x is a random variable the function a(x) is also a random variable. If the p.d.f. for x is given by f(x) the p.d.f. g(a) for the random variable a(x) is given by the transformation formula: In the more general case of a mapping of a n-dimensional random vector onto a n-dimensional random vector the last term in the transformation formula is given by the determinant of the Jacobian matrix.

SPECIFIC PROBABILITY DISTRIBUTIONS
a) Binomial distribution
Suppose there are two distinct outcomes of an experiment ('Kopf oder Zahl') with probabilities P(Kopf)=p, P(Zahl)=1-p and we repeat the experiment N times. The probability to obtain r times 'Kopf' is then given by: The mean value and variance for the binomial distribution read

b) Poissonian distribution
If in the binomial distribution the probability of a single event becomes small and the number of trials becomes large so that μ=Np remains finite then the binomial distribution approaches a Poisson distribution which is described by one single parameter μ: The parameter μ(=Np) has the meaning of the mean value and the standard deviation at the same time.
Example: Radioactive Cs(137) nuclei have a half-life of 27 years. The decay probability per unit time for a single nucleus is then λ=ln2/27 years ≈ 8.2 10^-10 1/s. In e.g. 1 μg Cs(137) we have N=10¹⁵ nuclei (= trials). Therefore we expect μ=N λ ≈ 8.2 10⁵ decays/s and the number of observed events is distributed according to a Poissonian distribution with parameter μ. Similar arguments apply to particle scattering.

c) Gaussian distribution and Central Limit Theorem
The Gaussian distribution for a continuous random variable x is characterized by two variables which represent the mean value and the variance of the distribution:
The Gaussian distribution plays a central role in statistical data analysis due to the 'Central Limit Theorem': If x_i are random variables with p.d.f.'s f_i(x_i), mean values μ_i and finite variances σ_i² then the sum s=Σ_i x_i for large i is a random variable with Gaussian p.d.f. G(s;s₀,σ>²) with mean s₀ = Σ_i μ_i and variance σ² = Σ_i σ_i².
Consequences:
a) In the limit of large r the Binomial (Poisson) distribution becomes a Gaussian distribution.
b) If a measurement is influenced by a sum of many random errors of similar size the result of the measurement is distributed according to a Gaussian distribution.

d) χ² distribution
The χ² distribution derives its importance from the fact that a sum of squares of independent Gaussian distributed random variables divided by their variances are χ²-distributed: where the parameter n is called the 'number of degrees of freedom'. The function Γ(x) is the generalisation of the factorial: The mean and the variance of the χ² distribution is n and 2n, respectively. The χ² distribution can be used in tests of goodness-of-fit in least squares fits.

e) Breit-Wigner distribution
If a particle is unstable, i.e. its lifetime is finite, its energy (mass) x has a not one well-defined value but is spread according to a Breit-Wigner distribution Please note that the mean value of the Breit-Wigner distribution is not defined in the strict sense. Please note also that the variance and higher moments of the Breit-Wigner distribution are divergent as well. Nevertheless, the parameter x₀ describes the peak position and the parameter Γ describes the full-width of the peak at half maximum.

f) Exponential distribution
The proper decay times t for unstable particles with lifetime τ are distributed according to the p.d.f.: The mean value and the standard deviation are given by the lifetime parameter τ.

g) Uniform distribution
A very important p.d.f. for practical purposes is the uniform p.d.f.:
The mean value and variance for the uniform distribution are A widely used application of the uniform distribution is the generation of pseudo-random numbers according to arbitrary p.d.f.'s f(x) using Monte Carlo techniques. One of these methods is called the transformation method and is based on the following fact:
Starting from a random variable x with p.d.f. f(x) we define a new random variable y=F(x), given by the cumulative distribution of f(x). Independent of f(x) the new variable y is uniformly distributed between 0 and 1!

PARAMETER ESTIMATION FROM DATA
So far we have assumed that we have a model for random process, that is a p.d.f. for a random variable x depending on a parameter θ. For a certain parameter value of θ we can then calculate the probability to find the variable x in a given interval. In the following we consider statistical data analysis, also called statistical inference, that is, we have measured some data x and our aim is now to estimate the underlying parameter θ from the measured data x.

a) Maximum Likelihood Method
We consider a p.d.f. f(x|θ) in the random variable x which depends on the (a-priori unknown) parameter θ. We now determine a set of n measurements x=(x₁, ..., x_n). As the n measurements are independent the probability to observe exactly this set of measurement if the true parameter value is θ is given by L(x|θ)=f(x₁|θ)...f(x_n|θ) dx₁...dx_n. In the following considerations we can remove dx₁...dx_n. The so-called likelihood L(θ|x) after having measured x is then a function in the unknown parameter θ. Please note that L is not a p.d.f.!

The best estimation of the parameter θ is then given by the maximum of the likelihood function (Maximum Likelihood Method) resulting in the most likely value for the parameter θ. To find the Maximum Likelihood we have to solve dL/dθ=0 or, often more conveniently, d(ln L)/dθ=0 resulting in the solution θ_est.

In the large sample limit (n large) the likelihood is a Gaussian function. In this case the interval [θ_est-σ,θ_est+σ] covers the true value of the parameter θ with confidence 68 percent. In other words: if the experiment were repeated many times the interval constructed from the likelihood function in this way would cover the true value in 68 percent of the experiments.

b) Curve fitting (Least squares fits or χ² fits)
In the limit of large statistics the Maximum Likelihood Method is identical to the method of least squares. Suppose we measure n data points y_i with errors sigma_i depending on the data points x_i and y is supposed to be a function of x, y=f(x;θ), depending on m a-priori unknown parameters θ=(θ₁,...,θ_m). Our aim is to estimate θ from the data. For this purpose we build: To find the values for θ we set: If the hypothesis (y=f(x;θ)) is correct, the errors are Gaussian distributed and well-estimated the function S² is distributed according to a χ² distribution with n-m degrees of freedom. If the χ² value found in the fit is much larger than its expectation value this is a hint that either the hypothesis of the fitting model is wrong or that the errors are underestimated.

HYPOTHESIS TESTING
Basic detector concepts
Problems