Statistical data analysis in a nutshell
The description of measurement processes relies on a basic understanding
of random processes and the statistical interpretation of experimental
data. Hence, in this section we shall briefly discuss some basic concepts
of random processes and statistical data analysis.
PROBABILITY
One of the central concepts of statistical data analysis is probability.
Probability is either interpreted as limiting relative frequency:
This definition is the basis of so-called frequentist or classical statistics
and assumes that an experiment is at least in principle repeatable.
Or it is interpreted in a more general sense as the degree of belief
(subjective or Bayesian probability).
Ex. 1: P(Morgen geht die Sonne auf) = 1
Ex. 2:
is understood as the (a-posteriori) probability that a certain theory is true
after having measured a certain set of data. According to Bayes theorem it is
given by the (a-priori) probability (degree of believe) that the theory is true
times the probability to observe this set of data if the theory is indeed true.
In the following we will make use only of classical statistics if not stated
otherwise.
PROBABILITY DISTRIBUTIONS
Measurements deal with random processes: either the process under consideration
is random (e.g. the number of decays within a bunch of unstable particle within a
time interval T) or the measurement procedure contains random (or uncontrollable)
errors. The outcome of a measurement are hence considered as a random variable
with a corresponding probability distribution.
There are two types of probability distributions, namely for:
a) Discrete random variables:
e.g. the probability to observe N events in a counting experiment is given by
a positive function P(N).
b) Continuous random variables:
For continuous random variables the probability to observe a certain
result x is exactly zero: P(x)=0.
Instead we are using here a probability density function (p.d.f.) f(x)
which quantifies the probability to observe x lying in an interval [x,x+dx]:
In case of more than one random variable we have to consider joint p.d.f.'s.
E.g. in two dimensions, the probability for x to lie in an interval [x,x+dx]
and for y to lie in an interval [y,y+dy] is given by
If the two random variables x and y are independent then the joint p.d.f.
factorizes: f(x,y)=g(x)h(y).
CUMULATIVE DISTRIBUTIONS
The integral of a p.d.f. f(x) up to a certain value x is called the
cumulative distribution
Hence
The last condition reflects the fact that any probability distribution is
normalized to 1.
EXPECTATION VALUES
The expectation value E[x] (also called mean value) of a random variable x with corresponding
p.d.f. f(x) is defined as
More generally, the n-th algebraic moment of x is defined as the following expectation value
The second central moment, the variance, measures the spread of the random variable x
around its mean value.
The square root of the variance is called the standard deviation.
If we consider e.g. two random variables the generalization of the variance
is the covariance
The covariance is a measure of the correlation between two random variables.
If two variables are independent then they are also uncorrelated. However,
two variables may be uncorrelated but are not independent.
With these definitions in hand we can also give the error propagation formula.
If y is a function of random variables x=(x1,x2) then
mean and variance of y can be expressed by the mean values and variances in x
as follows:
FUNCTIONS OF RANDOM VARIABLES
If x is a random variable the function a(x) is also a random variable.
If the p.d.f. for x is given by f(x) the p.d.f. g(a) for the random variable
a(x) is given by the transformation formula:
In the more general case of a mapping of a n-dimensional random vector onto
a n-dimensional random vector the last term in the transformation formula
is given by the determinant of the Jacobian matrix.
SPECIFIC PROBABILITY DISTRIBUTIONS
a) Binomial distribution
Suppose there are two distinct outcomes of an experiment ('Kopf oder Zahl')
with probabilities P(Kopf)=p, P(Zahl)=1-p
and we repeat the experiment N times. The probability to obtain r times
'Kopf' is then given by:
The mean value and variance for the binomial distribution read
b) Poissonian distribution
If in the binomial distribution the probability of a single event becomes small
and the number of trials becomes large so that μ=Np remains finite then the
binomial distribution approaches a Poisson distribution which is described by
one single parameter μ:
The parameter μ(=Np) has the meaning of the mean value and the standard
deviation at the same time.
Example:
Radioactive Cs(137) nuclei have a half-life of 27 years. The decay probability
per unit time for a single nucleus is then λ=ln2/27 years ≈
8.2 10-10 1/s.
In e.g. 1 μg Cs(137) we have N=1015 nuclei (= trials). Therefore
we expect μ=N λ ≈ 8.2 105 decays/s and the number
of observed events is distributed according to a Poissonian distribution with
parameter μ. Similar arguments apply to particle scattering.
c) Gaussian distribution and Central Limit Theorem
The Gaussian distribution for a continuous random variable x is characterized
by two variables which represent the mean value and the variance of the
distribution:
The Gaussian distribution plays a central role in statistical data
analysis due to the 'Central Limit Theorem':
If xi are random variables with p.d.f.'s fi(xi),
mean values μi and finite variances σi²
then the sum s=Σi xi for large i is a random variable with
Gaussian p.d.f. G(s;s0,σ>²)
with mean s0 = Σi μi
and variance σ² = Σi σi².
Consequences:
a) In the limit of large r the Binomial (Poisson) distribution
becomes a Gaussian distribution.
b) If a measurement is influenced by a sum of many random errors of similar size
the result of the measurement is distributed according to a Gaussian distribution.
d) χ² distribution
The χ² distribution derives its importance from the fact that
a sum of squares of independent Gaussian distributed random variables
divided by their variances
are χ²-distributed:
where the parameter n is called the 'number of degrees of freedom'.
The function Γ(x) is the generalisation of the factorial:
The mean and the variance of the χ² distribution is n and 2n, respectively.
The χ² distribution can be used in tests of goodness-of-fit
in least squares fits.
e) Breit-Wigner distribution
If a particle is unstable, i.e. its lifetime is finite, its energy (mass) x
has a not one well-defined value but is spread according to a Breit-Wigner
distribution
Please note that the mean value of the Breit-Wigner distribution is not
defined in the strict sense.
Please note also that the variance and higher moments of the Breit-Wigner
distribution are divergent as well.
Nevertheless, the parameter x0 describes the peak position and
the parameter Γ describes the full-width of the peak at half maximum.
f) Exponential distribution
The proper decay times t for unstable particles with lifetime τ are distributed
according to the p.d.f.:
The mean value and the standard deviation are given by the lifetime parameter τ.
g) Uniform distribution
A very important p.d.f. for practical purposes is the uniform p.d.f.:
The mean value and variance for the uniform distribution are
A widely used application of the uniform distribution is the generation of
pseudo-random numbers according to arbitrary p.d.f.'s f(x) using Monte Carlo
techniques. One of these methods is called the transformation method and is
based on the following fact:
Starting from a random variable x with p.d.f. f(x) we define a new random
variable y=F(x), given by the cumulative distribution of f(x). Independent
of f(x) the new variable y is uniformly distributed between 0 and 1!
PARAMETER ESTIMATION FROM DATA
So far we have assumed that we have a model for random process, that is
a p.d.f. for a random variable x depending on a parameter θ. For
a certain parameter value of θ we can then calculate the probability
to find the variable x in a given interval.
In the following we consider statistical data analysis, also called statistical
inference, that is, we have measured some data x and our aim is now to estimate
the underlying parameter θ from the measured data x.
a) Maximum Likelihood Method
We consider a p.d.f. f(x|θ) in the random variable x which depends
on the (a-priori unknown) parameter θ. We now determine a set of n
measurements x=(x1, ..., xn).
As the n measurements are independent the probability to observe exactly
this set of measurement if the true parameter value is θ is given by
L(x|θ)=f(x1|θ)...f(xn|θ) dx1...dxn.
In the following considerations we can remove dx1...dxn.
The so-called likelihood L(θ|x) after having measured x
is then a function in the unknown parameter θ. Please note that
L is not a p.d.f.!
The best estimation of the parameter θ is then given by the maximum
of the likelihood function (Maximum Likelihood Method) resulting in the
most likely value for the parameter θ.
To find the Maximum Likelihood we have to solve dL/dθ=0 or, often more
conveniently, d(ln L)/dθ=0 resulting in the solution θest.
In the large sample limit (n large) the likelihood is a Gaussian function.
In this case the interval [θest-σ,θest+σ]
covers the true value of the parameter θ with confidence 68 percent.
In other words: if the experiment were repeated many times the interval
constructed from the likelihood function in this way would cover the true
value in 68 percent of the experiments.
b) Curve fitting (Least squares fits or χ² fits)
In the limit of large statistics the Maximum Likelihood Method is identical
to the method of least squares.
Suppose we measure n data points yi with errors sigmai
depending on the data points xi and y is supposed to be a function
of x, y=f(x;θ), depending on m a-priori unknown parameters
θ=(θ1,...,θm). Our aim is to estimate
θ from the data.
For this purpose we build:
To find the values for θ we set:
If the hypothesis (y=f(x;θ)) is correct, the errors are Gaussian
distributed and well-estimated the function S² is distributed according to
a χ² distribution with n-m degrees of freedom.
If the χ² value found in the fit is much larger than its expectation
value this is a hint that either the hypothesis of the fitting model is wrong
or that the errors are underestimated.
HYPOTHESIS TESTING