Basics

Detector & statistics in a nutshell

  1. Statistical data analysis in a nutshell
    The description of measurement processes relies on a basic understanding of random processes and the statistical interpretation of experimental data. Hence, in this section we shall briefly discuss some basic concepts of random processes and statistical data analysis.

    PROBABILITY

    One of the central concepts of statistical data analysis is probability. Probability is either interpreted as limiting relative frequency:
    probability
    This definition is the basis of so-called frequentist or classical statistics and assumes that an experiment is at least in principle repeatable.
    Or it is interpreted in a more general sense as the degree of belief (subjective or Bayesian probability).
    Ex. 1: P(Morgen geht die Sonne auf) = 1
    Ex. 2:
    bayesprobability
    is understood as the (a-posteriori) probability that a certain theory is true after having measured a certain set of data. According to Bayes theorem it is given by the (a-priori) probability (degree of believe) that the theory is true times the probability to observe this set of data if the theory is indeed true.
    In the following we will make use only of classical statistics if not stated otherwise.

    PROBABILITY DISTRIBUTIONS

    Measurements deal with random processes: either the process under consideration is random (e.g. the number of decays within a bunch of unstable particle within a time interval T) or the measurement procedure contains random (or uncontrollable) errors. The outcome of a measurement are hence considered as a random variable with a corresponding probability distribution.
    There are two types of probability distributions, namely for:
    a) Discrete random variables: e.g. the probability to observe N events in a counting experiment is given by a positive function P(N).
    b) Continuous random variables: For continuous random variables the probability to observe a certain result x is exactly zero: P(x)=0. Instead we are using here a probability density function (p.d.f.) f(x) which quantifies the probability to observe x lying in an interval [x,x+dx]:
    probabilityfrompdf
    In case of more than one random variable we have to consider joint p.d.f.'s. E.g. in two dimensions, the probability for x to lie in an interval [x,x+dx] and for y to lie in an interval [y,y+dy] is given by
    probabilityfrom2dpdf
    If the two random variables x and y are independent then the joint p.d.f. factorizes: f(x,y)=g(x)h(y).

    CUMULATIVE DISTRIBUTIONS

    The integral of a p.d.f. f(x) up to a certain value x is called the cumulative distribution
    cumulativedistribution
    Hence
    probabilityfromcumulativedistribution
    The last condition reflects the fact that any probability distribution is normalized to 1.

    EXPECTATION VALUES

    The expectation value E[x] (also called mean value) of a random variable x with corresponding p.d.f. f(x) is defined as
    expectationvalue
    More generally, the n-th algebraic moment of x is defined as the following expectation value
    nthalgebraicmoment
    The second central moment, the variance, measures the spread of the random variable x around its mean value.
    variance
    The square root of the variance is called the standard deviation.
    If we consider e.g. two random variables the generalization of the variance is the covariance
    covariance
    The covariance is a measure of the correlation between two random variables. If two variables are independent then they are also uncorrelated. However, two variables may be uncorrelated but are not independent.
    With these definitions in hand we can also give the error propagation formula. If y is a function of random variables x=(x1,x2) then mean and variance of y can be expressed by the mean values and variances in x as follows:
    errorpropagation


    FUNCTIONS OF RANDOM VARIABLES

    If x is a random variable the function a(x) is also a random variable. If the p.d.f. for x is given by f(x) the p.d.f. g(a) for the random variable a(x) is given by the transformation formula:
    transformationformula
    In the more general case of a mapping of a n-dimensional random vector onto a n-dimensional random vector the last term in the transformation formula is given by the determinant of the Jacobian matrix.

    SPECIFIC PROBABILITY DISTRIBUTIONS

    a) Binomial distribution
    Suppose there are two distinct outcomes of an experiment ('Kopf oder Zahl') with probabilities P(Kopf)=p, P(Zahl)=1-p and we repeat the experiment N times. The probability to obtain r times 'Kopf' is then given by:
    binomial
    The mean value and variance for the binomial distribution read
    binomialmeanvariance


    b) Poissonian distribution
    If in the binomial distribution the probability of a single event becomes small and the number of trials becomes large so that μ=Np remains finite then the binomial distribution approaches a Poisson distribution which is described by one single parameter μ:
    poissoniandistribution
    The parameter μ(=Np) has the meaning of the mean value and the standard deviation at the same time.
    Example: Radioactive Cs(137) nuclei have a half-life of 27 years. The decay probability per unit time for a single nucleus is then λ=ln2/27 years ≈ 8.2 10-10 1/s. In e.g. 1 μg Cs(137) we have N=1015 nuclei (= trials). Therefore we expect μ=N λ ≈ 8.2 105 decays/s and the number of observed events is distributed according to a Poissonian distribution with parameter μ. Similar arguments apply to particle scattering.

    c) Gaussian distribution and Central Limit Theorem
    The Gaussian distribution for a continuous random variable x is characterized by two variables which represent the mean value and the variance of the distribution:
    gaussiandistribution

    The Gaussian distribution plays a central role in statistical data analysis due to the 'Central Limit Theorem': If xi are random variables with p.d.f.'s fi(xi), mean values μi and finite variances σi² then the sum s=Σi xi for large i is a random variable with Gaussian p.d.f. G(s;s0,σ>²) with mean s0 = Σi μi and variance σ² = Σi σi².
    Consequences:
    a) In the limit of large r the Binomial (Poisson) distribution becomes a Gaussian distribution.
    b) If a measurement is influenced by a sum of many random errors of similar size the result of the measurement is distributed according to a Gaussian distribution.

    d) χ² distribution
    The χ² distribution derives its importance from the fact that a sum of squares of independent Gaussian distributed random variables divided by their variances
    sumofsquares
    are χ²-distributed:
    chisquaredistribution
    where the parameter n is called the 'number of degrees of freedom'. The function Γ(x) is the generalisation of the factorial:
    gammafunction
    The mean and the variance of the χ² distribution is n and 2n, respectively. The χ² distribution can be used in tests of goodness-of-fit in least squares fits.

    e) Breit-Wigner distribution
    If a particle is unstable, i.e. its lifetime is finite, its energy (mass) x has a not one well-defined value but is spread according to a Breit-Wigner distribution
    breitwignerdistribution
    Please note that the mean value of the Breit-Wigner distribution is not defined in the strict sense. Please note also that the variance and higher moments of the Breit-Wigner distribution are divergent as well. Nevertheless, the parameter x0 describes the peak position and the parameter Γ describes the full-width of the peak at half maximum.

    f) Exponential distribution
    The proper decay times t for unstable particles with lifetime τ are distributed according to the p.d.f.:
    exponentialdistribution
    The mean value and the standard deviation are given by the lifetime parameter τ.

    g) Uniform distribution
    A very important p.d.f. for practical purposes is the uniform p.d.f.:
    uniformdistribution

    The mean value and variance for the uniform distribution are
    meanvarianceuniformdistribution
    A widely used application of the uniform distribution is the generation of pseudo-random numbers according to arbitrary p.d.f.'s f(x) using Monte Carlo techniques. One of these methods is called the transformation method and is based on the following fact:
    Starting from a random variable x with p.d.f. f(x) we define a new random variable y=F(x), given by the cumulative distribution of f(x). Independent of f(x) the new variable y is uniformly distributed between 0 and 1!

    PARAMETER ESTIMATION FROM DATA

    So far we have assumed that we have a model for random process, that is a p.d.f. for a random variable x depending on a parameter θ. For a certain parameter value of θ we can then calculate the probability to find the variable x in a given interval. In the following we consider statistical data analysis, also called statistical inference, that is, we have measured some data x and our aim is now to estimate the underlying parameter θ from the measured data x.

    a) Maximum Likelihood Method
    We consider a p.d.f. f(x|θ) in the random variable x which depends on the (a-priori unknown) parameter θ. We now determine a set of n measurements x=(x1, ..., xn). As the n measurements are independent the probability to observe exactly this set of measurement if the true parameter value is θ is given by L(x|θ)=f(x1|θ)...f(xn|θ) dx1...dxn. In the following considerations we can remove dx1...dxn. The so-called likelihood L(θ|x) after having measured x is then a function in the unknown parameter θ. Please note that L is not a p.d.f.!

    The best estimation of the parameter θ is then given by the maximum of the likelihood function (Maximum Likelihood Method) resulting in the most likely value for the parameter θ. To find the Maximum Likelihood we have to solve dL/dθ=0 or, often more conveniently, d(ln L)/dθ=0 resulting in the solution θest.

    In the large sample limit (n large) the likelihood is a Gaussian function. In this case the interval [θest-σ,θest+σ] covers the true value of the parameter θ with confidence 68 percent. In other words: if the experiment were repeated many times the interval constructed from the likelihood function in this way would cover the true value in 68 percent of the experiments.


    b) Curve fitting (Least squares fits or χ² fits)
    In the limit of large statistics the Maximum Likelihood Method is identical to the method of least squares. Suppose we measure n data points yi with errors sigmai depending on the data points xi and y is supposed to be a function of x, y=f(x;θ), depending on m a-priori unknown parameters θ=(θ1,...,θm). Our aim is to estimate θ from the data. For this purpose we build: To find the values for θ we set: If the hypothesis (y=f(x;θ)) is correct, the errors are Gaussian distributed and well-estimated the function S² is distributed according to a χ² distribution with n-m degrees of freedom. If the χ² value found in the fit is much larger than its expectation value this is a hint that either the hypothesis of the fitting model is wrong or that the errors are underestimated.

    HYPOTHESIS TESTING




  2. Basic detector concepts
  3. Problems