Essential tools for data analysis.

Probabilites

Sample Space

Def: A sample space is the set of all possible outcomes of a random experiment.

can be finite or infinite.

Examples:

  1. The set of all possible outcomes of a dice roll. .
  2. Pages of a book opened randomly. .
  3. Real numbers for temperature, location, time, etc. .

Events

Def: An event is a subset of the sample space .

can be finite or infinite.

Examples:

  1. \( A = \{1,2,3,4\} \)
  2. “Book open at an odd number page”
  3. \(a \leq A \leq b;a \in \mathbb{R}, b \in \mathbb{R}\)

Probability

Def: A probability is the chance that the event happens. is a function that maps the event onto the interval .

It can be viewed as the ratio of the subspace over the entire space of events .

Kolmogorov Axioms

  1. Non-negativity: for each event.
  2. Unit measure: \(P(\Omega) = 1 \).
  3. -additivity: For disjoint sets (events) , we have .

Consequences:

  • \(P(\emptyset) = 0\)
  • \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
  • \(P(A^c) = 1 - P(A)\)

Random Variables

Def: A real valued random variable is a function of the outcome of a randomized experiment

Discrete Distributions

Bernoulli distribution:

The Bernoulli distribution is the probability distribution of a random variable which takes the value with success probability of and the value with failure probability of .

Binomial distribution:

The Binomial distribution is the discrete probability distribution of the number of successes in a sequence of independent Bernoulli experiment.

Continuous Distributions

Parameter Estimation:

Def: Evaluating how will the model fits the observed data.

Decision Theory:

Motivaion: Suppose we have an input value x together with a corresponding vector t of target variables, and our goal is to predict t given a new value of x. For regression problems , t will comprimise continuous variables, wherease for classification problems t will represent class labels.