Posted in : Machine Learning
Machine Learning - Basic Statistics
Essential tools for data analysis.
Probabilites
Sample Space
Def: A sample space is the set of all possible outcomes of a random experiment.
can be finite or infinite.
Examples:
- The set of all possible outcomes of a dice roll. .
- Pages of a book opened randomly. .
- Real numbers for temperature, location, time, etc. .
Events
Def: An event is a subset of the sample space .
can be finite or infinite.
Examples:
- \( A = \{1,2,3,4\} \)
- “Book open at an odd number page”
- \(a \leq A \leq b;a \in \mathbb{R}, b \in \mathbb{R}\)
Probability
Def: A probability is the chance that the event happens. is a function that maps the event onto the interval .
It can be viewed as the ratio of the subspace over the entire space of events .
Kolmogorov Axioms
- Non-negativity: for each event.
- Unit measure: \(P(\Omega) = 1 \).
- -additivity: For disjoint sets (events) , we have .
Consequences:
- \(P(\emptyset) = 0\)
- \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
- \(P(A^c) = 1 - P(A)\)
Random Variables
Def: A real valued random variable is a function of the outcome of a randomized experiment
Discrete Distributions
Bernoulli distribution:
The Bernoulli distribution is the probability distribution of a random variable which takes the value with success probability of and the value with failure probability of .
Binomial distribution:
The Binomial distribution is the discrete probability distribution of the number of successes in a sequence of independent Bernoulli experiment.
Continuous Distributions
Parameter Estimation:
Def: Evaluating how will the model fits the observed data.
Decision Theory:
Motivaion: Suppose we have an input value x together with a corresponding vector t of target variables, and our goal is to predict t given a new value of x. For regression problems , t will comprimise continuous variables, wherease for classification problems t will represent class labels.