At first, I said in my study intro that probability is harder than statistics since there aren’t any formulas at your disposal. Turns out there are some formulas, but you have to adjust them for each problem. Probability isn’t as plug-and-play as other branches of math. Of course if it was then we could practically predict the future, which somebody would have found by now.
It’s already hard to talk math. With probability being the most intuitive branch, it’s almost impossible to verbalize. I pity those taking this subject in formal settings where all of the worksheets are strictly numbers with little guidance on interpreting them
I’ve found this site that allows the student to approach probability from an interactive standpoint; funny how a form of teaching that actually engages the student is seen as unorthodox.
So I’ll go through this site’s path and do the best I can with the concepts. I won’t get too into details, this is just an overview of probability.
Chance Events
To start off, probability is an attempt to measure the likeliness of an event. I was wondering if there’s a difference between this and likelihood. Google says the former is general while the latter is specific. We’ll come back to this later
As we all know, the likelihood of a coin landing on either side is 50%.
What we need to consider is that this number is based on a large trial. Theoretically, if you flip a coin 100 times, it each side should win 50 times. Keyword: theoretically. There are few situations where we conduct a trial 100 times, so I don’t see how 50% is really anything more than an educated guess.
Expectation
I’m assuming this is the same as expected value. The site is defining it as long-run average and probability-weighted sum. I won’t overthink these terms for now, so we’ll go with expected value.
This is our attempt to find the center of a random variable’s distribution. The site uses a 6-sided die, claiming that the expected value should be 3.5.
As of now, this sounds like a more detailed version of chance events. It’s basically saying that if you roll the die a large number of times, 3.5 should be the average of all the numbers you get.
They give you a formula, but we couldn’t even try to unpack it as beginners, so we’ll come back to it.
Variance
Whereas expectation provides a measure of centrality, the variance of a random variable quantifies the spread of that random variable’s distribution. The variance is the average value of the squared difference between the random variable and its expectation.
As it says, this is basically how we measure the spread of a distribution; trying to determine how off-the-wall it gets.
Set Theory
This section didn’t do the best job at explaining the concept.
All I can say is that a set is pretty much what it sounds like it is, a set of variables.
Counting
An ordered set is a permutation.
An unordered set is a combination.
Conditional Probability
We make predictions based on previous information, otherwise it is speculation (a nicer way of saying “pulling it out of your ass”).
Say I have a friend who has 3 pairs of shoes, red, blue, and green. It’s one thing if I predict they’ll wear red shoes tomorrow, but if I predict they’ll wear them given the fact they’re wearing red shoes today, that is conditional probability.
Evaluating a conditional probability problem requires us to shrink our sample space so other information doesn’t affect it. For my shoe example, I would only look at today and maybe yesterday for previous information, instead of looking back on all the days I’ve known the friend.
I know it’s more complicated than this, I’m just playing along.
There’s a ‘Random Variable’ section, but it’s way too vague to take anything away from it
Discrete & Continuous
A discrete number is finite, or limited.
A continuous number is infinite, or unlimited
Past that I don’t know what this is, I’ll just list the concepts for both:
Discrete
- Bernoulli
- Binomial
- Geometric
- Poisson
- Negative binomial
Continuous
- Uniform
- Normal
- Student T
- Chi squared
- Exponential
- F
- Gamma
- Beta
Next section is on ‘Central Limit Theorem’, which based on the site’s explanation, I don’t see as any different from the expected value we just went over.
The section after that is on ‘Point Estimation’ which involves pi (π). This is meant to help us estimate an unknown parameter (or limit). Nothing more to say about it past that.
‘Confidence Intervals’ follows, with evaluating the limits WITHIN a parameter.
Bootstrap
Just another method of estimation through resampling. I’m dumb on this after that.
Introduced to another distribution called Fisher-Snedecor.
Next 3 sections are based in Bayesian Inference including ‘Bayes Theorem’, ‘Likelihood Function’, and ‘Prior to Posterior’
Regression Analysis
Ordinary least squares – no idea what this means
Correlation – finding the linear relationship between two variables
Analysis of Variance – testing wheter two or more groups have the same mean
What I Learned Today
Got off on a veeeeerrrrrryyyyy rough start with this. Once you pass basic coin flips and rolls of the die, this subject goes straight into Greek letters and algebraic equations. This is a true 0-100 subject.
My next post will just be me starting over on this subject. I’ll be going over the fundamental concepts in depth:
- Sets
- Random Variables
- Conditional
- Expected Value
- Bayes Theorem
- Variance
This should be enough to get me started. I got a feeling this will be the most intense subject I go through on my data science journey after programming.