Only two days until exam 1! Tomorrow is the last topic before it, and the day after I'll be writing about tips for your CAS calculator - so if anybody wants a particular topic discussed, now's your chance to ask!! (and I promise tomorrow and wednesday will actually be before 6...)
Okay, so sampling. Yesterday we discussed probability in terms of trying to assign numbers to things, and we also discussed that if you try to repeat a probabilistic experiment, you won't get the same thing each time. This should be common sense, but if you're confused by it, just spend an afternoon flipping a coin.
So, what does this have to do with statistics? Well, as with all maths, the answer to more complicated questions (such as who voted for which political group) can often be solved by first understanding much easier questions.
Let's say I wanted to know the proportion of times I bought chocolate milk in my coin example from yesterday. Over a period of 10 days, I got
fat 7 cartons of chocolate milk. This would mean that the proportion (think percentage, but out of 1) of milk that was chocolate is 7/10. This is easy for a human to do, but what about a computer? We know that if we were to get a computer to do this, it wouldn't be able to do much harder questions unless we personally figured out all the proportions ourselves and fed them to the computer. So, instead, we'll approach the problem this way:
Just like yesterday, where we said chocolate was 1 and vanilla was 0, let's do that again. We can actually now model this as a random variable - even better, we can model it as a
binomial random variable, where n is the amount of times we buy milk. Here's the thing, though - we don't actually know what the probability of a success is! If our coin was fair, it would be 0.5 - but we're no longer talking about a made-up, probabilistic scenario. We're talking about a real, statistical scenario. The difference is this:
In probability, you start with a distribution, and use that to see what might actually occur in real life.
In statistics, you see what actually occurred in real life, and you try to find the distribution it came from.
Since we only have two possible outcomes (either I get chocolate or vanilla milk), we know the type of distribution - binomial. To find out what the distribution is, we need to know its two parameters - n (number of trials) and p (probability of success). In probability, this is known - and we'll often refer to the made-up, probability scenario as the population. The statistical experiment is then referred to as the sample. The aim of sampling is simple - can we predict the population parameters from the sample statistics?
Okay, so back to our example - we know the type of distribution, we know the first parameter is n=10. So, can we predict the probability of success? One guess you might make is that the probability of success is the same as the proportion, which we could then calculate like so:
Now, there's a lot in this equation, believe it or not. So, some points: firstly, P-hat is capital because it's ALSO random. We call this a sampling distribution, which is called that way because P can change every time you do an experiment. For example, let's say while watching myself buy milk, you also buy milk yourself using the same method (I even give the you my own coin when I'm done, to make things equally fair). You might buy chocolate milk 4/10 times instead of 7/10 - in this case, when we both calculate P-hat, even though we've used the same formula, we got a different result. The next point - the hat means that P is an estimator of the population parameter p. This is how we differentiate between a population parameter and a sample statistic. Finally, note that X is a random variable, and corresponds to the amount of successes we have. This goes back to remembering that the capital X, after an experiment, will itself be a number - which is a point that's often forgotten.
Now, I mentioned that P-hat is a sampling distribution - which should mean that it can also be modeled with a probabilistic distribution, right? Because it can - in fact, using some scary maths from specialist, I can tell you that P-hat has a normal distribution, with the following parameters:
You can prove this yourself if you want - it's not actually that hard, you just gotta remember that X is binomial. If you do specialist, I recommend giving the proof a shot - it's not hard, and very much within the realm of things you could be asked.
Okay, so now that we know this, let's get back to trying to predict the population parameter, p. Before we got two estimates, 0.7 (from my experiment) and 0.4 (from your experiment). These are both wildly different - which is annoying. We can make them get closer by doing more experiments - let's say we do 100. This time, you might get 0.48 and I might get 0.61. This is still wildly different, but much closer. We could up this to 1000 experiments, and get 0.499 and 0.564, but as you can see, we very quickly get to a point where we have to do way too many experiments to get these values to be the same. However, they are still useful to us - we often call them point estimates, and give them the notation of little p-hat. We'll come back to them in a second.
So, to get some useful numbers, what we do is we consider the interval of points it's likely to be. The reason we might want to do this depends on the application - for example, let's say you're digging for gold, and due to budgetary reasons, you should only dig in an area in which 20% of it consists of gold. Even more, let's say we do an experiment (that doesn't involve digging, but more complicated things), and it gives us a point estimate of little p-hat=0.4 after 25 replications. This point-estimate looks like it'll be enough, but what if another experiment gave little p-hat=0.1? We need to know with more certainty that the first experiment is reliable, and we've very well established that we can't trust point estimates alone.
Well, let's think of it this way, instead - these values came from a probability of a continuous random variable (big P-hat). So, what if we instead consider a probability - in particular:
What this equation tells us is what the probability (alpha) is of big P-hat being in the range of a and b. So, if big P-hat is between 0.2 and 1 95% of the time, this might be enough for us to decide to dig in that area. Yes, this IS what people do in real-life - countless experiments have demonstrated that this is actually not a bad way of trying to make these kinds of decisions. However, normally, we don't put in values for a and b and then find alpha - the reasons as for why, I'm not entirely sure, probably a historical thing - although, there are definitely situations where starting from alpha is more beneficial.
Okay, so, let's think about this - we know that big P-hat is normally distributed. So, to figure out what those bounds could be (and find this magical interval that I keep talking about), let's try starting with the standard normal distribution and seeing what I can get to with some algebraic changes:
An explanation: in the second line, I multiplied by that bit square-root thing to make the standard deviation the same as big P-hat, and then added p to make the mean the same as big P-hat, and the last line just makes the middle bit big P-hat. THIS IS BEYOND METHODS. DON'T WORRY IF YOU DON'T UNDERSTAND IT.
I just want to get to the point that we now have an interval in a probability - that interval is:
This, mathematically, is the interval of big P-hat that gives a probability of alpha - and we call this a confidence interval. Normally, all we care about is the 95% confidence interval, which comes from when z=1.96. If you wanted to do another interval, say a 68% confidence interval, you'd have to pick the z-values from the standard normal distribution that would give this percentage. Now, we don't have a lot of precedent to go off of, but it looks like VCAA will always just tell you the z-value straight up. If you want to cover your bases, you can use the following formula in your CAS calculator to find the z-value:
where invnorm is the inverse normal function.
So, back to our confidence interval. We're almost done, but there's one small problem with it - it uses a population parameter, which is what we've been trying to estimate this whole time! This is where the point estimate comes in - because we're now dealing with an interval, even though the point estimates gave a margin of error, that margin is reduced. In fact, it's reduced to the point that it's not as significant - so, what we do, is substitute our population parameter with our sample statistic. So, if we were to calculate the confidence interval from our milk buying situation from earlier, we'd get:
Now, you should be asked how many decimal points to put this to - if not, just pick something reasonable. I've gone with three.
Now, this results makes a lot more sense all of a sudden - and it very obviously explains why our coin gave such a wide variety of results - from only 10 tosses, 95% of the time our proportion is going to be in this range. The range gets smaller if we do more tosses - feel free to calculate it and get a feel from it all.
And let's also consider our mining example earlier, just so we can see how to make a conclusion from this interval. We ran our experiment 25 times and got a little p-hat of 0.4, and we want it to be above 0.2 95% of the time. So, doing our calculation gives the 95% confidence interval to be:
Since numbers below 0.2 are not in this interval, we know that it's safe to dig for gold in this area. HOWEVER, if we had instead gotten 0.1 as our point estimate:
Now, we did get a negative number - leave it in there, otherwise the interval doesn't correspond to 95% probability. The reason we got a negative number is because we're basing it off a normal distribution - even though it's nonsensical for p-hat to take on a value smaller than 0 or greater than 1. It seems weird, but trust me - the maths is sound.
This time, numbers less than 0.2 are in our interval - so, it's not safe to dig for gold.
There's only final point I want to make - you'll notice that the first gold example has a range smaller than the milk example. Hopefully this can help you appreciate just how big of a difference just a few more experiments can make. Since n is under a square-root, this difference becomes much smaller every unit as n get massive (as an example, the range decrease from n=1 to n=4 is the same range decrease from n=4 to n=9, even though we've gone up an extra two experiments). The other thing you'll notice is that the second gold example has an even smaller range than the first gold example. This isn't because our point estimate is smaller - in fact, the range for p-hat=0.1 is the same as the range for p-hat=0.9. The key here is that point estimates further from 0.5 give smaller ranges than point estimates closer to 0.5
---
Reminder, tomorrow's the last lesson before exam 1, so if you want something covered, get it in quick!