Double-post, but it makes sense to put this thread back on the radar with an update this big, so I'ma do it and let the mods tear me a new one if they don't think it is okay
Thanks to another generous user, I now have another 6 subjects to play around and test with! They also did Japanese, which hasn't been requested yet but I wanted to test for still, so I added Japanese SL to the calculator. The results were... Interesting. The good news is, the calculator is essentially operating within precision - or at least to a level I'm satisfied with. So if you're not interested in the lengthy technical discussion, you can stop here
So, without revealing who this person is and what scores they got, they did quite well. As in, subjects with only A+ grades, quite well, though they unfortunately did not have their exact grades to test with - but this isn't that big a deal, and I'm more interested in just the letter grades for testing ANYWAY. And what I discovered is that the calculator is actually OVERestimating your score at the top end a little bit.
So for my previous test, I found that even if I took the extreme marks for the letter grades, I still got the person's study score within 20% of variation of the z-score - currently, the calculator uses a 25% variation buffer for the z-score. In this case, on the lower-side of the marks, they were within 20% - but on the higher-side (which essentially equated to all 100%s), the calculator was over-estimating the score, and I'd have to adjust the variation by up to 50% in some cases (this was for Japanese. Sidenote: Japanese, and presumably the other LOTEs as well, is really difficult for the calculator to predict!)
My hypothesis (but again, would require more scores and more testing - I don't think I'll ever have enough to be truly certain of what's going on, but just having as many scores as I can get is honestly a real boon) is that the calculator is actually a lot more robust that I gave it credit for, and that estimating scores up to about 44 is something it can do quite well in general. It's just the true extremes where it's going to struggle, and I think based on the subject we'll also see some major issues. In fact, I'd even go as far as to say that most subjects are fine with estimating scores up to even 48, and the only ones where we'll see some issues are subjects like Further which are particularly top-heavy in their distributions.
So, how do you tell if a subject is top-heavy or not? Well, what you want to do is look at the shape of the graph. We'll use the chemistry grade distribution as an example. Notice how GA2 (unit 4 SACs) has a median that's quite high (a B+), and a mode that's very close to the top (the mode simply being the highest dot), and the graph in general is very asymmetrical? That's top heavy. However, in GA1 (unit 3 SACs), the median is in the middle (a C+), the mode is also very in the middle, and the curve looks nice and symmetrical - this one isn't top heavy. So, you've expect the calculator is likely going to under-estimate for GA2, but calculate GA1 and GA3 fine (although the extreme end of these will be overestimated, as discussed above, but only the extreme end). Since GA1 and GA3 are worth 80% of your score, versus 20% for GA2, this ends up meaning that the calculation for chemistry is actually fine and quite good, even if overestimating at the very extremes.