From what I understand (and heard) though I could be completely wrong:
Your raw score is, in effect, a percentage/ranking of how you went compared to the rest of the cohort. Regardless of how hard or easy it was, you probably will end up with the same raw score. For eg. If it was really hard you still would've been say the average student, and hence got a 30 raw.
The actual scaling of the raw score, depends on how hard the subject is in general compared to other subjects. The basis of assumption is that students are just as 'intelligent' at specialist as they are at say, english. You may get a 40 raw for English, but only a 30 raw for spesh. This indicates that spesh should be scaled up 10 (if you were the only one doing the subject).
Basically how much scaling is calculated based on how the Specialist Cohort goes in other subject exams. What they can see from this is, for example, 80% of the same spesh students are actually in the top 30% of methods (if they do both subjects). So specialist is 'harder' then methods, and needs to be scaled up, so that (for example) 80% the scaled scores in spesh are 40+ and 30% of the methods scaled scores are 40+ . It then just gets more complicated when you add mulitple subjects and multiple combinations. Of course, because they do this on a large scale so all the averages areduces the error from indivduals.