Take this as a grain of salt, as it is derived from reading and hearsay...
From what I can gather; scaling works on a principle of comparing the relative raw scores of all of the subjects every student within a subjects cohort (eg: spesh) undertakes, thus allowing them to come to a conclusion as to the difficulty of the subject (spesh).
For example, within Spesh they notice that the cohort of spesh is considered 'smart' as much of the students who get a 30+ raw in spesh get low-mid 40's in their other subjects. Hence it can be gathered that spesh must be 'difficult' and using modelling they can accurately determine this 'difficulty' from using each individual students data and hence scale the subject according.
(This is also as to the reason why scaling differs slightly each year, as the cohort is as a mean slightly differing in 'intelligence' each year)
For the 3 maths though, they never include the subjects the student undertook in year 11 (most often methods) to assist in this calculation of 'difficulty'; this is crucial as most specialist students go quite well in methods in year 11, and by not including this high score they are in a way decreasing the 'mean' intelligence of the spesh cohort.
Hence, since this 'mean intelligence' was slightly lower since not including methods, the scaling for spesh was scaling slightly lower than if they had included this methods score.
Thus now that they are including methods, it is predicted (using their modelling) that the 'mean intelligence' of the cohort will rise, hence leading to the subject appearing more difficult (the intelligence of the cohort suddenly rises but the average scores will be the same this year from the last) meaning that theoretically (and based from their modelling) the mean will now rise from 38 scaled, to 41 scaled.