Login

Welcome, Guest. Please login or register.

November 01, 2025, 10:34:03 am

Author Topic: Question!! Help!  (Read 884 times)  Share 

0 Members and 1 Guest are viewing this topic.

xleannenguyen

  • Victorian
  • Adventurer
  • *
  • Posts: 8
  • Respect: 0
  • School: Killester College
Question!! Help!
« on: April 22, 2015, 04:06:42 pm »
0
When doing data transformation (core) and you need to pick the best transformation that fits the data, is it better to pick the one with the highest r^2? What if the residual plot shows more values closer to 0 even though it doesn't have the highest coefficient determination?

Thank you!!  :)
2015: English, Methods, Further, Psychology, Global Politics
Goal ATAR: 90+ (^:

AngelWings

  • Victorian Moderator
  • ATAR Notes Superstar
  • *****
  • Posts: 2456
  • "Angel wings, please guide me..."
  • Respect: +1425
Re: Question!! Help!
« Reply #1 on: April 22, 2015, 10:52:49 pm »
+2
Hi there,
   First off, there's actually a Further help page which is a little more active than a random post. Just telling you now, which you can find: here. Data transformation in Core is, arguably, one of the more difficult areas of the module.

 
[When] you need to pick the best transformation that fits the data, is it better to pick the one with the highest r^2?
Generally, yes. A higher r2 value indicates that the data is more linear and 'conforms' towards the trend.
i.e. A r2 value of 0.9 would be less spread and variable than that of 0.5.
To demonstrate this, depict the scatterplot (I'd draw it out for you, but I don't have too much time on my hands.) for these. The one with r2 value of 0.9 will show a strong positive trend, where the data points (the dots) form a near straight line going towards the top right corner (assume usual Cartesian plan). The one with r2 value of 0.5, on the other hand, would only form a vague line.

What if the residual plot shows more values closer to 0 even though it doesn't have the highest coefficient determination?
A residual plot will always try to be fairly close-ish or somewhere in the vicinity of 0. Why? Think back to how you get the residual plots anyway... we take either the difference from the line of best fit or a regression line, right? So it'll try to make a line in the middle of all the points, if they don't go through them. Therefore, we can say that nearly all scatterplots will produce a residual plot somewhere close to 0. Of course, a higher coefficient of determination (r2 value) will become closer and closer to 0 as a whole, but it's not necessarily false when you do this with a low r2 value. It's a difference, so the points may end up close to 0, it's just not as often. For example, an outlier may affect the regression line and thus, the residual plot will show the outlier being the furthest away and the rest of the points stay close to 0.

Not sure if I made a lot of sense there. Tell me if anything there didn't make sense, because either I'm just super tired or I didn't explain it well enough in your terms.
VCE: Psych | Eng Lang | LOTE | Methods | Further | Chem                 
Uni: Bachelor of Science (Hons) - genetics
Current: working (sporadically on AN)
VTAC Info Thread