yeah linear approximation can be quite tough to get your head around at first, but after you understand underlying principles, the topic becomes quite manageable and straightforward.
consider the following diagram (it's not a particularly good one, but it's the only one i could find).

suppose the equation of the curve is y=f(x). you are given f(a) (that is the y-value at x=a) and f'(a) (that is the gradient at x=a). now say you want to find the value of f(a+h) (that is the y-value at x=a+h, where h is small). in this case, you can simply substitute x=a+h into the supplied equation y=f(x) and obtain an exact value. however, it is not always possible to obtain the equation of y=f(x) (you'd appreciate this if you do spesh; for example, if you only know the derivative equation, i.e. dy/dx=f'(x), then it is not always possible to integrate both sides to obtain the equation y=f(x), since there is only a limited number of expressions that can be integrated). so sometimes, we can only find an APPROXIMATION for f(a+h).
so, what is the best approximation given the data we have at hand? (remember, we only know f(a) and f'(a)). well, how about if we draw in the tangent at x=a, and then locate the point on the tangent at x=a+h and find the y-value of THAT instead? if h is small, say 0.01, then we would be able to obtain a pretty decent approximation (the y-value of the point on the tangent at x=a+h - the approximation - would pretty much be equal to the y-value of the point on the curve at x=a+h - the real value). so the question is: what is the y-value of the point on the tangent at x=a+h?
well, we know from year 9 maths how to find the equation of the tangent at x=a:
y-f(a) = f'(a) (x-a)
[notice that we CAN find the equation of the tangent because we know f(a) and f'(a), and these are the only two pieces of information required.]
so at x=a+h:
y - f(a) = f'(a) ((a+h)-a)
y - f(a) = h*f'(a)
y = f(a) + h*f'(a)
look familiar? yes, this is the formula for linear approximation. but don't just memorise this formula without understand what it actually represents.
now, suppose we knew f'(a+h/3), f'(a+2h/3), as well as f'(a+h). can we not obtain an even better approximation? have a think about this...not really needed in methods but it is in the spesh course. (search up euler's method if interested or better yet just sit down and think about how the new data can be exploited yourself.)
new problem. suppose we knew, not only f(a) and f'(a), but also f''(a), f'''(a), f''''(a) (fourth derivative) and so on. can we exploit this new data to obtain an even better approximation. how about if we knew ALL the derivatives? the fifth derivative at x=a, the sixth, the seventh, ...the a millionth! perhaps we can obtain SO accurate an approximation that the approx actually COINCIDES with the actual value. (search up taylor series if interested.)