Now reading
Linear vs nonlinear regression: some things to think about, part 2.
In Part 1, I mentioned my early confusion regarding nonlinear versus linear regression modeling in terms of what goes on behind the scenes. How are these estimates calculated in the first place?
For any model you apply to a given dataset, linear models have the most historical mileage. They have been used since the dawn of statistics and employ rather basic mathematical/statistical assumptions for how variables relate to one another. As such, there is a wealth of statistical theory pertaining to linear modeling and the parameter estimates for linear models will typically have closed-form solutions (in other words, linear models are mathematically tractable). When JMP or any other statistical software package fits a linear model to your data, they are using matrix algebra (aka linear algebra) to come up with a solution.
The price you pay for using a linear model is that it may not adequately capture intra-variable relationships and, as such, may be of limited use in the real world.
For nonlinear models, on the other hand, it will almost always be necessary to rely on numerical methods that use computational algorithms to come up with model parameter estimates. Very rarely will a closed-form solution exist for your model. Assuming you have selected an appropriate nonlinear model for your data, you will often have more choices to make in terms of how the parameters for this nonlinear model are estimated. And for life in general, you could make a strong argument that too many choices often leads to confusion.
When a statistical software package fits a nonlinear model to your data (or if you are writing custom code), then two methods will usually be used to come up with parameter estimates:
To complicate matters further, it is often not a simple matter of merely minimizing A or maximizing B to come up with parameter estimates. Depending on the functional expression of your model, it often makes computational sense to rely on partial derivatives (normal equations) or apply a logarithmic transformation. If you are relying on software, then these decisions are made for you. If you are writing custom code, then this is where experience and a basic understanding of statistical computation are your friends.
And, unfortunately, there are still more choices to make!
Nonlinear optimization based on either A. or B. always poses the risk of coming up with suboptimal parameter estimates. To avoid this it is often necessary to make several choices regarding how you will search the parameter space and, specifically, which parameter values you will use as initial estimates. This will help us avoid minima/maxima that are local instead of global.
And, to cap it all off, you still have to decide which optimization algorithm you will use to actually minimize A. or maximize B. Whew!
If you are using fairly standard models and if you rely on out-of-the-box software, then you probably don’t need to spend too much time thinking about all of the above. If you’re like me, however, it’s intellectually satisfying to have at least a rudimentary understanding of what is going on below the surface. And as you progress within the field of data science, this understanding is absolutely crucial. You can probably carve out a career simply using software to crunch some numbers, but understanding the “why” is what makes the difference between a job that pays the bills and one that is truly satisfying.