Extensive Linear regression module offers unusually rich commented text and graphical diagnostics for many types of linear regression models including polynomials, multivariate Taylor models, general transformed regreesion. The module goes beyond least squares regression, it offers several robust methods like Lp, M-estimates, Quantile regression, Bounded Influence regression, All possible subsets regression, and much more. Diagnostics includes diagnostics of model, data and regression method, vhich will make results as reliable as possible.
Linear regression - Pdf manual
- Regression model coefficients
- Curve fitting
- Variable prediction
- Find most suitable model (Stepwise, All possible subsets)
- Calibration, validation
- Response surface optimization function
- Robust methods for "real world" data
- Extensive diagnostic tools will find any problems and pecularities in data or model
Models:
- Plain linear model
- Polynomial models
- Taylor quadratic hypersurfaces
- User-defined models
- Weighted models
- Quasilinear response correction
- Implicit models
- All posible subset, find best model
|
Methods:
- Least squares
- Rank correction/regularization
- Quantile regression
- Lp-norm regression
- Least median of Squares (LMS)
- IRWLS
- M-estimates
- Robust/resistant BIR method
- Stepwise/All model selection
|
Text output:
- Basic analysis
- Correlations in X
- Multicollinearity
- Eigenvalues analysis
- Analysis of variance (ANOVA)
- Regression coefficients statistics
- Confidence intervals
- Statistical residual analysis
- R, RSS, AIC, MEP, etc.
- Classical residuals
- Dependences in residuals
- Model/Data testing
- Tests for data and residuals
- Tests of model
- Predicted statistics
- Influential points analysis
- Jackknife residuals
- Hat Matrix
- Cook distance
- Atkinson distance
- Andrews-Pregibon statistics
- Likelihood distances
- Prediction
|
Graphical output:
- Regression curve
- Y-prediction
- Residuals vs. prediction
- Absolute residuals
- Squared residuals
- Residual QQ-plot
- Autocorrelation plot
- Heteroscedasticity plot
- Jackknife residuals
- Predicted residuals
- Partial regression plots
- Partial residual plots
- Hat matrix diagonal plot
- Predicted residuals QQ-plot
- Pregibon statsitic plot
- Williams statistic plot
- McCulloh statistic plot
- L-R plot
- Cook distance
- Atkinson distance
- Studentized residuals
- Andrews plot
- Jackknife residual QQ-plot
|
Main panel of Linear regression
Output specification panel
User-defined model panel
Example outputs:
Regression line with identified cases (numbers instead of points) and confidence band (red):
Comparison of two models - model building example (how important is the regression diagniostics):
First model consiered was derived theoretically, has 5 parameters and fits the data well.
Model: [Rate] ~ Abs + [pH-value] + [pH-value]^2 + Ln([pH-value]) + Exp([pH-value])
Its prediction capability is low however, and moreover three out of the five parameters proved insignificant, which makes this model unusable.
Variable |
Estimate |
Std dev |
Conclusion |
P-value |
Lower C.I. |
Upper C.I. |
Abs |
2.1789 |
0.47815 |
Significant |
2.00E-005 |
1.226 |
3.131 |
[pH-value] |
-0.251 |
0.64328 |
Insignificant |
0.69727 |
-1.532 |
1.03 |
[pH-value]^2 |
0.00641 |
0.03973 |
Insignificant |
0.872108 |
-0.072 |
0.085 |
Ln([pH-value]) |
1.5023 |
1.1092 |
Insignificant |
0.17974 |
-0.707 |
3.712 |
Exp([pH-value]) |
0.000261 |
3.1980627E-005 |
Significant |
6.29E-012 |
0.00019 |
0.0003 |
Residual variability |
12.21360278 |
F-statistic |
359.6264707 |
Multiple correlation R |
0.9752305217 |
Determination coefficient R^2 |
0.9510745705 |
Predicted correlation coef. Rp |
0.9418693616 |
Mean error of prediction MEP |
0.1836906899 |
Akaike information criterion |
-137.4849057 |
In the Second model we dropped one of the insignificant parameters. Model still fits the data still very well.
Model: [Rate] ~ Abs + [pH-value] + [pH-value]^2 + Exp([pH-value])
Its prediction capability outside the interval of measured data has improved. Morover all the model parameters are now significant, so we can compute physical constants from the data. Notice much higher F-statistic, which means overall significance of the model.
Variable |
Estimate |
Std Dev |
Conclusion |
p-value |
Lower C.I. |
Upper C.I. |
Abs |
1.61910 |
0.241700 |
Significant |
3.3881E-009 |
1.13761 |
2.1005 |
[pH-value] |
0.605230 |
0.118673 |
Significant |
2.4763E-006 |
0.36881 |
0.8416 |
[pH-value]^2 |
-0.04453 |
0.0128498 |
Significant |
0.000877 |
-0.07013 |
-0.018 |
Exp([pH-value]) |
0.000290 |
2.356E-005 |
Significant |
0 |
0.000243 |
0.00033 |
Residual variability |
12.51634119 |
F-statistic |
473.622371 |
Multiple correlation R |
0.9746085658 |
Determination coefficient R^2 |
0.9498618565 |
Predicted correlation coef. Rp |
0.9417364687 |
Mean error of prediction MEP |
0.1841106266 |
Akaike information criterion |
-137.5506086 |
All possible subsets regression
This method can search up to 8000 regression submodels to select the best one to describe the given data. Criteria for the selection are F-statistic, Mean error of prediction (MEP) and Akaike infiormation statistic (AIC)
Quantile regression
This methods finds the regression quantile curve with a given probability ot data below it. This is very important for example when modelling reliability.
Quantile 15%: Y=1.950+1.961*X-0.121*X^2
|
Quantile 50%: Y=1.074+2.106*X-0.136*X^2
|
Quantile 90%: Y=-0.382+2.079*X-0.126*X^2
|
|
Robust methods
Robust methods are useful when the data may contain gross-errors, bad measurements, etc.
Ordinary Least Squares regression (wrong)
|
Robust M-estimate regression (correct)
|
Rich diagnostic plots and statistics
|