Multiple Linear Regression

Categories: Prediction

Exposed methods:

Multiple Linear Regrssion
Input:	Instances, feature vectors, real-numbered target values
Output:	Regression model
Input format:	Dependent on implementation, e.g., Weka's ARFF format
Output format:	Dependent on implementation, e.g., Weka: plain text; binary models
User-specified parameters:	None
Reporting information:	Apart from the model coefficients, several other statistical results are reported by the MLR method concerning the training data: coefficient of determination, adjusted coefficient of determination, F-statistic, t-statistic for each individual independent variable, confidence intervals, residuals and variance inflation factor.

Description:

MLR (Multiple Linear Regression) is a simple and popular statistical technique that uses several explanatory
(independent) variables to predict the outcome of a response (dependent) variable. The model creates a
relationship in the form of a straight line (linear) that best approximates all the individual data points.

Background (publication date, popularity/level of familiarity, rationale of approach, further comments)
Multiple linear regression (MLR) is the most widely used mathematical technique in QSAR analysis.

Bias (instance-selection bias, feature-selection bias, combined instance-selection/feature-selection bias, independence assumptions?, ...)
Feature-selection bias
The error is assumed to be a random variable with a mean of zero conditional on the explanatory variables.
The independent variables are error-free.
The predictors must be linearly independent, i.e. it must not be possible to express any predictor as a linear combination of the others.
The errors are uncorrelated, that is, the variance-covariance matrix of the errors is diagonal and each non-zero element is the variance of the error.

Lazy learning/eager learning
Eager learning

Interpretability of models (black box model?, ...)
Good (linear model, i.e., produces a simple linear weighting of given features), If the variables are standardized to have mean of zero and standard deviation of one, then the regression coefficients (beta coefficients). Allow the comparison of the relative contribution of each independent variable in the prediction of the dependent variable.

Type of Descriptor:

Interfaces:

Priority: High

Development status:

Homepage:

Dependencies:

Technical details

Data: No

Software: Yes

Programming language(s): Java

Operating system(s): Linux, Win, Mac OS

Input format: Dependent on implementation, e.g., Weka's ARFF format

Output format: Dependent on implementation, e.g., Weka: plain text; binary models

License: GPL

Sections

Multiple Linear Regression

Technical details

References

Document Actions