GNU Octave: Correlation and Regression Analysis

26.4 Correlation and Regression Analysis

Function File: cov (x)

Function File: cov (x, opt)

Function File: cov (x, y)

Function File: cov (x, y, opt)

Compute the covariance matrix.

If each row of x and y is an observation, and each column is a variable, then the (i, j)-th entry of cov (x, y) is the covariance between the i-th variable in x and the j-th variable in y.

cov (x) = 1/N-1 * SUM_i (x(i) - mean(x)) * (y(i) - mean(y))

If called with one argument, compute cov (x, x), the covariance between the columns of x.

The argument opt determines the type of normalization to use. Valid values are

0:: normalize with N-1, provides the best unbiased estimator of the covariance [default]
1:: normalize with N, this provides the second moment around the mean

Compatibility Note:: Octave always computes the covariance matrix. For two inputs, however, MATLAB will calculate cov (x(:), y(:)) whenever the number of elements in x and y are equal. This will result in a scalar rather than a matrix output. Code relying on this odd definition will need to be changed when running in Octave.

See also: corr.

Function File: corr (x)

Function File: corr (x, y)

Compute matrix of correlation coefficients.

If each row of x and y is an observation and each column is a variable, then the (i, j)-th entry of corr (x, y) is the correlation between the i-th variable in x and the j-th variable in y.

corr (x,y) = cov (x,y) / (std (x) * std (y))

If called with one argument, compute corr (x, x), the correlation between the columns of x.

See also: cov.

Function File: spearman (x)

Function File: spearman (x, y)

Compute Spearman’s rank correlation coefficient rho.

For two data vectors x and y, Spearman’s rho is the correlation coefficient of the ranks of x and y.

If x and y are drawn from independent distributions, rho has zero mean and variance 1 / (n - 1), and is asymptotically normally distributed.

spearman (x) is equivalent to spearman (x, x).

See also: ranks, kendall.

Function File: kendall (x)

Function File: kendall (x, y)

Compute Kendall’s tau.

For two data vectors x, y of common length n, Kendall’s tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then

         1
tau = -------   SUM sign (q(i) - q(j)) * sign (r(i) - r(j))
      n (n-1)   i,j

in which the q(i) and r(i) are the ranks of x and y, respectively.

If x and y are drawn from independent distributions, Kendall’s tau is asymptotically normal with mean 0 and variance (2 * (2n+5)) / (9 * n * (n-1)).

kendall (x) is equivalent to kendall (x, x).

See also: ranks, spearman.

Function File: [theta, beta, dev, dl, d2l, p] = logistic_regression (y, x, print, theta, beta)

Perform ordinal logistic regression.

Suppose y takes values in k ordered categories, and let gamma_i (x) be the cumulative probability that y falls in one of the first i categories given the covariate x. Then

[theta, beta] = logistic_regression (y, x)

fits the model

logit (gamma_i (x)) = theta_i - beta' * x,   i = 1 … k-1

The number of ordinal categories, k, is taken to be the number of distinct values of round (y). If k equals 2, y is binary and the model is ordinary logistic regression. The matrix x is assumed to have full column rank.

Given y only, theta = logistic_regression (y) fits the model with baseline logit odds only.

The full form is

[theta, beta, dev, dl, d2l, gamma]
   = logistic_regression (y, x, print, theta, beta)

in which all output arguments and all input arguments except y are optional.

Setting print to 1 requests summary information about the fitted model to be displayed. Setting print to 2 requests information about convergence at each iteration. Other values request no information to be displayed. The input arguments theta and beta give initial estimates for theta and beta.

The returned value dev holds minus twice the log-likelihood.

The returned values dl and d2l are the vector of first and the matrix of second derivatives of the log-likelihood with respect to theta and beta.

p holds estimates for the conditional distribution of y given x.