Next: , Previous: , Up: Statistics   [Contents][Index]

### 26.4 Correlation and Regression Analysis

cov (x)
cov (x, opt)
cov (x, y)
cov (x, y, opt)

Compute the covariance matrix.

If each row of x and y is an observation, and each column is a variable, then the (ij)-th entry of `cov (x, y)` is the covariance between the i-th variable in x and the j-th variable in y.

```cov (x) = 1/(N-1) * SUM_i (x(i) - mean(x)) * (y(i) - mean(y))
```

where N is the length of the x and y vectors.

If called with one argument, compute `cov (x, x)`, the covariance between the columns of x.

The argument opt determines the type of normalization to use. Valid values are

0:

normalize with N-1, provides the best unbiased estimator of the covariance [default]

1:

normalize with N, this provides the second moment around the mean

Compatibility Note:: Octave always treats rows of x and y as multivariate random variables. For two inputs, however, MATLAB treats x and y as two univariate distributions regardless of their shapes, and will calculate `cov ([x(:), y(:)])` whenever the number of elements in x and y are equal. This will result in a 2x2 matrix. Code relying on MATLAB’s definition will need to be changed when running in Octave.

corr (x)
corr (x, y)

Compute matrix of correlation coefficients.

If each row of x and y is an observation and each column is a variable, then the (ij)-th entry of `corr (x, y)` is the correlation between the i-th variable in x and the j-th variable in y.

```corr (x,y) = cov (x,y) / (std (x) * std (y))
```

If called with one argument, compute `corr (x, x)`, the correlation between the columns of x.

r = corrcoef (x)
r = corrcoef (x, y)
r = corrcoef (…, param, value, …)
[r, p] = corrcoef (…)
[r, p, lci, hci] = corrcoef (…)

Compute a matrix of correlation coefficients.

x is an array where each column contains a variable and each row is an observation.

If a second input y (of the same size as x) is given then calculate the correlation coefficients between x and y.

param, value are optional pairs of parameters and values which modify the calculation. Valid options are:

`"alpha"`

Confidence level used for the bounds of the confidence interval, lci and hci. Default is 0.05, i.e., 95% confidence interval.

`"rows"`

Determine processing of NaN values. Acceptable values are `"all"`, `"complete"`, and `"pairwise"`. Default is `"all"`. With `"complete"`, only the rows without NaN values are considered. With `"pairwise"`, the selection of NaN-free rows is made for each pair of variables.

Output r is a matrix of Pearson’s product moment correlation coefficients for each pair of variables.

Output p is a matrix of pair-wise p-values testing for the null hypothesis of a correlation coefficient of zero.

Outputs lci and hci are matrices containing, respectively, the lower and higher bounds of the 95% confidence interval of each correlation coefficient.

spearman (x)
spearman (x, y)

Compute Spearman’s rank correlation coefficient rho.

For two data vectors x and y, Spearman’s rho is the correlation coefficient of the ranks of x and y.

If x and y are drawn from independent distributions, rho has zero mean and variance `1 / (N - 1)`, where N is the length of the x and y vectors, and is asymptotically normally distributed.

`spearman (x)` is equivalent to `spearman (x, x)`.

kendall (x)
kendall (x, y)

Compute Kendall’s tau.

For two data vectors x, y of common length N, Kendall’s tau is the correlation of the signs of all rank differences of x and y; i.e., if both x and y have distinct entries, then

```         1
tau = -------   SUM sign (q(i) - q(j)) * sign (r(i) - r(j))
N (N-1)   i,j
```

in which the q(i) and r(i) are the ranks of x and y, respectively.

If x and y are drawn from independent distributions, Kendall’s tau is asymptotically normal with mean 0 and variance `(2 * (2N+5)) / (9 * N * (N-1))`.

`kendall (x)` is equivalent to ```kendall (x, x)```.