Predicting Values Using Critical Values Correlation Coefficients

How to compute the critical value of the Pearson Correlation Coefficient " r " WITHOUT the use of PPMC Table?

Compute t = r sqrt(n-2)/sqrt(1-r^2), where r is the Pearson correlation coefficient. You need to find the p-value from the t statistic;

Use the tcdf function with the obtained t and n-2 degrees of freedom.
Multiply the p-value by 2.

If the p-value < 0.05 (assuming 0.05 is the level of significance), reject the null hypothesis. Otherwise, retain the null hypothesis.
The null hypothesis is H0: Rho = 0

Rho is the population correlation coefficient.

Use the linear correlation coefficient given to determine the coefficient of determination, R^2.?

All you need to do is square each value to compute R^2. So,

a) R^2 = .1024

b) R^2 = .0169

c) R^2 = .16

d) R^2 = .8649

The value of R^2 can be interpreted as the amount of variation in a model.

So let's say for the first problem, you built a simple linear regression model with one dependent variable (y) and one independent variable (x). You could say 10.24% of the variation in the model is explained by using the independent variable (x) as a means to predict the dependent variable (y).

Hope that helps!

Meaning of r value or correlation coefficient in this context?

the correlation coefficient between predicted cholesterol level in mg/dl and weight is 0.4. what does this mean?

A 40% of variability in cholesterol level can be explained by weight
B 16% variability in cholesterol level can be explained by weight
C for each added kg of weight, cholesterol level increases by 0.5 mg/dl
D for each added kg of weight, cholesterol level increases by 0.16 mg/dl
E none of the above

Is an 85% correlation sufficient to make a prediction using a regression line equation?

Take a look at Anscombe's quartet - four datasets, each with the same means, variances and correlation, but vastly different as far as regression is concerned. I suggest you graph the data if you can. Cross-validation, p-values and error ranges are fine suggestions. You can also use bootstrapping to perturb the data and see how robust the model is.

What is the relationship between statistical power and the p-value?

If by p-value you mean the critical p-value (or alpha) then the relationship is that given everything else stays the same* then increasing the alpha value (from 0.05 to 0.1 for example) will give you more power--the vertical line will move to the left. You can imagine this by looking at the above picture. *sample size, effect size...etc. In this picture we have two distributions: the one we expect of the null is true (left) and the one we expect if the alternative hypothesis is true (right). The vertical line is the point where the critical value (alpha) is. We reject the null if our sample has a P-value of less than alpha (to the right of the line) and we fail to reject the null if our sample has a p-value greater than alpha (to the left of the line). We do this because if we get something to the left of the line, were more sure that the sample came from the null distribution than the alternative (right hand) distribution. The "power" is represented as the shaded portion of the right (alternative) distribution. Anything on the right side of this line will cause us to reject the null...if the TRUE population distribution is the alternative one than this is good because the null isn't really true and we accurately rejected it! This is our power, the ability to correctly detect an effect if there is one. If you move the line to the left (by increasing alpha) than MORE of your alternative distribution would be to the right of the alpha line, so you have a better chance of correctly rejecting the null given that the true distribution is the alternative one. This is great! However if the true population distribution is the null one, then moving the line gives you less of a chance of correctly FAILING to reject the null. So there's usually a trade off between power and reducing type 1 errors. Bonus: Since we've generally as a scientific community accepted that alpha 0.05 is acceptable (this is completely arbitrary by the way), we control power by increasing our sample size which will make our distributions skinnier and overlap less which means more of our alternative distribution will be on the right of the alpha line.

How can we calculate critical temperature, volume and pressure in terms of a and b?

This can be done from the Van der Waals Equation of State for real gases. Consider that equation, for one mole of the gas in question:[math](P+\frac{a}{V^2})(V-b)=RT[/math]A bit of rearrangement gives us-[math]P=\frac{RT}{V-b}-\frac{a}{V^2}[/math]Now if a graph of P as a function of volume were to be plotted, it would be something like this-As you can see, the critical temperature [math](T_c)[/math] also happens to be a point of inflection for the PV diagram. One concludes, at this temperature-[math]\frac{\partial P}{\partial V} = 0[/math][math]\frac{\partial^2 P}{\partial V^2}=0[/math]Where P is a function of V as per the Van der Waals' Equation. Making use of that, we get:[math]\frac{\partial P}{\partial V} = -\frac{RT}{(V-b)^2} + \frac{2a}{V^3}=0[/math][math]\Rightarrow \frac{2a}{V^3} = \frac{RT}{(V-b)^2} . . . . . . (A_1)[/math][math]\Rightarrow \frac{a}{V^4} = \frac{RT}{2V(V-b)^2} . . . . . . (A_2)[/math]Also, by differentiating the first partial derivative again, with respect to V, we get:[math]\frac{\partial^2 P}{\partial V^2} = \frac{2RT}{(V-b)^3} - \frac{6a}{V^4} = 0[/math]Dividing both sides by 2, and taking one term to the left hand side, we get:[math]\frac{RT}{(V-b)^3} = \frac{3a}{V^4}[/math]Plugging in the value that we just obtained from Equation [math]A_2[/math]-[math]\frac{RT}{(V-b)^3} = \frac{3RT}{2V(V-b)^2}[/math]Now, if we divide both sides by a factor of [math]\frac{RT}{(V-b)^2}[/math], and just rearrange the terms, we see that-[math]3V - 3b = 2V[/math][math]\Rightarrow V_c = 3b, [/math]which is the critical volume.Plugging this value into equation [math]A_1[/math]-[math]\frac{RT}{4b^2} = \frac{2a}{27b^3}[/math][math]\Rightarrow T_c = \frac{8a}{27Rb},[/math]which is the critical temperature.To calculate the critical pressure, return to the original equation-[math]P=\frac{RT}{V-b}-\frac{a}{V^2}[/math]Plug in the values of [math]T_c[/math] and [math]V_c[/math] we just calculated, and you'll see that-[math]P_c =\frac{a}{27b^2}[/math]Image Source: Bright Hub Engineering.

What could be the reasons for an unusually high correlation coefficient?

LASSO is in the family of regularized regression methods. It uses the L1 penalty. This penalizes non-zero coefficient values in the coefficients of the predictor variables. Usually LASSO is used as a model fit or predictor function. It is a linear model.You might mean the correlation as variance explained by the LASSO fit on the training dataset. The metric is the R squared one. You might mean it is lower than 90% in all predictor variables, but reaches 90% or greater in the subset of predictor variables. This can follow from predictor variables that are correlated together to the predicted response. When you reduce the predictor space, you remove these correlations. You can also reduce the predictor space to a few instrumental variables that explain the predicted response. This requires familiarity with the dataset and domain knowledge. The issue with high accuracy LASSO models using many predictors is loss of simplicity and direct relevance of critical predictors.The example I have in mind is tens of thousand genetic markers, that LASSO can fit and reduce to only a few highly influential genetic markers. Then with a few ten genetic markers, the geneticist or medical researcher can focus on them in another regression model. They can also look at these genetic markers in alternative learning methods. Lastly, they can inspect the genetic markers in their domain science and look for relevance and explanatory power.

Perform the required correlation test. Assume assumptions for regression are met.?

I have assumed 1164 = 11.64 (see your x-data set)
Number of cases 9
∑ X = 41.56
∑ Y = 1416.1
∑ X^2 = 289.4222
∑ Y^2 = 232498.97
∑ XY = 7439.37
∑ X ∑ Y = 7439.37

r = ( ∑ XY - ∑X ∑Y / n ) / SQRT { [∑ X^2 - (∑X)^2 / n] [∑ Y^2 - (∑Y)^2 / n]}
Numerator of r = (7439.37) - (41.56)(1416.1) / 9 = 900.134889
Denominator of r = SQRT[289.4222 - (41.56)^2 / 9] * SQRT[232498.97 - (1416.1000000000001)^2 / 9]
= SQRT[97.507356] * SQRT[9683.502222]
r = 900.134889 / [9.87458] * [98.40479]
Correlation coefficient r = 0.9263 --- I get the same based on that assumption
degree of freedom = 7 (n-2)

t= r sqrt(n-2)/sqrt(1-r^2) = (.9263)sqrt(7) /sqrt(1-(.9263)^2) = 6.5044
Using a significance level of .05, t-critical (7df) = 2.365
Calculated t exceeds critical t, so reject H0.

The correlation coefficient is significantly different from 0.

TRENDING NEWS

POPULAR NEWS

Predicting Values Using Critical Values Correlation Coefficients

TRENDING NEWS