Regression And Correlation Analysis Assignment
Course Project Final Part: Regression & Correlation Analysis 1
Regression & Correlation Analysis 1
1. Scatter Plot of Y and X1
Scatter plott of sales and calls shows that there can be a linear trend between the both. The trendline indicates that it looks like higher the number of calls, higher will be sales
Sales (Y) across Calls (X1) 80 70 60 50 9 40 30 20 10 50 100 150 200 250 Calls
2. Best fit line
Using the Regression option in Excel Data analysis menu, obtain the following output
Coefficients Standard Error t Stat 22.52055848 6.069248905 3.710600576 0.000343207 P-value Intercept Calls (X1) 0.12373018 0.
best fit line equation is Sales=Intercept + Coefficient of Calls *Calls
3. Coefficient of Correlation
It denotes the strength of association between two variables. The sign denotes the direction of association.
calculate Correlation coefficient as Correl (X1array,Yarray)
We get the value as 0.318
This means that calls and sales are slightly positively associated. With increase in one quantity, the other is also showing an increasing trend.
4. Coefficient of Determination
It is more commonly known as R squared value. It gives the measure of how close the data points are to the best fit line. In other words, it gives the proportion of variability in dependent variable that can be explained by the independent variable. Higher the R- squared value, better the model is.
Excel regression output, get R squared value or Coefficient of Determination as 0.101
~10% of variability in sales is explained by calls.
SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 0.31807481 0.101171585
5. Utility of Regression model
F test can be used to test the utility of the model.
Null Hypothesis: Beta coefficient of call = 0; i.e., Calls is NOT linearly associated with sales
Alternate Hypothesis: Beta coefficient of call \neq 0; Calls is linearly associated with sales
choose significance level, \alpha = 0.05.
From the regression ANOVA output, get p value of F test as 0.0012 (<0.05) for the given degrees of freedom
ANOVA df MS F Significance F Regression Residual Total 1 515.0392467 515.0392467 11.03082098 0.001259972 98 4575.710753 46.69
Since p value < \alpha, reject Null hypothesis, concluding that with the given data it can be said that calls are linearly associated with sales.
6. Based on the above findings, it can be said that calls are a good and important variable in predicting sales volume.
It has been proved that calls and sales have a positive linear association between them. From the best fit line (Sales = 22.52 + 0.1237 * Calls), we can say that with every call, sales increases by 0.1237units (interpretation of coefficient of calls).
7. 95% Confidence Interval
Coefficients Standard Error tStat 22.52055848 6.069248905 3.710600576 0.000343207 10.47633186 34.5647851 p-value Lower 95% Up
The 95% confidence interval for the coefficient of Calls (\beta1) is [0.0498, 0.1976]
Interpretation: 95% confidence interval means that if this regression analysis is to be repeated for other samples from population, 95% of the intervals will contain the true value of \beta1. In simpler terms, can say that 95% confident that the true value of \beta1 is interval.
8. Sales = 22.52 + 0.1237 * Calls
say calls = 100.
95% confidence interval for \beta1 = [0.0498, 0.1976]
lower limit of Sales value, Y low= 22.52 + 0.0498 * 100 = 27.5
Upper limit of Sales value, Y high = 22.52 + 0.1976 * 100 = 42.28
for calls = 100, Sales can be expected to be in the range of [27.5, 42.28]
Multiple linear regressions: Rearession Statistics Multiple R RSquare Adjusted R Square Standard Error Observations 0.693 0.4