# 1, Mathematical principle analysis

## linear regression

When there is an accurate and strict linear relationship between two variables, Y=a+bX can be used to represent the functional relationship between them.
Where X is an independent variable; Y is a dependent variable.
However, in real life, due to the interference of other factors, the relationship between many bivariates is not a strict functional relationship, which can not be accurately reflected by the functional equation. In order to distinguish from the functional equation between two variables, we call this relationship regression relationship, which is expressed by the linear program, and this relationship is called regression line or linear regression.

## least square method

Calculation principle: least square method, that is to ensure that the sum of squares of the longitudinal distance from each measured point to the regression line is the smallest, and the calculated regression equation can best represent the straight-line trend reflected by the measured data.
Relevant formula:    Total variation decomposition of Y: The square value of R is between 0 and 1, which reflects the relative degree of regression contribution. # 2, EXCEL simple processing

## 20 sets of data

Pictures drawn using excel: ## 200 sets of data

Pictures drawn using excel: ## 2000 sets of data

Pictures drawn using excel: ## 20000 sets of data

Pictures drawn using excel: # 3, python language design least squares calculation (using anaconda's jupyterlab)

## Introduction to using tools

A tool jupyterab in Anaconda is used, which can be found in Anaconda official website Download it yourself.
Open jupyterlab in anaconda and it will be opened as a web page: Click python to create a file: ## Do not call the python calculation of the package

```#Univariate linear regression without packet switching
import pandas as pd
df = pd.read_excel('..\\source\\weights_heights(height-Weight data set).xls',sheet_name ='weights_heights')
height=df.iloc[0:raw,1:2].values
weight=df.iloc[0:raw,2:3].values
return height,weight

def array_to_list(array):#Convert array to list
array=array.tolist()
for i in range(0,len(array)):
array[i]=array[i]
return array

def unary_linear_regression(x,y):#Univariate linear regression, x,y are all list types
xi_multiply_yi=0
xi_square=0;
x_average=0;
y_average=0;
f=x
for i in range(0,len(x)):
xi_multiply_yi+=x[i]*y[i]
x_average+=x[i]
y_average+=y[i]
xi_square+=x[i]*x[i]
x_average=x_average/len(x)
y_average=y_average/len(x)
b=(xi_multiply_yi-len(x)*x_average*y_average)/(xi_square-len(x)*x_average*x_average)
a=y_average-b*x_average
for i in range(0,len(x)):
f[i]=b*x[i]+a
R_square=get_coefficient_of_determination(f,y,y_average)
print('R_square='+str(R_square)+'\n'+'a='+str(a)+'  b='+str(b))

def get_coefficient_of_determination(f,y,y_average):#Transmit the calculated values f and X, the true value of Y, and the average value y_average to obtain the determination coefficient, that is, R ²
res=0
tot=0
for i in range(0,len(y)):
res+=(y[i]-f[i])*(y[i]-f[i])
tot+=(y[i]-y_average)*(y[i]-y_average)
R_square=1-res/tot
return R_square

raw=[20,200,2000,20000]
for i in raw:
print('The number of data groups is'+str(i)+":")
height=array_to_list(height)
weight=array_to_list(weight)
unary_linear_regression(height,weight)
```

Click Run: Results obtained (can be compared with excel): ## Call python calculation of package

It is the same as the above process, except that the code has changed. Calling pandas's sklearn method will make the code simpler without typing the algorithm:

```#Implementation of univariate linear regression by packet switching
from sklearn import linear_model
from sklearn.metrics import r2_score
import numpy as np
import pandas as pd
df = pd.read_excel('D:\weights_heights(height-Weight data set).xls',sheet_name ='weights_heights')
height=df.iloc[0:raw,1:2].values
weight=df.iloc[0:raw,2:3].values
return height,weight

for i in raw:
print('The number of data groups is'+str(i)+":")
weight_predict=weight
lm = linear_model.LinearRegression()
lm.fit(height,weight)
b=lm.coef_
a=lm.intercept_
weight_predict=lm.predict(height)#Calculate the value inferred from the equation
R_square=r2_score(weight,weight_predict)#Calculate variance
print('b='+str(b)+' a='+str(a))
print('R_square='+str(R_square))
```

Results obtained: # 4, Reference materials

Posted on Fri, 01 Oct 2021 15:59:47 -0400 by cryp7