1, Summary of learning points

1. Information obtained from competition information
3. Evaluation and calculation of classification index
4. On parity calculation of regression index
5. Understanding of some nouns

2, Learning content:

1. New knowledge learned from the competition

```a. Desensitization: process some private information, such as 186 mobile phone numbers****7392 Like this.
b. label encoding Digital form
c. Anonymity is the failure to tell the relevant nature of the data column
d. The evaluation index is to evaluate the gap between the model effect and the actual effect(Specific evaluation indicators (written later)
```

2. Lessons learned from the task Code:

(1)

From the inside, I know:
a. The head function in pandas can display five pieces of data (five by default)

3. Evaluation and calculation of classification indicators

1.accuracy

```#accuracy
import numpy as np
from sklearn.metrics import accuracy_score
y_pred=[0,1,3,4]
y_true=[0,1,4,4]
print('ACC:',accuracy_score(y_pred,y_true,normalize=False))
print('ACC:',accuracy_score(y_true,y_pred))
```

We use accuracy here_ Score function, which is an evaluation method. Classification accuracy score refers to the percentage of correct classification.
sklearn.metrics.accuracy_score(y_true, y_pred, *, normalize=True, sample_weight=None)
If normalize is False, y is returned_ And Y in pred_ The number of elements is the same as True. The default is True, and the return is the correct ratio.

• Prerequisite knowledge:
I use T for correct prediction and F for wrong prediction
Use P for positive class and N for negative class
TP: the prediction is positive and the judgment is correct
FP: the prediction is positive and the judgment is wrong
FN: negative prediction, wrong judgment
TN: the prediction is negative and the judgment is correct
1. precision accuracy
```## Precision
from sklearn import metrics
y_pred = [1,1,0,0,0]
y_true = [1,0,0,1,0]
print('Precision',metrics.precision_score(y_true, y_pred))
'''The resolution is not indicated here average The default value is taken binary. 1 Is the default positive class, then TP(A positive class value predicted to be a positive class, that is, 1) has 1,
TP+FP(That is, the total number predicted as 1) has two, so it is calculated P=TP/(TP+FP)Is 0.5'''
```

Precision is used here_ Score function. The parameter average in this function is binary by default, so y is required by default_ true, y_ PRED only contains 0 and 1 (that is, binary), and it also involves another parameter pos_label，pos_ Label specifies that it is considered positive_ The value of label (that is, the value regarded as a positive class) is 1 by default, that is, 1 is positive by default (that is, the often said positive class).

• Other parameters of average, such as the most commonly used macro and weight calculation methods, are also based on the calculation of P.
1. Recall recall rate
Here, as above, average goes to the default value binary
```#Recall
print('Recall:',metrics.recall_score(y_true, y_pred))
'''The analysis is here. Go and find it first TP，That is 1.Then find TP+FN(FN: If the prediction is negative, the judgment is wrong, that is, find 2 of the predicted value of 0, but the original value is 1),
therefore recall=TP/(TP+FN)'''
```
1. F1_ fraction
```#F1-score
print('F1-score:',metrics.f1_score(y_true, y_pred))
#The calculation method is (2*P*R) / (P+R), where P and R are the above accuracy rate and recall rate respectively
```

4. Parity calculation of regression index

(relevant information and explanations have been written on the notes)

```# coding=utf-8
import numpy as np
from sklearn import metrics

# MAPE needs to be implemented by itself. It is not in the sklearn library, so you should write your own code to implement it
def mape(y_true, y_pred):
return np.mean(np.abs((y_pred - y_true) / y_true))

y_true = np.array([1.0, 5.0, 4.0, 3.0, 2.0, 5.0, -3.0])
y_pred = np.array([1.0, 4.5, 3.8, 3.2, 3.0, 4.8, -2.2])

# MSE mean square error
print('MSE:',metrics.mean_squared_error(y_true, y_pred))
# RMSE root mean square error
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true, y_pred)))
# MAE mean absolute error
print('MAE:',metrics.mean_absolute_error(y_true, y_pred))
# MAPE mean absolute percentage error
print('MAPE:',mape(y_true, y_pred))
```

```## R2 score R2 determination coefficient (goodness of fit), the closer it is to 1, the better
from sklearn.metrics import r2_score
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print('R2-score:',r2_score(y_true, y_pred))
```

The closer R2 is to 1, the higher the correlation!