1, ID3 algorithm
1. Pseudo code
ID3 (Examples, Target_Attribute, Attributes) Create a root node for the tree If all examples are positive, Return the single-node tree Root, with label = +. If all examples are negative, Return the single-node tree Root, with label = -. If number of predicting attributes is empty, then Return the single node tree Root, with label = most common value of the target attribute in the examples. Otherwise Begin A ← The Attribute that best classifies examples. Decision Tree attribute for Root = A. For each possible value, vi, of A, Add a new tree branch below Root, corresponding to the test A = vi. Let Examples(vi) be the subset of examples that have the value vi for A If Examples(vi) is empty Then below this new branch add a leaf node with label = most common target value in the examples Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A}) End Return Root
We will not use this next, but use the sklearn library
2. Disadvantages
- For attributes with many values, it is very sensitive. For example, if an attribute value in our dataset is basically different from different samples, or even more extreme endpoints, it is unique for each sample. If we use this attribute to divide the dataset, it will get a great information gain. However, this result is not what we want.
- The ID3 algorithm cannot handle attributes with continuous values.
- The ID3 algorithm cannot handle samples with missing values for attributes.
- Due to the deep tree generated according to the above algorithm, over fitting is easy to occur.
3. Implementation code
1. Import module
import pandas as pd from sklearn.preprocessing import LabelEncoder from sklearn.tree import DecisionTreeClassifier
2. Read data
data = pd.read_csv('./Watermelon dataset.csv') data
color and lustre | Root | Knock | texture | Umbilicus | Tactile sensation | Good melon | |
---|---|---|---|---|---|---|---|
0 | dark green | Curl up | Turbid sound | clear | sunken | Hard slip | yes |
1 | Black | Curl up | Dull | clear | sunken | Hard slip | yes |
2 | Black | Curl up | Turbid sound | clear | sunken | Hard slip | yes |
3 | dark green | Curl up | Dull | clear | sunken | Hard slip | yes |
4 | plain | Curl up | Turbid sound | clear | sunken | Hard slip | yes |
5 | dark green | Slightly curled | Turbid sound | clear | Slightly concave | Soft sticky | yes |
6 | Black | Slightly curled | Turbid sound | Slightly paste | Slightly concave | Soft sticky | yes |
7 | Black | Slightly curled | Turbid sound | clear | Slightly concave | Hard slip | yes |
8 | Black | Slightly curled | Dull | Slightly paste | Slightly concave | Hard slip | no |
9 | dark green | Stiff | Crisp | clear | flat | Soft sticky | no |
10 | plain | Stiff | Crisp | vague | flat | Hard slip | no |
11 | plain | Curl up | Turbid sound | vague | flat | Soft sticky | no |
12 | dark green | Slightly curled | Turbid sound | Slightly paste | sunken | Hard slip | no |
13 | plain | Slightly curled | Dull | Slightly paste | sunken | Hard slip | no |
14 | Black | Slightly curled | Turbid sound | clear | Slightly concave | Soft sticky | no |
15 | plain | Curl up | Turbid sound | vague | flat | Hard slip | no |
16 | dark green | Curl up | Dull | Slightly paste | Slightly concave | Hard slip | no |
3. Data coding
#Create a LabelEncoder() object for serialization label = LabelEncoder() #Serialize for each column for col in data[data.columns[:-1]]: data[col] = label.fit_transform(data[col]) data
color and lustre | Root | Knock | texture | Umbilicus | Tactile sensation | Good melon | |
---|---|---|---|---|---|---|---|
0 | 2 | 2 | 1 | 1 | 0 | 0 | yes |
1 | 0 | 2 | 0 | 1 | 0 | 0 | yes |
2 | 0 | 2 | 1 | 1 | 0 | 0 | yes |
3 | 2 | 2 | 0 | 1 | 0 | 0 | yes |
4 | 1 | 2 | 1 | 1 | 0 | 0 | yes |
5 | 2 | 1 | 1 | 1 | 2 | 1 | yes |
6 | 0 | 1 | 1 | 2 | 2 | 1 | yes |
7 | 0 | 1 | 1 | 1 | 2 | 0 | yes |
8 | 0 | 1 | 0 | 2 | 2 | 0 | no |
9 | 2 | 0 | 2 | 1 | 1 | 1 | no |
10 | 1 | 0 | 2 | 0 | 1 | 0 | no |
11 | 1 | 2 | 1 | 0 | 1 | 1 | no |
12 | 2 | 1 | 1 | 2 | 0 | 0 | no |
13 | 1 | 1 | 0 | 2 | 0 | 0 | no |
14 | 0 | 1 | 1 | 1 | 2 | 1 | no |
15 | 1 | 2 | 1 | 0 | 1 | 0 | no |
16 | 2 | 2 | 0 | 2 | 2 | 0 | no |
sklearn fitting
# Fit with ID3 dtc = DecisionTreeClassifier(criterion='entropy') # Fit dtc.fit(data.iloc[:,:-1].values.tolist(),data.iloc[:,-1].values) # Tag corresponding code result = dtc.predict([[1,1,1,1,0,0]]) #Fitting results result
array(['yes'], dtype=object)
2, C4.5 algorithm
The general idea of C4.5 algorithm is similar to that of ID3, which is classified by constructing a decision tree. The difference lies in the processing of branches. In the selection of branch attributes, ID3 algorithm uses information gain as a measure, while C4.5 algorithm introduces information gain rate as a measure
[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-h88vxEjX-1635604412623)(attachment:image.png)]
It can be seen from the formula of information gain rate that when 𝑣 is relatively large, the information gain rate will be significantly reduced, which can solve the problem of selecting branch attributes with more values in ID3 algorithm to a certain extent
3, CART algorithm
CART algorithm constructs a binary decision tree. After the decision tree is constructed, it also needs pruning in order to be better applied to the classification of unknown data. CART algorithm uses Gini coefficient to select features when constructing decision tree.
1. Gini index
[external link image transfer failed. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-9I1TfPAZ-1635604412625)(attachment:8RW4%293Q@CXD6R%60WA%5BXP4CGC.png)]
2.CART fitting
# Fit with CART dtc = DecisionTreeClassifier() # Fit dtc.fit(data.iloc[:,:-1].values.tolist(),data.iloc[:,-1].values) # Tag corresponding code result = dtc.predict([[1,1,1,1,0,0]]) #Fitting results result
array(['yes'], dtype=object)