Decision tree 2 of machine learning practice

In the last blog, I wrote how to build decision tree by training data. In this section, let's verify the actual effect of the algorithm. All the codes in this section and the previous blog are in the file chosetree.py

Test algorithm: use decision tree for classification

First, declare that the inputTree here is mytree in the previous section=
{'no sufacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}


def classify(inputTree,featLabels,testVec):
    firstStr = inputTree.keys()[0]    #Get the root node 'no surfacing' of the tree
    secondDict = inputTree[firstStr]  #Get the value of key ='No surfacing ', value = {0:' no ', 1: {' flipers': {0: 'no', 1: 'yes'}}}}}
    featIndex = featLabels.index(firstStr) #Get the position of 'no surfacing' in labels
    for key in secondDict.keys():
        if testVec[featIndex] == key: #Determine whether testVec to be verified is equal to the key value set of secondDict
            if type(secondDict[key]).__name__=='dict': #When value When it is a dictionary, the recursive function judges the second attribute of the validation list
                classLabel = classify(secondDict[key],featLabels,testVec)
            else:
                classLabel = secondDict[key]
    return classLabel


The classify function is actually a recursive function. One of the most important problems to be solved in the test function is how to determine the location of features in the data set. For example, the first feature used to divide the data set in mytree is the 'no surfacing' attribute, but where is the attribute stored in the actual data set? The first three lines of the function can help us solve this problem, and the index method is a good way to find the location. Then the code recursively traverses the whole tree, compares the value in testVec variable with the value of the tree node, and returns the classification label of the current node if it reaches the leaf node. Now let's test the code for correctness?

>>> import chosetree as ct
>>> dataset,label = ct.createDataSet()
>>> mytree = ct.createTree(dataset,label)
>>> mytree
{'no sufacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}
>>> classlabel = ct.classify(mytree,label,[1,0])
>>> classlabel
'no'
>>> ct.classify(mytree,label,[1,1])
'yes'

Now we have created a decision tree classifier, we find that every time we use it, we need to reconstruct the decision tree, which will take a lot of computing time. In order to save time, we'd better be able to call the constructed decision tree every time we perform classification. To solve this problem, you need to use the pickle serialization object of python module.

Using algorithm: storage of decision tree

def storeTree(inputTree,filename):
    import pickle
    fw = open(filename,'w')
    pickle.dump(inputTree,fw)
    fw.close()

def grabTree(filename):
    import pickle
    fr = open(filename)
    return pickle.load(fr)
verification:

>>> ct.storeTree(mytree,'mytree.txt')
>>> ct.grabTree('mytree.txt')
{'no sufacing': {0: 'no', 1: {'flippers': {0: 'no', 1: 'yes'}}}}





Tags: Attribute Python

Posted on Sun, 03 May 2020 17:05:37 -0400 by peterg0123