Everyone knows programming - the financial mob asks me for my rate forecast

It's like this.

In the middle of the night, I got a question from a big stock broker about programming.

Bogor, have you slept?

Inner OS: As a financial mob, how come I ask questions about programming so late?

Later, I learned that they had organized a competition within their company - interest rate forecasts.

That's true. It's not simple. Is it a linear regression model?Not the same as Hello world-level home price forecasting models in artificial intelligence.I'll give him an explanation, Barbara.

"What, do you think I understand a little bit?But I still don't know what to do ~"said the financial mob.

--Here's the dividing line--

Import paddlepaddle And data processing packages

#Load [propeller] (Https://www.oschina.net/action/visit/ad?Id=1185Propellers, Numpy, and Related Class Libraries
import [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle")
import [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle").fluid as fluid
import [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle").fluid.dygraph as dygraph
from [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle").fluid.dygraph import Linear
import numpy as np
import os
import random

data processing

The code for data processing does not depend on the framework and is the same as the code for building a house price prediction task using Python, which is not covered here.

def load_data():
    # Import data from a file
#     datafile = './housing.data'
    datafile = './national debt2.txt'
    data = []
#     with open(datafile, "r", encoding='utf-8') as f:
#         data = f.read()  #Remove line breaks for each element in the list
#         data.append(line + "\n")
#     data = np.fromfile(datafile, sep='\t')
    data = np.loadtxt(datafile, delimiter='\t', encoding='gbk', dtype=np.float64)
    print(data)
    print(">>>>>>")
    # Each piece of data includes 14 items, of which the first 13 items are the influencing factors and the 14th item is the corresponding median house price
#     feature_names = [ 'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', \
#                       'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV' ]
    
    feature_names = [ 'X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7','Y' ]
    feature_num = len(feature_names)
    print(data.shape[0])
#     print("size = " + str(len(data))
    # Reshape the original data to [N, 14]
    data = data.reshape([data.shape[0] , feature_num])

    # Split the original dataset into training and test sets
    # Here, 80% of the data is used for training and 20% for testing.
    # Test set and training set must be non-intersecting
    ratio = 0.8
    offset = int(data.shape[0] * ratio)
    training_data = data[:offset]
#     print(training_data)
    # Calculate maximum, minimum, mean of train dataset
    maximums, minimums, avgs = training_data.max(axis=0), training_data.min(axis=0), \
                                 training_data.sum(axis=0) / training_data.shape[0]
    
    # Record data normalization parameters and normalize data in predictions
    global max_values
    global min_values
    global avg_values
    max_values = maximums
    min_values = minimums
    avg_values = avgs

    # Normalize data
    for i in range(feature_num):
        #print(maximums[i], minimums[i], avgs[i])
        data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i])

    # Proportion of training and test sets
    #ratio = 0.8
    #offset = int(data.shape[0] * ratio)
    training_data = data[:offset]
    test_data = data[offset:]
    return training_data, test_data
          
training_data,test_data = load_data()
print(training_data)
print(test_data)

model design

The essence of model definition is to define the network structure of linear regression. Propeller It is recommended that the model network be defined by creating Python classes, that is, by defining the init and forward functions.The forward function is a function that the framework specifies to implement forward computing logic, and the program automatically executes the forward method when calling a model instance.The network layer used in the forward function needs to be declared in the init function.

The implementation process consists of two steps:

  1. Define the init function: Declare the implementation function for each layer of the network in the class's initialization function.In a housing price forecasting model, only one level of full-connection FC, model structure and structure need to be defined. Sections 1-2 The models are consistent.
  2. Define forward function: Construct a neural network structure to implement the forward calculation process, and return the prediction results. In this task, the result of housing price prediction is returned.

Explain:

Name_When scope variables are used to debug a model, track variables from multiple models. Ignore them here. Propeller versions 1.7 and later do not force the user to set name_scope.

class Regressor(fluid.dygraph.Layer):
    def __init__(self, name_scope):
        super(Regressor, self).__init__(name_scope)
        name_scope = self.full_name()
        # Define a layer of full connection, output dimension is 1, activation function is None, that is, no activation function is used
        self.fc = Linear(input_dim=7, output_dim=1, act=None)
    
    # Forward calculation function of network
    def forward(self, inputs):
        x = self.fc(inputs)
        return x

Training Configuration

  1. The grard function specifies the machine resources for running the training, indicating that the programs in the with scope are executed on the local CPU resources.Dygraph.guardIndicates that a program within the with scope will execute in the mode of a propeller dynamic diagram (real-time execution).
  2. Declare a defined Regressor instance of the regression model and set the state of the model to training.
  3. Use load_The data function loads training and test data.
  4. Setting the optimization algorithm and learning rate, the optimization algorithm uses random gradient descent SGD The learning rate is set to 0.01.

The training configuration code is as follows:

# Define the working environment of the propeller dynamic diagram
with fluid.dygraph.guard():
    # Declare a well-defined linear regression model
    model = Regressor("Regressor")
    # Open model training mode
    model.train()
    # Loading data
    training_data, test_data = load_data()
    # Defines an optimization algorithm, where a random gradient descent-SGD is used
    # Learning rate set to 0.01
    opt = fluid.optimizer.SGD(learning_rate=0.01, parameter_list=model.parameters())

Explain:

  1. By default, this case runs on the reader's notebook, so the machine resources for model training are CPU s.
  2. The model instance has two states: training state (.train()) and prediction state (.eval()).Both forward calculation and backward propagation gradient are performed during training, while only forward calculation is required for prediction.There are two reasons to specify a running state for a model:

(1) Some advanced operators, such as Drop out and Batch Normalization, which are detailed in the section on computer vision, perform logically differently in the two states.

(2) Considering performance and storage space, memory is saved and performance is better when predicting state.

  1. In the code above, you can see that declaring the model, defining the optimizer, and so on, were all created with fluid.dygraph.guard() In context, can be understood as withFluid.dygraph.guard() Create a working environment for propeller dynamic diagram, in which model declaration, data conversion and model training are completed.

In the case of Python-based implementation of a neural network model, we have written a lot of code for achieving gradient descent, which is greatly simplified by using the propeller framework to implement the optimizer setup by simply defining SDG.

with dygraph.guard(fluid.CPUPlace()):
    EPOCH_NUM = 10   # Set number of outer loops
    BATCH_SIZE = 10  # Set batch size
    
    # Define Outer Loop
    for epoch_id in range(EPOCH_NUM):
        # Randomize the order of training data before each iteration starts
        np.random.shuffle(training_data)
        # Split the training data, each batch contains 10 pieces of data
        mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)]
        # Define inner loop
        for iter_id, mini_batch in enumerate(mini_batches):
            x = np.array(mini_batch[:, :-1]).astype('float32') # Get current batch training data
            y = np.array(mini_batch[:, -1:]).astype('float32') # Get the current batch training label (real house price)
            # Converting numpy data to variable form of propeller dynamic diagram
            house_features = dygraph.to_variable(x)
            prices = dygraph.to_variable(y)
            
            # Forward calculation
            predicts = model(house_features)
            
            # Calculate loss
            loss = fluid.layers.square_error_cost(predicts, label=prices)
            avg_loss = fluid.layers.mean(loss)
            if iter_id%20==0:
                print("epoch: {}, iter: {}, loss is: {}".format(epoch_id, iter_id, avg_loss.numpy()))
            
            # Reverse Propagation
            avg_loss.backward()
            # Minimize loss, update parameters
            opt.minimize(avg_loss)
            # Clear Gradient
            model.clear_gradients()
    # Save Model
    fluid.save_dygraph(model.state_dict(), 'LR_model')

Save and test the model

Save Model

Current parameter data for the modelModel.state_Dict() is saved to a file (by specifying the saved file name LR_with parameters)Model) for program calls to predict or verify, as shown below.

# Define [Propeller] (https://www.oschina.net/action/visit/ad?id=1185"Propeller") Dynamic Diagram Working Environment
with fluid.dygraph.guard():
    # Save model parameters, file name LR_model
    fluid.save_dygraph(model.state_dict(), 'national_debt')
    print("The model was saved successfully, and the model parameters were saved in LR_model in")

Model saved successfully, model parameters saved in LR_In model

# Select dozens of records to train
def load_one_example(data_dir):
    f = open(data_dir, 'r')
    datas = f.readlines()
#     print(datas)
    # Select 10th Last Data for Test
    tmp = datas[-10]
    tmp = tmp.strip().split()
    one_data = [float(v) for v in tmp]

    # Normalize data
    for i in range(len(one_data)-1):
        one_data[i] = (one_data[i] - avg_values[i]) / (max_values[i] - min_values[i])

    data = np.reshape(np.array(one_data[:-1]), [1, -1]).astype(np.float32)
    label = one_data[-1]
    return data, label
    
# Load Test Set
def load_test_data(data_dir):
#     f = open(data_dir, 'r')
#     datas = f.readlines()
    one_data = np.loadtxt(data_dir, delimiter='\t', encoding='gbk', dtype=np.double)
    print(one_data)

    maximums, minimums, avgs = one_data.max(axis=0), one_data.min(axis=0), \
                                 one_data.sum(axis=0) / one_data.shape[0]
    
    # Record data normalization parameters and normalize data in predictions
    global max_values
    global min_values
    global avg_values
    max_values = maximums
    min_values = minimums
    avg_values = avgs

    # Normalize data
    for i in range(7):
        #print(maximums[i], minimums[i], avgs[i])
        one_data[:, i] = (one_data[:, i] - avgs[i]) / (maximums[i] - minimums[i])

    # Select 10th Last Data for Test
#     tmp = datas
# #     tmp = tmp.strip()
#     one_data = [float(v) for v in tmp]

    # Normalize data
#     for i in range(len(one_data)-1):
#         one_data[i] = (one_data[i] - avg_values[i]) / (max_values[i] - min_values[i])

#     data = np.reshape(np.array(one_data[:-1]), [1, -1]).astype(np.float32)
    return one_data        
    

with dygraph.guard():
    # The file address where the parameter is the saved model parameter
    model_dict, _ = fluid.load_dygraph('national_debt')
    print(model_dict)
    model.load_dict(model_dict)
    model.eval()

    # Parameter is the file address of the dataset
    test_data, label = load_one_example('./national debt2.txt')
    # Convert Data to variable Format for Dynamic Graphics
    test_data = dygraph.to_variable(test_data)
    results = model(test_data)
    print(test_data)
    # Denormalize the results
    results = results * (max_values[-1] - min_values[-1]) + avg_values[-1]
    print("Inference result is {}, the corresponding label is {}".format(results.numpy(), label))
#     print("Inference result is {}".format(results.numpy()))        

Model Output

 {'fc.weight': array([[ 0.26267445],
       [ 0.3111655 ],
       [-0.07909104],
       [ 0.14917243],
       [-0.7034063 ],
       [ 0.6225266 ],
       [-0.56594455]], dtype=float32), 'fc.bias': array([0.01486984], dtype=float32)}
name generated_var_0, dtype: VarType.FP32 shape: [1, 7] 	lod: {}
	dim: 1, 7
	layout: NCHW
	dtype: float
	data: [-2.80606 -1.25 4.26667 0.671242 0.688889 0.733556 0.251534]
    Inference result is [[21.144272]], the corresponding label is 2.1907

Final model:

Y =  0.26267445 * X1 +  0.3111655 * X2 + -0.07909104 * X3 + 0.14917243] * X4 + -0.7034063 * X5 + 0.6225266 * X6 + -0.56594455 * X7

Tags: network encoding Python Programming

Posted on Tue, 02 Jun 2020 04:09:52 -0400 by Anco