Everyone knows programming - the financial mob asks me for my rate forecast

It's like this. In the middle of the night, I got a quest...
Import And data processing packages
data processing
model design
Training Configuration
Save and test the model

It's like this.

In the middle of the night, I got a question from a big stock broker about programming.

Bogor, have you slept?

Inner OS: As a financial mob, how come I ask questions about programming so late?

Later, I learned that they had organized a competition within their company - interest rate forecasts.

That's true. It's not simple. Is it a linear regression model?Not the same as Hello world-level home price forecasting models in artificial intelligence.I'll give him an explanation, Barbara.

"What, do you think I understand a little bit?But I still don't know what to do ~"said the financial mob.

--Here's the dividing line--

Import paddlepaddle And data processing packages

#Load [propeller] (Https://www.oschina.net/action/visit/ad?Id=1185Propellers, Numpy, and Related Class Libraries import [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle") import [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle").fluid as fluid import [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle").fluid.dygraph as dygraph from [paddle](https://www.oschina.net/action/visit/ad?id=1185 "paddle").fluid.dygraph import Linear import numpy as np import os import random

data processing

The code for data processing does not depend on the framework and is the same as the code for building a house price prediction task using Python, which is not covered here.

def load_data(): # Import data from a file # datafile = './housing.data' datafile = './national debt2.txt' data = [] # with open(datafile, "r", encoding='utf-8') as f: # data = f.read() #Remove line breaks for each element in the list # data.append(line + "\n") # data = np.fromfile(datafile, sep='\t') data = np.loadtxt(datafile, delimiter='\t', encoding='gbk', dtype=np.float64) print(data) print(">>>>>>") # Each piece of data includes 14 items, of which the first 13 items are the influencing factors and the 14th item is the corresponding median house price # feature_names = [ 'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', \ # 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV' ] feature_names = [ 'X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7','Y' ] feature_num = len(feature_names) print(data.shape[0]) # print("size = " + str(len(data)) # Reshape the original data to [N, 14] data = data.reshape([data.shape[0] , feature_num]) # Split the original dataset into training and test sets # Here, 80% of the data is used for training and 20% for testing. # Test set and training set must be non-intersecting ratio = 0.8 offset = int(data.shape[0] * ratio) training_data = data[:offset] # print(training_data) # Calculate maximum, minimum, mean of train dataset maximums, minimums, avgs = training_data.max(axis=0), training_data.min(axis=0), \ training_data.sum(axis=0) / training_data.shape[0] # Record data normalization parameters and normalize data in predictions global max_values global min_values global avg_values max_values = maximums min_values = minimums avg_values = avgs # Normalize data for i in range(feature_num): #print(maximums[i], minimums[i], avgs[i]) data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i]) # Proportion of training and test sets #ratio = 0.8 #offset = int(data.shape[0] * ratio) training_data = data[:offset] test_data = data[offset:] return training_data, test_data training_data,test_data = load_data() print(training_data) print(test_data)

model design

The essence of model definition is to define the network structure of linear regression. Propeller It is recommended that the model network be defined by creating Python classes, that is, by defining the init and forward functions.The forward function is a function that the framework specifies to implement forward computing logic, and the program automatically executes the forward method when calling a model instance.The network layer used in the forward function needs to be declared in the init function.

The implementation process consists of two steps:

  1. Define the init function: Declare the implementation function for each layer of the network in the class's initialization function.In a housing price forecasting model, only one level of full-connection FC, model structure and structure need to be defined. Sections 1-2 The models are consistent.
  2. Define forward function: Construct a neural network structure to implement the forward calculation process, and return the prediction results. In this task, the result of housing price prediction is returned.

Explain:

Name_When scope variables are used to debug a model, track variables from multiple models. Ignore them here. Propeller versions 1.7 and later do not force the user to set name_scope.

class Regressor(fluid.dygraph.Layer): def __init__(self, name_scope): super(Regressor, self).__init__(name_scope) name_scope = self.full_name() # Define a layer of full connection, output dimension is 1, activation function is None, that is, no activation function is used self.fc = Linear(input_dim=7, output_dim=1, act=None) # Forward calculation function of network def forward(self, inputs): x = self.fc(inputs) return x

Training Configuration

  1. The grard function specifies the machine resources for running the training, indicating that the programs in the with scope are executed on the local CPU resources.Dygraph.guardIndicates that a program within the with scope will execute in the mode of a propeller dynamic diagram (real-time execution).
  2. Declare a defined Regressor instance of the regression model and set the state of the model to training.
  3. Use load_The data function loads training and test data.
  4. Setting the optimization algorithm and learning rate, the optimization algorithm uses random gradient descent SGD The learning rate is set to 0.01.

The training configuration code is as follows:

# Define the working environment of the propeller dynamic diagram with fluid.dygraph.guard(): # Declare a well-defined linear regression model model = Regressor("Regressor") # Open model training mode model.train() # Loading data training_data, test_data = load_data() # Defines an optimization algorithm, where a random gradient descent-SGD is used # Learning rate set to 0.01 opt = fluid.optimizer.SGD(learning_rate=0.01, parameter_list=model.parameters())

Explain:

  1. By default, this case runs on the reader's notebook, so the machine resources for model training are CPU s.
  2. The model instance has two states: training state (.train()) and prediction state (.eval()).Both forward calculation and backward propagation gradient are performed during training, while only forward calculation is required for prediction.There are two reasons to specify a running state for a model:

(1) Some advanced operators, such as Drop out and Batch Normalization, which are detailed in the section on computer vision, perform logically differently in the two states.

(2) Considering performance and storage space, memory is saved and performance is better when predicting state.

  1. In the code above, you can see that declaring the model, defining the optimizer, and so on, were all created with fluid.dygraph.guard() In context, can be understood as withFluid.dygraph.guard() Create a working environment for propeller dynamic diagram, in which model declaration, data conversion and model training are completed.

In the case of Python-based implementation of a neural network model, we have written a lot of code for achieving gradient descent, which is greatly simplified by using the propeller framework to implement the optimizer setup by simply defining SDG.

with dygraph.guard(fluid.CPUPlace()): EPOCH_NUM = 10 # Set number of outer loops BATCH_SIZE = 10 # Set batch size # Define Outer Loop for epoch_id in range(EPOCH_NUM): # Randomize the order of training data before each iteration starts np.random.shuffle(training_data) # Split the training data, each batch contains 10 pieces of data mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)] # Define inner loop for iter_id, mini_batch in enumerate(mini_batches): x = np.array(mini_batch[:, :-1]).astype('float32') # Get current batch training data y = np.array(mini_batch[:, -1:]).astype('float32') # Get the current batch training label (real house price) # Converting numpy data to variable form of propeller dynamic diagram house_features = dygraph.to_variable(x) prices = dygraph.to_variable(y) # Forward calculation predicts = model(house_features) # Calculate loss loss = fluid.layers.square_error_cost(predicts, label=prices) avg_loss = fluid.layers.mean(loss) if iter_id%20==0: print("epoch: {}, iter: {}, loss is: {}".format(epoch_id, iter_id, avg_loss.numpy())) # Reverse Propagation avg_loss.backward() # Minimize loss, update parameters opt.minimize(avg_loss) # Clear Gradient model.clear_gradients() # Save Model fluid.save_dygraph(model.state_dict(), 'LR_model')

Save and test the model

Save Model

Current parameter data for the modelModel.state_Dict() is saved to a file (by specifying the saved file name LR_with parameters)Model) for program calls to predict or verify, as shown below.

# Define [Propeller] (https://www.oschina.net/action/visit/ad?id=1185"Propeller") Dynamic Diagram Working Environment with fluid.dygraph.guard(): # Save model parameters, file name LR_model fluid.save_dygraph(model.state_dict(), 'national_debt') print("The model was saved successfully, and the model parameters were saved in LR_model in")

Model saved successfully, model parameters saved in LR_In model

# Select dozens of records to train def load_one_example(data_dir): f = open(data_dir, 'r') datas = f.readlines() # print(datas) # Select 10th Last Data for Test tmp = datas[-10] tmp = tmp.strip().split() one_data = [float(v) for v in tmp] # Normalize data for i in range(len(one_data)-1): one_data[i] = (one_data[i] - avg_values[i]) / (max_values[i] - min_values[i]) data = np.reshape(np.array(one_data[:-1]), [1, -1]).astype(np.float32) label = one_data[-1] return data, label # Load Test Set def load_test_data(data_dir): # f = open(data_dir, 'r') # datas = f.readlines() one_data = np.loadtxt(data_dir, delimiter='\t', encoding='gbk', dtype=np.double) print(one_data) maximums, minimums, avgs = one_data.max(axis=0), one_data.min(axis=0), \ one_data.sum(axis=0) / one_data.shape[0] # Record data normalization parameters and normalize data in predictions global max_values global min_values global avg_values max_values = maximums min_values = minimums avg_values = avgs # Normalize data for i in range(7): #print(maximums[i], minimums[i], avgs[i]) one_data[:, i] = (one_data[:, i] - avgs[i]) / (maximums[i] - minimums[i]) # Select 10th Last Data for Test # tmp = datas # # tmp = tmp.strip() # one_data = [float(v) for v in tmp] # Normalize data # for i in range(len(one_data)-1): # one_data[i] = (one_data[i] - avg_values[i]) / (max_values[i] - min_values[i]) # data = np.reshape(np.array(one_data[:-1]), [1, -1]).astype(np.float32) return one_data with dygraph.guard(): # The file address where the parameter is the saved model parameter model_dict, _ = fluid.load_dygraph('national_debt') print(model_dict) model.load_dict(model_dict) model.eval() # Parameter is the file address of the dataset test_data, label = load_one_example('./national debt2.txt') # Convert Data to variable Format for Dynamic Graphics test_data = dygraph.to_variable(test_data) results = model(test_data) print(test_data) # Denormalize the results results = results * (max_values[-1] - min_values[-1]) + avg_values[-1] print("Inference result is {}, the corresponding label is {}".format(results.numpy(), label)) # print("Inference result is {}".format(results.numpy()))

Model Output

{'fc.weight': array([[ 0.26267445], [ 0.3111655 ], [-0.07909104], [ 0.14917243], [-0.7034063 ], [ 0.6225266 ], [-0.56594455]], dtype=float32), 'fc.bias': array([0.01486984], dtype=float32)} name generated_var_0, dtype: VarType.FP32 shape: [1, 7] lod: {} dim: 1, 7 layout: NCHW dtype: float data: [-2.80606 -1.25 4.26667 0.671242 0.688889 0.733556 0.251534] Inference result is [[21.144272]], the corresponding label is 2.1907

Final model:

Y = 0.26267445 * X1 + 0.3111655 * X2 + -0.07909104 * X3 + 0.14917243] * X4 + -0.7034063 * X5 + 0.6225266 * X6 + -0.56594455 * X7

2 June 2020, 04:09 | Views: 6488

Add new comment

For adding a comment, please log in
or create account

0 comments