# Everyone knows programming - the financial mob asks me for my rate forecast

It's like this.

In the middle of the night, I got a question from a big stock broker about programming.

Bogor, have you slept? Inner OS: As a financial mob, how come I ask questions about programming so late?

Later, I learned that they had organized a competition within their company - interest rate forecasts.

That's true. It's not simple. Is it a linear regression model?Not the same as Hello world-level home price forecasting models in artificial intelligence.I'll give him an explanation, Barbara.

"What, do you think I understand a little bit?But I still don't know what to do ~"said the financial mob.

--Here's the dividing line--

``````#Load [propeller] (Https://www.oschina.net/action/visit/ad?Id=1185Propellers, Numpy, and Related Class Libraries
import numpy as np
import os
import random
``````

## data processing

The code for data processing does not depend on the framework and is the same as the code for building a house price prediction task using Python, which is not covered here.

``````def load_data():
# Import data from a file
#     datafile = './housing.data'
datafile = './national debt2.txt'
data = []
#     with open(datafile, "r", encoding='utf-8') as f:
#         data = f.read()  #Remove line breaks for each element in the list
#         data.append(line + "\n")
#     data = np.fromfile(datafile, sep='\t')
data = np.loadtxt(datafile, delimiter='\t', encoding='gbk', dtype=np.float64)
print(data)
print(">>>>>>")
# Each piece of data includes 14 items, of which the first 13 items are the influencing factors and the 14th item is the corresponding median house price
#     feature_names = [ 'CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', \
#                       'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV' ]

feature_names = [ 'X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7','Y' ]
feature_num = len(feature_names)
print(data.shape)
#     print("size = " + str(len(data))
# Reshape the original data to [N, 14]
data = data.reshape([data.shape , feature_num])

# Split the original dataset into training and test sets
# Here, 80% of the data is used for training and 20% for testing.
# Test set and training set must be non-intersecting
ratio = 0.8
offset = int(data.shape * ratio)
training_data = data[:offset]
#     print(training_data)
# Calculate maximum, minimum, mean of train dataset
maximums, minimums, avgs = training_data.max(axis=0), training_data.min(axis=0), \
training_data.sum(axis=0) / training_data.shape

# Record data normalization parameters and normalize data in predictions
global max_values
global min_values
global avg_values
max_values = maximums
min_values = minimums
avg_values = avgs

# Normalize data
for i in range(feature_num):
#print(maximums[i], minimums[i], avgs[i])
data[:, i] = (data[:, i] - avgs[i]) / (maximums[i] - minimums[i])

# Proportion of training and test sets
#ratio = 0.8
#offset = int(data.shape * ratio)
training_data = data[:offset]
test_data = data[offset:]
return training_data, test_data

print(training_data)
print(test_data)
``````

## model design

The essence of model definition is to define the network structure of linear regression. Propeller It is recommended that the model network be defined by creating Python classes, that is, by defining the init and forward functions.The forward function is a function that the framework specifies to implement forward computing logic, and the program automatically executes the forward method when calling a model instance.The network layer used in the forward function needs to be declared in the init function.

The implementation process consists of two steps:

1. Define the init function: Declare the implementation function for each layer of the network in the class's initialization function.In a housing price forecasting model, only one level of full-connection FC, model structure and structure need to be defined. Sections 1-2 The models are consistent.
2. Define forward function: Construct a neural network structure to implement the forward calculation process, and return the prediction results. In this task, the result of housing price prediction is returned.

Explain:

Name_When scope variables are used to debug a model, track variables from multiple models. Ignore them here. Propeller versions 1.7 and later do not force the user to set name_scope.

``````class Regressor(fluid.dygraph.Layer):
def __init__(self, name_scope):
super(Regressor, self).__init__(name_scope)
name_scope = self.full_name()
# Define a layer of full connection, output dimension is 1, activation function is None, that is, no activation function is used
self.fc = Linear(input_dim=7, output_dim=1, act=None)

# Forward calculation function of network
def forward(self, inputs):
x = self.fc(inputs)
return x
``````

## Training Configuration

1. The grard function specifies the machine resources for running the training, indicating that the programs in the with scope are executed on the local CPU resources.Dygraph.guardIndicates that a program within the with scope will execute in the mode of a propeller dynamic diagram (real-time execution).
2. Declare a defined Regressor instance of the regression model and set the state of the model to training.
4. Setting the optimization algorithm and learning rate, the optimization algorithm uses random gradient descent SGD The learning rate is set to 0.01.

The training configuration code is as follows:

``````# Define the working environment of the propeller dynamic diagram
with fluid.dygraph.guard():
# Declare a well-defined linear regression model
model = Regressor("Regressor")
# Open model training mode
model.train()
# Defines an optimization algorithm, where a random gradient descent-SGD is used
# Learning rate set to 0.01
opt = fluid.optimizer.SGD(learning_rate=0.01, parameter_list=model.parameters())
``````

Explain:

1. By default, this case runs on the reader's notebook, so the machine resources for model training are CPU s.
2. The model instance has two states: training state (.train()) and prediction state (.eval()).Both forward calculation and backward propagation gradient are performed during training, while only forward calculation is required for prediction.There are two reasons to specify a running state for a model:

(1) Some advanced operators, such as Drop out and Batch Normalization, which are detailed in the section on computer vision, perform logically differently in the two states.

(2) Considering performance and storage space, memory is saved and performance is better when predicting state.

1. In the code above, you can see that declaring the model, defining the optimizer, and so on, were all created with fluid.dygraph.guard() In context, can be understood as withFluid.dygraph.guard() Create a working environment for propeller dynamic diagram, in which model declaration, data conversion and model training are completed.

In the case of Python-based implementation of a neural network model, we have written a lot of code for achieving gradient descent, which is greatly simplified by using the propeller framework to implement the optimizer setup by simply defining SDG.

``````with dygraph.guard(fluid.CPUPlace()):
EPOCH_NUM = 10   # Set number of outer loops
BATCH_SIZE = 10  # Set batch size

# Define Outer Loop
for epoch_id in range(EPOCH_NUM):
# Randomize the order of training data before each iteration starts
np.random.shuffle(training_data)
# Split the training data, each batch contains 10 pieces of data
mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)]
# Define inner loop
for iter_id, mini_batch in enumerate(mini_batches):
x = np.array(mini_batch[:, :-1]).astype('float32') # Get current batch training data
y = np.array(mini_batch[:, -1:]).astype('float32') # Get the current batch training label (real house price)
# Converting numpy data to variable form of propeller dynamic diagram
house_features = dygraph.to_variable(x)
prices = dygraph.to_variable(y)

# Forward calculation
predicts = model(house_features)

# Calculate loss
loss = fluid.layers.square_error_cost(predicts, label=prices)
avg_loss = fluid.layers.mean(loss)
if iter_id%20==0:
print("epoch: {}, iter: {}, loss is: {}".format(epoch_id, iter_id, avg_loss.numpy()))

# Reverse Propagation
avg_loss.backward()
# Minimize loss, update parameters
opt.minimize(avg_loss)
# Save Model
fluid.save_dygraph(model.state_dict(), 'LR_model')
``````

## Save and test the model

### Save Model

Current parameter data for the modelModel.state_Dict() is saved to a file (by specifying the saved file name LR_with parameters)Model) for program calls to predict or verify, as shown below.

``````# Define [Propeller] (https://www.oschina.net/action/visit/ad?id=1185"Propeller") Dynamic Diagram Working Environment
with fluid.dygraph.guard():
# Save model parameters, file name LR_model
fluid.save_dygraph(model.state_dict(), 'national_debt')
print("The model was saved successfully, and the model parameters were saved in LR_model in")
``````

Model saved successfully, model parameters saved in LR_In model

``````# Select dozens of records to train
f = open(data_dir, 'r')
#     print(datas)
# Select 10th Last Data for Test
tmp = datas[-10]
tmp = tmp.strip().split()
one_data = [float(v) for v in tmp]

# Normalize data
for i in range(len(one_data)-1):
one_data[i] = (one_data[i] - avg_values[i]) / (max_values[i] - min_values[i])

data = np.reshape(np.array(one_data[:-1]), [1, -1]).astype(np.float32)
label = one_data[-1]
return data, label

#     f = open(data_dir, 'r')
one_data = np.loadtxt(data_dir, delimiter='\t', encoding='gbk', dtype=np.double)
print(one_data)

maximums, minimums, avgs = one_data.max(axis=0), one_data.min(axis=0), \
one_data.sum(axis=0) / one_data.shape

# Record data normalization parameters and normalize data in predictions
global max_values
global min_values
global avg_values
max_values = maximums
min_values = minimums
avg_values = avgs

# Normalize data
for i in range(7):
#print(maximums[i], minimums[i], avgs[i])
one_data[:, i] = (one_data[:, i] - avgs[i]) / (maximums[i] - minimums[i])

# Select 10th Last Data for Test
#     tmp = datas
# #     tmp = tmp.strip()
#     one_data = [float(v) for v in tmp]

# Normalize data
#     for i in range(len(one_data)-1):
#         one_data[i] = (one_data[i] - avg_values[i]) / (max_values[i] - min_values[i])

#     data = np.reshape(np.array(one_data[:-1]), [1, -1]).astype(np.float32)
return one_data

with dygraph.guard():
# The file address where the parameter is the saved model parameter
print(model_dict)
model.eval()

# Parameter is the file address of the dataset
# Convert Data to variable Format for Dynamic Graphics
test_data = dygraph.to_variable(test_data)
results = model(test_data)
print(test_data)
# Denormalize the results
results = results * (max_values[-1] - min_values[-1]) + avg_values[-1]
print("Inference result is {}, the corresponding label is {}".format(results.numpy(), label))
#     print("Inference result is {}".format(results.numpy()))
``````

Model Output

`````` {'fc.weight': array([[ 0.26267445],
[ 0.3111655 ],
[-0.07909104],
[ 0.14917243],
[-0.7034063 ],
[ 0.6225266 ],
[-0.56594455]], dtype=float32), 'fc.bias': array([0.01486984], dtype=float32)}
name generated_var_0, dtype: VarType.FP32 shape: [1, 7] 	lod: {}
dim: 1, 7
layout: NCHW
dtype: float
data: [-2.80606 -1.25 4.26667 0.671242 0.688889 0.733556 0.251534]
Inference result is [[21.144272]], the corresponding label is 2.1907
``````

### Final model:

``````Y =  0.26267445 * X1 +  0.3111655 * X2 + -0.07909104 * X3 + 0.14917243] * X4 + -0.7034063 * X5 + 0.6225266 * X6 + -0.56594455 * X7
``````

Posted on Tue, 02 Jun 2020 04:09:52 -0400 by Anco