Tf.estimatorGetting Started

Rimeng Society

AI: Keras PyTorch MXNet TensorFlow PaddlePaddle Deep Learning Actual Warfare (Update from time to time)

6.6Tf.estimatorGetting Started

Learning Objectives

  • target
    • knowTf.estimatorUse process
    • Learn what premade estimator is
  • application
    • applicationTf.estimatorComplete the second classification of U.S. census data

6.6.1Tf.estimatorintroduce

In TensorFlow tf.estimator The API encapsulates the basic machine learning model.Estimator is the most scalable and production-oriented TensorFlow model type.

This document describes the Estimator - A high-level TensorFlow API that greatly simplifies machine learning programming.Estimator encapsulates the following operations:

  • train
  • Assessment
  • Forecast
  • Export for use

The TensorFlow program that relies on a pre-created Estimator typically consists of the following four steps:

  1. Write one or more dataset import functions.For example, you can create a function to import the training set and another function to import the test set.Each dataset import function must return two objects:

    • A dictionary in which the key is the feature name and the value is the tensor (or SparseTensor) containing the corresponding feature data
    • A tensor containing one or more labels

    For example, the following code shows the basic framework of the input function:

    def input_fn(dataset):
       ...  # manipulate dataset, extracting the feature dict and the label
       return feature_dict, label
    

    (For complete details, see Import data. )

  2. Define the feature column.Each tf.feature_column The feature name, feature type, and any input preprocessing operations are identified.For example, the following code snippet creates three feature columns that store integer or floating-point data.The first two feature columns only identify the name and type of the feature.The third feature column also specifies a lambda that the program will call to adjust the original data:

    # Define three numeric feature columns.
    population = tf.feature_column.numeric_column('population')
    crime_rate = tf.feature_column.numeric_column('crime_rate')
    median_education = tf.feature_column.numeric_column('median_education',
                        normalizer_fn=lambda x: x - global_education_mean)
    
  3. Instantiate the associated pre-created Estimator.For example, here is an example code that instantiates a pre-created Estimator named LinearClassifier:

    # Instantiate an estimator, passing the feature columns.
    estimator = tf.estimator.LinearClassifier(
        feature_columns=[population, crime_rate, median_education],
        )
    
  4. Invoke training, evaluation, or reasoning methods.For example, all Estimator s provide a train method for training models.

    # my_training_set is the function created in Step 1estimator.train(input_fn=my_training_set, steps=2000)

6.6.1.1 Premade Estimators

pre-made Estimators are base classesTf.estimator.EstimatorSubclass, and the customized estimators areTf.estimator.EstimatorExamples:

pre-made Estimators are already done.Sometimes, however, you need more control over the behavior of an Estimator.Now you need to customize Estimators.You can create a customized version of Estimator to do anything.If you want hidden layers to connect in some unusual way, you can write a custom Estimator.If you want to calculate a unique metric for your model, you can write a custom Estimator.Basically, if you want to optimize for a particular problem, you can write a custom Estimator.

6.6.2 Cases: Using U.S. Census Data Classification

U.S. Census revenue data sets for 1994 and 1995.The solution is to solve the binary classification problem with the target tag: if the income exceeds $50,000, the value will be 1; otherwise, the value will be 0.

  • 'train': 32561
  • 'validation': 16281
  age workclass fnlwgt education education_num marital_status occupation relationship race gender capital_gain capital_loss hours_per_week native_country income_bracket
0 39 State-gov 77516 Bachelors 13 Never-married Adm-clerical Not-in-family White Male 2174 0 40 United-States <=50K
1 50 Self-emp-not-inc 83311 Bachelors 13 Married-civ-spouse Exec-managerial Husband White Male 0 0 13 United-States <=50K
2 38 Private 215646 HS-grad 9 Divorced Handlers-cleaners Not-in-family White Male 0 0 40 United-States <=50K
3 53 Private 234721 11th 7 Married-civ-spouse Handlers-cleaners Husband Black Male 0 0 40 United-States <=50K
4 28 Private 338409 Bachelors 13 Married-civ-spouse Prof-specialty Wife Black Female 0 0 40 Cuba <=50K

These columns are divided into two categories - category columns and continuous columns:

  • A column is called a category column if its value can only be one of the categories in a finite set.For example, marital status (wife, husband, unmarried, etc.) or educational level (high school, university, etc.) fall into the category.
  • A column is called a continuous column if its value can be any number in a continuous range.For example, a person's capital gains, such as $14084, are in a continuous column.

6.6.2.1 Case Realization

  • Objective: To make a two-class prediction of census income data
  • Steps:
    • 1. Read US Census Revenue Data
    • 2. Model Selection Features and Feature Engineering Processing
    • 3. Model Training and Evaluation

1. Read US Census Revenue Data

Tf.dataThe API makes it easy to process large amounts of data in different data formats and complex transformations.

  • Read csv file interface:tf.data.TextLineDataset()
    • Path + File Name List
    • Return: Dataset structure

Local data file,Adult.dataas well asAdult.test

Read-related settings

_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education_num',
    'marital_status', 'occupation', 'relationship', 'race', 'gender',
    'capital_gain', 'capital_loss', 'hours_per_week', 'native_country',
    'income_bracket'
]

_CSV_COLUMN_DEFAULTS = [[0], [''], [0], [''], [0], [''], [''], [''], [''], [''],
                        [0], [0], [0], [''], ['']]


train_file = "/root/toutiao_project/reco_sys/server/models/data/adult.data"
test_file = "/root/toutiao_project/reco_sys/server/models/data/adult.test"

Enter function code

def input_func(file, epoches, batch_size):
    """
    //Samples in csv format for census data
    :return:
    """
    def deal_with_csv(value):
        data = tf.decode_csv(value, record_defaults=_CSV_COLUMN_DEFAULTS)

        # Building dictionary data for column names and row values
        feature_dict = dict(zip(_CSV_COLUMNS, data))
        labels = feature_dict.pop('income_bracket')
        classes = tf.equal(labels, '>50K')
        return feature_dict, classes

    # 1. Read US Census Revenue Data
    # tensor's iteration, one row of sample data
    # Name to be drawn up
    # 39,State-gov,77516,Bachelors,13,,Adm-clerical
    dataset = tf.data.TextLineDataset(file)
    dataset = dataset.map(deal_with_csv)
    # dataset, containing feature_dict, classes, iterator
    dataset = dataset.repeat(epoches)
    dataset = dataset.batch(batch_size)
    return dataset

2. Model Selection Features and Feature Engineering Processing

Estimator uses a mechanism called a feature column to describe how the model should interpret each original input feature.Estimator requires a numeric input vector, and the feature column describes how the model should transform each feature.

Selecting and creating the right set of feature columns is the key to learning effective models.A feature column can be either one of the original inputs (the base feature column) in the original feature dict or any new column (the derived feature column) created by converting one or more base columns.

A feature column is an abstract concept that represents any original or derived variable that can be used to predict a target tag.

  • Numeric Column

Simplest feature_column is numeric_column.It means that the feature is a numeric value and should be entered directly into the model.For example:

# Numeric characteristics
age = tf.feature_column.numeric_column('age')
education_num = tf.feature_column.numeric_column('education_num')
capital_gain = tf.feature_column.numeric_column('capital_gain')
capital_loss = tf.feature_column.numeric_column('capital_loss')
hours_per_week = tf.feature_column.numeric_column('hours_per_week')

numeric_columns = [age, education_num, capital_gain, capital_loss, hours_per_week]
  • Category Column

To define a feature column for a class feature, use one of theTf.feature_Column.categorical_The column* function creates a CategoricalColumn.If you know the set of all possible eigenvalues for a column and there are only a few values in the set, use categorical_column_with_vocabulary_list.Each key in the list is assigned an auto-incrementing ID starting at 0.For example, for the relationship column, we could assign an integer ID of 0 to the feature string Husband, 1 to "Not-in-family", and so on.

relationship = tf.feature_column.categorical_column_with_vocabulary_list(
    'relationship',
    ['Husband', 'Not-in-family', 'Wife', 'Own-child', 'Unmarried', 'Other-relative'])

occupation = tf.feature_column.categorical_column_with_hash_bucket(
    'occupation', hash_bucket_size=1000)

education = tf.feature_column.categorical_column_with_vocabulary_list(
    'education', [
        'Bachelors', 'HS-grad', '11th', 'Masters', '9th', 'Some-college',
        'Assoc-acdm', 'Assoc-voc', '7th-8th', 'Doctorate', 'Prof-school',
        '5th-6th', '10th', '1st-4th', 'Preschool', '12th'])

marital_status = tf.feature_column.categorical_column_with_vocabulary_list(
    'marital_status', [
        'Married-civ-spouse', 'Divorced', 'Married-spouse-absent',
        'Never-married', 'Separated', 'Married-AF-spouse', 'Widowed'])

workclass = tf.feature_column.categorical_column_with_vocabulary_list(
    'workclass', [
        'Self-emp-not-inc', 'Private', 'State-gov', 'Federal-gov',
        'Local-gov', '?', 'Self-emp-inc', 'Without-pay', 'Never-worked'])

categorical_columns = [relationship, occupation, education, marital_status, workclass]

4. Model Training and Evaluation

train_entered into trainInpf simply puts the function name in, if you want to put the original input_The parameters in FN are taken out.You can use this methodFunctools.partialMethod

import functools

def add(a, b):
    return a + b

add(4, 2)
6

plus3 = functools.partial(add, 3)
plus5 = functools.partial(add, 5)

plus3(4)
7
plus3(7)
10

plus5(10)
15

The partial method is used in the dataset

import functools

train_inpf = functools.partial(input_fn, train_file, num_epochs=2, shuffle=True, batch_size=64)
test_inpf = functools.partial(input_fn, test_file, num_epochs=1, shuffle=False, batch_size=64)

Tf.estimatorConduct initial training evaluation:

classifier = tf.estimator.LinearClassifier(feature_columns=numeric_columns + categorical_columns)
classifier.train(train_inpf)
result = classifier.evaluate(test_inpf)
# result is a dictionary format containing evaluation indicators
for key, value in sorted(result.items()):
  print('%s: %s' % (key, value))

 

Tags: Lambda Programming

Posted on Fri, 19 Jun 2020 21:28:33 -0400 by Mikkki