step by step

This article refers to the original- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now, but the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than the Open-Source Library of facebook.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, the computation of softmax layer is accelerated by constructing a Huffman coding tree, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it is a very professional packaging place. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End- http://bjbsair.com/2020-03-25/tech-info/6300/ **Write in front
**

Today's tutorial is bag of tricks for efficient text classification based on FAIR [1]. This is what we often call fastText.

The most gratifying part of this paper is the fasttext toolkit. The code quality of this toolkit is very high. The result of this paper can be restored with one click. At present, it has been packaged in a very professional way. This is fasttext official website and its github code base, as well as providing python interface, which can be installed directly through pip. Such a model with high accuracy and fast speed is definitely a real weapon.

In order to better understand the principle of fasttext, we will repeat it directly now. However, the code only realizes the simplest word vector averaging based on words, and does not use the word vector of b-gram, so the text classification effect of our own implementation will be lower than that of facebook's open source library.

Overview of papers

We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.

First of all, I quote a passage in the paper to see how the authors evaluate the performance of the fasttext model.

The model of this paper is very simple. Students who have known word2vec before can find that it is very similar to CBOW's model framework.

Corresponding to the above model, for example, input is a sentence, to the word of the sentence or n-gram. Each one corresponds to a vector, and then the average of these vectors is used to get the text vector, and then the average vector is used to get the prediction label. When there are not many categories, it is the simplest softmax; when the number of tags is huge, it is necessary to use "hierarchical softmax".

The model is really simple, and there is nothing to say. Here are two tricks in the paper:

  • 「hierarchical softmax」
    When the number of categories is large, a Huffman coding tree is constructed to speed up the calculation of softmax layer, which is the same as the previous trip in word2vec
  • 「N-gram features」
    If only unigram is used, the word order information will be lost, so the storage of N-gram will be reduced by adding N-gram features to supplement hashing

Looking at the experimental part of the paper, such a simple model can achieve such good results!

However, it is also pointed out that the data sets selected in this paper are not very sensitive to sentence word order, so it is not surprising to get the test results in this paper.

code implementation

After reading the castration code, we remember to look at the source code Oh ~ as in the previous series, define a fastTextModel class, and then write the network framework, input and output placeholder, loss, training steps, etc.

class fastTextModel(BaseModel):  
    """  
    A simple implementation of fasttext for text classification  
    """  
    def __init__(self, sequence_length, num_classes, vocab_size,  
                 embedding_size, learning_rate, decay_steps, decay_rate,  
                 l2_reg_lambda, is_training=True,  
                 initializer=tf.random_normal_initializer(stddev=0.1)):  
        self.vocab_size = vocab_size  
        self.embedding_size = embedding_size  
        self.num_classes = num_classes  
        self.sequence_length = sequence_length  
        self.learning_rate = learning_rate  
        self.decay_steps = decay_steps  
        self.decay_rate = decay_rate  
        self.is_training = is_training  
        self.l2_reg_lambda = l2_reg_lambda  
        self.initializer = initializer  
        self.input_x = tf.placeholder(tf.int32, [None, self.sequence_length], name='input_x')  
        self.input_y = tf.placeholder(tf.int32, [None, self.num_classes], name='input_y')  
        self.global_step = tf.Variable(0, trainable=False, name='global_step')  
        self.instantiate_weight()  
        self.logits = self.inference()  
        self.loss_val = self.loss()  
        self.train_op = self.train()  
        self.predictions = tf.argmax(self.logits, axis=1, name='predictions')  
        correct_prediction = tf.equal(self.predictions, tf.argmax(self.input_y, 1))  
        self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'), name='accuracy')  
    def instantiate_weight(self):  
        with tf.name_scope('weights'):  
            self.Embedding = tf.get_variable('Embedding', shape=[self.vocab_size, self.embedding_size],  
                                             initializer=self.initializer)  
            self.W_projection = tf.get_variable('W_projection', shape=[self.embedding_size, self.num_classes],  
                                                initializer=self.initializer)  
            self.b_projection = tf.get_variable('b_projection', shape=[self.num_classes])  
    def inference(self):  
        """  
        1. word embedding  
        2. average embedding  
        3. linear classifier  
        :return:  
        """  
        # embedding layer  
        with tf.name_scope('embedding'):  
            words_embedding = tf.nn.embedding_lookup(self.Embedding, self.input_x)  
            self.average_embedding = tf.reduce_mean(words_embedding, axis=1)  
        logits = tf.matmul(self.average_embedding, self.W_projection) +self.b_projection  
        return logits  
    def loss(self):  
        # loss  
        with tf.name_scope('loss'):  
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.input_y, logits=self.logits)  
            data_loss = tf.reduce_mean(losses)  
            l2_loss = tf.add_n([tf.nn.l2_loss(cand_var) for cand_var in tf.trainable_variables()  
                                if 'bias' not in cand_var.name]) * self.l2_reg_lambda  
            data_loss += l2_loss * self.l2_reg_lambda  
            return data_loss  
    def train(self):  
        with tf.name_scope('train'):  
            learning_rate = tf.train.exponential_decay(self.learning_rate, self.global_step,  
                                                       self.decay_steps, self.decay_rate,  
                                                       staircase=True)  
            train_op = tf.contrib.layers.optimize_loss(self.loss_val, global_step=self.global_step,  
                                                      learning_rate=learning_rate, optimizer='Adam')  
        return train_op  

def prepocess():  
    """  
    For load and process data  
    :return:  
    """  
    print("Loading data...")  
    x_text, y = data_process.load_data_and_labels(FLAGS.positive_data_file, FLAGS.negative_data_file)  
    # bulid vocabulary  
    max_document_length = max(len(x.split(' ')) for x in x_text)  
    vocab_processor = learn.preprocessing.VocabularyProcessor(max_document_length)  
    x = np.array(list(vocab_processor.fit_transform(x_text)))  
    # shuffle  
    np.random.seed(10)  
    shuffle_indices = np.random.permutation(np.arange(len(y)))  
    x_shuffled = x[shuffle_indices]  
    y_shuffled = y[shuffle_indices]  
    # split train/test dataset  
    dev_sample_index = -1 * int(FLAGS.dev_sample_percentage * float(len(y)))  
    x_train, x_dev = x_shuffled[:dev_sample_index], x_shuffled[dev_sample_index:]  
    y_train, y_dev = y_shuffled[:dev_sample_index], y_shuffled[dev_sample_index:]  
    del x, y, x_shuffled, y_shuffled  
    print('Vocabulary Size: {:d}'.format(len(vocab_processor.vocabulary_)))  
    print('Train/Dev split: {:d}/{:d}'.format(len(y_train), len(y_dev)))  
    return x_train, y_train, vocab_processor, x_dev, y_dev  
def train(x_train, y_train, vocab_processor, x_dev, y_dev):  
    with tf.Graph().as_default():  
        session_conf = tf.ConfigProto(  
            # allows TensorFlow to fall back on a device with a certain operation implemented  
            allow_soft_placement= FLAGS.allow_soft_placement,  
            # allows TensorFlow log on which devices (CPU or GPU) it places operations  
            log_device_placement=FLAGS.log_device_placement  
        )  
        sess = tf.Session(config=session_conf)  
        with sess.as_default():  
            # initialize cnn  
            fasttext = fastTextModel(sequence_length=x_train.shape[1],  
                      num_classes=y_train.shape[1],  
                      vocab_size=len(vocab_processor.vocabulary_),  
                      embedding_size=FLAGS.embedding_size,  
                      l2_reg_lambda=FLAGS.l2_reg_lambda,  
                      is_training=True,  
                      learning_rate=FLAGS.learning_rate,  
                      decay_steps=FLAGS.decay_steps,  
                      decay_rate=FLAGS.decay_rate  
                    )  
            # output dir for models and summaries  
            timestamp = str(time.time())  
            out_dir = os.path.abspath(os.path.join(os.path.curdir, 'run', timestamp))  
            if not os.path.exists(out_dir):  
                os.makedirs(out_dir)  
            print('Writing to {} \n'.format(out_dir))  
            # checkpoint dir. checkpointing – saving the parameters of your model to restore them later on.  
            checkpoint_dir = os.path.abspath(os.path.join(out_dir, FLAGS.ckpt_dir))  
            checkpoint_prefix = os.path.join(checkpoint_dir, 'model')  
            if not os.path.exists(checkpoint_dir):  
                os.makedirs(checkpoint_dir)  
            saver = tf.train.Saver(tf.global_variables(), max_to_keep=FLAGS.num_checkpoints)  
            # Write vocabulary  
            vocab_processor.save(os.path.join(out_dir, 'vocab'))  
            # Initialize all  
            sess.run(tf.global_variables_initializer())  
            def train_step(x_batch, y_batch):  
                """  
                A single training step  
                :param x_batch:  
                :param y_batch:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                _, step, loss, accuracy = sess.run(  
                    [fasttext.train_op, fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            def dev_step(x_batch, y_batch):  
                """  
                Evaluate model on a dev set  
                Disable dropout  
                :param x_batch:  
                :param y_batch:  
                :param writer:  
                :return:  
                """  
                feed_dict = {  
                    fasttext.input_x: x_batch,  
                    fasttext.input_y: y_batch,  
                }  
                step, loss, accuracy = sess.run(  
                    [fasttext.global_step, fasttext.loss_val, fasttext.accuracy],  
                    feed_dict=feed_dict  
                )  
                time_str = datetime.datetime.now().isoformat()  
                print("dev results:{}: step {}, loss {:g}, acc {:g}".format(time_str, step, loss, accuracy))  
            # generate batches  
            batches = data_process.batch_iter(list(zip(x_train, y_train)), FLAGS.batch_size, FLAGS.num_epochs)  
            # training loop  
            for batch in batches:  
                x_batch, y_batch = zip(*batch)  
                train_step(x_batch, y_batch)  
                current_step = tf.train.global_step(sess, fasttext.global_step)  
                if current_step % FLAGS.validate_every == 0:  
                    print('\n Evaluation:')  
                    dev_step(x_dev, y_dev)  
                    print('')  
            path = saver.save(sess, checkpoint_prefix, global_step=current_step)  
            print('Save model checkpoint to {} \n'.format(path))  
def main(argv=None):  
    x_train, y_train, vocab_processor, x_dev, y_dev = prepocess()  
    train(x_train, y_train, vocab_processor, x_dev, y_dev)  
if __name__ == '__main__':  
    tf.app.run()

References

[1] Bag of Tricks for Efficient Text Classification: https://arxiv.org/abs/1607.01759

The End

Tags: Programming less github Python pip

Posted on Wed, 25 Mar 2020 21:11:14 -0400 by littlegreenman