[AI Creation Camp] science fiction star crossing ability

What super powers will the 12 constellations have if they cross science fiction dramas? Welcome your exclusive superpower!

  • Now many young people like to watch science fiction dramas, such as the avenger series. There are many heroes and superpowers. These people are our youth and feelings. What superpowers will the 12 constellations have if they cross science fiction dramas?
  • Using the text matching and conversation chat model provided by paddle, combined with wechaty
  • In addition to gaining exclusive superpowers, you can also check today's fortune and chat with a "fortune teller" when you are lonely and bored

Effect display



Video link of station B

https://www.bilibili.com/video/BV1PL4y1v7nf

Github link

https://github.com/27182812/paddle-wechaty-Zodiac

Implementation process of the project

ECS part

  • reference resources https://aistudio.baidu.com/aistudio/projectdetail/1836012

  • The Alibaba cloud ECS I use can also consider other cloud services or server resources accessible from the Internet.

  • Enter the server terminal and enter the following command in the terminal (Note: ensure that the input port is open to the outside world, please fill in your own token for WECHATY_TOKEN)

$ apt update

$ apt install docker.io

$ docker pull wechaty/wechaty:latest

$ export WECHATY_LOG="verbose"

$ export WECHATY_PUPPET="wechaty-puppet-wechat"

$ export WECHATY_PUPPET_SERVER_PORT="8080"

$ export WECHATY_TOKEN="puppet_padlocal_xxxxxx" # Enter your own token here

$ docker run -ti --name wechaty_puppet_service_token_gateway --rm -e WECHATY_LOG -e WECHATY_PUPPET -e WECHATY_TOKEN -e WECHATY_PUPPET_SERVER_PORT -p "$WECHATY_PUPPET_SERVER_PORT:$WECHATY_PUPPET_SERVER_PORT" wechaty/wechaty:latest
  • Enter web address: https://api.chatie.io/v0/hosties/xxxxxx (the following XXXXXX is its own token). If the ip address and port number of the server are returned, the operation is successful

  • After running, a lot of things will be output. Find an address of Online QR Code: and click it. A QR code will appear. Wechat scans the code to log in. Finally, the mobile phone shows that the desktop wechat has logged in.

Environment installation

!pip install -U paddlepaddle -i https://mirror.baidu.com/pypi/simple
!python -m pip install --upgrade paddlenlp -i https://pypi.org/simple
!pip install --upgrade pip
!pip install --upgrade sentencepiece 
!pip install wechaty

Text matching part

  • Text semantic matching is one of the most basic tasks of NLP. In short, it is to judge the semantic similarity of two texts. It has a wide range of application scenarios, such as search engine, intelligent Q & A, knowledge retrieval, information flow recommendation, etc.

  • Why use this function? Because if we judge the user's needs directly based on keyword matching, we may have misunderstandings. For example, if the user inputs "I hate constellations", but the chat robot may still show the user the super power of constellations; If it is directly limited to the strict matching of the keyword "constellation", the user can't achieve the desired function if he accidentally enters one more word or punctuation, which is too unfriendly. Therefore, this project uses text matching technology to judge whether users really need to view the future super capabilities of the constellation.

  • Based on paddelnlp, this project uses Baidu's open source pre training model ERNIE1.0 to build a semantic matching model to judge whether the semantics of the two texts are the same.

  • The key steps of training a model from scratch include data loading, data preprocessing, model building, model training and evaluation. For details, please refer to https://aistudio.baidu.com/aistudio/projectdetail/1972174
    Here, we directly call the trained semantic matching model for application.

  • Download the trained semantic matching model and unzip it

! wget https://paddlenlp.bj.bcebos.com/models/text_matching/pointwise_matching_model.tar
! tar -xvf pointwise_matching_model.tar
  • Specific code part (match.py file)
import numpy as np
import os
import time
import paddle
import paddle.nn.functional as F
from paddlenlp.datasets import load_dataset
import paddlenlp
# For the convenience of subsequent use, we give convert_example gives some default parameters
from functools import partial
from paddlenlp.data import Stack, Pad, Tuple

os.environ["CUDA_VISIBLE_DEVICES"] = "1"

tokenizer = paddlenlp.transformers.ErnieTokenizer.from_pretrained('ernie-1.0')

import paddle.nn as nn

# We build a point wise semantic matching network based on ERNIE1.0 model structure
# Therefore, the pre trained of ERNIE1.0 is defined here_ model
pretrained_model = paddlenlp.transformers.ErnieModel.from_pretrained('ernie-1.0')


class PointwiseMatching(nn.Layer):

    # Prepared here_ In this example, the model will be initialized by ERNIE1.0 pre training model
    def __init__(self, pretrained_model, dropout=None):
        super().__init__()
        self.ptm = pretrained_model
        self.dropout = nn.Dropout(dropout if dropout is not None else 0.1)

        # Semantic matching tasks: similar and dissimilar 2 classification tasks
        self.classifier = nn.Linear(self.ptm.config["hidden_size"], 2)

    def forward(self,
                input_ids,
                token_type_ids=None,
                position_ids=None,
                attention_mask=None):
        # Input here_ IDS is composed of two text tokens
        # token_type_ids represents the type encoding of two pieces of text
        # Returned cls_embedding represents the semantic representation vector obtained after the calculation of the model
        _, cls_embedding = self.ptm(input_ids, token_type_ids, position_ids,
                                    attention_mask)

        cls_embedding = self.dropout(cls_embedding)

        # The semantic representation vector of text pair is used for 2 classification task
        logits = self.classifier(cls_embedding)
        probs = F.softmax(logits)

        return probs

def convert_example(example, tokenizer, max_seq_length=512, is_test=False):
    query, title = example["query"], example["title"]

    encoded_inputs = tokenizer(
        text=query, text_pair=title, max_seq_len=max_seq_length)

    input_ids = encoded_inputs["input_ids"]
    token_type_ids = encoded_inputs["token_type_ids"]

    if not is_test:
        label = np.array([example["label"]], dtype="int64")
        return input_ids, token_type_ids, label
    # In the prediction or evaluation phase, the label field is not returned
    else:
        return input_ids, token_type_ids

def read_text_pair(data_path):
    """Reads data."""
    with open(data_path, 'r', encoding='utf-8') as f:
        for line in f:

            data = line.rstrip().split(" ")
            # print(data)
            # print(len(data))
            if len(data) != 2:
                continue
            yield {'query': data[0], 'title': data[1]}

def predict(model, data_loader):
    batch_probs = []

    # In the prediction phase, the eval mode is turned on, and the dropout and other operations in the model will be turned off
    model.eval()

    with paddle.no_grad():
        for batch_data in data_loader:
            input_ids, token_type_ids = batch_data
            input_ids = paddle.to_tensor(input_ids)
            token_type_ids = paddle.to_tensor(token_type_ids)

            # Obtain the matrix of prediction probability of each sample: [batch_size, 2]
            batch_prob = model(
                input_ids=input_ids, token_type_ids=token_type_ids).numpy()
            # print("111",batch_prob)
            batch_probs.append(batch_prob)
        batch_probs = np.concatenate(batch_probs, axis=0)

        return batch_probs
# Conversion function of prediction data
    # The predict data has no label, so convert_exmaple is_ Set the test parameter to True
trans_func = partial(
    convert_example,
    tokenizer=tokenizer,
    max_seq_length=512,
    is_test=True)

# batch operation of prediction data group
# predict data only returns input_ids and token_type_ids, so only two Pad objects are needed as batchify_fn
batchify_fn = lambda samples, fn=Tuple(
    Pad(axis=0, pad_val=tokenizer.pad_token_id),  # input_ids
    Pad(axis=0, pad_val=tokenizer.pad_token_type_id),  # segment_ids
): [data for data in fn(samples)]

pretrained_model = paddlenlp.transformers.ErnieModel.from_pretrained("ernie-1.0")

model = PointwiseMatching(pretrained_model)

# After decompressing the downloaded model, the storage path is. / pointwise_matching_model/ernie1.0_base_pointwise_matching.pdparams
state_dict = paddle.load("pointwise_matching_model/ernie1.0_base_pointwise_matching.pdparams")
model.set_dict(state_dict)

def start():
    # Load forecast data
    predict_ds = load_dataset(
        read_text_pair, data_path="./predict.txt", lazy=False)
    # for i in predict_ds:
    #     print(i)
    batch_sampler = paddle.io.BatchSampler(predict_ds, batch_size=32, shuffle=False)

    # Generate forecast data_loader
    predict_data_loader = paddle.io.DataLoader(
        dataset=predict_ds.map(trans_func),
        batch_sampler=batch_sampler,
        collate_fn=batchify_fn,
        return_list=True)

    # Execute prediction function
    y_probs = predict(model, predict_data_loader)

    # Obtain the prediction label according to the prediction probability
    y_preds = np.argmax(y_probs, axis=1)
    print(y_preds)
    return y_preds[-1]

    # predict_ds = load_dataset(
    #     read_text_pair, data_path="./predict.txt", lazy=False)
    #
    # for idx, y_pred in enumerate(y_preds):
    #     text_pair = predict_ds[idx]
    #     text_pair["pred_label"] = y_pred
    #     print(text_pair)


if __name__ == '__main__':
    start()

Conversation and chat

  • In recent years, man-machine dialogue system has attracted extensive attention in academia and industry. The open domain dialogue system hopes that the machine can interact with people smoothly and naturally. It can not only chat about daily greetings, but also complete specific functions.

  • With the continuous development of deep learning technology, chat robots become more and more intelligent. We can complete some mechanical Q & a work through robots, and we can also have a dialogue with intelligent robots in our spare time. Their emergence makes life more colorful.

  • Loading this function in this project also hopes that people can have a chat partner when they are lonely and bored. Although sometimes he may not know what to say, he will always be there waiting for you.

  • Specific code part (chat.py file)

from paddlenlp.transformers import GPTChineseTokenizer

# Set the name of the model you want to use
model_name = 'gpt-cpm-small-cn-distill'
tokenizer = GPTChineseTokenizer.from_pretrained(model_name)

import paddle
from paddlenlp.transformers import GPTForPretraining

# One click loading Chinese GPT model
model = GPTForPretraining.from_pretrained(model_name)

def chat(user_input):
    #user_input = "there is a pot of wine among the flowers. Drink alone without blind date. Raise your glass to invite the moon,"
    # Convert text to ids
    input_ids = tokenizer(user_input)['input_ids']
    #print(input_ids)
    # Convert the converted id to tensor
    input_ids = paddle.to_tensor(input_ids, dtype='int64').unsqueeze(0)
    #print(input_ids)
    # Call generate API to text
    ids, scores = model.generate(
                    input_ids=input_ids,
                    max_length=36,
                    min_length=1,
        decode_strategy='sampling',
        top_k=5,
    num_return_sequences=3)
    # print(ids)
    # print(scores)
    generated_ids = ids[0].numpy().tolist()
    # Use tokenizer to convert the generated id to text
    generated_text = tokenizer.convert_ids_to_string(generated_ids)
    print(generated_text)
    return generated_text.rstrip(',')

if __name__ == '__main__':
    chat("How do you do,baby")

  • PaddleNLP provides generate() function for generative tasks, which is embedded in all generative models of PaddleNLP. Green search, Beam Search and Sampling decoding strategies are supported. Users only need to specify the decoding strategy and corresponding parameters to complete prediction decoding and obtain the token ids and probability score of the generated sequence.
  • PaddleNLP has built-in corresponding tokenizer for various pre training models. You can load the corresponding tokenizer by specifying the model name you want to use.
  • PaddleNLP provides Chinese pre training models such as GPT and unifiedtransformer, which can be loaded with one click through the name of the pre training model. This time, a small Chinese GPT pre training model is used. Please refer to other pre training models Model List.

Main function part

  • main.py
  • Today's horoscope needs to apply for an interface. Website: Constellation fortune , fill in the values ['key'] of the APIKEY you applied for
  • When running this function, don't forget to turn on the ECS, so that your micro signal can become a fortune teller, otherwise you can only feel it locally.
import os
import cv2
import asyncio
import numpy as np
import paddlehub as hub
import json
import urllib.parse
import urllib.request
import match
import chat

from wechaty import (
    Contact,
    FileBox,
    Message,
    Wechaty,
    ScanStatus,
)

os.environ['WECHATY_PUPPET'] = "wechaty-puppet-service"
os.environ['WECHATY_PUPPET_SERVICE_TOKEN'] = "puppet_padlocal_XXXXXXXX" ## Your own token

def chinese_shuxiang(year):
    shuxiang_map = {
        u'rat': 1900,

        u'cattle': 1901,

        u'tiger': 1902,

        u'rabbit': 1903,

        u'Loong': 1904,

        u'snake': 1905,

        u'horse': 1906,

        u'sheep': 1907,

        u'monkey': 1908,

        u'chicken': 1909,

        u'dog': 1910,

        u'pig': 1911}

    for k, v in shuxiang_map.items():

        if (year % v % 12) == 0:
            return k

def xingzuo(month, day):
    xingzuo_map = {
        u'Aries': [(3, 21), (4, 20)],

        u'Taurus': [(4, 21), (5, 20)],

        u'Gemini': [(5, 21), (6, 21)],

        u'Cancer': [(6, 22), (7, 22)],

        u'leo': [(7, 23), (8, 22)],

        u'Virgo': [(8, 23), (9, 22)],

        u'libra': [(9, 23), (10, 22)],

        u'scorpio': [(10, 23), (11, 21)],

        u'sagittarius': [(11, 23), (12, 22)],

        u'aquarius': [(1, 20), (2, 18)],

        u'Pisces': [(2, 19), (3, 20)]

    }

    for k, v in xingzuo_map.items():

        if v[0] <= (month, day) <= v[1]:
            return k

    if (month, day) >= (12, 22) or (month, day) <= (1, 19):
        return u'Capricornus'

def super(xingzuo):
    xingzuosuper_map = {
        u'Aries': "The Incredible Hulk\n Aries people themselves are impulsive and straight hearted children. Therefore, if they go through science fiction dramas, they will have the super ability like Hulk. They will change when they are angry, and their ability will become stronger with the increase of anger value.",

        u'Taurus': "Bigger and smaller\n Taurus is cute and can eat. They have a good appetite. They can often eat their stomachs. Therefore, if they go through science fiction dramas, they will have the ability of ant people to grow bigger and smaller. Don't underestimate this super ability!",

        u'Gemini': "Xingjue\n Gemini's mouth is very eloquent. Chatting can make you laugh and quarreling can make you cry. Therefore, if you go through science fiction dramas, you will have super powers like xingjue. Compared with mortals with divine bodies, "mouth gun" is invincible, but it's very cute!",

        u'Cancer': "fire at the target a hundred times without a single miss\n Most cancer people have good eyes, high eyesight and strong ability to capture color. Therefore, if they go through science fiction dramas, they will have super powers like eagle eyes, which can be said to hit every shot!",

        u'leo': "whistle\n Leo is the most knife mouth tofu heart. Obviously, he is a very kind person, but in order to save face, he just arms himself into a cruel person. Therefore, if he passes through science fiction dramas, he will become a super ability like Yongdu. He can blow a whistle and put down the enemy.",

        u'Virgo': "Close combat\n Virgo people pursue perfection and are particularly strict with themselves. They must be perfect in everything they do. Therefore, if they go through science fiction dramas, they will have super abilities like panthers. They are not only intelligent, but also strong in close combat and move like clouds and water~",

        u'libra': "Infinite Resurrection\n Libra people, if they go through science fiction dramas, will have the kind of super power that can revive indefinitely, even if there is a lost cell left, because Libra is the most Jedi born person, they will not be crushed, and they believe in hope anyway.",

        u'scorpio': "magic\n If Scorpios are very interested in a thing, they will really stay up all night to study it. Therefore, if they go through science fiction dramas, they will have super abilities like Dr. strange, indulge in learning magic, and eventually become a powerful magician.",

        u'sagittarius': "Superb wisdom\n Sagittarius people are very smart. Their logical thinking ability and divergent thinking ability are among the best. Therefore, they are particularly good at mathematics. If they go through science fiction dramas, they will become people like iron man with their superb wisdom.",

        u'Capricornus': "Thor\n Capricorn people's inner belief is very strong. They believe that one thing will change. Therefore, if they go through science fiction dramas, they will have super powers like Thor, be able to summon lightning, and be just and tender. They are a good super power.",

        u'aquarius': "cyclops \n Aquarius people generally have a pair of sharp eyes, so if they go through science fiction dramas, they will have super powers like laser eyes. They can destroy where they see.",

        u'Pisces': "Telepathy\n Pisces people's sixth sense is very strong, and they are very careful in communicating with others. They will naturally have insight into other people's minds. Therefore, if they go through science fiction dramas, they will have the ability of telepathy, and the strong can control other people's thoughts!"
    }
    return xingzuosuper_map[xingzuo]


def img(xingzuo):
    xingzuofig_map = {
        u'Aries': "1",

        u'Taurus': "2",

        u'Gemini': "3",

        u'Cancer': "4",

        u'leo': "5",

        u'Virgo': "6",

        u'libra': "7",

        u'scorpio': "8",

        u'sagittarius': "9",

        u'Capricornus': "10",

        u'aquarius': "11",

        u'Pisces': "12"

    }
    # Path to save pictures
    img_path = './imgs/' + xingzuofig_map[xingzuo] +'.png'

    return img_path


def xzyunshi(xingzuo):
    xingzuoen_map = {
        u'Aries': "aries",

        u'Taurus': "taurus",

        u'Gemini': "gemini",

        u'Cancer': "cancer",

        u'leo': "leo",

        u'Virgo': "virgo",

        u'libra': "libra",

        u'scorpio': "scorpio",

        u'sagittarius': "sagittarius",

        u'Capricornus': "capricorn",

        u'aquarius': "aquarius",

        u'Pisces': "pisces"

    }

    url = "http://api.tianapi.com/txapi/star/index"

    # Define the request data and assign values to the data
    values = {}
    values['key'] = 'XXXX' ## APIKEY you applied for yourself
    values['astro'] = xingzuoen_map[xingzuo]

    # Encode request data
    data = urllib.parse.urlencode(values).encode('utf-8')
    print(type(data))  # Print < class' bytes' >
    print(data)  # Print b'status = HQ & token = c6ad7daa24baa29ae14465ddc0e48ed9 '

    # If it is a post request, the following methods will report an error TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str
    # The Post data must be bytes or iterable of bytes, not str. if it is STR, it needs to be encoded ()
    data = urllib.parse.urlencode(values)
    print(type(data))  # Print < class' STR '>
    print(data)  # Print status = HQ & token = c6ad7daa24baa29ae14465ddc0e48ed9

    # Splice data with url
    req = url + '?' + data
    # Open request to get object
    response = urllib.request.urlopen(req)
    print(type(response))  # Print < class' http. Client. Httpresponse '>
    # Print Http status code
    print(response.status)
    if response.status == 200:
        the_page = response.read()
        rsts = eval(the_page.decode("utf-8"))
        #print(rsts["newslist"])
        yunshi = []
        yunshi.append('Composite index:' + rsts["newslist"][0]["content"])
        yunshi.append('Love index:' + rsts["newslist"][1]["content"])
        yunshi.append('Work index:' + rsts["newslist"][2]["content"])
        yunshi.append('Fortune index:' + rsts["newslist"][3]["content"])
        yunshi.append('Health index:' + rsts["newslist"][4]["content"])
        yunshi.append('Lucky color:' + rsts["newslist"][5]["content"])
        yunshi.append('Lucky number:' + rsts["newslist"][6]["content"])
        yunshi.append('Noble Constellation:' + rsts["newslist"][7]["content"])
        yunshi.append('Today's overview:' + rsts["newslist"][8]["content"])
        finalstr = ""
        for i in yunshi:
            finalstr += i+'\n'
        return finalstr


def match_input(input):
    print(input)
    with open("./predict.txt", "w", encoding="utf-8") as f:
        f.write(input + " Check constellation\n")
    rst = match.start()
    return int(rst)

userstate = '0'


async def on_message(msg: Message):
    global userstate

    print(msg.talker().name)
    if msg.talker().name == '271828':
        # print(msg.talker().name)
        print("11",userstate)
        if userstate == '1-1':
            str = msg.text()
            print(str)
            rst = xingzuo(int(str[0]), int(str[2]))
            await msg.talker().say('You are?' + rst)
            selfsuper = super(rst)
            await msg.talker().say('What is your sign's superpower' + selfsuper)
            imgpath = img(rst)
            file_box_xz = FileBox.from_file(imgpath)
            await msg.talker().say(file_box_xz)
            yunshi = xzyunshi(rst)
            print(yunshi)
            userstate = '0'
            await msg.say("Your luck today:\n" + yunshi)


        rst = match_input(msg.text())
        if rst == 1:
            userstate = '1-1'
            await msg.talker().say('Please say your birthday in the format of 5.7,5 July 7')
            await msg.say('You don't need to add a year')

        else:

            if msg.text() == 'ding':
                await msg.say('This is an automatic reply: dong dong dong')

            if msg.text() == 'hi' or msg.text() == 'Hello':
                await msg.say(
                    'This is an automatic reply: Now many young people like to watch science fiction dramas, such as Avengers. There are many heroes and superpowers. These people are our youth and feelings. What superpowers will the 12 constellations have if they cross science fiction dramas?\n The current function of the robot is\n- received"Zodiac", Reply to your sign according to the prompt\n- received"constellation", Reply to your horoscope and today's fortune according to the tips, and your super powers in the science fiction world')

            if msg.text() == 'Zodiac':
                userstate = '2-1'
                await msg.say('Please enter your year of birth. Please keep pure numbers, such as 1998')

            if userstate == '2-1':
                year = msg.text()
                print(year)
                rst = chinese_shuxiang(int(year))
                await msg.say('You belong' + rst)

                userstate = '0'

            else:
                rst = chat.chat(msg.text())
                await msg.say(rst)


async def on_scan(
        qrcode: str,
        status: ScanStatus,
        _data,
):
    print('Status: ' + str(status))
    print('View QR Code Online: https://wechaty.js.org/qrcode/' + qrcode)


async def on_login(user: Contact):
    print(user)


async def main():
    # Make sure we set wechaty in the environment variable_ PUPPET_ SERVICE_ TOKEN
    if 'WECHATY_PUPPET_SERVICE_TOKEN' not in os.environ:
        print('''
            Error: WECHATY_PUPPET_SERVICE_TOKEN is not found in the environment variables
            You need a TOKEN to run the Python Wechaty. Please goto our README for details
            https://github.com/wechaty/python-wechaty-getting-started/#wechaty_puppet_service_token
        ''')

    bot = Wechaty()

    bot.on('scan', on_scan)
    bot.on('login', on_login)
    bot.on('message', on_message)

    await bot.start()

    print('[Python Wechaty] Ding Dong Bot started.')


s.environ:
        print('''
            Error: WECHATY_PUPPET_SERVICE_TOKEN is not found in the environment variables
            You need a TOKEN to run the Python Wechaty. Please goto our README for details
            https://github.com/wechaty/python-wechaty-getting-started/#wechaty_puppet_service_token
        ''')

    bot = Wechaty()

    bot.on('scan', on_scan)
    bot.on('login', on_login)
    bot.on('message', on_message)

    await bot.start()

    print('[Python Wechaty] Ding Dong Bot started.')


asyncio.run(main())

Postscript

Parts of the project that can be improved

  • The chatting model is not strong enough, it may be confused, or it may prefer text continuation rather than dialogue. There will even be bad words, which should be caused by the training corpus. The corpus is not cleaned during the model pre training. In the future, we can consider fine-tuning with a clean dialogue corpus more in line with the project.
  • The calling method of dialog semantic matching is too monotonous. I hope it can be improved in the future.
  • There are not enough functions. Combined with the excellent resources provided by pad, you can create more and more fun functions.

Mutual communication and progress

  • My general research direction is NLP. I welcome small partners to communicate and make progress together. This is the NLP database I collected: https://github.com/27182812/NLP-dataset

  • If you think it's good, give me a Star*_*

reference material

Finally, if you think the project is well written, remember fork and love, thank you!!!

Tags: NLP paddlepaddle

Posted on Mon, 08 Nov 2021 21:42:56 -0500 by liquidchild