The last article was mainly about installation. In fact, there are two problems later. After installing TensorFlow, anaconda prompt reports an error and anaconda can't open
Problem solving
1.anaconda prompt reports an error
Maybe the following problem occurs. As long as you open prompt, it will appear and you can't input instructions. The only way to find out is to uninstall and reinstall anaconda, which is equivalent to starting the whole process again.
But the cause was also found because the following command was used during installation
pip install --ignore-installed --upgrade tensorflow
However, just write as follows, but this is a probability factor and may not trigger it. I may just be unlucky..
pip install tensorflow
2.anaconda opens without the main page
The main reason is that the loading green circle appears after clicking, and then there is no response. The main page cannot be opened. Moreover, clicking the green circle will disappear and an error will be reported. The general meaning of error reporting is that there is already an anaconda running. The solution to this problem is found in CSDN. CSDN provides many solutions, such as
Modify file and upgrade method
Completely uninstall clear file reinstallation method
I finally changed the version number, but I forgot the specific operation. That method is also submerged in the web page. I can see it and try it casually. I'll update it next time.
Bert actual use
I ran the. csv given by the teacher, and finally generated the representation vector of each column. The teacher only gave us a piece of source code and asked us to sort and rewrite it according to the project situation, and finally sorted out a version:
from bert_serving.client import BertClient import numpy as np from pandas import read_csv bc = BertClient(ip="localhost", check_length=False) def write_txt(root_dir, content): with open(root_dir, 'a+', encoding='utf-8')as f: f.write(content) # The passed in parameter is root_dir, content; root_dir is the content to be written, the data type is string, content is the content to be written, and the data type is string. # Write the content file, 'a' means to add writing after the original content, 'utf8' means the code to be written, which can be replaced with 'utf 16'. def generate_text(data_path): items = read_csv(data_path) items.to_csv('routeName.txt', sep='\t', index=False, header=None, columns=['routeName'], encoding='utf-8') # Line feed separation, remove the column header, and read the column named 'routeName'. The default code is utf-8 generate_text('data/Travel Package Information.csv') def read_txt(data_path): with open(data_path, 'r', encoding='utf-8')as f: lines = f.readlines() return lines # Read the text content and return the reading result to lines for final output def embedding_item(data_path): lines = read_txt(data_path) content_list = [] for line in lines: content_list.append(line.strip("\n")) vec = bc.encode(content_list) print("vec shape:", vec.shape) np.save("data/ic routeName.npy", vec) print("end") embedding_item(data_path='routeName.txt')
This is a preliminary arrangement. You can only sort out one column at a time, more than 60000. After running for more than 20 minutes, the teacher's computer ran for one second... I really envy you.
The following is the code given by the teacher. Four columns are output at one time, and the time can be calculated at the end.
from bert_serving.client import BertClient import numpy as np from pandas import read_csv import time def write_txt(root_dir, content): with open(root_dir, 'a+', encoding='utf-8') as f: f.write(content) def generate_text(data_path): items = read_csv(data_path) items.to_csv('data/routeName.txt', sep='\t', index=False, columns=['routeName'], encoding="utf_8", header=0) items.to_csv('data/destination.txt', sep='\t', index=False, columns=['destination'], encoding="utf_8", header=0) items.to_csv('data/destinationLarge.txt', sep='\t', index=False, columns=['destinationLarge'], encoding="utf_8", header=0) items.to_csv('data/type.txt', sep='\t', index=False, columns=['type'], encoding="utf_8", header=0) def read_txt(data_path): with open(data_path, 'r', encoding='utf-8') as f: lines = f.readlines() return lines def embedding_item(feature): lines = read_txt('data/' + feature + '.txt') content_list = [] for line in lines: content_list.append(line.strip("\n")) vec = bc.encode(content_list) print(feature + " vec shape:", vec.shape) np.save("data/" + feature + '.npy', vec) print(feature + " Embedding end!") if __name__ == "__main__": starttime = time.time() bc = BertClient(ip="localhost", check_length=False) generate_text('data/Travel Package Information.csv') embedding_item(feature='routeName') embedding_item(feature='destination') embedding_item(feature='destinationLarge') embedding_item(feature='type') endtime = time.time() running_time = endtime - starttime print('Running Time:', running_time / 60.0, 'branch')
Changing the code is really bald.