Detailed explanation of using server training model

Encountered and solved some problems when trying to use the school server

Software download and installation

Required software (for personal use, it can also be other)
1.xshell
Use xshell to connect to the server and perform instruction operations (you can also use other software, such as ssh)
Download and installation: you can find free ones on the Internet (it is recommended to support genuine ones). After downloading, you can install them according to the guidance.
2.xmanager
Download and installation: if you do not use graphical functions, you can not install them. Similarly, you can find free ones on the Internet (it is recommended to support genuine versions). After downloading, you can install them according to the guidance.
3.pycharm
Download and installation: there are community version and professional version to choose from. The community version is free, and the general functions are supported. The professional version can be tried for one month, and the registration code is required later. Here, we only need to upload and download codes and trained weights with the help of pycharm, and the function of the community version is enough.

Configure xshell


Right click all sessions in the session manager, and create - > session
Note: the new session 1 here is a new session created by the author. There is no such session for the first time

Enter the session name (or no), host and port number

Set X11 transfer in tunnel option in SSH to Xmanager

Connecting to the server using xshell

If you use an intranet server such as the school intranet, you need to log in to the intranet first

enter one user name

Input password

Connection succeeded

Configure pycharm

Both the community version and the professional version can be configured. The location may be different. The professional version is used for configuration here

First, find and select Tools on the menu

Select deployment - > configuration

Select "+" sign - > SFTP

Click '...' after SSH configuration

Enter the Host address, User Name, User Name, and PassWord. If the port number is not the default 22 port number, you need to modify the port number

Select Save password and click Test Connection to test the connection. If it is prompted in the figure, the connection is successful

Press tools - > deployment - > Browse remote host to open remote host

Display Remote Host

Click the down arrow and select the SFTP just configured

You can see your own user's folder in this directory. It's best to upload code in your own folder

To upload and download files, you only need to select and drag them. For example, upload the code to the server, select the code on the left of pycharm, drag it to the folder to be uploaded on the right. If you want to download the trained weights, you only need to select the weight file in the server on the right and drag it to the folder on the left

Install dependent packages

Generally, the dependency packages of the newly configured server environment are not enough to meet the operation of the code, so we need to install the dependency packages before training.

pip timeout problem: some servers are not connected to the network at ordinary times. At this time, the command run by the server is equivalent to the server running offline locally
Solution: enter this command before pip.
Note: this command cannot be used directly. Many places need to change the information of their own environment, such as changing the location with "server address" to their own server address, (X11); Ubuntu; Linux x86_ 64; RV: 81.0) should also be changed to the corresponding version of server linux. Generally, these information can be found with commands. If you can't find it, you can ask the server administrator.

curl 'http://Server address / 0. HTM '- H' user agent: Mozilla / 5.0 (X11); Ubuntu;  Linux x86_ 64;  rv:81.0) Gecko/20100101 Firefox/81.0' -H 'Accept: text/html,application/xhtml+xml,application/xml; q=0.9,image/webp,*/*; q=0.8' -H 'Accept-Language: zh-CN,zh; q=0.8,zh-TW; q=0.7,zh-HK; q=0.5,en-US; q=0.3,en; Q = 0.2 '- compressed - H' content type: application / x-www-form-urlencoded '- H' origin: http: / / server address' - H 'DNT: 1' - H 'connection: keep alive' - H 'referer: http: / / server address / 0. HTM' - H 'upgrade secure requests: 1' - Data raw 'R3 = 1 & v6ip = & ddddd = 2018405a122 & upass = 260613 & save_ me=1&0MKKey=123'

After networking, you can use PIP normally, but because many packages are external networks, the download will be very slow, so you should use the image source, and pay attention to the package name during pip
Take opencv as an example

pip install opencv-python -i https://pypi.tuna.tsinghua.edu.cn/simple

- I after package name https://pypi.tuna.tsinghua.edu.cn/simple Tsinghua image is used

Problems with Cuda installation

There are many methods on how to install the linux version and tensorflow. Here we focus on cudnn.
Some cudnn LIBS cannot be referenced normally when using gpu acceleration due to the problem of environment variable configuration. In this case, the program will report an error, indicating that a certain file cannot be found in a certain path. My method is to search the file and copy the file to the file path with error
Commands for searching files

locate File name

Training time

Because each server has a time limit, it will automatically disconnect after about 30 minutes of no operation, and this time is not enough for many models, so we need to find ways to extend the time or permanently prevent disconnection.
1. Script
Right click the web page - > check - > console, and enter a script on the console to prevent disconnection (the principle is to operate the web page every other period of time)
Take Colab and kaggle as examples
Applicable scripts for Colab

function ConnectButton(){
    console.log("Connect pushed"); 
    document.querySelector("#top-toolbar > colab-connect-button").shadowRoot.querySelector("#connect").click() 
}
setInterval(ConnectButton,60000);

function closeButton(){
    console.log("close"); 
    document.querySelector("body > colab-dialog > paper-dialog > colab-sessions-dialog").shadowRoot.querySelector("#footer > div > paper-button.dismiss").click() 
}
setInterval(ConnectButton,60000);

kaggle applicable code

function closeButton(){
    console.log("close"); 
   document.querySelector("#root > div > div > div.AppView-sc-16eb2j.kZXkZl > div.App_Body-sc-16c8j4p.hxOBfv > div.Layout_Body-sc-6piylv.bXAYPy > div > div > div > div.ToolbarContainer_Body-sc-2h8iu7.fhvgBU > button").click() 
}
setInterval(closeButton,60000);

function closeButton(){
    console.log("close"); 
    document.querySelector("#root > div > div > div.AppView-sc-16eb2j.kZXkZl > div.App_Body-sc-16c8j4p.hxOBfv > div.Layout_Body-sc-6piylv.bXAYPy > div > div > div > div.ToolbarContainer_Body-sc-2h8iu7.fhvgBU > div.DetailedStatus_Body-sc-zfwb95.fMzpPO > button > i").click() 
}
setInterval(closeButton,60000);

Disadvantages: it can be seen from the above that the code of each different web page is different, which will be troublesome. Therefore, we will introduce the use of nohup command to realize offline operation.

nohup command:
The nohup command should be run on the command line of xshell or ssh. Enter nohup before the command to be executed to run the command offline. The same is true for the command to execute python files. When executed, the command has no screen output, and the screen output without nohup is output to the nohup.out file generated in the current folder

After running, you can shut down the xshell, or even shut down, and the server will run on its own.

nohup.out takes up too much memory

When the number of training model theories is too many, nohup.out will store a lot of characters, which will take up a lot of space. My approach is to delete the nohup.out file after making sure it has started running, so it can run normally (I'm not sure what the principle is, so I'm not sure every server supports this), so I don't have to worry about the problem of excessive nohup.out.

Terminate the nohup process

First you need to find the process
Enter command:

ps aux | less 

Operation results:

It should be noted that the second column is the process ID(PID), and the last line is the COMMAND. If it is a python program, the COMMAND is similar to the python program name parameter
Then kill the process

kill -9 PID

Tags: Python Pycharm ssh AI server

Posted on Mon, 06 Sep 2021 20:04:48 -0400 by Ravrflavr