Initial experience of using pyautogui

1. Write in front

As a student majoring in automation, the word automation is related to the automatic operation related to the control of the production line by default. Due to the lack of field visits to automatic chemical plants, the feeling of "automation" only stays in books.

When I first started python, I heard about the automation function of python, which is mentioned in almost every Python tutorial. However, until recently, for me, python has not realized the role of automation except that the syntax is very simple and can write and call tools for machine learning and deep learning.

Until I realized pyautogui This artifact...

2. Pyautogui

In short, pyautogui allows python scripts to control the mouse and keyboard of devices, so as to realize automatic interaction with applications without manual keyboard tapping and mouse clicking.

At present, I mainly use these operations:

  • mouse

These two are responsible for moving the mouse. The first is absolute movement and the second is relative movement. For absolute movement, you need to enter the position relative to the upper left corner of the whole screen. The position relationship is shown in the figure below. Relative movement refers to taking the current position of the mouse as the origin, and the change relationship of X and Y is the same. Therefore, the coordinates of X and Y in absolute movement are positive, while negative numbers will appear in relative movement (moving to the left or up).

How to determine the target position of mouse movement? There are two methods:


The first method can output the current absolute position of the mouse without inputting parameters. When using this method, first move the mouse to the target position, then use this method to obtain the target position, and then input the position into the mouse movement function.

The second method is very amazing, using the visual method. The input is a picture (usually a screenshot of the target position). This method can locate the position coordinates and length and width of the upper left corner of the picture on the screen. Using this method is more in line with people's habits and is not sensitive to location. For example, when you want to automatically click a web page, if the location of the web page changes, the original absolute location will be invalid, and the vision based method can still identify the location of the current web page; The disadvantage of this method is that the recognition rate is sometimes not high, and it may return to multiple positions.

In addition, the two mouse related operations are click and scroll:

This click function defaults to one left click, and the input is the number of clicks (ps. by viewing the official document, it is found that this function can also directly enter the position or locate the position to click through the picture. At present, I move the mouse to the target position and then click). The input of the scroll wheel is the distance to move the page. A negative number indicates up and a positive number indicates down.

  • keyboard
    Keyboard operations include typing and shortcut keys:

The first two can output English characters. Chinese characters need to be copied and pasted. For example, to input 'China', you can do the following:


The pyperclip package will be installed when installing pyautogui with pip. In addition, pasting can also be implemented with pyautogui. Hot ('ctrl ',' V ') to be introduced below.

The third is the general key pressing method. If you want to press multiple keys, you need to put the key name in a list, and the function will press the keys in turn; As can be seen from the name, the last method is used to press shortcut keys, that is, press two or three keys at the same time. This function can also be realized by pyautogui.keyDown() and pyautogui.keyUp(), but it should be more complex, such as:

pyautogui.hotkey('ctrl', 'shift', 'esc') 

Equivalent to


The keys of all keyboard keys listed in the official document are as follows. You should be able to guess what each represents.

['\t', '\n', '\r', ' ', '!', '"', '#', '$', '%', '&', "'", '(',
')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`',
'a', 'b', 'c', 'd', 'e','f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '{', '|', '}', '~',
'accept', 'add', 'alt', 'altleft', 'altright', 'apps', 'backspace',
'browserback', 'browserfavorites', 'browserforward', 'browserhome',
'browserrefresh', 'browsersearch', 'browserstop', 'capslock', 'clear',
'convert', 'ctrl', 'ctrlleft', 'ctrlright', 'decimal', 'del', 'delete',
'divide', 'down', 'end', 'enter', 'esc', 'escape', 'execute', 'f1', 'f10',
'f11', 'f12', 'f13', 'f14', 'f15', 'f16', 'f17', 'f18', 'f19', 'f2', 'f20',
'f21', 'f22', 'f23', 'f24', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9',
'final', 'fn', 'hanguel', 'hangul', 'hanja', 'help', 'home', 'insert', 'junja',
'kana', 'kanji', 'launchapp1', 'launchapp2', 'launchmail',
'launchmediaselect', 'left', 'modechange', 'multiply', 'nexttrack',
'nonconvert', 'num0', 'num1', 'num2', 'num3', 'num4', 'num5', 'num6',
'num7', 'num8', 'num9', 'numlock', 'pagedown', 'pageup', 'pause', 'pgdn',
'pgup', 'playpause', 'prevtrack', 'print', 'printscreen', 'prntscrn',
'prtsc', 'prtscr', 'return', 'right', 'scrolllock', 'select', 'separator',
'shift', 'shiftleft', 'shiftright', 'sleep', 'space', 'stop', 'subtract', 'tab',
'up', 'volumedown', 'volumemute', 'volumeup', 'win', 'winleft', 'winright', 'yen',
'command', 'option', 'optionleft', 'optionright']

3. Postscript

I think pyautogui, an automation artifact, can also be combined with crawlers and even natural language understanding tools to create more intelligent automation tools. The pyautogui tool itself is also constantly improving. It is learned from the official Roadmap that the functions of pyautogui's future plan include the following

  • A tool for determining why an image can't be found in a particular screenshot. (This is a common source of questions for users.)
  • Full compatibility on Raspberry Pis.
  • "Wave" function, which is used just to see where the mouse is by shaking the mouse cursor a bit. A small helper function.
  • locateNear() function, which is like the other locate-related screen reading functions except it finds the first instance near an xy point on the screen.
  • Find a list of all windows and their captions.
  • Click coordinates relative to a window, instead of the entire screen.
  • Make it easier to work on systems with multiple monitors.
  • GetKeyState() type of function
  • Ability to set global hotkey on all platforms so that there can be an easy "kill switch" for GUI automation programs.
  • Optional nonblocking pyautogui calls.
  • "strict" mode for keyboard - passing an invalid keyboard key causes an exception instead of silently skipping it.
  • rename keyboardMapping to KEYBOARD_MAPPING
  • Ability to convert png and other image files into a string that can be copy/pasted directly in the source code, so that they don't have to be shared separately with people's pyautogui scripts.
  • Test to make sure pyautogui works in Windows/mac/linux VMs.
  • A way to compare two images and highlight differences between them (good for pointing out when a UI changes, etc.)

Python automation, the future can be expected!

Tags: Python

Posted on Tue, 30 Nov 2021 22:43:29 -0500 by pythian