Introduction to Hexagon GDB Debugger (25)

2.12.13 character set

If the debugger and the debugged program use different character sets to represent characters and strings, the debugger can automatically convert between character sets for you. The character set used by the debugger is called the host character set, and the character set used by inferior programs is called the target character set.

For example, if you run the debugger on a GNU/Linux system that uses the ISO Latin 1 character set, but you are using the debugger's remote protocol (see Section 3.3) to debug a program running on an IBM mainframe that uses the EBCDIC character set, the host character set is Latin-1 and the target character set is EBCDIC. If you provide the debugger with the command set target charset ebcdic-us, the debugger will convert between EBCDIC and Latin 1 when you print character or string values or use character and string text in expressions

The debugger cannot automatically recognize the character set used by inferior programs; You must use the set target charset command to tell it, as described below.
The following commands are used to control debugger character set support:

set target-charset charset
Sets the current target character set to the character set. We list the character set names recognized by the debugger below, but if you type set target charset followed by it, the debugger will list the target character sets it supports.

set host-charset charset
Sets the current host character set to the character set.
By default, the debugger uses the host character set appropriate for the system it runs; You can override this default by using the set host charset command.
The debugger can only use certain character sets as its host character set. Below we list the character set names recognized by the debugger and indicate which can be the host character set, but if you type set target charset followed by it, the debugger will list the host character sets it supports.

set charset charset
Sets the current host and destination character sets to character sets. As mentioned above, if you type set charset followed by charset, the debugger will list the names of the character sets available for the host and target.

show charset
Displays the name of the current host and target character set.

show host-charset
Displays the name of the current host character set.

show target-charset
Displays the name of the current target character set.

The debugger currently includes support for the following character sets:
ASCII
Seven bit American ASCII. The debugger can use it as its host character set.
ISO-8859-1
ISO Latin 1 character set. This extends ASCII with the accented characters required for French, German, and Spanish. The debugger can use it as its host character set.

EBCDIC-US
IBM1047
A variant of the EBCDIC character set for some IBM mainframe operating systems. (GNU/Linux on S/390 uses US ASCII.) the debugger cannot use these as its host character set.

be careful
    These are single byte character sets. More work needs to be done inside the debugger to support multi byte or variable width character encoding, such as Unicode of UTF-8 and UCS-2 code.

The following is an example of debugger character set support. Suppose the following source code is placed in the file charset-test.c:

#include <stdio.h>
char ascii_hello[]
  = {72, 101, 108, 108, 111, 44, 32, 119,
111, 114, 108, 100, 33, 10, 0};

char ibm1047_hello[]

  = {200, 133, 147, 147, 150, 107, 64, 166,

150, 153, 147, 132, 90, 37, 0};
main ()
{

  printf ("Hello, world!\n");

}

In this program, ascii_hello and ibm1047_hello is the string containing Hello, world! Array of. Followed by a newline character, encoded in ASCII and IBM1047 character sets.

We compile the program and call the debugger on it:

$ hexagon-gcc -g charset-test.c -o charset-test
$ hexagon-gdb -nw charset-test
GNU gdb 2001-12-19-cvs
Copyright 2001 Free Software Foundation, Inc.
...
(hexagon-gdb)

We can use the show charset command to see which character sets are currently used by the debugger to interpret and display characters and Strings:

(hexagon-gdb) show charset
The current host and target character set is `ISO-8859-1'.
(hexagon-gdb)

To print this manual, let's use ASCII as our initial character set:

(hexagon-gdb) set charset ASCII
(hexagon-gdb) show charset
The current host and target character set is `ASCII'.

(hexagon-gdb)

Let's assume that ASCII is indeed the correct character set for our host system; In other words, let's assume that if the debugger prints characters using the ASCII character set, our terminal will display them correctly. Since our current target character set is also ASCII, ASCII_ The contents of Hello are printed clearly:

(hexagon-gdb) print ascii_hello
$1 = 0x401698 "Hello, world!\n"
(hexagon-gdb) print *(ascii_hello+0)
$2 = 72 'H'

(hexagon-gdb)

The debugger uses the target character set of the characters and string literals you use in the expression:

(hexagon-gdb) print '+'
$3 = 43 '+'
(hexagon-gdb)

The ASCII character set uses the number 43 to encode the + character.

The debugger relies on the user telling it which character set the target program uses. If we print ibm1047 when the target character set is still ASCII_ Hello, the result is garbage:

(hexagon-gdb) print ibm1047_hello
$4 = 0x4016a8 "\310\205\223\223\226k@\246\226\231\223\204Z%"

(hexagon-gdb) print *(ibm1047_hello+0)
$5 = 200 '\310'
(hexagon-gdb)

If we call set target charset, the debugger will tell us the character sets it supports:

(hexagon-gdb) set target-charset
ASCII       EBCDIC-US   IBM1047     ISO-8859-1

(hexagon-gdb) set target-charset

We can select IBM1047 as our target character set and check the program string again. Now the ascii string is wrong, but the debugger will IBM1047_ The contents of Hello are converted from the target character set IBM1047 to the host character set ASCII, and they are displayed correctly:

(hexagon-gdb) set target-charset IBM1047
(hexagon-gdb) show charset
The current host character set is `ASCII'.

The current target character set is `IBM1047'.
(hexagon-gdb) print ascii_hello
$6 = 0x401698 "\110\145%%?\054\040\167?\162%\144\041\012"

(hexagon-gdb) print *(ascii_hello+0)
$7 = 72 '\110'

(hexagon-gdb) print ibm1047_hello
$8 = 0x4016a8 "Hello, world!\n"
(hexagon-gdb) print *(ibm1047_hello+0)
$9 = 200 'H'

(hexagon-gdb)

As mentioned above, the debugger uses the target character set of the characters and string text you use in the expression:

(hexagon-gdb) print '+'
$10 = 78 '+'
(hexagon-gdb)

The IBM1047 character set encodes the + character with the number 78.

Tags: tools GDB

Posted on Thu, 18 Nov 2021 01:02:20 -0500 by donnierivera