Detailed explanation of the underlying programming details of Go unsafe package [Go language Bible notes]

Underlying programming

The design of Go language contains many security policies, which limit the usage that may lead to program running errors. Compile time type checking can find most operations with mismatched types, such as the error of subtracting two strings. All built-in types such as string, map, slice and chan have strict type conversion rules.

For errors that cannot be detected statically, such as array access out of bounds or the use of null pointers, runtime dynamic detection can ensure that the program will terminate immediately and print relevant error information when it encounters problems. Automatic memory management (automatic garbage collection) can eliminate most of the problems related to wild pointers and memory leaks.

The implementation of Go language deliberately hides many underlying details. We cannot know the real memory layout of a structure, obtain the machine code corresponding to a runtime function, or know which operating system thread the current goroutine is running on. In fact, the Go language scheduler will decide whether to transfer a goroutine from one operating system thread to another. A pointer to a variable does not show the real address of the variable, because the garbage collector may move the memory location of the variable as needed. Of course, the address corresponding to the variable will also be updated automatically.

In general, these characteristics of Go language make Go programs easier to predict and understand than low-level C language, and the programs are not easy to crash. By hiding the underlying implementation details, the program written in Go language is also highly portable, because the semantics of the language is largely independent of any compiler implementation, operating system and CPU system structure (of course, it is not completely independent: for example, types such as int depend on the size of CPU machine words, the specific order of evaluation of some expressions, and some additional restrictions implemented by the compiler).

Sometimes we may give up using some language features and prefer methods with better performance, such as interoperability with libraries written in other languages, or some functions that cannot be implemented in pure Go language.

In this chapter, we will show how to use the unsafe package to get rid of the restrictions brought by the Go language rules, how to create the binding of C language function library, and how to make system calls.

The methods provided in this chapter should not be used easily . if the details are not handled properly, they may lead to various unpredictable and obscure errors, which even experienced C language programmers cannot understand. While using unsafe package, it also gives up the commitment of Go language to ensure compatibility with future versions, because it is bound to use many non-public implementation details intentionally or unintentionally It is likely to be changed in the future Go language.

It should be noted that the unsafe package is a package implemented in a special way. Although it can be imported and used like an ordinary package, it is actually implemented by the compiler. It provides some methods to access the internal features of the language, especially the details related to memory layout. Encapsulating these features in a separate package is required in very few cases In addition, some environments may restrict the use of this package because of security factors.

However, unsafe packages are widely used in relatively low-level packages, such as runtime, os, syscall and net packages, because they need to work closely with the operating system, but they generally do not need to use unsafe packages for ordinary programs.

unsafe.Sizeof,Alignof and Offsetof

The unsafe package has only these three functions

The unsafe.Sizeof function returns the byte size of the operand in memory. The parameter can be an expression of any type, but it does not evaluate the expression. A Sizeof function call is equivalent to a constant expression of the corresponding uintptr type. Therefore, the returned result can be used as the length of the array type, or used to calculate other constants.

import "unsafe"
fmt.Println(unsafe.Sizeof(float64(0)))  // 8

The size returned by the Sizeof function only includes the fixed part of the data structure, such as the pointer and string length in the structure corresponding to the string, but does not include the content of the string pointed to by the pointer. In Go language, non aggregate types usually have a fixed size, although the actual size generated under different tool chains may be different. Considering portability, The size of a reference type or containing a reference type is 4 bytes on a 32-bit platform and 8 bytes on a 64 bit platform.

When the computer loads and saves data, it will be more efficient if the memory address is properly aligned. For example, the address of a 2-byte int16 type variable should be even, the address of a 4-byte rune type variable should be a multiple of 4, and the address of an 8-byte float64, uint64 or 64 bit pointer type variable should be 8-byte aligned. However, it is larger The address alignment multiple is unnecessary. Even large data types such as complex128 are only 8-byte aligned at most.

Due to address alignment, the size of an aggregate type (structure or array) is at least the sum of the sizes of all fields or elements, or greater (because there may be memory holes) Memory hole is an unused memory space automatically added by the compiler to ensure that the address of each subsequent field or element can be reasonably aligned with the starting address of the structure or array.

type

size

bool

1 byte

intN, uintN, floatN, complexN

N/8 bytes (for example, float64 is 8 bytes)

int, uint, uintptr

1 machine word

*T

1 machine word

string

2 machine words (data,len)

[]T

3 machine words (data,len,cap)

map

1 machine word

func

1 machine word

chan

1 machine word

interface

2 machine words (type,value)

The Go language specification does not require that the declaration order of a field is consistent with the order in memory, so theoretically, a compiler can rearrange the memory location of each field at will, although the compiler did not do so when writing this book. Although the following three structures have the same fields, the first writing method needs 50% more than the other two Memory.

                                // 64-bit  32-bit
struct{ bool; float64; int16 }  // 3 words 4 words
struct{ float64; int16; bool }  // 2 words 3 words
struct{ bool; int16; float64 }  // 2 words 3 words

(note to the author: the author means that the compiler may merge the above three permutations into the second or the third, that is, no matter what order is written, the compiler compiles according to the second or the third writing method, because it saves more space.)

The details of the memory address alignment algorithm are beyond the scope of this book, and not every structure needs to worry about this problem, but effective packaging can make the data structure more compact Issue10014 , memory utilization and performance may benefit.

The unsafe.Alignof function returns the multiple of the type of the corresponding parameter that needs to be aligned. Like Sizeof, Alignof also returns a constant expression corresponding to a constant. Usually, Boolean and numeric types need to be aligned to their own size (up to 8 bytes), and other types need to be aligned to the machine word size.

The parameter of the unsafe.Offsetof function must be a field x.f, and then returns the offset of the f field relative to the starting address of X, including possible holes.

Figure 13.1 shows a structure variable x and its typical memory on 32-bit and 64 bit machines. The gray area is the memory hole.

var x struct {
    a bool
    b int16
    c []int
}

The calculation results of calling unsafe package related functions for x and its three fields are shown below:

32-bit system:

Sizeof(x)   = 16  Alignof(x)   = 4
Sizeof(x.a) = 1   Alignof(x.a) = 1 Offsetof(x.a) = 0
Sizeof(x.b) = 2   Alignof(x.b) = 2 Offsetof(x.b) = 2
Sizeof(x.c) = 12  Alignof(x.c) = 4 Offsetof(x.c) = 4

Although these functions are in the unsafe unsafe package, these function calls are not really unsafe. Especially when memory space needs to be optimized, their returned results are very helpful to understand the native memory layout.

unsafe.Pointer

Unsafe. Pointer is a specially defined pointer type , it can contain the address of any type of variable. Of course, we can't directly obtain the value of the real variable pointed to by the unsafe.Pointer pointer through * p, because we don't know the specific type of the variable. Like ordinary pointers, the unsafe.Pointer can also be compared, and it supports comparison with the nil constant to judge whether it is a null pointer.

An ordinary * T type pointer can be converted to an unsafe.Pointer type pointer, and an unsafe.Pointer type pointer can also be returned to an ordinary pointer. The returned ordinary pointer type does not need to be the same as the original * T type.

By converting * float64 type pointer to * uint64 type pointer, we can view the bit pattern of a floating-point variable:

pacage math

func Float64bits(f float64) uint64 { return *(*uint64)(unsafe.Pointer(&f)) }

fmt.Printf("%#016x\n", Float64bits(1.0))  // 0x3ff0000000000000

(note to the author: unsafe.Pointer here is a bridge between two formats)

By converting to a new type pointer, we can update the bit mode of floating-point numbers. It is possible to operate floating-point numbers through bit mode, but more importantly, the pointer conversion syntax allows us to write arbitrary values to memory without destroying the type system.

An unsafe.Pointer can also be converted to uintptr type, and then saved into a pointer type numeric variable (Note: This is only the same numeric value as the current pointer, not a pointer), and then used for necessary pointer value operations. (Chapter 3, uintptr is an unsigned integer number, which is enough to save an address) Although this conversion is also reversible, turning uintptr to an unsafe.Pointer pointer may break the type system, because not all numbers are valid memory addresses.

Many operations that convert an unsafe.Pointer to a native number and then back to an unsafe.Pointer are also unsafe. For example, the following example needs to convert the address of variable x plus the address offset of field B into a pointer of type * int16, and then update x.b through this pointer:

// gopl.io/ch13.unsafeptr

var x struct {
    a bool
    b int16
    c []int
}

// And Pb: = & x.b equivalent
pb := (*int16)(unsafe.Pointer(
    uintptr(unsafe.Pointer(&x) + unsafe.Offsetof(x.b)) 
)

*pb = 42
fmt.Println(x.b)  // 42

(note to the author: the pointer addition operation can only be performed after using unsafe.Pointer to obtain the pointer value of any type. The address of x.b is equal to the starting address of x plus the offset of member variable B. as for why you have to convert to uintptr type and then unsafe.Pointer type to convert to * int16 type, as in the above example, unsafe.Pointer needs to be used as an intermediate bridge for pointer class Type conversion)

Although the above writing is cumbersome, it is not a bad thing here, because these functions should be used carefully. Do not try to introduce a temporary variable of uintptr type, because it may undermine the security of the code. The following code is wrong:

// note: subtly incorrect
tmp := uintptr(unsafe.Pointer(&x)) + unsafe.Offsetof(x.b)
pb := (*int16)(unsafe.Pointer(tmp))
*pb = 42

The reason for the error is very subtle: sometimes the garbage collector will move the addresses of some variables to reduce memory fragmentation. This kind of garbage collector is called moving GC. When a variable is moved, all pointers that save the old address of the variable must be updated to the new address after the variable is moved. From the perspective of the garbage collector, an unsafe.Pointer is a The pointer to the variable, so when the variable is moved, the corresponding pointer must also be updated; however, the temporary variable of uintptr type is only an ordinary number, so its value should not be changed. The above error code introduces a non pointer temporary variable tmp, so the garbage collector cannot correctly recognize that it is a pointer to variable x. when the second language When the statement is executed, the variable x may have been transferred, and the temporary variable tmp is no longer the current & x.b address. The third assignment statement to the previously invalid address space will completely destroy the whole program!

There are many errors caused by similar reasons. For example, this statement:

pT := uintptr(unsafe.Pointer(new(T)))  // Prompt: error!

There is no pointer to reference the newly created variable of new, so the garbage collector has the right to reclaim its memory space immediately after the execution of the statement, so the returned pT will be an invalid address.

(note to the author: garbage collection algorithms include reference counting method (python uses backup method to solve the circular dependency problem of reference counting method, which is one of the few common programming languages that use this method) and reachability analysis method (Java and Go are most used) In reachability analysis, if a variable is not referenced, it is unreachable in the reference chain. It is judged as a useless object, that is, unreachable is regarded as a flag that can be recycled. All anonymous reference types should be garbage collected after use because they are used at one time).

Although the current Go language implementation has not used mobile GC, this should not be a lucky reason for writing error code: the current Go language implementation already has the scene of moving variables. In Section 5.2, we mentioned that goroutine's stack grows dynamically as needed. When the stack grows dynamically, all variables in the original stack may need to be moved to a new and larger stack, so we can't ensure that the address of variables remains unchanged throughout the service cycle.

At the time of writing this article, there are no clear principles to guide Go programmers to what kind of unsafe.Pointer and uintpr conversion is unsafe (see) Issue7192 ). This issue has been closed), so we strongly recommend dealing with it in the worst way. Treat all uintptr type variables containing variable addresses as bugs, and reduce unnecessary conversion from unsafe.Pointer type to uintptr type. In the first example, there are three conversions "conversion of field offset (1) to uintpr (2) and operation of returning to unsafe.Pointer(3)", all of which are completed in one expression.

When a library function is called and the address of uintptr type is returned, such as the related functions in the reflect ion package, The returned result should be immediately converted to unsafe.Pointer to ensure that the pointer points to the same variable.

package reflect

func (Value) Pointer() uintptr
func (Value) UnsafeAddr() uintptr
func (Value) InterfaceData() [2]uintptr  // index 1

Depth equality judgment

The DeepEqual function from the reflect package can judge the depth equality of the two values. The DeepEqual function uses the built-in comparison operator to judge the equality of the basic types. For the composite type, it recurses each basic type of the variable, and then makes a similar comparison judgment. Because it can work on any type * * and even for some types that do not support operation operators * *, this function is widely used in some test codes. For example, the following code uses the DeepEqual function to compare whether two string arrays are equal.

func TestSplit(t *testing.T) {
    got := strings.Split("a:b:c", ":")
    want := []string{"a", "b", "c"};
    if !reflect.DeepEqual(got, want) { /* ... */ }
}

Although the DeepEqual function is convenient and can support any data type, it also has shortcomings. For example, it treats a map with nil value and a map with non nil value but empty as unequal. Similarly, slice with nil value and slice with non nil but empty value are also regarded as unequal (I note: as for why, I think this implementation is right in the section of interface value).

var a, b []string = nil, []string{}
fmt.Println(reflect.DeepEqual(a, b))  // faslse

var c, d map[string]int = nil, make(map[string]int)
fmt.Println(reflect.DeepEqual(c, d))  // false

We want to implement our own equal function here to compare the values of types. Similar to the DeepEqual function, it also makes recursive comparison based on each element of slice and map. The difference is that it treats slice with nil value (similar to map) and slice with non nil value but empty as equal values. The comparison of the basic part can be completed based on the reflect package, which is similar to the implementation method of the Display function in Chapter 12.3. Similarly, we also define an internal function equal for internal recursive comparison. Readers do not need to care about the specific meaning of the see parameter at present. For each pair of x and y to be compared, the equal function first detects whether they are valid (or both are invalid), and then whether they are of the same type. The rest is a huge switch branch for comparing elements of the same base type. Because of the limitation of page space, we have omitted some similar branches.

// gopl.io/ch13/equal

func equal(x, y reflect.Value, seen map[comparison]bool) bool {  // Author's note: the type of map key is comparison, which is declared below
    if !x.IsValid() || !y.IsValid() {  // Author's note: think about why it is written like this
        return x.IsValid() == y.IsValid()
    }
    if x.Type() != y.Type() {
        return false
    }
    
    // ..cycle check omitted (shown later) ...
    
    switch x.Kind() {
    case reflect.Bool:
        return x.Bool() == y.Bool()
    case reflect.String:
        return x.String() == y.String()    
    }
    
    // ... numeric cases omitted for brevity... Author's note: the branch of numeric type is as long as the function implemented by reflect, so it is omitted
    
    case reflect.Chan, reflect.UnsafePointer, reflect.Func:
        return x.Pointer() == y.Pointer()
    case reflect.Ptr, reflect.Interface:  // Author's note: the contents of pointer and interface are compared
        return equal(x.Elem(), y.Elem(), seen)
    case reflcet.Array, reflect.Slice:
        if x.Len() != y.Len() {
            return false
        }
        for i:=0; i<x.Len(); i++ {
            if !equal(x.Index(i), y.Index(i), seen) {
                return false
            }
        }
        return true
        
    // ...struct and map cases omitted for brevity...
    panic("unreachable")
}

Like the previous suggestions, we do not expose the interface related to the reflect package, so the exported function needs to internally convert the variable to the type reflect.Value.

// Equal reports whether x and y are deeply equal
func Equal(x, y interface{}) bool {
    seen := make(map[comparison]bool)
    return equal(reflect.ValueOf(x), reflect.ValueOf(y), seen)
}

type comparison {
    x, y unsafe.Pointer
    t reflect.Type
}

In order to ensure that the algorithm can exit normally for data structures with rings, we must record the variables that have been compared each time, so as to avoid entering the second comparison. The Equal function assigns a set of structures for comparison, including the address (saved as unsafe.Pointer) and type of each pair of comparison objects. The reason we want to record types is that some different variables may correspond to the same address. For example, if x and y are array types, X and x[0] will correspond to the same address, and Y and y[0] will also correspond to the same address, which can be used to distinguish whether the comparison between X and Y or the comparison between x[0] and y[0] has been carried out.

// cycle check
if x.CanAddr() && y.CanAddr() {
    xptr := unsafe.Pointer(x.UnsafeAddr())
    yptr := unsafe.Pointer(y.UnsafeAddr())
    if xptr == yptr {
        return true // identical references
    }
    c := comparison{xptr, yptr, x.Type()}
    if seen[c] {
        return true // already seen
    }
    seen[c] = true
}

This is an example of the use of the Equal function:

fmt.Println(Equal([]int{1, 2, 3}, []int{1, 2, 3}))        // "true"
fmt.Println(Equal([]string{"foo"}, []string{"bar"}))      // "false"
fmt.Println(Equal([]string(nil), []string{}))             // "true"
fmt.Println(Equal(map[string]int(nil), map[string]int{})) // "true"

The Equal function can even handle data with loops like those in Chapter 12.3 that cause the Display to fall into an endless loop.

// Cicular linked lists a->b->a and c->c
type link struct {
    value string
    tail  *link
}
a, b, c := &link{value: "a"}, &link{value: "b"}, &link{value: "c"}
a.tail, b.tail, c.tail = b, a, c
fmt.Println(Equal(a, a)) // "true"
fmt.Println(Equal(b, b)) // "true"
fmt.Println(Equal(c, c)) // "true"
fmt.Println(Equal(a, b)) // "false"
fmt.Println(Equal(a, c)) // "false"

Call C code through cgo

Go programs may encounter the scene of accessing some hardware driver functions of C language, or querying records from an embedded database implemented in C + + language, or some linear algebra libraries implemented in Fortran language. As a general-purpose language, many libraries will choose to provide a C-Compatible API and implement it in different programming languages.

In this section, we will build a simple data compression program, using a tool called cgo that comes with Go language to support C language function calls. Such tools are generally called foreign function interfaces (ffi), and cgo is not unique among similar tools. SWIG( http://swig.org )Swig is another similar and widely used tool. Swig provides many complex features to support C + + features, but swig is not the subject of our discussion.

The compress /... Sub package of the standard library has the encoding and decoding implementation of many popular compression algorithms, including the popular LZW compression algorithm (the algorithm for Unix compress command) and the DEFLATE compression algorithm (the algorithm for GNU gzip command). Although the API details of these packages are somewhat different, they all provide compression interfaces for io.Writer type output and decompression interfaces for io.Reader type input. For example:

package gzip  // compress/gzip

func NewWriter(w io.Writer) io.WriteCloser
func NewReader(r io.Reader) (io.ReadCloser, error)

Bzip2 compression algorithm is based on the elegant burrows Wheeler transform algorithm. It runs slower than gzip, but it can provide higher compression ratio. The compress/bzip2 package of the standard library does not provide the implementation of bzip2 compression algorithm. It is a tedious work to implement a compression algorithm completely from scratch http://bzip.org There is a ready-made open source implementation of libbzip2, which is not only well documented but also has good performance.

If it is a relatively small C language library, we can completely implement it again in pure Go language. If we have no special requirements for performance, we can also use the os/exec package to run the application written in C as a sub process. cgo is used only when you need to use a complex and higher performance underlying C interface. Next, we will describe the specific usage of cgo through an example.

The codes used in this chapter are up-to-date. Because the code contained in previously published books can only be used before Go1.5. Since Go1.6, the Go language has clearly defined which Go language pointers can be directly passed into C language functions. The new code focuses on adding two functions of bz2alloc and bz2free for BZ_ Application and release of stream object space. The following comments are added to the new code to illustrate this problem:

// The version of this program that appeared in the first and second printings did not comply with the proposed rules for passing pointers between Go and C, 
// described here: https://github.com/golang/proposal/blob/master/design/12416-cgo-pointers.md
//
// The rules forbid a C function like bz2compress from storing 'in' and 'out' (pointers to variables allocated by Go) into the Go variable 's', even temporarily.
//
// The version below, which appears in the third printing, has been corrected. 
// To comply with the rules, the bz_stream variable must be allocated by C code.  We have introduced two C functions, bz2alloc and bz2free, to allocate and free instances of the bz_stream type.
// Also, we have changed bz2compress so that before it returns, it clears the fields of the bz_stream that contain pointers to Go variables.

To use libbzip2, we need to build a bz_stream structure that holds input and output caches. Then there are three functions: BZ2_bzCompressInit is used to initialize the cache, BZ2_bzCompress is used to compress the data from the input cache to the output cache, BZ2_bzCompressEnd is used to free unwanted cache. (don't worry about the specific structure of the package at present. The purpose of this example is to demonstrate how the various parts are combined.)

We can call bz2 directly in the Go code_ Bzcompressinit and BZ2_bzCompressEnd, but for BZ2_bzCompress, we will define a C language wrapper function to complete the real work. The following is the C code, corresponding to an independent file:

gopl.io/ch13/bzip

/* This file is gopl.io/ch12/bzip/bzip2.c,         */
/* a simple wrapper for libbzip2 suitable for cgo. */
# inlcude <bzlib.h>

int bz2compress(bz_stream *s, int action, 
              char *in, usigned *inlen, char *out, unsigned *outlen) {
    s->next_in = in;
    s->avail_in = *inlen;
    s->next_out = out;
    s->avail_out = *outlen;
    int r = BZ2_bzCompress(s, action);  // Author's note: there is no substantive content to directly call library functions
    *inlen -= s->avail_in;  // Note: set to zero
    *outlen -= s->avail_out; // Note: set to zero
    s->next_in = s->next_out = NULL;  // Author's note: empty. C language is a left join language, which is compiled from right to right
    return r;
}

Now let's Go to the Go language section. The first part is as follows. The import"C" statement is special. In fact, there is no package called C, but this line of statement will let the Go compiler run the cgo tool before compiling.

// Package bzip provides a writer that uses bzip2 compression (bzip.org)
package bzip

/*
#cgo CFLAGS: -I/usr/include
#cgo LDFLAGS: -L/usr/lib -lbz2
#include <bzlib.h>
#include <stdlib.h>
bz_stream* bz2alloc() { return calloc(1, sizeof(bz_stream)); }
int bz2compress(bz_stream *s, int action,
                char *in, unsigned *inlen, char *out, unsigned *outlen);
void bz2free(bz_stream* s) { free(s); }
*/

import "C"

import (
    "io"
    "unsafe"
)

type writer struct {
    w      io.Writer  // underlying ouput stream
    stream *C.bz_stream  // Author's note: BZ in C code_ stream
    outbuf [64 * 1024]byte
}

// NewWriter returns a writer for bzip2-compressed stream
func NewWriter(out io.Writer) io.WriterCloser {
    const blockSize = 9
    const verbosity = 0
    const workFactor = 30
    w := &writer{w: out, stream: C.bz2alloc()}  // Author's note: call bz2alloc in C code
    C.BZ2_bzCompressInit(w.stream, blockSize, verbosity, workFactor)
    return w
}

Note: the above is the C language source code, but the specific implementation of bz2compress is omitted. It is used to call cgo, which is explained below

During preprocessing, the cgo tool generates a temporary package that contains all C language functions or types accessed in the Go language. For example * * C.bz_stream (type) and C.BZ2_bzCompressInit * (function) *. The cgo tool calls the local C compiler in a special way to find the contents of the C header file contained in the comments before the Go source file import declaration.

CGO comments can also contain #cgo instructions to specify special parameters for the C language tool chain. For example, CFLAGS and LDFLAGS correspond to the compilation parameters and linker parameters passed to the C language compiler respectively, so that they can find the bzlib.h header file and libbz2.a library file from a specific directory. This example assumes that you have successfully installed the bzip2 Library in the / usr directory. If the bzip2 library is installed in different locations, you need to update these parameters https://github.com/chai2010/bzip2 ).

The NewWriter function calls bz2 of C language_ The bzcompressinit function initializes the cache in the stream. Another buffer is included in the writer structure for output caching.

The following is the implementation of the Write method, which returns the size of the compressed data successfully. The main body is implemented by a bz2compress function that calls the C language in a loop. As can be seen from the code, the Go program can access the BZ of C language_ Stream, char and uint types. You can also access functions such as bz2compress, and even access functions like BZ in C language_ Macro definitions like run are all accessed in C.x syntax. The C.uint type is different from the uint type of go language. Even if they have the same size, they are different types.

func (w *writer) Write(data []byte) (int, error) {
    if w.stream == nil {
        panic("closed")
    }
    var total int  // uncompressed bytes written

    for len(data) > 0 {
        inlen, outlen := C.unit(len(data), C.uint(cap(w.outbuf)))
        C.bz2compress(w.stream, C.BZ_RUN,
            (*C.char)(unsafe.Pointer(&data[0])), &inlen,
            (*C.char)(unsafe.Pointer(&w.outbuf)), &outlen
        )
        total += int(inlen)
        data = data[inlen:]
        if _, err = w.w.Write(w.outbuf[:outlen]); err != nil {
            return total, err
        }
    }
}

In each iteration of the loop, the address of the incoming data and the length of the remaining part, as well as the address and capacity of the output cache w.outbuf, are sent to bz2compress. These two length information are passed in through their addresses rather than values, because the bz2compress function may update these two values according to the compressed data and the size of the compressed data. The compressed data of each block is written to the io.Writer of the underlying layer.

The Close method has a similar structure to the Write method, which flushes the remaining compressed data to the output cache through a loop.

// Close flushes the compressed data and closes the stream.
// It does not close the underlying io.Writer.
func (w *writer) Close() error {
    if w.stream == nil {
        panic("closed")
    }
    defer func() {
        C.BZ2_bzCompressEnd(w.stream)
        C.bz2free(w.stream)
        w.stream = nil
    }()
    for {
        inlen, outlen := C.uint(0), C.uint(cap(w.outbuf))
        r := C.bz2compress(w.stream, C.BZ_FINISH, 
                           nil, &inlen, (*C.char)(unsafe.Pointer(&w.outbuf)), &outlen)
        if _, err := w.w.Write(w.outbuf[:outlen]); err != nil {
            err
        }
    }
    if r == C.BZ_STREAM_END {
        return nil
    }
}

In the above implementation, not only writing is non concurrent security, but even calling Close and Write methods concurrently may lead to program crash (Note: there are two solutions: (1)lock (2)channel).

The following bzipper program uses the bzip2 compression command implemented in our own package. It behaves like the bzip2 command on many Unix systems.

gopl.io/ch13/bzipper

// Bzipper reads input, bzip2-compresses it, and write it out
package main

import (
    "io"
    "log"
    "os"
    "gopl.io/ch13/zip"
)

func main() {
    w := bzip.NewWritter(os.Stdout)
    if _, err := io.Copy(w, os.Stdin); err != nil {
        log.Fatalf("bzipper: %v\n", err)
    }
    if err := w.Close(); err != nil {
        log.Fatalf("bzipper: close: %v\n", err)
    }

}

In the above scenario, we use bzipper to compress the dictionary of / usr/share/dict/words system from 938848 bytes to 335405 bytes. About one-third the size of the original data. Then use the bunzip2 command that comes with the system to decompress it. The SHA256 hash code of the file before and after compression is the same, which also shows that our compression tool is correct.

$ go build gopl.io/ch13/bzipper
$ wc -c < /usr/share/dict/words
938848
$ sha256sum < /usr/share/dict/words
126a4ef38493313edc50b86f90dfdaf7c59ec6c948451eac228f2f3a8ab1a6ed -
$ ./bzipper < /usr/share/dict/words | wc -c
335405
$ ./bzipper < /usr/share/dict/words | bunzip2 | sha256sum
126a4ef38493313edc50b86f90dfdaf7c59ec6c948451eac228f2f3a8ab1a6ed -

We demonstrated how to link a C language library to a go language program. On the contrary, it is also feasible to compile go into static library and link it to C program, or compile Go program into dynamic library and load it dynamically in C program (Note: in Go1.5, the go language implementation of Windows system does not support the feature of generating C language dynamic library or static library. However, the good news is that some people are trying to solve this problem. Please visit Issue11058 for details.) Here, we only show some small aspects of CGO. More details about memory management, pointers, callback functions, interrupt signal processing, strings, errno processing, terminators, and the relationship between goroutines and system threads can be discussed. In particular, the rules of how to pass go language pointers into C functions are extremely complex (Note: in short, the data pointed to by the go pointer to be passed into the C function cannot contain pointers or other reference types; and the C function cannot continue to hold the go pointer after returning; and before the C function returns, the go pointer is locked, which cannot cause the corresponding pointer data to be moved or the stack to be adjusted.) , some of the reasons are discussed in section 13.2, but they have not been clarified in Go1.5. If you want to read further, you can https://golang.org/cmd/cgo Start.

Some advice

At the end of the previous chapter, we warned to use the reflect package with caution. Those warnings also apply to the unsafe package in this chapter.

High level language makes programmers no longer need to pay attention to the instruction details of the real running program, and no longer need to pay attention to many implementation details such as memory layout. Because of the insulating abstraction layer of high-level language, we can write safe, robust and highly portable programs that can run on different operating systems.

However, unsafe package allows programmers to directly use some necessary functions through this insulated abstraction layer, although it may be for better performance. However, the cost is to sacrifice portability and program security, so using unsafe package is a dangerous behavior. Our suggestions on when and how to use unsafe package and Knuth's suggestions mentioned in section 11.5 are premature The optimization suggestions are similar. Most Go programmers may never need to use the unsafe package directly. Of course, there will always be some scenarios where it will be simpler to use the unsafe package. If you really think that using the unsafe package is the best way, you should limit it to a smaller scope as far as possible, so that other code can ignore the impact of unsafe.

Now, forget the last two chapters. It's true to write some real applications. Stay away from the unsafe package of reflect unless you really need them.

Finally, use Go to program happily. We hope you can like Go language as much as we do.

Posted on Mon, 06 Dec 2021 19:11:56 -0500 by Rayn