linux operating system: virtual file system. File management system is needed when there are many files

here , the format of the library's bookshelf, that is, the file system on the hard disk, has been built. Now we also need a library management and borrowing system, that is, the file management module.

If a process wants to write data into the file system, it needs the cooperation of many layers of components:

  • In the application layer, when a process reads and writes files, it can use system calls, such as sys_open,sys_read,sys_write et al
  • In the kernel, each process needs to maintain a certain data structure for the open file
  • In the kernel, the files opened by the whole process also need to maintain a certain data structure
  • linux can support dozens of different file systems, and their implementations are different. Therefore, the linux kernel provides a unified interface of virtual file system to operate the file system. It provides common file system object models, such as inode, directory entry, mount, etc., and methods to operate these objects, such as inode operations, directory operations, file operations, etc
  • Then, the docking is the real file system, such as ext4 file system
  • In order to read and write ext4 file system, it is necessary to pass the IO layer of block device, that is, the BIO layer. This is the interface between the file system layer and the block device driver
  • In order to speed up the reading and writing efficiency of block devices, we also have a cache layer
  • At the bottom is the block device driver

Next, we analyze layer by layer.

Parsing system calls is the most powerful key to understanding the kernel architecture. Here we just need to focus on these most important system calls:

  • Mount system call is used to mount the file system;
  • The open system call is used to open or create a file and create the o to be set in flags_ Creat: set flags to O for read / write_ RDWR;
  • Read system call is used to read the file content;
  • The write system call is used to write the contents of the file.

Mount file system

To operate a file system, the first thing is to mount the file system.

  • Whether the kernel supports a certain type of file system needs to be registered. For example, the ext4 file system needs to call register_ File system. The parameter passed in is ext4_fs_type, indicating that the file system registered is of ext4 type. One of the most important member variables is ext4_mount
static struct file_system_type ext4_fs_type = {
	.owner		= THIS_MODULE,
	.name		= "ext4",
	.mount		= ext4_mount,
	.kill_sb	= kill_block_super,
	.fs_flags	= FS_REQUIRES_DEV,
  • If a file system type has been registered in the kernel, it allows you to mount and use the file system

mount system call is defined as follows:

SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name, char __user *, type, unsigned long, flags, void __user *, data)
	ret = do_mount(kernel_dev, dir_name, kernel_type, flags, options);

The next call chain is: do_ mount->do_ new_ mount->vfs_ kern_ mount.

struct vfsmount *
vfs_kern_mount(struct file_system_type *type, int flags, const char *name, void *data)
	mnt = alloc_vfsmnt(name);
	root = mount_fs(type, flags, name, data);
	mnt->mnt.mnt_root = root;
	mnt->mnt.mnt_sb = root->d_sb;
	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
	mnt->mnt_parent = mnt;
	list_add_tail(&mnt->mnt_instance, &root->d_sb->s_mounts);
	return &mnt->mnt;

vfs_kern_mount first creates a struct mount structure, and each mounted file system corresponds to such a structure

struct mount {
	struct hlist_node mnt_hash;
	struct mount *mnt_parent;
	struct dentry *mnt_mountpoint;
	struct vfsmount mnt;
	union {
		struct rcu_head mnt_rcu;
		struct llist_node mnt_llist;
	struct list_head mnt_mounts;	/* list of children, anchored here */
	struct list_head mnt_child;	/* and going through their mnt_child */
	struct list_head mnt_instance;	/* mount instance on sb->s_mounts */
	const char *mnt_devname;	/* Name of device e.g. /dev/dsk/hda1 */
	struct list_head mnt_list;
} __randomize_layout;
struct vfsmount {
	struct dentry *mnt_root;	/* root of the mounted tree */
	struct super_block *mnt_sb;	/* pointer to superblock */
	int mnt_flags;
} __randomize_layout;
  • mnt_parent is the parent file system of the mount point
  • mnt_mountpoint is the dentry of the mount point in the parent file system
  • struct dentry represents a directory and is associated with the inode of the directory
  • mnt_root is the dentry of the root directory of the current file system, mnt_sb is a pointer to the superblock

Next, let's look at calling mount_fs mounts the file system.

struct dentry *
mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
	struct dentry *root;
	struct super_block *sb;
	root = type->mount(type, flags, name, data);
	sb = root->d_sb;

The big call here is ext4_ fs_ The mount function of type, that is, ext4 mentioned above_ Mount, read the superblock from the file system. In the implementation of file system, each structure on hard disk also corresponds to the structure of the same format in memory. When all data structures are read into memory, the kernel can operate the file system by operating these data structures.

Open file

Next, let's start by analyzing Open system calls.

SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode)
	return do_sys_open(AT_FDCWD, filename, flags, mode);
long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
	fd = get_unused_fd_flags(flags);
	if (fd >= 0) {
		struct file *f = do_filp_open(dfd, tmp, &op);
		if (IS_ERR(f)) {
			fd = PTR_ERR(f);
		} else {
			fd_install(fd, f);
	return fd;

To open a file, first use get_unused_fd_flags gets a useless file descriptor. How do I get this file descriptor?

Tasks in each process_ In struct, there is a pointer files, and the type is files_struct

struct files_struct		*files;

files_ The most important thing in struct is a list of file descriptors. Every time a file is opened, an item will be allocated in this list, and the subscript is the file descriptor.

struct files_struct {
	struct file __rcu * fd_array[NR_OPEN_DEFAULT];

For any process, by default, file descriptor 0 represents stdin standard input, file descriptor 1 represents stdout standard output, and file descriptor 2 represents stderr standard error output. In addition, if you open a file again, you will find a free location from this list and assign it.

Each item in the file descriptor list is a pointer to a struct file, that is, each block of a file will have a corresponding struct file.

do_ sys_ Calling do_ in open filp_ Open is to create the struct file structure, and then fd_install(fd, f); Is to associate the file descriptor with this structure

struct file *do_filp_open(int dfd, struct filename *pathname,
		const struct open_flags *op)
	set_nameidata(&nd, dfd, pathname);
	filp = path_openat(&nd, op, flags | LOOKUP_RCU);
	return filp;

do_filp_open first initializes the struct nameidata structure. As we know, files are a string of path names that need to be resolved one by one. This structure is used as an aid when parsing and finding paths.

There is a key member variable struct path in struct nameidata

struct path {
	struct vfsmount *mnt;
	struct dentry *dentry;
} __randomize_layout;

The struct vfsmount is related to the mounting of the file system. Another struct dentry, in addition to identifying the directory mentioned above, can also represent the file name and establish the association between the file name and inode.

The next step is to call path_openaat mainly does the following things:

  • get_empty_filp generates a struct file structure
  • path_init initializes nameidata and is ready to start node path lookup
  • link_path_walk searches the node path layer by layer for the path name. There is a large loop, which is separated by "/" and processed layer by layer
  • do_last gets the inode object corresponding to the file and initializes the file object
static struct file *path_openat(struct nameidata *nd,
			const struct open_flags *op, unsigned flags)
	file = get_empty_filp();
	s = path_init(nd, flags);
	while (!(error = link_path_walk(s, nd)) &&
		(error = do_last(nd, file, op, &opened)) > 0) {
	return file;

For example, the file "/ root/hello/world/data", link_path_walk will parse the previous path part "/ root/hello/world". When the parsing is completed, the dentry of nameidata is the parent directory "/ root/hello/world" of the last part of the path name, and the filename of nameidata is the last part "data" of the path name.

The last part of the analysis and processing, we give it to do_last.

static int do_last(struct nameidata *nd,
		   struct file *file, const struct open_flags *op,
		   int *opened)
	error = lookup_fast(nd, &path, &inode, &seq);
    error = lookup_open(nd, &path, file, op, got_write, opened);
	error = vfs_open(&nd->path, file, current_cred());

In this, we need to find the dentry corresponding to the last part of the file path. How to find it?

In order to improve the processing efficiency of directory item objects, linux designs and implements directory item cache dentry cache, which is called dcache for short. It mainly consists of two data structures:

  • Hash table dentry_hashtable: all dentry objects in dcache pass d_hash execution is linked to the corresponding dentry hash linked list
  • Unused dentry object linked list s_dentry_lru: dentry object through its d_lru pointer is linked into LRU linked list. LRU means least recently used. As long as there is it, it means that it is not used for a long time and should be released.

There is a complex relationship between the two lists:

  • The reference is 0: if a dentry in the hash table becomes unreferenced, it will be added to the LRU table
  • Referenced again: if a dentry in the LRU table is referenced again, it will be removed from the LRU table
  • Allocation: when dentry is not found in the hash table, it is allocated from the Slub allocator
  • Overdue return: dentries that have not been used for the longest time in the LRU table should be released back to the slub allocator
  • File deletion: if the file is deleted, the corresponding dentry should be released back to the slub allocator
  • Structure reuse: when a dentry needs to be allocated, but a new one cannot be allocated, take one from the LRU table for reuse

So, do_ When last () looks for dentry, of course, it first looks in the cache and calls lookup_fast.

If it is not found in the cache, you need to really look in the file system_ Open creates a new dentry and calls the inode of the lnode of the upper directory_ For ext4, the lookup function of operations calls ext4_lookup, the inode will be found in the physical file system. When it is finally found, give the new dentry to the path variable

static int lookup_open(struct nameidata *nd, struct path *path,
			struct file *file,
			const struct open_flags *op,
			bool got_write, int *opened)
    dentry = d_alloc_parallel(dir, &nd->last, &wq);
    struct dentry *res = dir_inode->i_op->lookup(dir_inode, dentry,
    path->dentry = dentry;
	path->mnt = nd->path.mnt;
const struct inode_operations ext4_dir_inode_operations = {
	.create		= ext4_create,
	.lookup		= ext4_lookup,

do_ The last step of last () is to call vfs_open really opens the file

int vfs_open(const struct path *path, struct file *file,
	     const struct cred *cred)
	struct dentry *dentry = d_real(path->dentry, NULL, file->f_flags, 0);
	file->f_path = *path;
	return do_dentry_open(file, d_backing_inode(dentry), NULL, cred);
static int do_dentry_open(struct file *f,
			  struct inode *inode,
			  int (*open)(struct inode *, struct file *),
			  const struct cred *cred)
	f->f_mode = OPEN_FMODE(f->f_flags) | FMODE_LSEEK |
	f->f_inode = inode;
	f->f_mapping = inode->i_mapping;
	f->f_op = fops_get(inode->i_fop);
	open = f->f_op->open;
	error = open(inode, f);
	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
	return 0;
const struct file_operations ext4_file_operations = {
	.open		= ext4_file_open,

vfs_ The final thing to do in open is to call F_ Op - > open, that is, call ext4_file_open. Another important thing is to fill in all the information of the open file into the structure of struct file.

struct file {
	union {
		struct llist_node	fu_llist;
		struct rcu_head 	fu_rcuhead;
	} f_u;
	struct path		f_path;
	struct inode		*f_inode;	/* cached value */
	const struct file_operations	*f_op;
	spinlock_t		f_lock;
	enum rw_hint		f_write_hint;
	atomic_long_t		f_count;
	unsigned int 		f_flags;
	fmode_t			f_mode;
	struct mutex		f_pos_lock;
	loff_t			f_pos;
	struct fown_struct	f_owner;
	const struct cred	*f_cred;
	struct address_space	*f_mapping;
	errseq_t		f_wb_err;


  • For each process, the open file has a file descriptor in file_ There will be an array of file descriptors in struct. Each file descriptor is the subscript of the array, and the contents point to a file structure, indicating the open file. This structure contains the inode corresponding to this file and the operation file corresponding to this file_operation. If you operate this file, look at this file_ The definition in operation
  • For each open file, there is a dentry corresponding to it. Although it is called directory entry, it represents not only a folder, but also a file. Its most important function is to point to the inode corresponding to this file
  • If the file structure is created after a file is opened, dentry is placed in a dentry cache. When the file is closed, it still exists. Therefore, it can maintain the relationship between the representation of files in memory and the representation of files on the hard disk for a longer time
  • Inode structure represents the inode on the hard disk, including block device number, etc.
  • Almost every structure has its own corresponding operation structure, which contains some methods

Tags: Operating System

Posted on Sun, 28 Nov 2021 01:17:35 -0500 by trinitywave