Two days ago, the development post of Telecom Tianyi cloud on the HXD line asked what kinds of file systems there are... On the one hand, I felt that we must know astronomy and geography in order to develop. On the other hand, I felt that the endorsement was limited to the ability to improve. Therefore, this paper attempts to explore the underlying structure of Ext4 file system in Linux system.
The only disadvantage is that there is no legend.
An excellent tutorial combining theoretical knowledge and practice, but at present, this paper does not introduce the Ext4 file system.
A block is a group of sectors between 1KiB and 64KiB, and the number of sectors must be an integral power of 2.
Here, sector refers to cluster, which is the smallest storage unit of disk on the physical level.
Blocks are in turn grouped into larger units called block groups.
Quoted from https://blog.51cto.com/u_15265005:
Ext4 file system divides disk space into several groups, and manages disk space in this group. This group is called Block group Group), which contains metadata to manage the disks in this area.
Inode refers to fields in an inode table entry.
In the Linux operating system, files are identified by inodes, and each file has an inode node on the disk. For Ext2 file systems, these inode nodes are usually placed in a relatively centralized area, which is called the inode table.
View all mounted devices
#View instructions for the file system df -T #View disk name ll /dev/*
View file system superblocks
sudo dumpe2fs -h /dev/sda5
Inode count: 1277952 Block count: 5111040 Reserved block count: 255552 Free blocks: 2173751 Free inodes: 1005093 First block: 0 Block size: 4096 One Block Can save 4096 bit，block It is the basic unit of file storage in the file system Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 One Inode Corresponds to one file, so a group can have up to 8192 files. A file system can contain several files Group，therefore Inode count Much larger than here Inodes per group Inode blocks per group: 512 Inode It's too much, so it takes up more than one block. For this attribute, my understanding is: Inode blocks per group * (BlockSize/Inode size) = BlockSize 512 * ( 4096/256 ) = 8192 Flex block group size: 16 Filesystem created: Mon Oct 5 13:24:59 2020 Last mount time: Thu Sep 16 00:01:46 2021 Last write time: Thu Sep 16 00:01:42 2021 Mount count: 40 Maximum mount count: -1 Last checked: Mon Oct 5 13:24:59 2020 Check interval: 0 (<none>) Lifetime writes: 88 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Here Inode size Reference Inode Node size (in bits) Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 First orphan inode: 655386 Default directory hash: half_md4 Directory Hash Seed: 11778f68-41b9-4d07-9b77-714815ee7721
Check the Stat of a small file. Why are Blocks 8?
linux@ubuntu:~/temp$ cat 1.txt 123456 linux@ubuntu:~/temp$ stat 1.txt File: 1.txt Size: 7 Blocks: 8 IO Block: 4096 regular file Device: 805h/2053d Inode: 398584 Links: 2 Access: (0664/-rw-rw-r--) Uid: ( 1000/ linux) Gid: ( 1000/ linux) Access: 2021-09-16 04:05:42.486923173 -0700 Modify: 2021-09-11 19:36:52.889907621 -0700 Change: 2021-09-11 19:36:52.889907621 -0700 Birth: -
The definition of Block in Stat instruction is a unit, which is equivalent to 512 bits.
On the other hand, in the above, the Block Size is 4096 (bits). Regardless of the size, the two files will not occupy the same Block (in other words, Block is the basic storage unit in the file system and cannot be divided). Finally, 4096 / 512 = 8 .
View Group information
Group information(Above query'Super block'All instructions are returned together Group Information about): Group 4: (Blocks 131072-163839) csum 0xfa75 [INODE_UNINIT, ITABLE_ZEROED] 163839-131072+1=32768, Corresponding to above Blocks per group Block bitmap at 1032 (bg #0 + 1032), csum 0xf68bc9f4 Inode bitmap at 1048 (bg #0 + 1048), csum 0x00000000 Inode table at 3108-3619 (bg #0 + 3108) Corresponding to the above question Inode blocks per group(512) corresponding Inode Per Group(8192) 0 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes Free blocks: Free inodes: 32769-40960
Analyze INODE structure
About small end sequence:
All fields in ext4 are written to disk in little endian order.
HOWEVER, all fields in jbd2 (the journal) are written to disk in big-endian order.
The Journal module is related to disk troubleshooting and will not be discussed in depth here. In short, we will use the knowledge of small end order when interpreting the INODE structure in the following part.
(Need to use debugfs，reference resources: https://blog.csdn.net/xingkong_678/article/details/40687209) debugfs: stat ./1.txt --Output results———— Inode: 398584 Type: regular Mode: 0664 Flags: 0x80000 0x80000 Inode uses extents (EXT4_EXTENTS_FL)，EXT4 Medium Inode Node adoption Ext Tree method to store the sequence number of physical blocks occupied by a file (the contents of a file are stored in blocks orderly, but these blocks are Group The positions in are not necessarily continuous, the former is called "logical block" and the latter is called "physical block"). Generation: 1037448172 Version: 0x00000000:00000001 User: 1000 Group: 1000 Project: 0 Size: 7 File ACL: 0 Links: 2 Blockcount: 8 there Blockcount Medium Block It is still a unit, equivalent to 512 bits Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x613d67c4:d42ba694 -- Sat Sep 11 19:36:52 2021 atime: 0x61432506:74176e94 -- Thu Sep 16 04:05:42 2021 mtime: 0x613d67c4:d42ba694 -- Sat Sep 11 19:36:52 2021 crtime: 0x613d620c:592e44c4 -- Sat Sep 11 19:12:28 2021 Size of extra inode fields: 32 Inode checksum: 0xc67f4100 EXTENTS: (0):2670562 --Output results———— Or the following instruction directly displays the serial number of the physical block occupied by the file debugfs: blocks ./1.txt
Reads the data of the specified block number
sudo dd if=/dev/sda5 bs=4096 count=1 skip=2670562 If you accidentally lose the wrong bs，The instruction will output unexpected content. Presumably, the reason is that the instruction will be based on the given bs Partition the file system into blocks and find the corresponding blocks.
Try Inode for a slightly larger file!
-Output results—— Inode: 404013 Type: regular Mode: 0600 Flags: 0x80000 Generation: 3563925921 Version: 0x00000000:00000001 User: 1000 Group: 1000 Project: 0 Size: 27781 File ACL: 0 Links: 1 Blockcount: 64 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6142ec58:ac656cf4 -- Thu Sep 16 00:03:52 2021 atime: 0x6143f98d:553064f8 -- Thu Sep 16 19:12:29 2021 mtime: 0x613d9245:44243738 -- Sat Sep 11 22:38:13 2021 crtime: 0x5f7b2021:4a0c5a08 -- Mon Oct 5 06:31:13 2020 Size of extra inode fields: 32 Inode checksum: 0xcaf276a7 EXTENTS: (ETB0):2111400, (0):1626117, (1):1618417, (2):1618715, (3):1618548, (4):2107063, (5):2105480, (6):2109895 there ETB0 yes Extent A tree is a data block used to maintain the relationship between logical blocks and physical blocks. (0,1,2....)It can be regarded as logical block serial number, and the number after colon is 1626117,1618417,...It can be regarded as the serial number of the physical block. -Output results——
Try a larger file (~ 900MB)
Inode: 12 Type: regular Mode: 0600 Flags: 0x80000 Generation: 2105891747 Version: 0x00000000:00000001 User: 0 Group: 0 Project: 0 Size: 968110080 File ACL: 0 Links: 1 Blockcount: 1890848 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5f7b109c:a60a2c64 -- Mon Oct 5 05:25:00 2020 atime: 0x6142ebda:b44661b8 -- Thu Sep 16 00:01:46 2021 mtime: 0x5f7b109c:a60a2c64 -- Mon Oct 5 05:25:00 2020 crtime: 0x5f7b811b:2df0e80c -- Mon Oct 5 13:24:59 2020 Size of extra inode fields: 32 Inode checksum: 0x6e263862 EXTENTS: (ETB0):33796, (0-32767):34816-67583, (32768-63487):67584-98303, (63488-96255):100352-133119, (96256-126975):133120-163839, (126976-159743):165888-198655, (159744-190463):198656-229375, (190464-223231):231424-264191, (223232-236354):264192-277314 Different from the primary and secondary index nodes in textbooks, EXT4 Another method is used to record the data used Blocks. It is not difficult to notice that the intervals in parentheses correspond to 0-236354，One Block There are 8 512 bit blocks, 236354*8≈1890848，that is Blockcount So many Block Stored in several segments Block In consecutive segments, corresponding to each interval without brackets.
Extents are arranged as a tree. Each node of the tree begins with a struct ext4_extent_header.
A node in the extension tree occupies a Block, and the extension_header will indicate whether it is a leaf node or a non leaf node.
If the node is an interior node (eh.eh_depth > 0), the header is followed by eh.eh_entries instances of struct ext4_extent_idx; each of these index entries points to a block containing more nodes in the extent tree.
For non leaf nodes, it will contain several types of struct ext4_ extent_ EH of idx_ Entries, pointing to each node of the next layer.
If the node is a leaf node (eh.eh_depth == 0), then the header is followed by eh.eh_entries instances of struct ext4_extent; these instances point to the file's data blocks.
For leaf nodes, it will contain several types of struct ext4_ EH of ext_ Entries, pointing to specific file block segments (e.g. 34816-6758467584-98303,... Above).
The root node of the extent tree is stored in inode.i_block, which allows for the first four extents to be recorded without the use of extra metadata blocks.
When files occupy fewer blocks, they will be directly in inode. I_ Store Extent Nodes in block, saving space.
Next, we work with the structure diagram of the extension tree( address )And the hexadecimal code of ETB0 to view the structure of the Extent Node:
eh_ Depth (purple): the number of node layers. If the value is zero, it indicates that the node is a leaf node.
ee_block (red): the first file block number corresponding to the range (extent)
ee_len (green): the number of blocks contained in the interval. The document indicates that if the value is > 32768, it indicates that the interval is not initialized (to be further studied).
Because the blocks contained in the interval are continuous, only the block number of the first block needs to be recorded here:
ee_start_hi (blue): the upper 16 digits of the actual block number
ee_start_lo (black): the lower 32 bits of the actual block number (corresponding to 3481667584100352 in the output result above)
Analyze folder structure
-Output results—— Inode: 414868 Type: directory Mode: 0775 Flags: 0x80000 Generation: 1424482529 Version: 0x00000000:00000030 User: 1000 Group: 1000 Project: 0 Size: 4096 File ACL: 0 Links: 2 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x6143e6b3:16f62484 -- Thu Sep 16 17:52:03 2021 atime: 0x6143f20c:9b797f04 -- Thu Sep 16 18:40:28 2021 mtime: 0x6143e6b3:16f62484 -- Thu Sep 16 17:52:03 2021 crtime: 0x604cb298:c1345480 -- Sat Mar 13 04:39:52 2021 Size of extra inode fields: 32 Inode checksum: 0x89eda6de EXTENTS: (0):1584381 -Output results—— Use the following instructions to view Block Contents in: sudo dd if=/dev/sda5 bs=4096 count=1 skip=1584381 | hexdump -C
In Linux, the essence of a folder is the mapping of the file name to the Inode node:
Directory is more or less a flat file that maps an arbitrary byte string (usually ASCII) to an inode number on the filesystem.
Secondly, it should be noted that the folder structure in Ext4 file system is divided into two types: (1) linear (Classic) directories and (2) hashtree directories. If it is the latter, the Inode corresponding to the folder will be added with a flag with a value of 0x1000.
Next, combined with the references, we analyze the content of 158438 Block. The Flags value above is 0x80000, indicating that this is the first type of folder structure.
Inode (red): inode, the inode number corresponding to the file
rec_len (green): the length of the file structure (for example, describe that the first file occupies a total of 12 bytes, here is 0c 00).
name_len (blue): file name length
file_type (yellow): file type
??? (pink): unknown content. The front part corresponds to the file name "1.txt". Because it indicates that the file name length is 5, the pink part will not affect the file name, and because rec_len is 16, and the pink part is a part of the file structure corresponding to "1.txt". I speculate that this part of the content is only useless content that has not been assigned to zero.
Contents to be supplemented
- Changes to the parent folder of a file after it is deleted
- Structural analysis of HashTreeDirectories
- Analysis of log system structure