How to find duplicate files in the system and quickly free disk space?

Whether it's a Windows computer or a Linux computer, there will be more or less many duplicate files left in the process of using. These files will not only occupy our disk, but also drag down our system, so it is necessary to kill these duplicate files.

This article will introduce 6 ways to find duplicate files in the system, so that you can quickly free up hard disk space!

1. Use the diff command to compare files

In our normal operation, the easiest way to compare the differences between the two files is probably to use the diff command. The output of the diff command will use the < and > symbols to show the differences between the two files. With this feature, we can find the same file.

When there are differences between the two files, the diff command will output the differences:

$ diff index.html backup.html
2438a2439,2441
> <pre>
> That's all there is to report.
> </pre>

If your diff command does not output, the two files are the same:

$ diff home.html index.html
$

However, the disadvantage of diff command is that it can only compare two files at a time. If we want to compare multiple files, the efficiency of comparing two files must be very low.

2. Use checksums

The checksum command, cksum, computes the contents of the file into a very long number (for example, 2819078353 228029) according to a certain algorithm. Although the calculated results are not absolutely unique, the possibility of different documents leading to the same checksums is similar to that of China's men's football team entering the world cup.

$ cksum *.html
2819078353 228029 backup.html
4073570409 227985 home.html
4073570409 227985 index.html

In the above operation, we can see that the checksums of the second and third files are the same, so we can think that the two files are the same.

3. Use the find command

Although the find command has no option to find duplicate files, it can be used to search for files by name or type and run the cksum command. The specific operation is as follows.

$ find . -name "*.html" -exec cksum {} \;
4073570409 227985 ./home.html
2819078353 228029 ./backup.html
4073570409 227985 ./index.html

4. Use fslint command

The fslint command can be used specifically to find duplicate files. But here's a caveat: we need to give it a starting point. If we need to run a large number of files, this command may take quite a long time to complete the lookup.

$ fslint .
-----------------------------------file name lint
-------------------------------Invalid utf8 names
-----------------------------------file case lint
----------------------------------DUPlicate files	<==
home.html
index.html
-----------------------------------Dangling links
--------------------redundant characters in links
------------------------------------suspect links
--------------------------------Empty Directories
./.gnupg
----------------------------------Temporary Files
----------------------duplicate/conflicting Names
------------------------------------------Bad ids
-------------------------Non Stripped executables

Tips: we must install fslint on the system and add it to the search path:

$ export PATH=$PATH:/usr/share/fslint/fslint

5. Use the rdfind command

The rdfind command will also look for duplicate (same content) files. Known as "redundant data lookup", this command can determine which files are the original files based on the file date, which is helpful for us to choose to delete duplicates, because it will delete the newer files.

$ rdfind ~
Now scanning "/home/alvin", found 12 files.
Now have 12 files in total.
Removed 1 files due to nonunique device and inode.
Total size is 699498 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
It seems like you have 2 files that are not unique
Totally, 223 KiB can be reduced.
Now making results file results.txt

We can also run it in dryrun.

$ rdfind -dryrun true ~
(DRYRUN MODE) Now scanning "/home/alvin", found 12 files.
(DRYRUN MODE) Now have 12 files in total.
(DRYRUN MODE) Removed 1 files due to nonunique device and inode.
(DRYRUN MODE) Total size is 699352 bytes or 683 KiB
Removed 9 files due to unique sizes from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on first bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on last bytes:removed 0 files from list.2 files left.
(DRYRUN MODE) Now eliminating candidates based on sha1 checksum:removed 0 files from list.2 files left.
(DRYRUN MODE) It seems like you have 2 files that are not unique
(DRYRUN MODE) Totally, 223 KiB can be reduced.
(DRYRUN MODE) Now making results file results.txt

The rdfind command also provides options to ignore empty files (- ignore empty) and follow symbolic links (- follow symlinks). Its common options are explained in detail below.

option Significance
-ignoreempty Ignore empty files
-minsize Ignore files smaller than a specific size
-followsymlink Follow symbolic links
-removeidentinode Delete files that reference the same inode
-checksum Identify the type of checksum to use
-deterministic Decide how to sort files
-makesymlinks Convert duplicate files to symbolic links
-makehardlinks Replace duplicate files with hard links
-makeresultsfile Create result file in current directory
-outputname Provide the name of the result file
-deleteduplicates Delete / unlink duplicate files
-sleep Set sleep time between read files (MS)
-n,-dryrun Show what should be done, but do not

Note here that the rdfind command provides the option to delete duplicate files using the - deleteduplicates true setting. As the name implies, using this option will automatically delete duplicate files.

$ rdfind -deleteduplicates true .
...
Deleted 1 files.	<==

Of course, the premise is that we must also install the rdfind command on the system.

6. Use the fdupes command

The fdupes command can also easily identify duplicate files and provides a number of useful options. In the simplest operation, it puts duplicate files together as follows:

$ fdupes ~
/home/alvin/UPGRADE
/home/alvin/mytwin

/home/alvin/lp.txt
/home/alvin/lp.man

/home/alvin/penguin.png
/home/alvin/penguin0.png
/home/alvin/hideme.png

-The r option represents recursion, which means that it will use recursion under each directory to find duplicate files. However, it is important to have many duplicate files under Linux (such as user's. bashrc and. profile files), which will cause system exceptions if deleted.

# fdupes -r /home
/home/shark/home.html
/home/shark/index.html

/home/dory/.bashrc
/home/eel/.bashrc

/home/nemo/.profile
/home/dory/.profile
/home/shark/.profile

/home/nemo/tryme
/home/shs/tryme

/home/shs/arrow.png
/home/shs/PNGs/arrow.png

/home/shs/11/files_11.zip
/home/shs/ERIC/file_11.zip

/home/shs/penguin0.jpg
/home/shs/PNGs/penguin.jpg
/home/shs/PNGs/penguin0.jpg

/home/shs/Sandra_rotated.png
/home/shs/PNGs/Sandra_rotated.png

The common options for the fdupes command are shown in the following table:

option Significance
-r --recurse recursion
-R --recurse Recursively specified directory
-s --symlinks-H --hardlinks Follow symbolic link catalog
-H --hardlinks Treat hard links as duplicate links
-n --noempty Ignore empty files
-f --omitfirst Omit the first file in each group of matches
-A --nohidden Ignore hidden files
-1 --sameline Same list match single line
-S --size Show the size of duplicate files
-m --summarize Summarize duplicate file information
-q --quiet Progress indicator
-d --delete Prompt user to save file
-N --noprompt Invalid with -- delete, keep first file in collection
-I --immediate Delete duplicates when they are encountered
-p --permissions Permissions do not duplicate a sonider file with different owners / groups or permission bits
-o --order=WORD WORD order documents according to specifications
-i --reverse Reverse when sorting
-v --version Show fdupes version
-h --help Display help

Summary

Linux system provides us with many tools to locate and delete duplicate files. Using these tools, we can quickly find and delete duplicate files on disk. I hope this sharing can help you~

-----------------

Dry cargo development engineer, Linux preacher, the world's top 500 foreign company, welcome to my official account, "good Linux", which is full of dry goods! Linux

→ "technical dry goods push"

"Exclusive data sharing"

"Experts like cloud community"

If you are interested in my topic content, you can also follow my blog: lxlinux.net

Tags: Operation & Maintenance Linux SHA1 Windows less

Posted on Tue, 10 Mar 2020 01:08:03 -0400 by cavendano