Corrupted USB files… a new hope.

Star Wars USBs – not for sale (because I don’t sell them)

So the corrupted USB files in question have been found and restored. Yes, deleted USB files can be recovered. Awesome! But, the question is: how?

The last post raised a few questions, such as:

  • Didn’t I delete the files? How come they’re still there?
  • How did these programs recover deleted data?
  • How do programs know just from the CHK files which files are which?
  • Or simply, what is a file system?

Let’s explore three topics that will aid us in answering these intriguing questions:

  1. Logical Structure of a Disk
  2. Windows File Systems
  3. Encoding Standards and Hex Editors

1. The anatomy of a disk

A disk is hardware. But, the computer can only use the hard disk if there is a logical structure present. A logical structure organizes data in the form of files and directories in the shape of a tree. A logical structure also helps the computer resolve any disk-related problems.

This logical structure is known as the file system. A file system is composed of clusters – the smallest accessible storage unit on a hard disk. The disk is divided this way so that it can be used efficiently. Files that we access, such as documents, pictures, audios, etc. are all assigned a number of clusters. The size of the cluster can be configured manually, but it greatly depends on disk partitioning and the size of the volume. The size of a cluster can range from 512 to 4096 bytes.

Sometimes, these clusters can get lost. This is usually a FAT error (FAT is the name of a Windows file system called File Allocation Table, and no, it is not a fat joke), where the OS marks clusters as being in use but in reality have no file allocated in them. To find these lost clusters, you can issue the CHKDSK command.

The above screenshot displays the results of the CHKDSK command. We can see that no lost clusters were found. And if there were, Windows would correct it. Awesome! The three most popular originators of file systems are Windows, Linux, and Mac OS X. Each of these have their own “line” of file systems. But they all follow the principles that we have highlighted so far.

Now, let’s delve deeper into a specific file system created by Windows and commonly used in flash drives: FAT.

2. Windows File Systems

Windows has three main file systems:

  • FAT
  • FAT32
  • NTFS

NTFS is the better FAT, and it is widely used. However, because of it’s design and simple folder structure, FAT is ideal for thumb drives.

To understand how the corrupted files were recovered, we have to understand the structure of the FAT file system.

The structure of a FAT volume

Great, new terms. The FAT file system is pretty straightforward. It’s purpose is just to store data and know where to find it later. It starts with the Boot sector, then two file allocation tables (FAT), then the root directory, and finally any other created directories and all data.

Boot SectorThis initial part of the volume is very important to the computer. It contains executable code and the data required by said code, and also includes information about the file system and the volume itself.
File Allocation Table (1 + 2)This table identifies each cluster in the volume and labels them, in a sense. The labels are:

◉ Unused (0x0000)
◉ Cluster in use by a file
◉ Bad cluster (0xFFF7)
◉ Last cluster in a file (0xFF8-0xFFF for FAT12; 0xFFF8-0xFFFF for FAT16;
0xFFFFFF8-0xFFFFFFFF for FAT32)

This table is so important that two copies are made of it.

NOTE: Don’t worry about those values following each label, it will all make sense in a minute.
Root DirectoryA directory at the start of the volume that stores directory entries.
Other Directories and All FilesThe directory entries are 32 bytes and store a file’s name, size, starting cluster and time stamp (last-accessed, created and so on) information.

The FAT file system is known for having no organization to the FAT folder structure, so files are given clusters on a first come, first serve basis. Because of this, entries on the file allocation table can link many clusters scattered throughout the volume. It resembles kind of a form of a chain: a file contains a cluster that is linked to another cluster somewhere else, and so on. These entries are known as file-allocation chains because of this.

This design is like filling the seats at a theater without prior reservation. Those who come early get to sit in the front row, followed by many more. Some people sit alone, and others sit in a group. The group may be split, it all depends on the time they came in.

There is much more information related to the architecture of FAT – processes, interactions, etc. that run on the low, but this will suffice 🙂 Now let’s see what this has to do with recovering our lost USB files.

Our friends at Microsoft helped us out with the pictures, and provide a comprehensive look on the FAT file system. I highly recommend reading it! Check out these links:

https://social.technet.microsoft.com/wiki/contents/articles/6771.the-fat-file-system.aspx

https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2003/cc776720(v=ws.10)

3. Encoding Standards and Hex Editors

Great! We have deconstructed the FAT file system. But, how does it look like in hex? Probably something like this:

The beginning of a FAT partition in hex.

This is just a snippet of what an actual FAT partition looks like. Hex Editors such as HxD can help us interpret the contents of files and partitions in its rawest format – binary level. But in reality, it’s of not much help. It’s overwhelming! Although it may seem like a whole lot of nothing, each of these hexadecimal values represent the beginning and end of each part of the FAT file system we’ve talked about – the boot sector, both file allocation tables, the root directory and all other directories and files. Remember the values next to the cluster “labels” on the table above? Those values are here! They mark the beginning and end of a sector (0xFFFFFF8-0xFFFFFFFF), or that there is nothing at all (0x0000).

Want to learn how to read hex? There’s a great blog post in Asanka P. Sayakkara’s blogspot, Lecturer at the University of Colombo School of Computing (UCSC), Sri Lanka. And no, he is not a rival blog. He explores a FAT32 partition in its entirety, and even includes a copy of the disk image you can download to follow along in your own hex editor of choice. Find his post here!

Alright, we’re [—] this close to unlocking the mystery of how our deleted files were recovered. First, a recap. We understand that all drives have a file system, and in the case of thumb drives, it’s FAT. We have taken a semi in-depth approach to the architecture of FAT. And we understand that each hex value has a special meaning. In fact, it marks the difference of the four main parts of FAT.

Now here’s the truth. When you delete a file, whether it be on your phone, on a thumb drive, in the computer, or even in the cloud, that data is not truly deleted. It has simply been flagged and marked as available. Until the drive decides to rewrite it, the information is still there.

A SIDE NOTE… in the field of digital forensics, this is the fundamental truth that gives experts hope that they will find incriminating evidence on a device 🙂

So, as long as we didn’t write anything new on the flash drive, then yes! There is a chance to recover ALL of our lost files.

My theory on the deCHK tool is that it scans through the whole drive, looking for specific markers that identify the start of the file and even it’s type: pdf, doc, jpg, mp3, mp4, exe and many more. Once it identifies it, it brings all those clusters related to the file together and saves it under the correct file type.

Each file type has it’s special marker, or first bits. Below is a table of some file types and their first bits.

jpeg0xffd8
bmp42 4D
gif47 49 46
png89 50 4E
pdf25 50 44 46
docd0 cf 11 e0 a1 b1 1a e1
pptx50 4b 03 04 14 00 06 00
jnt4e 42 2a 00
epub50 4b 03 04
zip50 4b 03 04
rar52 61 72 21 1a 07
whv30 26 b2 75 8e 66 cf 11
flv46 4c 56 01
mp466 74 79 70
avi41 56 49 20
mp349 44 33 03
aiff41 49 46 46
wav7 41 56 45

Once the program runs and does its thing, you will have recovered all of your supposed lost files in no time. Awesome!

Some videos that helped me understand this concept further are on YouTube. Check these out for additional content.

Aftertaste

We know the theory, and we put it into practice. Thanks to our studies on the technicalities of file systems we have arrived to accurate conclusions on how to safely recover lost files and why it works. Wicked. With a bit of theory, commands, and free tools, we have achieved the same results of a tool that costs MONEY – so it’s worth it.

If you enjoyed this brief read, I suggest exploring other topics to further deepen your knowledge on hard disks and file systems, such as:

  • HDD tracks and sectors
  • Data density
  • SSD (NAND flash memory, controller, DRAM, and host interface)
  • Master Boot Record (MBR) and GUID Partition Table (GPT)
  • BIOS Parameter Block (BPB)
  • Slack space
  • Windows, Mac, and Linux Booting Process
  • Linux file systems (ext, ext2, ext3, Xia, MS-DOS, VFAT, NFS, HPFS, SMB)
  • Mac file systems (UFS, HFS, HFS+, APFS)
  • RAID, JBOD, NAS, and SAN

Maybe these are topics we can address in future posts. Let me know your complaints, comments, and concerns below!

One Comment