Quick Btrfs Tutorial

Btrfs, pronounced “Butter F S”, among others, is one of the many open source filesystems available on Linux. It’s features include multi-device spanning, cloning, snapshots, sub volumes, and RAID capabilities makes it attractive for building multi drive storage arrays. Btrfs offers data protection and the convenience to grow or resize its storage pool as and when needed.

Btrfs is similar to Oracle’s ZFS in several ways. Interestingly, Oracle is also supporting the development of Btrfs. It’s still considered experimental, but has been around since 2009. Seeing that Btrfs is included Red Hat Enterprise Linux (RHEL) 7, and offered by the installer alongside XFS as a disk format choice, says quite a lot about how ready some people think it is for production. Officially, Red Hat describes Btrfs’ including in RHEL 7 as a technology preview.

My use case for Btrfs is for my home file server. I’ve a bunch of disks. I’d like to pool all of them together to form a single filesystem. I would like to do RAID (ideally RAID5 but otherwise RAID1 will also suffice). I’d like to be able to use a mixture of disk sizes, and be able to maximise their utilisation. As time goes by, I want to be able to replace older disks with newer, larger disks, seamlessly, without having to manually copy data back and forth. The filesystem must provide robust recovery from data corruption and faulty disks, with no data loss. Btrfs seems to do all that.

Let me demonstrate a number of Btrfs features by way of a quick tutorial below. I have three disks for this demonstration, /dev/sdb, /dev/sdc, and /dev/sdd.

First, create a filesystem with two disks.

[root@lzsh0 mnt]# mkfs.btrfs -f /dev/sdb /dev/sdc

WARNING! - Btrfs v3.12 IS EXPERIMENTAL
WARNING! - see http://btrfs.wiki.kernel.org before using

Turning ON incompat feature 'extref': increased hardlink limit per file to 65536
adding device /dev/sdc id 2
fs created label (null) on /dev/sdb
 nodesize 16384 leafsize 16384 sectorsize 4096 size 3.64TiB
Btrfs v3.12

Check filesystem information. There are two disks, each having 1.82TB usable capacity.

[root@lzsh0 mnt]# btrfs fi sh
Label: none uuid: befcfef9-54c0-4c72-9283-23096ab5f909
 Total devices 2 FS bytes used 112.00KiB
 devid 1 size 1.82TiB used 2.03GiB path /dev/sdb
 devid 2 size 1.82TiB used 2.01GiB path /dev/sdc

Btrfs v3.12

Now, mount the filesystem, and check the filesystem space usage. Notice that data blocks are currently stored without redundancy, while system and metadata blocks are in RAID1. This is the default when the filesystem is created with two disks.

[root@lzsh0 mnt]# mount /dev/sdb /mnt/test
[root@lzsh0 mnt]# btrfs fi df test
Data, single: total=1.00GiB, used=512.00KiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=1.00GiB, used=112.00KiB

Convert the data blocks to RAID1 and check the space usage again.

[root@lzsh0 mnt]# btrfs balance start -dconvert=raid1 /mnt/test
Done, had to relocate 1 out of 3 chunks
[root@lzsh0 mnt]# btrfs fi df /mnt/test
Data, RAID1: total=1.00GiB, used=768.00KiB
System, RAID1: total=32.00MiB, used=16.00KiB
Metadata, RAID1: total=1.00GiB, used=112.00KiB

Now, I’ll copy some data into the filesystem so I can do some testing. Verify that our copy is indeed good.

[root@lzsh0 mnt]# cp -a /media/nas/data /mnt/test/
[root@lzsh0 mnt]# diff -qr /mnt/test/data /media/nas/data

Now, I’m going to add a new device, /dev/sdd, and check filesystem information.

[root@lzsh0 mnt]# btrfs device add /dev/sdd /mnt/test
[root@lzsh0 mnt]# btrfs fi sh
Label: none uuid: befcfef9-54c0-4c72-9283-23096ab5f909
 Total devices 3 FS bytes used 42.14GiB
 devid 1 size 1.82TiB used 44.03GiB path /dev/sdb
 devid 2 size 1.82TiB used 44.03GiB path /dev/sdc
 devid 3 size 2.73TiB used 0.00 path /dev/sdd

Btrfs v3.12

Notice that the new disk, which has 2.73TB usable capacity, is empty. It remains empty until some new data is written. However, it’s possible to manually rebalance the disks. I’ll do that now, and check the filesystem information again. For good measure, I’ll verify the data too.

[root@lzsh0 mnt]# btrfs balance start /mnt/test
Done, had to relocate 45 out of 45 chunks
[root@lzsh0 mnt]# btrfs fi sh
Label: none uuid: befcfef9-54c0-4c72-9283-23096ab5f909
 Total devices 3 FS bytes used 42.14GiB
 devid 1 size 1.82TiB used 22.03GiB path /dev/sdb
 devid 2 size 1.82TiB used 22.00GiB path /dev/sdc
 devid 3 size 2.73TiB used 44.03GiB path /dev/sdd

Btrfs v3.12
[root@lzsh0 mnt]# diff -qr /mnt/test/data /media/nas/data

I’m going to remove a drive now. After that, check filesystem information and verify the data.

[root@lzsh0 mnt]# btrfs device delete /dev/sdc /mnt/test
[root@lzsh0 mnt]# btrfs fi sh
Label: none uuid: befcfef9-54c0-4c72-9283-23096ab5f909
 Total devices 2 FS bytes used 42.14GiB
 devid 1 size 1.82TiB used 44.03GiB path /dev/sdb
 devid 3 size 2.73TiB used 44.03GiB path /dev/sdd

Btrfs v3.12
[root@lzsh0 mnt]# diff -qr /mnt/test/data /media/nas/data

Now, here’s the interesting big. I’ve two drives left. In RAID1, the system should still be able to survive a corrupted disk. I’ll simulate a corrupted disk by using dd to write zeros to one of the drives.

[root@lzsh0 mnt]# dd if=/dev/zero of=/dev/sdb bs=1M count=1K seek=1K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.377679 s, 2.8 GB/s
[root@lzsh0 mnt]# dd if=/dev/zero of=/dev/sdb bs=1M count=1K seek=10K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 0.378561 s, 2.8 GB/s
[root@lzsh0 mnt]# dd if=/dev/zero of=/dev/sdb bs=1M count=1K seek=100K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1.24496 s, 862 MB/s
[root@lzsh0 mnt]# dd if=/dev/zero of=/dev/sdb bs=1M count=1K seek=200K
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1.17177 s, 916 MB/s

Now, I’ll scrub to manually check the data blocks and recover corrupted ones. The scrub starts in the background. I run a status check once, to see that the scrubbing is still in progress. Then verify the data. This is happening before the scrub fixes up the bad blocks. Btrfs will in any case silently detect and repair corrupted blocks as it finds them. Finally, I run the status command again to check that the scrub is complete, and get the repair statistics.

[root@lzsh0 mnt]# btrfs scrub start /mnt/test
scrub started on /mnt/test, fsid befcfef9-54c0-4c72-9283-23096ab5f909 (pid=21488)
[root@lzsh0 mnt]# btrfs scrub status /mnt/test
scrub status for befcfef9-54c0-4c72-9283-23096ab5f909
 scrub started at Tue Aug 26 13:19:25 2014, running for 110 seconds
 total bytes scrubbed: 2.83GiB with 169424 errors
 error details: csum=169424
 corrected errors: 169424, uncorrectable errors: 0, unverified errors: 0
[root@lzsh0 mnt]# diff -qr /mnt/test/data /media/nas/data

I’ll check the scrub status again, when it’s finished, to see the completed repair statistics.

[root@lzsh0 mnt]# btrfs scrub status /mnt/test
scrub status for befcfef9-54c0-4c72-9283-23096ab5f909
 scrub started at Tue Aug 26 13:19:25 2014 and finished after 1647 seconds
 total bytes scrubbed: 84.27GiB with 238575 errors
 error details: csum=238575
 corrected errors: 238575, uncorrectable errors: 0, unverified errors: 0

Cool, isn’t it?

To summarise, I’ve created a Btrfs filesystem, added a device, deleted a device, wrote rubbish to the raw disk to corrupt it, and then recovered from the corruption. There was no data loss, and the test data copied into the filesystem is intact. This demonstration shows several features:

The ability to add disks online.
The ability to mix disks of different sizes, and balance usage across them.
The ability to change RAID types while the filesystem is in use.
The ability to remove disks while the filesystem is in use. It’s useful to be able to remove disks from a storage pool.
Despite forcibly writing rubbish to the drive, no data was lost.

There are many more useful features of Btrfs. Subvolumes, for example, enable you to partition your storage pool much like how you partition your hard disk drive, except that the former isn’t a fixed block device. In fact, Btrfs subvolumes are almost like subdirectories. You can still set quotas on subvolumes in order to constrain the space usage.

Btrfs is supposed to support RAID5 and RAID6 too. However, the changelog states: (typo noted)

Basic support for RAD5/6 profiles, no crash resiliency and scrub support

Doesn’t sound good to me. I would love to use RAID5. But I suppose with online rebalancing, it’s easy to convert when the time comes. A more interesting future feature is per-subvolume RAID level configuration.

I hope this quick tutorial has been useful in demonstrating some of the key features of Btrfs that makes it useful as a filesystem for a multi-disk NAS.

4 thoughts on “Quick Btrfs Tutorial”

Shpetim Aliaj says:

November 20, 2014 at 3:09 pm

very interesting. have you tried to use it on xenserver ?

1. Lai Zit Seng says:
  
  November 24, 2014 at 2:09 am
  
  Not tried on xenserver. So far, I’ve only had it running on CentOS 7 boxes.
  
Lai Zit Seng says:

November 24, 2014 at 2:09 am

Not tried on xenserver. So far, I’ve only had it running on CentOS 7 boxes.

Pingback: Building a Private Cloud Storage | Zit Seng's Blog

4 thoughts on “Quick Btrfs Tutorial”

Leave a Reply to Lai Zit Seng Cancel reply

Categories

Archives