Linux

Howto Resize a Xen DRBD LVM VBD

I decided to put together this howto because I spent practically an entire day at work trying to get this to work without needing a full resync after it was all said and done. So a quick run through of the setup and then off we go.

Note: This HOWTO describes shrinking a volume but will work exactly the same … no-resync necessary … for growing a DRBD volume backed by LVM 😀

I have 3 CentOS 5.1 domains (1x dom0 and 2x domU). The dom0 is running kernel 2.6.18-53.1.21.el5xen and Xen 3.0.3 and the 2 domUs are running kernel 2.6.18-53.1.19.el5xen and DRBD 8.2.5-1.el5. All of this came from the standard CentOS 5 yum repos. Each domU has two VBDs, namely /dev/xvda and /dev/xvdb. The OS is on /dev/xvda and the backing for DRBD is /dev/xvdb. Both of these are mapped to individual LVs on the dom0, namely /dev/vg0/node1 -> /dev/xvda and /dev/vg0/node1-drbd -> /dev/xvdb on node1 and /dev/vg0/node2 -> /dev/xvda and /dev/vg0/node2-drbd -> /dev/xvdb on node2. The filesystem on /dev/drbd0 is ext3 and is currently at 512MB for the duration of this howto. The current size of /dev/drbd0 is 3G and we want to shrink it to 2G.

Here’s my drbd.conf for reference:

resource res {
        protocol C;
        startup { wfc-timeout 0; degr-wfc-timeout 120; }
        disk { on-io-error detach; }
        net { }
        syncer { rate 100M; # we're using 1Gbps crossover link }
        on node1.domain.com {
                device /dev/drbd0;
                disk /dev/xvdb;
                address 10.0.0.1:7788;
                meta-disk internal;
        }
        on node2.domain.com {
                device /dev/drbd0;
                disk /dev/xvdb;
                address 10.0.0.2:7788;
                meta-disk internal;
        }
}

Whew … ready, steady go!

!!! READ ME — YOU ARE PROCEEDING BEYOND THIS POINT AT YOUR OWN RISK !!! THE IT DEPARTMENT CAN NOT AND WILL NOT BE HELD RESPONSIBLE / LIABLE / ACCOUNTABLE FOR THE RESULTS OF ANY OF YOUR ACTIONS. THIS IS PURELY AN EDUCATIONAL WRITING AND SHOULD NOT BE USED IN PRODUCTION — READ ME !!!

Now that I’ve scared the $h!t out of you … please let us continue.

0. Ensure your filesystem is smaller than the target size of the resource

In this example, our filesystem is 512MB which is already smaller than our target resource size of 2GB. Whenever you attempt this, simply resize your filesystem down smaller than your target (say -1GB or so than your target size) and when everything is said and done, you can simply grow the filesystem up to it’s maximum size possible.

1. Take down the resource on each node

[root@node1 ~]# drbdadm down res
[root@node2 ~]# drbdadm down res

2. Dump the resource metadata to disk on each node

[root@node1 ~]# drbdadm dump-md res > /tmp/metadata
[root@node2 ~]# drbdadm dump-md res > /tmp/metadata

Do not dump the meta data on one node, and simply copy the dump file to the peer. This will not work. You must dump the resource metadata on each node!

3. Detach the VBDs from the domUs

To actively resize the LVM without restarting the domU you have to detach the VBD from the domU / resize the LV in the dom0 / reattach the VBD to the domU. To detach a VBD, you need it’s dev-id which can be found using xm block-list

[root@dom0 ~]# xm block-list node1
Vdev  BE handle state evt-ch ring-ref BE-path
51713    0    0     4      6      8     /local/domain/0/backend/vbd/57/51713
51728    0    0     4      8      1377  /local/domain/0/backend/vbd/57/51728
[root@dom0 ~]# cat /etc/xen/node1 | grep disk
disk = [ "phy:/dev/vg0/node1,xvda,w", "phy:/dev/vg0/node1-drbd,xvdb,w" ]

From what I can tell, this is linear to your domU configuration. The first device in the xm block-list output corresponds to the first device in your configuration, the second corresponds to the second device, and so forth. Therefore, in my configuration, my domU node1’s VBD /dev/xvda has dev-id 51713 and it’s VBD /dev/xvdb has dev-id 51728.

Same thing goes for domU node2:

[root@dom0 ~]# xm block-list node2
Vdev  BE handle state evt-ch ring-ref BE-path
51713    0    0     4      6      8     /local/domain/0/backend/vbd/57/51713
51728    0    0     4      8      1377  /local/domain/0/backend/vbd/57/51728
[root@dom0 ~]# cat /etc/xen/node2 | grep disk
disk = [ "phy:/dev/vg0/node2,xvda,w", "phy:/dev/vg0/node2-drbd,xvdb,w" ]

Now you can detach…

[root@dom0 ~]# xm block-detach node1 51728
[root@dom0 ~]# xm block-detach node2 51728

4. Resize the LVs which are the backing for the VBDs

In my case, I have /dev/vg0/node1-drbd and /dev/vg0/node2-drbd. In this example, I am shrinking my DRBD device down to 2GB:

[root@dom0 ~]# lvresize -L2G /dev/vg0/node1-drbd
  WARNING: Reducing active logical volume to 2.00 GB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce node1-drbd? [y/n]: y
  Reducing logical volume node1-drbd to 2.00 GB
  Logical volume node1-drbd successfully resized
[root@dom0 ~]# lvresize -L2G /dev/vg0/node2-drbd
  WARNING: Reducing active logical volume to 2.00 GB
  THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce node2-drbd? [y/n]: y
  Reducing logical volume node2-drbd to 2.00 GB
  Logical volume node2-drbd successfully resized

Enter yes when it asks you if “you really want to reduce” though you “MAY DESTROY YOUR DATA” … the force “The IT Department” is with you

5. Attach the VBDs back to the domUs

This step is simple enough:

[root@dom0 ~]# xm block-attach node1 phy:/dev/vg0/node1-drbd xvdb w
[root@dom0 ~]# xm block-attach node2 phy:/dev/vg0/node2-drbd xvdb w

6. Restore the resource metadata

This is probably the most complicated step. The jist of it is you have to create new metadata, then overwrite it with the old metadata you dumped earlier with a slight modification. You’ll see …

First create new metadata on each node

[root@node1 ~]# drbdadm create-md res
md_offset 2147479552
al_offset 2147446784
bm_offset 2147381248

Found ext3 filesystem which uses 524288 kB
current configuration leaves usable 2097052 kB

 ==> This might destroy existing data! <==
 
Do you want to proceed?
[need to type 'yes' to confirm] yes

[root@node2 ~]# drbdadm create-md res
md_offset 2147479552
al_offset 2147446784
bm_offset 2147381248

Found ext3 filesystem which uses 524288 kB
current configuration leaves usable 2097052 kB

 ==> This might destroy existing data! <==
 
Do you want to proceed?
[need to type 'yes' to confirm] yes

If you get a warning about there already being a v08 style flexible-size internal meta data block …

v07 Magic number not found
v07 Magic number not found
You want me to create a v08 style flexible-size internal meta data block.
There apears to be a v08 flexible-size internal meta data block
already in place on /dev/xvdb at byte offset 2147479552
Do you really want to overwrite the existing v08 meta-data?
[need to type 'yes' to confirm] yes

Go ahead and type ‘yes’ to overwrite it … again “The IT Department” is with you

Now do you see the line where DRBD says “current configuration leaves usuable 2097052 kB” … well we need that number. That number is the answer to life the amount of data the device can use minus the size of the internal metadata DRBD needs. The only problem is it’s in kB and we need it in sectors. You’re right … simple conversion: x KB / 1024 = y MB … y MB x 2048 = z Sectors

Therefore: 2097052 KB / 1024 = 2047.90234375 MB * 2048 = 4194104 Sectors. Now before we can restore the original metadata dump, we have to update it with the new size. Then we can restore it:

[root@node1 ~]# sed -i -e 's/la-size-sect.*/la-size-sect 4194104;/g' /tmp/metadata
[root@node1 ~]# drbdmeta_cmd=$(drbdadm -d dump-md res)
[root@node1 ~]# ${drbdmeta_cmd/dump-md/restore-md} /tmp/metadata

Valid meta-data in place, overwrite?
[need to type 'yes' to confirm] yes

reinitialising
Successfully restored meta data
[root@node2 ~]# sed -i -e 's/la-size-sect.*/la-size-sect 4194104;/g' /tmp/metadata
[root@node2 ~]# drbdmeta_cmd=$(drbdadm -d dump-md res)
[root@node2 ~]# ${drbdmeta_cmd/dump-md/restore-md} /tmp/metadata

Valid meta-data in place, overwrite?
[need to type 'yes' to confirm] yes

reinitialising
Successfully restored meta data

7. Bring up the resource on each node

If all goes well … you should simply be able to issue a drbdadm up on each node

[root@node1 ~]# drbdadm up res
[root@node2 ~]# drbdadm up res

And then verify everything is connected and UpToDate…

[root@node1 ~]# service drbd status
version: 8.2.5 ( api:88/proto:86-88 )
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by buildsvn@c5-i386-build, 2008-05-11 03:43:50
m:res  cs         st                   ds                 p  mounted  fstype
0:res  Connected  Secondary/Secondary  UpToDate/UpToDate  C
[root@node2 ~]$ service drbd status
version: 8.2.5 ( api:88/proto:86-88 )
GIT-hash: 9faf052fdae5ef0c61b4d03890e2d2eab550610c build by buildsvn@c5-i386-build, 2008-05-11 03:43:50
m:res  cs         st                   ds                 p  mounted  fstype
0:res  Connected  Secondary/Secondary  UpToDate/UpToDate  C

Everything is a-ok and no need for a full-resync!!!

8. Conclusion

This was a simple example with a filesystem of 512MB and we only shrank the resource from 3GB to 2GB but this will work for larger configurations (I personally did it from 50GB down to 25GB earlier today). Maybe one day this won’t be as complicated but for now this works and I have no complaints.