How to enable Ceph RBD mirroring

Lots of talk but not a lot of info online.
This worked for me

I’m enabling replication t pool level here. Not only images that have the ‘journaling’ feature enabled will be mirrored irrespective of the state of your pool. So to clarify, you need to enable replication on the pool AND enable the journaling feature on the image itself.

1 – Make sure the pool exists on the local AND remote clusters.

In this case our local cluster is ceph(Or default) and our remove cluster is adleast

On the ADLEast cluster run ceph osd pool create ADLWEST-vms 128

2 – Enable mirroring on the pool on both clusters

rbd mirror pool enable ADLWEST-vms pool

rbd mirror pool enable ADLWEST-vms pool --cluster adleast

3 – Add peers to the pool

rbd --cluster adleast mirror pool peer add ADLWEST-vms client.admin@ceph

rbd mirror pool peer add ADLWEST-vms client.admin@adleast

4 – Enable replication on the desired images in the pool

rbd feature enable ADLWEST-vms/VM-Cacti.raw journaling --journal-pool ADLWEST-journal

Note the journal-pool argument, this allows you to send all the journal data for that VM to a different pool, this might help you reduce the performance imapct of joutrnaling\mirroring on your cluster. Your journal will need to be as fast if not faster thatn the actual pool the image resides in else it will become a bottleneck. Also a really important gotcha, if you are using KVM(Or anything with cephx authentication i guess) the user account you are using to access the cluster(Cinder for example!?) MUST have access to this pool, otherwise you IO access will just hang inexplicably! Trust me, i learnt this one the hard way!

 

 

Useful script

List the info on all images in a pool

rbd ls -p $1 |
  while IFS= read -r line
  do
    rbd mirror image status $1/$line
  done

Should yeild a result like

 bash checkMirrorStatus.sh ADLWEST-vms
ADLWest-RGW-LB02.raw:
  global_id:   4196f19b-3ddb-4dce-a15d-0a281898298d
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47
ADLWest-RGW02.raw:
  global_id:   859a6377-9872-4f0f-9c5f-4cb69bcf101d
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47
ADLWest-Tunnel1.raw:
  global_id:   0e36b8bd-cf07-42e7-8875-ea4e63f9dcfa
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47
VM-ADLWest-PRTG.raw:
  global_id:   473c6d0a-4e6b-492b-a143-b240e4b6194d
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47
VM-Cacti.raw:
  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47
VM-OS-Net02.raw:
  global_id:   ee6532e1-c11f-4728-a327-559e91eee39e
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47
VM-SMTP01.raw:
  global_id:   4fb8a975-54e4-486a-a119-ae741c4163af
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 19:25:47

Useful command

Show the status of your image replication

rbd mirror image status ADLWEST-vms/VM-Cacti.raw
VM-Cacti.raw:
  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
  state:       up+stopped
  description: remote image is non-primary or local image is primary
  last_update: 2017-06-25 18:58:16
rbd mirror image status ADLWEST-vms/VM-Cacti.raw --cluster=adleast
VM-Cacti.raw:
  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
  state:       up+syncing
  description: bootstrapping, IMAGE_COPY/COPY_OBJECT 21%
  last_update: 2017-06-25 18:58:50

 

Then when the replication is done you’ll see something like this

rbd mirror image status ADLWEST-vms/VM-Cacti.raw
VM-Cacti.raw:
global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
state:       up+stopped
description: remote image is non-primary or local image is primary
last_update: 2017-06-25 19:23:18

rbd mirror image status ADLWEST-vms/VM-Cacti.raw --cluster=adleast
VM-Cacti.raw:
  global_id:   1aac9fbd-0eb4-47bd-a7f7-c7e0adebb5a9
  state:       up+replaying
  description: replaying, master_position=[object_number=21, tag_tid=0, entry_tid=57097], mirror_position=[object_number=6, tag_tid=0, entry_tid=10886], entries_behind_master=46211
  last_update: 2017-06-25 19:22:57

I believe that ‘entries_behind_master’ is something along the lines of how far behind the replication of the master vs the slave is. So if you have a write heavy VM it might fall quite far behind the master. But an idle VM should show zero

Throttle RBD Export

RBD export works great, sometimes too great and can overload the disks it’s writing too, if those disks happen to be serving say VM’s then goodnight VM.

Solution – Use https://github.com/Phredward/throttle

Example – Export at 50MB\s

rbd export volumes/volume-371b6662-1bc2-4dad-8545-c30ec1f0bfce - | python /root/throttle.py --bandwidth=52428800 > /data/MyExportedImage.img