Nasty VMWare snapshot problem

This is a more technical post this time, but maybe it helps somebody who has a simliar problem. So….

I was called late last week by a friend. She runs a local family business and her question was if I knew something about VMWare.

When I looked at their server (the ERP system was basically affected, none of their registers was actually working anymore), it became pretty evident that they were running VMWare workstation to host the system in a virtual machine. And that basically their part-time admin had used Snapshots within the vm to do some sort of backups. The other thing that was also evident was that the system was not booting anymore – with the error message

“The parent virtual disk has been modified since the child was created”

After some research it became clear that the chain of snapshots was broken (explained for example here). But how to fix it? Basically there was a problem that the chain of snapshots was broken.

This article explains the problem and solution fairly well and also explains the tools needed.

So I extracted the discriptor files this way of all 18 snapshots that existed, some of them really small. And I found the following dependency of the snapshots:

1=> 2 => 6 => 4 => 5 =>8 => 3 => (7,9,10,11,12,14,1)

So snapshot 3 was pointing to a lot of snapshots, including 1. So besides the fact that this is a circle and therefore can not work, it also lead us to the root cause of the problem: The base image was overwritten!

Now, reading more about this issue, there are some hints that 7-zip was a way to look into a vmdk file. My friend found an ealier vmdk of the vm that I was able to list the content easily with 7-zip. So, full of hope I was thinking: If I can get access to the latest snapshot this way, I should be able to extract the latest version of the database and config to be then used to restore the system.

BUT – nope, 7-Zip does not allow you to access snapshots, just does not work….

So I started to look for other ways. And basically, being a linux guy somewhat, that came handy now. There is a tool in Linux (discribed here in detail) that allows you to mount Guest file systems in your native OS, called guestmount which is part of qemu. I was hoping that maybe this tool would allow me to now mount the snapshot – also was not working ….

Now, I found a solution – which is the combination of all these things:

back to the chain, the problem was the first snapshot or base file, that was missing. Being able to extract the CID from that file and update it reference in the 2nd file. I was then able to mount snapshot 2 – YEAH!!!

Whith this then I was able to also mount Snapshot 3, the last working one before things went crazy. Having it mounted finally (and as the server was Windows and it was a fairly safe bet to mound sda1 out of the image, I had access to the content – Voila.

Then it was just a matter of copying the data from the ERP DB and required config onto a USB stick (not even 2GB) and hand it back.

Now we need to keep our fingers crossed that the data is not corrupt (which I hope for as this is the directory that is constantly overwritten in the VM) and we should be able to restore the system. But that is a differnt story.

 

Anyway, hopefully this helps somebody who runs into the same problem. Maybe even a base VM with OS would allow this process … but I have not tested it…

 

Anyway, there is one thing the teaches us all: BACKUPS ARE NEEDED. NOT ON THE SAME COMPUTER OR EVEN THE SAME HARDDISK, BUT ON A SAFE STORAGE THAT IS INDEPENDANT.

If you do backups, they are unpleasant to do, but they help you a lot when you need them!

This entry was posted in IT, Linux and tagged . Bookmark the permalink.

1 Response to Nasty VMWare snapshot problem

  1. Geraldine says:

    Your mode of explaining all in this article is actually good, all be capable of effortlessly understand it, Thanks a
    lot.

Leave a comment