Hello,
I start this thread hoping to attract the attention of the vmware developers.
The subject to be discussed was discussed previously over and over again. The answers from vmware people was about the same, over and over again
Yes, we are talking about the infamous .vmem files. According the vmware people these files are not only harmless, but as they get used only once , and more so they bring some benefits which we will not discuss here.
Unfortunately, those claims are not quite true, and LOTS of people noticed the hard way that those vmem files actually do have quite a big impact on performance . I will present my findings here hoping that someone will listen.
- it seems that the vmem files are not written only when the hypervisor needs to evacuate the current vm from memory to the disk due to shortage of ram - but instead they get written whenever a guest os writes data in his own memory. It almost looks as the vmem is not just a last resort swap, but actually always tries to stay in sync with the guest's memory.
- while this behaviour may not be so easy to notice with windows guests, it more noticeable with linux guests. By default the linux kernel tries to maximize the benefit of the ram allocated to it by using it for cache (when no other process needs more memory, of course). This policy has the effect that on a linux system the memory is almost fully occupied : a small part with application text and data and the rest with various buffers and caches. Basically, if you read a file it will remain in the cache until someone else requests that file again or until some process requires memory and the kernel will sacrifice the cached pages and give them to the requesting process.
- Due to this mode of operation, the memory of a linux system will always be used and reused for various data. Now if the vmware behaviour is to try to keep sync between the guest memory and the vmem file, you can deduce what happens when all the memory inside the guests will be continuously crunched (by the applications running inside it or by its operating system) : vmware will try to keep all memory in sync with all the vmem files. It does not matter that swapping would not be necessary - the vmem files must be in sync, so we'll probbably have lots of i/o operations rushing to the storage just to keep the vmems in sync.
- To add insult to injury, the default placement of the vmem files is in the same directory with the guest image files, so all this running arround will impact the same storage that is used by the guests, dramatically affecting the throughoutput they are able to get. This is why people who used mainMem.useNamedFile = "FALSE" seem to have improved their results : because this option will cause the vmem files to be created on the host's temporary directory on linux (/tmp) and in the host's swap in the case of windows hosts. In many cases /tmp points to another filesystem. Having it to stay on a ramdisk (shmfs for example) will have dramatic benefits, but at the cost of using twice as much real ram : the ram directly allocated to the guest, and its copy in the form of a vmem file.
- Now let's try to reproduce the observations above hands-on. You will need :
A linux host (For example RHEL 5.3 or Centos 5.3 64 bit ) with enough ram, for example 16G.
Vmware server 2.0 installed on that host.Configure vmware to swap as little as possible, so we won't be accused of shooting ourselves in the foot.
dstat installed on the host. If you use RHEL or Centos you can get dstat from epel.
two linux guests , give them as much memory as possible so the total memory of the guests maxes the server's physical memory minus some reserved for the host os and vmware tools. You can just put 8G for each guest - vmware will give the first 8G, and will ask to reduce the memory size for the second, which you can gladly accept.You could even create one VM at first and just make a copy later.
Steps to reproduce :
1.After you installed the linux guests, log on to each one and create a very big file in each of them, bigger than the ram of the host system. I created a 24G file in each guest. Wait untill al data is written to the disk, and then power cycle your guests.
2.After you rebooted the guests, log on each of them (or even just on one of them ) and issue a simple command that will read the big file created at the previous step. For example cat bigfile >>/dev/null will suffice.
3. Go back to your host system and launch dstat. Now given the fact that you are only READING data from the physical storage, you would expect to see high values for disk reads. However, you will probabbly see alsovery, very high values for "disk writes total", even higher values that "disk reads total". Now remember, all we do is READ data from the storage. The only thing that gets written by our guest activity is the guest's ram. Where are those writes coming from then ? I'm quite certain that it's vmware trying to mirror the guest's changing memory on the vmem file. Also remember the previous mention : when reading a file, the linux kernel will try to cache that file in the memory, so the hypervisor will actually see a lot of memory being changed by the guest, and will promptly start to copy it on the disk. If you open a second console to the reading guest you could see the buffers expanding as the big file is read (use top or free or vmstat for that ).
4. Now for the final test, power off your guests and add mainMem.useNamedFile = "FALSE" to each guest's vmx file. Have /tmp on another filesystem than the vmware data store(s) and have enough space there.Start your guests again.
5. Give the reading command in the guests again (cat bigfile >>/dev/null). Also open a new console and leave a watch on df /tmp .
6.Notice again with dstat in the host system how you get a lot of write activity on the disks. Now in the second console, you will see that at first the /tmp filesystem is almost unused,but as the reading progresses, the space occupied on that filesystem grows and grows, until it reaches about the sum of your guests allocated memory (about 13G in my case)
Well, I don't know vmware internals and I don't pretend to be an expert in vmware, but IMHO vmware has a problem here. Be it due to a bug, or be it due to a (bad) design decision, the reality is that the current working mode is flawed. On VERY fast storage subsystems, the problems are not so visible, because the storage keeps the pace both with the guest's disk i/o AND vmware's threads running after the guest's memory. But on more common storage, and with enough guests running (without overcommiting memory !) the impact is huge. And you end up with huge loads on the guest system that are not generated by the guest's cpu activity, but by vmware's writing threads. This memory mirroring takes such a huge toll on performance that is impossible to use more than a few guests on each vmware server, and not due to cpu shortage but due to storage shortage caused by bad design.
Now don't get me wrong, my interpretation of what's going on may be wrong, but after all it does not matter what happens if the system is sluggish and inneficient - and a lot of user reports confirm this.
Vmware Server would be a very nice product, and could be very successfull in its niche, because there are not too many products in this category (relatively simple,independent, hosted, easy to administer,capable of COW storage,etc ). Other products are too complex and try to over manage or too simple and incomplete) - so it's quite a shame to let this product plagued by such big issues.
I for myself migrated to another solution (not from vmware) due to vmware's performance issues and others may have done the same. I hope, however that someone will listen, investigate and correct the issues.