Bug#668594: qemu-kvm: Suboptimal virtio/vhost-net performance on Debian KVM hosts compared to others

April 13th, 2012 - 05:50 am ET by Hans-Kristian Bakke | Report spam
Package: qemu-kvm
Version: 1.0+dfsg-9
Severity: normal

I cannot get optimal network througput on KVM guest using Debian Wheezy (and stable) as KVM host.
It is not horribly bad, just not good compared to relevant alternatives.

I have tried Ubuntu Server 11.10, Proxmox 1.9, Proxmox 2.0 and Fedora 17 Alpha as 1:1 replacement for the Debian KVM host using the same guests (just preserving the LVM volumes between installs) and they all manage about 20 gbit/s guest to guest using a simple iperf test, while Debian only manages about 2.3 gbit/s with high CPU usage. The CPU usage is generally much higher on all guest network activity on Debian and are in some cases not able to even saturate a gigabit link (when coming from other subnet) without maxing a CPU core where the other KVM hosts barely breaks a sweat.

Disc IO is very good and the guests feels snappy so it doesn't seem like there is something really wrong, just something suboptimal with the networking.
The issue follows only the host OS as the guests have been the same in all comparisons (Debian Wheezy)


To reproduce:

Install Debian Wheezy in guests (minimal with SSH and ntp)
Install iperf via apt-get
Configure network

Run test:
guest1: iperf -s
guest2: iperf -c <iperf-server> -i 2 -t 33333

My results:
-
- Guest to guest performance via local bridge: ~2.3 gbit/s, very high CPU usage on vhost-$PID and kvm process on host
- Physical server to guest on same subnet: ~940 mbit/s but with very high CPU usage on vhost-$PID and kvm process on host
- Physical server to guest via router: ~850 mbit/s with very high CPU usage on vhost-$PID and kvm process on host (why is routed traffic slower than switched on the guest??)
- Physical server to kvm host via router (just to verify that the router is not the issue): ~940 mbit/s with almost no CPU usage

Expected results after comparison with other KVM hosts everything else the same:
-
- Guest to guest performance via local bridge: ~20 gbit/s, high CPU usage
- Physical server to guest on same subnet: ~940 mbit/s with low CPU usage on vhost-$PID and a bit higher on kvm process on host
- Physical server to guest via router: ~940 mbit/s with low CPU usage on vhost-$PID and a bit higher on kvm process on host
- Physical server to kvm host via router (just to verify that the router is not the issue): ~940 mbit/s with almost no CPU usage (the same as my current results)

Compare results with other OSes on same machine (guest to guest via bridge):
Ubuntu Server 11.10 (virtualization host): ~19 gbit
Proxmox VE 2.0: ~20 gbit/s
Fedora 17 alpha: ~20 gbit/s

VMWare ESXi 5 with VMXNET3: ~22 gbit/s


Host details:

OS: Debian Wheezy (testing), kernel 3.2.0-2-amd64, currently based on 3.2.12

virsh qemu-monitor-command --hmp mail 'info version'
1.0.0 (Debian qemu-kvm 1.0+dfsg-9)

virsh qemu-monitor-command --hmp mail 'info kvm':
kvm support: enabled

lsmod | grep kvm:
kvm_intel 121968 9
kvm 287572 1 kvm_intel

lsmod | grep vhost:
vhost_net 27436 3
tun 18337 7 vhost_net
macvtap 17598 1 vhost_net

KSM enabled or disabled makes no difference on the results but here are my parameters with it on:
echo "1" > /sys/kernel/mm/ksm/run
echo "200" > /sys/kernel/mm/ksm/sleep_millisecs


Output from ps -ef of running guest:
/usr/bin/kvm -S -M pc-0.15 -cpu
core2duo,+lahf_lm,+rdtscp,+avx,+osxsave,+xsave,+aes,+popcnt,+x2apic,+sse4.2,+sse4.1,+pdcm,+xtpr,+cx16,+tm2,+est,+smx,+vmx,+ds_cpl,+dtes64,+pclmuldq,+pbe,

+tm,+ht,+ss,+acpi,+ds
-enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name mail -uuid
ccace357-783d-ce9f-444a-419445ee601d -nodefconfig -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/mail.monitor,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -drive
file=/dev/raid10/mail,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
-drive if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw
-device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1
-netdev tap,fd ,id=hostnet0,vhost=on,vhostfd# -device
virtio-net-pci,netdev=hostnet0,id=net0,macR:54:00:f7:25:33,bus=pci.0,addr=0x3
-chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -usb -device
usb-tablet,id=input0 -vnc 127.0.0.1:2 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5


Server hardware (seems to be the same issue regardless of server used):
- Intel(R) Xeon(R) CPU E31220 @ 3.10GHz Quad Core
- 16 GB ECC RAM
- Supermicro X9SCI-LN4F (Quad Intel Server NICs using e1000e)
- System disc: Corsair SSD Force Series 3 60GB
- Storage for guests: LVM images on directly attached RAID10


Guest details:

OS: Debian Wheezy (testing), kernel 3.2.0-2-amd64, currently based on 3.2.12

root@mail:~# lsmod | grep virtio:
virtio_balloon 12832 0
virtio_blk 12874 3
virtio_net 17808 0
virtio_pci 13207 0
virtio_ring 12969 4 virtio_pci,virtio_net,virtio_blk,virtio_balloon
virtio 13093 5 virtio_ring,virtio_pci,virtio_net,virtio_blk,virtio_balloon


I have tried:
-
- Replacing Debian Wheezy with Debian Squeeze (stable, kernel 2.6.32-xx) - even worse results
- Replacing kernel 3.2.0-2-amd64 with vanilla kernel 3.4-rc2 and config based on Debians included config - no apparent change
- Extracted the kernel-config file from Fedora 17 alphas kernel and used this to compile a new kernel based on Debian Wheezys kernel source - slightly worse

results
- Installing Proxmox VE 2.0 kernel in Debian. Results are the same
- ...in addition to exchanging Debian with Ubuntu Server 11.10, Fedora 17 alpha, Proxmox 1.9 and 2.0 and ESXi 5 which all have expected network performance using virtio.


Please optimize KVM/vhost in Debian so it performs like the other alternatives.





/proc/cpuinfo:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping : 7
microcode : 0x1b
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 6186.08
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping : 7
microcode : 0x1b
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
apicid : 2
initial apicid : 2
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 6185.89
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping : 7
microcode : 0x1b
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
apicid : 4
initial apicid : 4
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 6185.90
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Xeon(R) CPU E31220 @ 3.10GHz
stepping : 7
microcode : 0x1b
cpu MHz : 1600.000
cache size : 8192 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
apicid : 6
initial apicid : 6
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dts tpr_shadow vnmi flexpriority ept vpid
bogomips : 6185.90
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:




Debian Release: wheezy/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages qemu-kvm depends on:
ii adduser 3.113+nmu1
ii ipxe-qemu 1.0.0+git-20120202.f6840ba-3
ii libaio1 0.3.109-2
ii libasound2 1.0.25-2
ii libbluetooth3 4.99-2
ii libbrlapi0.5 4.3-2
ii libc6 2.13-27
ii libcurl3-gnutls 7.25.0-1
ii libglib2.0-0 2.30.2-6
ii libgnutls26 2.12.18-1
ii libiscsi1 1.0.1-1
ii libjpeg8 8d-1
ii libncurses5 5.9-4
ii libpng12-0 1.2.49-1
ii libpulse0 1.1-3+b1
ii librados2 0.43-1
ii librbd1 0.43-1
ii libsasl2-2 2.1.25.dfsg1-4
ii libsdl1.2debian 1.2.15-2
ii libspice-server1 0.10.1-2
ii libtinfo5 5.9-4
ii libuuid1 2.20.1-4
ii libvdeplug2 2.3.2-4
ii libx11-6 2:1.4.4-4
ii python 2.7.2-10
ii qemu-keymaps 1.0.1+dfsg-1
ii qemu-utils 1.0.1+dfsg-1
ii seabios 1.6.3-2
ii vgabios 0.7a-2
ii zlib1g 1:1.2.6.dfsg-2

Versions of packages qemu-kvm recommends:
ii bridge-utils 1.5-2
ii iproute 20120319-1

Versions of packages qemu-kvm suggests:
pn debootstrap <none>
pn samba <none>
pn vde2 <none>




To UNSUBSCRIBE, email to debian-bugs-dist-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
email Follow the discussionReplies 1 replyReplies Make a reply

Similar topics

Replies

#1 Michael Tokarev
April 14th, 2012 - 11:50 am ET | Report spam
On 13.04.2012 13:46, Hans-Kristian Bakke wrote:
Package: qemu-kvm
Version: 1.0+dfsg-9
Severity: normal

I cannot get optimal network througput on KVM guest using Debian Wheezy (and stable) as KVM host.
It is not horribly bad, just not good compared to relevant alternatives.

I have tried Ubuntu Server 11.10, Proxmox 1.9, Proxmox 2.0 and Fedora 17 Alpha as 1:1 replacement for the Debian KVM host using the same guests (just preserving the LVM volumes between installs) and they all manage about 20 gbit/s guest to guest using a simple iperf test, while Debian only manages about 2.3 gbit/s with high CPU usage. The CPU usage is generally much higher on all guest network activity on Debian and are in some cases not able to even saturate a gigabit link (when coming from other subnet) without maxing a CPU core where the other KVM hosts barely breaks a sweat.

Disc IO is very good and the guests feels snappy so it doesn't seem like there is something really wrong, just something suboptimal with the networking.
The issue follows only the host OS as the guests have been the same in all comparisons (Debian Wheezy)


To reproduce:

Install Debian Wheezy in guests (minimal with SSH and ntp)
Install iperf via apt-get
Configure network

Run test:
guest1: iperf -s
guest2: iperf -c <iperf-server> -i 2 -t 33333

My results:
-
- Guest to guest performance via local bridge: ~2.3 gbit/s, very high CPU usage on vhost-$PID and kvm process on host
- Physical server to guest on same subnet: ~940 mbit/s but with very high CPU usage on vhost-$PID and kvm process on host
- Physical server to guest via router: ~850 mbit/s with very high CPU usage on vhost-$PID and kvm process on host (why is routed traffic slower than switched on the guest??)
- Physical server to kvm host via router (just to verify that the router is not the issue): ~940 mbit/s with almost no CPU usage

Expected results after comparison with other KVM hosts everything else the same:
-
- Guest to guest performance via local bridge: ~20 gbit/s, high CPU usage
- Physical server to guest on same subnet: ~940 mbit/s with low CPU usage on vhost-$PID and a bit higher on kvm process on host
- Physical server to guest via router: ~940 mbit/s with low CPU usage on vhost-$PID and a bit higher on kvm process on host
- Physical server to kvm host via router (just to verify that the router is not the issue): ~940 mbit/s with almost no CPU usage (the same as my current results)

Compare results with other OSes on same machine (guest to guest via bridge):
Ubuntu Server 11.10 (virtualization host): ~19 gbit
Proxmox VE 2.0: ~20 gbit/s
Fedora 17 alpha: ~20 gbit/s



Which versions of qemu[-kvm] are used on these?
Did you try older versions of qemu-kvm from snapshot.debian.org?

It really looks like qemu-kvm-specific, not kernel-specific,
and qemu-kvm in Debian is very close to upstream 1.0.1 version,
so I'm not sure where to look at.

I've another bugreport at hand claiming that vhost-net, but
this time with macvtap not bridge, has a speed regression
between 1.0+dfsg-8 and 1.0+dfsg-9 (bridge mode unaffected).
1.0+dfsg-9 is the (debian) revision where I added a 1.0.1
diff (I didn't re-upload the new source). Maybe this is
related somehow - please try -8 release too, for comparison.

Unfortunately I can't help here at all, as I don't reach any
speeds comparable with what you have, and mine don't change
much. But I haven't tried installing any other OS (from a
list you mentioned too), -- I don't like to repartition my
only hdd to do so.


I have tried:
-
- Replacing Debian Wheezy with Debian Squeeze (stable, kernel 2.6.32-xx) - even worse results



This kernel does not support vhost-net,
it was 2.6.35 or .38 addition.

- Replacing kernel 3.2.0-2-amd64 with vanilla kernel 3.4-rc2 and config based on Debians included config - no apparent change
- Extracted the kernel-config file from Fedora 17 alphas kernel and used this to compile a new kernel based on Debian Wheezys kernel source - slightly worse

results
- Installing Proxmox VE 2.0 kernel in Debian. Results are the same
- ...in addition to exchanging Debian with Ubuntu Server 11.10, Fedora 17 alpha, Proxmox 1.9 and 2.0 and ESXi 5 which all have expected network performance using virtio.


Please optimize KVM/vhost in Debian so it performs like the other alternatives.



With pleasure, but I need some help ;)

Thank you!

/mjt



To UNSUBSCRIBE, email to
with a subject of "unsubscribe". Trouble? Contact
email Follow the discussion Replies Reply to this message
Help Create a new topicReplies Make a reply
Search Make your own search