Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load average on AWS does not divide by number of cores #97

Closed
rafaeldff opened this issue Jan 8, 2015 · 4 comments
Closed

Load average on AWS does not divide by number of cores #97

rafaeldff opened this issue Jan 8, 2015 · 4 comments

Comments

@rafaeldff
Copy link

All "processors" on the output of cat /proc/cpuinfo have the same "core id". It seems each of them is a part of a hyperthreaded core.

Currently, riemann-health behaves as if the machine has a single processor, always dividing the load average by 1. I don't know if it's possible to get the actual number of cores assigned to the VM, but there is probably a better estimate than 1.

E.g, here is the output of a 4 vCPU instance, reporting all "processors" as having core id = 2:

processor    : 0
vendor_id    : GenuineIntel
cpu family    : 6
model        : 45
model name    : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping    : 7
microcode    : 0x70b
cpu MHz        : 2599.998
cache size    : 20480 KB
physical id    : 0
siblings    : 4
core id        : 2
cpu cores    : 1
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm
bogomips    : 5199.99
clflush size    : 64
cache_alignment    : 64
address sizes    : 46 bits physical, 48 bits virtual
power management:

processor    : 1
vendor_id    : GenuineIntel
cpu family    : 6
model        : 45
model name    : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping    : 7
microcode    : 0x70b
cpu MHz        : 2599.998
cache size    : 20480 KB
physical id    : 0
siblings    : 4
core id        : 2
cpu cores    : 1
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm
bogomips    : 5199.99
clflush size    : 64
cache_alignment    : 64
address sizes    : 46 bits physical, 48 bits virtual
power management:

processor    : 2
vendor_id    : GenuineIntel
cpu family    : 6
model        : 45
model name    : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping    : 7
microcode    : 0x70b
cpu MHz        : 2599.998
cache size    : 20480 KB
physical id    : 0
siblings    : 4
core id        : 2
cpu cores    : 1
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm
bogomips    : 5199.99
clflush size    : 64
cache_alignment    : 64
address sizes    : 46 bits physical, 48 bits virtual
power management:

processor    : 3
vendor_id    : GenuineIntel
cpu family    : 6
model        : 45
model name    : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping    : 7
microcode    : 0x70b
cpu MHz        : 2599.998
cache size    : 20480 KB
physical id    : 0
siblings    : 4
core id        : 2
cpu cores    : 1
apicid        : 5
initial apicid    : 5
fpu        : yes
fpu_exception    : yes
cpuid level    : 13
wp        : yes
flags        : fpu de tsc msr pae cx8 sep cmov pat clflush mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes hypervisor lahf_lm ida arat epb xsaveopt pln pts dtherm
bogomips    : 5199.99
clflush size    : 64
cache_alignment    : 64
address sizes    : 46 bits physical, 48 bits virtual
power management:
@jamtur01
Copy link
Member

@rafaeldff - The branch I've just pushed should address this issue I think. Is it possible for you to test - here's a gem build with the patch applied:

https://www.dropbox.com/sh/bm8g5e8s94eu66n/AACTNDFX8Vb7SwUNyU4jqmKIa?dl=0

jamtur01 added a commit that referenced this issue Feb 29, 2016
The nproc command should be available on most Linux distros and
can replace the `cores` method in riemann-health.

This should address issues where a host has multiple cores with the
same ID.
@adamdyga
Copy link

adamdyga commented Nov 25, 2016

I've faced this issue too. Any chance the fix gets to regular release? It's difficult to set thresholds for CPU load if the load is not reported on per-core basis.

@jamtur01
Copy link
Member

jamtur01 commented Nov 25, 2016

@adamdyga Does the fix address the issue? I'm still waiting for feedback.

@adamdyga
Copy link

@jamtur01 to me it works like a charm ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants