Vfx companies have software that turn off renddr farm machines in periods of low load. I helped write software that did this backin 2006.
Basically, we found it was possible to shut down machines in periods of low load and then use "Wake On Lan" to start them up once load picked up again.
I am unsure if the on-off power cycling reduces machine longevity.
> I am unsure if the on-off power cycling reduces machine longevity.
Possibly the disks are the most vulnerable components. I wonder if it would be possible in software to shut down the CPUs and have only disks + network running...
A bit late to the party on this one. Check out cpu "c states" and things like intels "speedstep." Modern cpus can reduce/shutdown power to individual packages and cores. This can reduce power consumption from a hundred tdp to tens of tdp.
The downside is latency associated with changing state. Depending on the change it can be hundreds or thousands of micros to go through these states. On a server workload this can introduce huge latency outliers as a request blocks on a core to wake up.
I was more after "shut down everything except the disks" to have the absolute minimum running and the disks spinning to reduce wear on the engines.
If you're shutting down, you have massive latency anyway but if it's possible to at least save all the power not required for keeping disks up that 'd be great.
In a diurnal cycle like Facebook's, you'd have one start-stop per day, which should be well within the rated specs of hard disks. A few years back I looked at the idea of treating disk lifetime as a resource and explicitly managing it: http://dx.doi.org/10.1109/MSST.2011.5937221
That's assuming that the servers have disks at all, which they probably shouldn't.
This is one of these things where virtualisation can help even more. For example, VMware can dynamically put servers in standby mode when demand is low and power them up again when needed: http://www.vmware.com/products/vsphere/features/drs-dpm
You know that would be trivial to do with bare metal and out of band management cards like Dell Dracs, IBM RSA cards, HP ILOs, or generic IPMI BMCs, right?
Virtualization doesn't really add much of anything for that specific problem other than increased context switching and slightly lower performance.
Disclaimer: building this type of thing (on bare metal) is a chunk of my day job. I see it as unbelievably trivial. In fact, the same ideas are behind Rackspace's "OnMetal" initiative:
Seriously. A person who would advocate using IPMI at scale has either never owned an IPMI card or has never worked at scale or both. The just don't work, and they erase whatever power savings you're trying to achieve.
Although... if you have the engineering resources of Facebook, you can write your own IPMI software and probably get it working pretty well. They're all just embedded ARM systems after all..
Real servers can't be bought without IPMI and AFAIK the BMC cannot be turned off, so it's probably not worth worrying about BMC power if there's nothing you can do about it.
Sure, but as you're aware facebook, google, et al don't buy "real servers", they buy servers that actually meet their requirements. That's why "real vendors" like HP have missed the boat on selling millions of servers into the cloud.
Speaking of Facebook specifically, the evolution is interesting. They replaced BMCs with the reboot-on-LAN hack but then their next motherboard version had BMCs again. It would be interesting to hear the story behind that.
> Disclaimer: building this type of thing (on bare metal) is a chunk of my day job. I see it as unbelievably trivial.
You do, I'm sure, because you've invested a lot of time learning it and building tooling around it. For anyone who isn't a full-time sysadmin, debugging all of the many and various quirks in management hardware is a major time sink versus scripting a VM server's API.
If they do use virtualization, the fact, 0 request lead to low power does not correct any more, for example linux container, other container may accept requests, if so they have to dispatch request cross all containers and servers.
The Achilles heel of virtualization is networking. All of the hypervisors out there (VMWare, Xen, KVM) have user-space software switch implementations that dramatically reduce the throughput of TCP session creation. As a consequence you lose a significant amount of hardware potential to serve HTTP connections.
That's not virtualization; it's namespace isolation. There's a small performance impact if you're using NAT, but otherwise the kernel networking stack is used, so there's no performance penalty.
Yes, indeed. I mean the optimization of energy saving seems not suitable for the condition of namespace isolation, since you can't control the other containers requests. if we have to, we need to dispatch request from server perspective but not container.
"Virtualization doesn't really add much of anything for that specific problem other than increased context switching and slightly lower performance."
This is BS. What if you have 3 physical servers with 30% utilization? DRS can _seamlessly_ consolidate _arbitrary_ application VMs to one server and shutdown the rest. With bare metal, only certain specifically designed workloads (stateless web farms and some distributed systems etc.) can be moved easily.
It seems that dismissing virtualization out of sheer ignorance is a fad these days. Virtualization provides important hardware abstraction to a much wider variety of workloads.
You could probably get a similar effect by using HAProxy's "balance first" algorithm, which chooses the first available server with an available connection slot (as defined by maxconn). If you did this, you'd want to set maxconn pretty conservatively.
Basically, we found it was possible to shut down machines in periods of low load and then use "Wake On Lan" to start them up once load picked up again.
I am unsure if the on-off power cycling reduces machine longevity.
Might be something worth exploring at Facebook.