Historic Issues

Machine creates fail

01/27/2016 6:00 PM CT

A software upgrade on our hypervisor caused several critical issues that are now preventing VM creates on this hypervisor. An error on our part is preventing us from rolling back this software change so we are currently working to patch the software to function properly. We do not have an ETA for resolution at this time but we will update this page when we have more information. Thank you for your patience and understanding.

01/27/2016 10:00 PM CT

A patch has been applied that will allow machines to be created but further research determined there is an underlying communication problem between our panel and our hypervisor preventing creates in some situations. If you attempt to create or manage a machine and you see an error please try again in a moment as the communication problem does resolve its self intermittently. We are continuing to investigate the communication problem and will continue to provide updates as the issue evolves.

01/29/2016 11:00 AM CT

After 24 hours of monitoring we have now confirmed that the communications issue is now completely resolved. Ultimately the problem was related to a software configuration problem that resulted in timeouts being extremely low. The very low timeouts caused the connection between our panel and our hypervisors to prematurely timeout over and over, preventing reliable communication between these machines. The configuration has been updated and we do not except this issue to occur again in the future. Thank you for your patience and understanding.

Users not able to create VMs in the Panel - Post Mortem

12/21/2015 15:30 EST

We have identified an issue where users are not able to create new VMs using the Bitkumo panel. We are working on implementing a fix now, will provide an update within an hour.

12/21/2015 16:30 EST

We have implemented a fix and users should be able to create VMs now using the Bitkumo panel. A full postmortem to follow.

12/21/2015 22:00 EST

We have completely corrected this issue at this time. Upon investigating this issue we were able to identify three causes:

1) We had a customer machine on our primary host that had been compromised and subsequently used to launch a large DoS attack.

2) An application error was identified that, when the network connection was stressed between the Bitkumo panel and our hypervisors, caused many reconnection attempts. This is as designed, but these connections were not being closed properly. This caused the application on our hypervisor to slowly build up hundreds of connections, eventually causing the application to be killed by the OOM killer. When this application fails new creates on this hypervisor are impossible.

3) Our hot spare host was not configured properly. Normally new creates would fail over to this machine, but given the network configuration was incorrect new create requests were rejected because QEMU could not start new machines.

Ultimately we solved this problem by powering down the compromised machine, patching the panel to properly close connections and by configuring our hot spare to properly accept new creates in the event of a failure like this in the future.

We apologize for any inconvenience this problem may have caused you.