Skip to main content


Quantifying failure risk of version switch for rolling upgrade on clouds


Len Bass, Alan Fekete, Vincent Gramoli, Sherry Xu, Liming Zhu and Daniel Sun



Rolling upgrade is an industry technique for online dynamic software update. A rolling upgrade updates a small number of instances in an old version to a new version at a time and the operation is repeated in a wave rolling until all of the instances have been upgraded. In many cases, the software needs to avoid interactions between different versions. One common simple approach is to make instances version aware, and then a version switch point can be chosen to deactivate the old service and activate the new service. On a Cloud platform, upgrades can be implemented simply through replacing old virtual machine instances with ones in new versions, and during the process of rolling upgrade various failures may present. If an instance fails, a new instance has to be launched from the backup images, which in most software systems are in an old version and cannot be simply replaced to a new version if the new software and the new service have not been stable for the sake of reliability and stability. Thus the progress of the rolling upgrade is not guaranteed, and indeed the number of upgraded instances can sometimes decrease. We aim to determine the probability that, after switching the versions at a selected point, the number of working instances may sometime fall below the amount needed for a desired Quality of Service. In this paper, we stochastically quantify the risk with a family of discrete Markov chains (DTMC). The evaluation in both Amazon Web Service (AWS) and simulation reveals that our technique can well predict the risks after given version switch points.

BibTeX Entry

    author           = {Bass, Len and Fekete, Alan and Gramoli, Vincent and Xu, Xiwei (Sherry) and Zhu, Liming and Sun, Wei
    month            = nov,
    year             = {2014},
    keywords         = {rolling upgrade, failure, version switch, risk, model, markov},
    title            = {Quantifying Failure Risk of Version Switch for Rolling Upgrade on Clouds},
    booktitle        = {IEEE International Conference on Big Data and Cloud Computing},
    pages            = {8},
    address          = {Sydney, Australia}


Served by Apache on Linux on seL4.