Skip to main content

Scalable rollback for cloud operations using AI planning


Suhrid Satyal, Ingo Weber, Len Bass and Min Fu



Human-induced faults play a large role in systems reliability. In cloud platforms, system administrators may inadvertently make catastrophic mistakes, like deleting a virtual disk with important data. Providing rollback for cloud operations can reduce the severity and impact of such mistakes, by allowing to revert back to a known, good state.

In this paper, we present a scalable approach to rollback operations that change state of a system on proprietary cloud platforms. In our previous work, we provided a system that augments cloud APIs and provides rollback operation using an AI planner. However, the previous system eventually suffers from the exponential complexity inherent to AI planning tasks. In this paper, we divide and parallelize rollback plan generation, based on characteristics unique to the rollback scenario. Through experimental evaluation, we show that this approach scales better than the previous, na ̈ıve approach, and effectively avoids the exponential behavior.

BibTeX Entry

    author           = {Satyal, Suhrid and Weber, Ingo and Bass, Len and Fu, Min},
    month            = sep,
    year             = {2015},
    keywords         = {reliability, ai planning, cloud computing, web service},
    address          = {Adelaide, Australia},
    title            = {Scalable Rollback for Cloud Operations using {AI} Planning},
    booktitle        = {Australasian Software Engineering Conference}


Served by Apache on Linux on seL4.