Taste of Research (ToR) 2017-2018 Summer Project Proposals
Applications for summer research in Trustworthy Systems for the topics below can be made through
- UNSW Faculty of Engineering Taste of Research Scholarships program (closes August 2016)
- University of Sydney Summer Scholarships program (closes September 2016)
Automating Formal Proofs (+USyd)
Abstract: Isabelle is an interactive theorem prover which combines automated and human reasoning through the use of proof methods (or tactics). These methods allow users to make high-level decisions on how to progress in a proof without worrying about the formal details. Powerful proof methods are required for large proof undertakings, such as the L4.verified project at Data61.
Eisbach is an in-development prototype of a high-level language for writing proof methods in Isabelle, which is historically done in Standard ML. It is influenced by Coq’s Ltac, but distinguishes itself by leveraging Isabelle’s existing automation and backtracking infrastructure.
The aim of this project is to investigate useful applications for Eisbach, with several possible domains to explore: An existing suite of Data61-developed separation logic tactics could be re-implemented and extended, or a similar investigation can be done against the verification-condition generator used in the L4.verified proof; Or a novel proof method can be developed solve, for example, word arithmetic proofs that appear frequently in the verification of C programs.
Novelty: This will be one of the first
larger-scale uses of the new Eisbach proof language and the
resulting feedback will contribute to its further
Outcome: Automated proof tactics for previously manual proofs in one or more of the specified application domains, ideally reducing proof size and time for those applications.
Machine learning for proof script generation.
Abstract: Isabelle is an interactive theorem prover. At Data61, we are developing a domain specific language, Isarring, to specify high level proof strategies for Isabelle. Given a proof strategy, Isarring's runtime system tries to generate human-readable fast-to-replay proof scripts using backtracking proof-search.
The goal of this project is to improve the proof script generation of Isarring's runtime system with machine learning. Isabelle already uses a machine learning tool, MaSh, when it invokes external first-order theorem provers through the command, Sledgehammer. In this project, we improve Isarring's runtime system using MaSh.
Novelty: Upon successful completion of the project, this work will combine the power of backtracking search and machine learning to produce human-readable fast-to-replay proof scripts in Isabelle.
Outcome: Better proof automation for Isabelle.
Reference Material Links:
Improving automation in concurrent software verification
Abstract: Formal verification of concurrent OS code is one of the main research grand challenges of the Trustworthy Systems group. We have done initial work in modelling and verifying a small real-time operating system, eChronos. In this work, the reasoning about interleaved execution between tasks' code and interrupt code is done using a classical concurrency reasoning method, known as Owicki-Gries, empowered by the automation of a modern interactive theorem prover, Isabelle. We have further developed a framework to reason at the implementation level, either with Owicki-Gries, or with the more compositional Rely-Guarantee.
We are now exploring the verification of the multicore version of seL4, our verified microkernel, a landmark in software verification. A few approaches are being investigated, all currently involving a level of manual work. In this project you will investigate increasing the automation of practical concurrency verification, by designing suitable rules, allowing reuse of annotations, etc.
Novelty: Your work will contribute to the general feasibility and scalability of practical concurrent software verification.
Outcome: Your work will directly impact the efficiency of the framework and proofs developed for the verification of concurrent OS code.
Reference Material Links:
Implement and Verify a CakeML Compiler Optimisation (+USyd)
CakeML is a functional programming language (similar to OCaml or Standard ML) with a formal specification (a definition in logic) and a mechanically verified (proven correct) compiler and runtime system. The CakeML compiler is at the forefront of research on verified compilers for functional programming languages, aiming for end-to-end correctness with good performance and reasonable cost. It is the basis for an approach to building high-assurance computer software using interactive theorem provers.
The design of the CakeML compiler allows for various optimisation passes to be implemented as local source-to-source transformations over one of the compiler's intermediate languages. Implementing and verifying one such optimisation is a good way to improve the CakeML compiler while gaining experience with theorem proving and compiler writing.
There are many specific examples of optimisations to add (including: pointer equality testing, pattern-match exhaustiveness checking, splitting large functions, simplifying redundant if-expressions, improving instruction selection for conditional jumps, and encoding stack-layout bitmap indices in return addresses). Do one or more of those, or design and verify your own optimisation!
Novelty: Following the success of CompCert for C, interest in optimising verified compilers is taking off. CakeML is the most realistic verified compiler for a functional programming language. Verified optimisations are an opportunity to improve the state of the art in verified compilation.
Outcome: A new compiler optimisation added to the verified CakeML compiler. Benchmarks demonstrating an improvement in the compiled code.
Implement and Verify Enhancements to CakeML (+USyd)
CakeML is a functional programming language (similar to OCaml or Standard ML) with a formal specification (a definition in logic) and a mechanically verified (proven correct) compiler and runtime system. CakeML is the basis for an approach to building high-assurance computer software using interactive theorem provers. The aim of the project is end-to-end correctness with good performance and reasonable cost, and ultimately scaling formal methods up to large-scale applications.
Formally specifying a programming language is a serious endeavour, and the CakeML language currently lacks many features of full-blown (but underspecified) programming languages. There are opportunities for enhancements at many levels, including the concrete syntax, the type system, the built-in primitive operations, and the standard library. Examples of features we would like to add include: explicit type annotations, expression-level let-polymorphism, user-defined infix operators, nested modules, named module interfaces (signatures), and records. Specify and verify one of these features, or design your own enhancement to CakeML!
Novelty: Scaling formal semantics up to realistic programming languages is an exciting area of research, making theorem proving relevant to a larger class of applications.
Outcome: An extension to CakeML's formal specification, and, ideally, a corresponding extension to CakeML's verified compiler.
Proof Engineering on CakeML proofs (+USyd)
CakeML is a functional programming language (similar to OCaml or Standard ML) with a formal specification (a definition in logic) and a mechanically verified (proven correct) compiler and runtime system. The CakeML proofs are a substantial formal development (over 100,000 lines of proof script) that is starting to reach the size where scalability issues become relevant. The discipline of proof engineering, which is to proof development as software engineering is to large-scale software development, can be applied; this project is to do so.
Although some of the work to be done here is simply tweaking and refactoring, there are also many opportunties for creative ideas about how to structure and manage large formal developments, or for particular approaches to generating certain definitions or proofs automatically.
Novelty: Large scale proofs are an exciting combination of software engineering methodology with formal methods.
Outcome: Improvements to the speed or size of CakeML proofs, and, ideally, new approaches to dealing with large formal developments.
Open Theory Import for Isabelle/HOL (+USyd)
The goal of the OpenTheory project is to allow specifications and proofs to be shared between different theorem prover implementations of higher order logic. Currently, this proof exchange format is supported by the provers HOL Light, HOL4, and ProofPower. The aim of this project is to extend the interactive theorem prover Isabelle/HOL with import facilities for the OpenTheory format, so that Isabelle users can access and re-use the large libraries of proofs written in any of these three provers.
There exists an initial attempt at OpenTheory import for a previous version of OpenTheory and of Isabelle/HOL from 2011. The task of this project would be to design and implement a new mechanism for modern versions of the exchange format and the prover.
The project requires knowledge of functional programming in a language such as ML, OCaml, or Haskell. The implementation language for this project is standard ML.
Novelty: Proof exchange formats are an exciting new research and engineering direction for interactive provers. The contribution of this project would be to extend the family of provers that can communicate with each other.
standard ML implementation of OpenTheory import for the prover
Reference Material Links:
Verification of native C functions in Cogent. (+USyd)
Abstract: The goal of this project is to verify the correctness of C functions that are called by Cogent programs.
Cogent is a purely functional domain-specific language with linear types. From a well-typed Cogent program, the Cogent compiler generates efficient C code, a formal specification of the program’s purely functional semantics and a machine-checked proof of compilation correctness against that specification. Hence, we can automatically guarantee the absence of several well-known errors statically by writing programs in Cogent.
TS has implemented multiple file systems in Cogent. In the development of these file systems, the Cogent foreign function interface allowed us to call C functions from Cogent code where they can be best implemented in C. The goal of this project is to verify that these C functions fit into the Cogent framework using Isabelle/HOL.
Novelty: Upon successful completion of the project, this work will fill the last missing piece of the verification of fully verified file systems. To the best of our knowledge, these will be the world's first implementations of such systems.
Outcome: Machine-checked fully verified file systems.
Modelling Routing Protocols
Abstract: Wireless Mesh Networks (WMNs) are a promising technology that is currently being used in a wide range of application areas, including Public Safety, Transportation, Mining, etc. Typically, these networks do not have a central component (router), but each node in the network acts as an independent router, regardless of whether it is connected to another node or not. They allow reconfiguration around broken or blocked paths by "hopping" from node to node until the destination is reached. Unfortunately, the performance of current systems often does not live up to the expectations of end users in terms of performance and reliability, as well as ease of deployment and management.
We explore and develop adaptive network protocols and mechanisms for Wireless Mesh Networks that can overcome the major performance and reliability limitations of current systems. To support the development of these new protocols, the project also aims at new Formal Methods based techniques, which can provide powerful new tools for the design and evaluation of protocols and can provide critical assurance about protocol correctness and performance. Close collaboration with industry partners ensures the use-inspired nature of the project.
The ideal applicant should be interested in applying Formal Methods and logic-based calculi in general; previous knowledge about process algebra is appreciated, but not necessary.
Novelty: Classical routing protocol specifications are usually written in plain English. Often this yields ambiguities, inaccuracies or even contradictions. The use of Formal Methods like process algebra avoids these problems and leads to a precise description of protocols. To compare and evaluate different protocols, we aim at a compendium of standard routing protocol specifications in a unified language.
Outcome: So far we have modelled one of the standard protocols using process algebra, namely AODV, as well as a draft successor protocol that is currently being discussed by the Internet Engineering Task Force (IETF).
The project's work should include the formalisation of a second
standard protocol such as OSLR
or HWMP (http://en.wikipedia.org/wiki/IEEE_802.11s).
After a faithful specification has been given, the work could include the verification of basic properties of the routing protocol: packet delivery for example guarantees that a packet, which is injected into a network, is finally delivered at the destination (if the destination can be reached).
What's a good routing protocol? Measurements for Comparing Routing Protocols
Abstract: Routing protocols specify how routers communicate with each other, disseminate information to select routes between any two nodes on a network, and provide the basis for sending data (packets) through a network. They find applications in all types of (communication) networks, such as metropolitan area networks (MAN), local area networks (LAN), virtual private networks (VPN) or wireless mesh networks (WMN). Due to the diversity in applications dozens of (classes of) different protocols have been developed. Often there are several protocols for the same type of networks. In the case of WMNs there are for example reactive protocols, such as AODV or DYMO, and link-state protocols, such as B.A.T.M.A.N. or OLSR.
Using protocol implementations and test-bed experiments, it has
been shown that protocols behave differently on different
For example in a star topology the centre has to forward (nearly) all messages and might be too busy to react in an appropriate time to some requests.
But what does this mean for the protocols involved? Is AODV better than OLSR, if there is only few network traffic? Should B.A.T.M.A.N be used in highly-connected networks only? ...
Novelty: To answer these questions systematically, network topologies have to been classified, using different metrics.
In order to solve such a problem, both the model and the specification are formulated in some precise mathematical language. But even if a precise description of a protocol is given e.g., in process algebra, this description is usually not accepted by a model checker. This is due to incompatibility of different languages.
Outcome: The main part of the project should be the creation of a topology zoo and its classification. Classification metrics used should include the average connectivity of a node, the network diameter, the density of the topology, and the beta index. After the classification metrics have been chosen, sample topologies for each category should be generated. While generating topologies it should also be determined if some topologies are members of different classes.
The second part of the project should demonstrate the usefulness of the classification. To achieve this, one particular routing protocol, e.g. AODV, should be analysed. In particular, it should be determined if the routing protocol behaves differently on different classes of topologies.
Reference Material Links:
Model Checking of Mesh Network Routing Protocols
Abstract: As utilisation of wireless networks becomes increasingly multimedia in nature (i.e., consisting of data and video), issues around bandwidth availability and Quality of Service (QoS) in general become increasingly important. Current wireless mesh solutions, however, do not consistently meet those requirements. It is a common belief that the failure of existing wireless mesh network systems are to a large extent due to the limitations of current network protocols.
As an effort to improve the performance of wireless networks, we model, analyse, verify routing protocols for wireless networks. If errors or shortcomings are found we also try to fix them and report back to the developers of the protocols.
Novelty: Model checking is a technique for automatically verifying correctness properties of finite-state systems in general and WMNs in particular: given a model of a routing protocol, a model checker like UPPAAL (http://www.uppaal.org/) can test automatically whether this model satisfies a given specification.
In order to solve such a problem, both the model and the specification are formulated in some precise mathematical formalism, such as the UPPAAL input language.
This project aims at setting up a model checking environment for UPPAAL, suitable for streamlining and simplifying the model checking of routing protocols on certain network topologies.
Outcome: The expected outcome will be a user-friendly model checking environment; the environment should should be adaptable for the specification of classes of network topologies as well routing protocol formalisations.
Reference Material Links:
Capability-oriented programming interface for seL4
Abstract: The formally verified seL4 microkernel is arguably the world's most secure operating system. Capability-based access control is a core enabler of security, as it provides fine-grained control over access rights.
Existing middleware for seL4 provides a programming model similar to traditional (Posix-like) systems, effectively reverting to ambient authority and thus abandoning most of the security advantages of capabilities.
Novelty: This work demonstrates that seL4 indeed benefits from the nice properties of classical object capabilities. A good job can produce publishable results.
Virtualising the seL4 ABI
Abstract: Capabilities are opaque object references with implied access rights. A capability system should allow transparent interposition, eg for security monitoring.
In a capability-based operating system, this should allow complete virtualisation of the system-call interface. This project is to put seL4, the world's most trustworthy operating system, to the test: Is seL4 truly virtualisable? I.e. can an arbitrary program written to run natively on seL4 run correctly in a virtualised envorinment where all system calls are intercepted by a security monitor? If so, what is the inherent cost of virtualisation?
The simplese proof of virtualisability is to construct a minimal wrapper around a user process that intercepts all capabiltiy transfers in and out of the process and replaces them by endpoint capabilitie. Those endpoints are invoked by the process as if they represented kernel objects, but instead send a message to the wrapper which then invokes the actual operation on behalf of the kernel. The virtualisation is complete if the wrapper can proxy all such invokations without any knowledge of the operation the client process' intended. Validate this by running the seL4 regression test suite as the client process.
Novelty: This work demonstrates that seL4 indeed benefits from the nice properties of classical object capabilities. A good job can produce publishable results.
Interrupt-Related Covert Channels on seL4
Description of this topic is in the Security section.
eChronos Art Project (+USyd)
Peter Chubb,, Sebastian Holzapfel
Abstract: eChronos is a formally-verified RTOS designed for deeply-embedded systems with no memory protection and single-mode execution. It is fairly simple to drive sensors, actuators and relays from very cheap boards, and get interesting results. This project is to take simple M4-based development boards such as the Stellaris Launchpad, hook them up to ultrasonic sensors, motors and LEDs to produce an artistic installation that shows off the real-time and embedded capabilities of the platform, while being aesthetically interesting.
Novelty: In addition to the novelty of the resulting art installation, it is expected that the candidate will be involved in eChronos development and contribute to the open source kernel.
Outcome: eChronos-based art installation suitable for open days and demonstrations.
Prerequisites: Students must have a strong background in Operating Systems and have at least completed an Operating Systems course with excellent marks.
Performance limits of real-time operating systems
Abstract: eChronos is a formally-verified RTOS designed for deeply-embedded systems with no memory protection and single-mode execution. Sloth is a system for a similar application domain, which takes the unusual approach of leaving all scheduling to hardware, by running everything in an interrupt context. This limits the use of Sloth to processors where interrupts mode can be entered by software. This project is to evaluate and quantify the performance advantage of Sloth over eChronos.
Novelty: Sloth is presently the world's fastest RTOS. eChronos, which has the advantage of formal verification and less dependence on hardware features, is a more traditionally-designed RTOS. This project will determine whether the performance advantage of Sloth is significant enough to justify the different (and more limiting) design. The results are eminently publishable.
Outcome: A better understanding of RTOS design tradeoffs.
Standard C libraries - which one?
Abstract: The project aims to survey existing C library implementations for use in seL4 microkernel. The standard C library is anything but standard, and differing implementations have differing properties. The goal of this project develop criteria for evaluating the goodness of a standard C library (such a performance, completeness, size, modularity), and use the criteria to understand the landscape of C libraries available for open source systems. The goal of the project is to port the most appropriate C library to seL4.
Novelty: An evaluation of existing C libraries in the context of microkernels.
Outcome: A C-library port and evaluation on seL4.
From RefOS to Phoenix.
Abstract: RefOS is an immature multiserver OS on seL4. The OS is somewhat fragile and a little neglected. This project is aimed at resurrecting RefOS from the ashes into something usable for future development. The project provides the opportunity for a student to make their mark on the only multiserver OS environment running on seL4. While the project is quite open ended, it can also be tackled in a quite focused and manageable way.
Novelty: Novelty: Tackles issues in building multiserver systems on the worlds only formally verified microkernel.
Outcome: RefOS running in a simple form.
Rump kernels on Camkes
Abstract: Rump kernels are a cut-down variant of the BSD operating system design to provide OS services in other contexts. We have a port of Rump kernels to native seL4. This project aims to build on existing work to run Rump kernels in the CAmkES environment on seL4. CAmkES is an embedded component framework for seL4 which would benefit from the increased functionality Rump kernels would provide.
Novelty: Be the first to examine the feasibility of using Rump kernel with a component framework on the world's only verified microkernel.
Outcome: A simple Rump component running in the CAmkES environment.
Using untrusted code in trusted environments.
Abstract: Modern computer software systems are large and complex systems. The effort required to develop such a system can be reduced by using existing opensource code bases to avoid the cost of reimplementation. However, including large opensource code bases in high security environments provides a large attack surface that is difficult to both evaluate and ultimately assure is secure.
This topic's goal is to take some baby steps towards being able to utilise open source libraries while minimising exposure to bugs within the libraries using protection boundaries provided by seL4.
Novelty: The is a open research area the student can actively contribute to initial understanding of the area.
Outcome: An initial understanding of what might be achieved, and a simple proof of concept.
ROS native on seL4
Abstract: ROS (robotics operating system) is a communication middleware that is widely used for programming robots. It typically runs on a fully-fledged OS, such as Linux, using sockets for communication. This makes it readily accessible, but from the security and safety point of view is a nightmare. The purpose of this project is to produce a native ROS on the seL4 microkernel, depending on a minimal trusted computing base. It involved an assessment of the OS services required by ROS, and design, implementation and evaluation of ROS/seL4.
Novelty: A minimal ROS can enable a security and safety analysis of robotics software, dramatically increasing the trustworthiness of the robots, and opening the way for deployment in critical systems.
Outcome: An seL4-based ROS implementation that can support the high-assurance autonomous trucks developed under the DARPA HACMS program. Performance evaluation against a Linux-based implementation.
Build the world's first secure network stack (+USyd)
Abstract: Build the world's first secure network stack by writing it in our special language. At Data61, we developed a new language called Cogent for writing verified by construction software. The Cogent compiler generates C code, a formal specification describing what this code does, and a mathematical machine checked proof that the generated C code corresponds to the generated formal specification. We implemented verified by construction file systems by writing them in this language. The goal of this project is to test the feasibility of implementing a simplified network stack in Cogent. This may contribute to enriching Cogent and you would be encouraged to ask for feature requests to make implementing in this language easier.
Novelty: This will be the first use of the Cogent language in the domain of network stacks.
Outcome: The outcome of this work is testing whether the Cogent language has sufficient expressivity to be used to implement network stacks. This work will be used to extend the Cogent language.
CAmkES on Linux (+USyd)
Abstract: The Trustworthy Systems Research group at Data61 has developed a component platform (CAmkES) for developing microkernel-based systems on seL4 (our formally verified microkernel). While CAmkES helps to ease the difficulty of developing systems on seL4, developing significantly complex systems is still hard, due to the need to develop almost everything (e.g. device drivers, network stacks, display systems, etc.) from scratch. The goal of this project is to develop a version of CAmkES for Linux. This will provide a much richer environment on which to prototype and test CAmkES systems, before porting them to run on seL4.
Novelty: This work will aid in the development of secure systems running on seL4.
Outcome: A version of CAmkES targeted to Linux, and sample systems to test it.
Prerequisites: Students must have a strong background in Operating Systems, have at least completed an Operating Systems course with excellent marks, and have experience with Linux systems programming in C.
Java Script on sel4
This project involves porting the v8 (or similar) node.JS code to run natively on seL4 on ARM, and porting enough of the nodebot libraries to control some simple things.
Operating System Components
Abstract: Towards developing a full OS on the seL4 microkernel. In DATA61's Trustworthy Systems team we've developed technology (CAmkES) for building componentised operating systems. Now we want to build up a repository of reusable operating systems components, so that we can easily build new, novel, and customised operating systems. The goal of this project is to develop new operating systems components, and develop some systems using these components as a means to test them.
NoveltyThis work will contribute to a platform for developing secure and safe operating system software.
Outcome:A collection of reusable operating system components.
seL4-based Distributed Systems
Abstract: In Data61's Trustworthy Systems research group we are developing and verifying seL4-based software systems. These systems are limited to run on a single computer, however, real-world systems are largely distributed systems, consisting of multiple networked computers. The time-triggered architecture (TTA) (developed by Hermann Kopetz) provides a computing infrastructure for the design and implementation of dependable distributed embedded systems. Most importantly key algorithms of TTA have been formally verified. The goal of this project is to investigate whether the combination of seL4 and TTA can be used to develop verified seL4-based distributed systems.
Novelty: This will be a first attempt at developing verified seL4-based distributed systems.
Outcome: A prototype seL4-based distributed system combining seL4 and TTA. The results of this project can form the basis of several other research projects further exploring this combination.
Graphical Editor for Building Componentised Operating Systems (+USyd)
Abstract: Data61's CAmkES (a component-based platform for developing microkernel-based systems on seL4) uses an Architecture Description Language (ADL) to describe the software architecture of an operating system. While the ADL helps to ease the difficulty of designing and building such a system, ADL documents quickly become too complicated to read and manipulate (in a text format) when the operating system becomes non-trivial. The goal of this project is to develop a graphical editor to design such componentised operating systems: allowing users to draw new components and connections and manipulate existing ones, then generate the code that represents their drawn system.
Novelty: This work will contribute to the CAmkES platform and our overall project for developing trustworthy software systems.
Outcome: A graphical editor for designing and developing component-based operating systems.
Fuzz testing a new language and compiler (+USyd)
Abstract: When developing software, the compiler that translates source code to machine instructions is generally assumed to be correct. Incorrect software is more likely to be a product of bugs in the source code than bugs in the translation. However, when high assurance techniques like formal verification are applied to the source code, this relationship is reversed. The compiler becomes the weaker link. Subtle compiler bugs may lurk undiscovered, and may only be triggered when compiling particular code.
Novelty:This will be the first use of the Cogent language outside the domain of file systems. There will also be the opportunity to explore new techniques for program generation as part of this project.
Outcome: The result of this work will be a reusable tool for generating valid Cogent programs. It is expected that this will in turn lead to a more robust Cogent compiler.
Secure Systems: Can You Hack an Unhackable System?
Abstract: We claim that we can build truly secure software systems, but are we right? In the past years DATA61's Trustworthy Systems team has done much work developing and verifying seL4, a secure, formally verified microkernel. Now we are using seL4 as the basis for developing truly secure software systems. But we want to be sure that these software systems provide the security we desire. We can gain assurance of security in many ways: by testing, attacking (hacking), analysing, and verifying the systems. The goal of this project is to take (or build) example systems and then show (by any of the above means) that they are secure (or not).
Novelty: This will be one of the first practical evaluations of the security of software built on seL4.
Outcome: Insight into how to make seL4-based systems that are secure in theory also really secure in practice.
seL4-based Distributed Systems
Description of this topic is in the Middleware section.
Linear type inference in Cogent language
Abstract: Cogent is a language we designed and developed for writing provably correct file systems. From Cogent compiler, given a type-correct source program, it spits out an efficient C implementation, a high-level language specification in Isabelle/HOL, a series of intermediate language descriptions and proofs to show that the C code is correct with respect to the high-level spec. Cogent, a purely functional language, features linear type system, which is the key which enables this approach. On the downside, however, our linear type system requires a lot of user annotations in the source programs, which makes them quite verbose. The goal of this project is to research on how linear types, together with effectful computations like allocation and free of memory, can be inferred by the compiler, probably with the aids of model checking.
Novelty: Although there is some preliminary work in this area, it remains an open question. It will be of a good theoretical and practical contribution to the entire language community.
Outcome: The result of this work will be a formal type inference system (possibly with compromises) and its prototypical implementation in Cogent compiler.