Haris Smajlovic
-
MSc (University of Sarajevo, 2017)
-
BSc (University of Sarajevo, 2015)
Topic
Secure Computational Genomics
Department of Computer Science
Date & location
-
Wednesday, November 27, 2024
-
10:00 A.M.
-
Engineering & Computer Science Building
-
Room 467
Reviewers
Supervisory Committee
-
Dr. Ibrahim Numanagić, Department of Computer Science, University of Victoria (Supervisor)
-
Dr. Sean Chester, Department of Computer Science, UVic (Member)
-
Dr. Riham AlTawy, Department of Electrical and Computer Engineering, UVic (Outside Member)
External Examiner
-
Dr. Yun William Yu, School of Computer Science, Carnegie Mellon University
Chair of Oral Examination
- Dr. Jay Cullen, School of Earth and Ocean Sciences, UVic
Abstract
Scattered between different biobanks and healthcare providers across multiple countries, biomedical data is extensively used for research purposes. Collaboration and sharing of such data between multiple institutions often provide access to more diverse datasets and a chance to conduct comprehensive studies. However, these collaboration efforts are usually hindered by privacy issues that render the pooling of such data at a centralized database impossible. To enable collaborative studies on top of such datasets, we present an easy-to-use programming framework with two domain-specific languages, Sequre and Shechi, for secure high-performance computing on private, distributed datasets. Our framework automatically converts Pythonic code into a secure distributed equivalent using secure multiparty computation (SMC) in Sequre and, for the first time, multiparty homomorphic encryption (MHE) in Shechi to enable efficient distributed computation. It abstracts away considerations about the private and distributed aspects of the input data from end users through a familiar Pythonic syntax, and by introducing new data types for the efficient handling of distributed data as well as systematic compiler optimizations for cryptographic and distributed computation. We evaluate our framework on a wide range of applications, including complex genomic analysis tasks and statistical analysis of private electronic health records (EHRs). Our results demonstrate Sequre’s and Shechi’s ability to uncover optimizations missed even by expert developers, achieving up to 15× runtime improvements over the prior state-of-the-art solutions and a 40-fold improvement in code expressiveness compared to code manually optimized by experts. Finally, our solution enables the utilization of distributed datasets as a whole to conduct collective studies between non-trusting private data proprietors and, as a result, facilitates data sharing and collaboration efforts in privacy-sensitive fields such as biomedicine.