M. Mustafa Rafique

Associate Professor

Department of Computer Science

Rochester Institute of Technology

Link to High Performance Distributed Systems Laboratory (HPDSL)

Biography

M. Mustafa Rafique is an Associate Professor in the Department of Computer Science at Rochester Institute of Technology (RIT). Mustafa has more than fifteen years of professional and research experience developing practical solutions for large-scale enterprise applications and creating innovative solutions for massively parallel, distributed and high-performance computing systems for a variety of application domains. Before joining RIT, Mustafa was a staff member in the High Performance Systems Group at IBM Research in Dublin (Ireland). He has also worked at NEC Labs (Princeton) and Qatar Computing Research Institute (QCRI) on designing innovative solutions for adaptive and efficient resource management in massively parallel, distributed, and high-performance computing systems. He is a Senior Member of the IEEE. At RIT he leads the High Performance Distributed Systems Laboratory (HPDSL).

Recent Publications

  • Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, and Ali R. Butt. Application-Attuned Memory Management for Containerized HPC Workflows. In Proceedings of the 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS), San Francisco, California, USA, May 2024.
  • Kevin Assogba, Bogdan Nicolae, and M. Mustafa Rafique. Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering. In Proceedings of the 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Goa, India, December 2023. Best Paper Nominee
  • Kevin Assogba, Eduardo Lima, M. Mustafa Rafique, and Minseok Kwon. PredictDDL: Reusable Workload Performance Prediction for Distributed Deep Learning. In Proceedings of the 25th IEEE International Conference on Cluster Computing (Cluster), Santa Fe, New Mexico, USA, October 2023.
  • Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Thierry Tonellot, Franck Cappello, and Hussain J. AlSalem. GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching. In Proceedings of the 32nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), Orlando, Florida, USA, June 2023.
  • Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Amr M. Elsayed, Thierry Tonellot, and Franck Cappello. Towards Efficient Cache Allocation for High-Frequency Checkpointing. In Proceedings of the 29th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Bangalore, India, December 2022. Best Paper Award
  • Moiz Arif, Kevin Assogba, and M. Mustafa Rafique. Canary: Fault-tolerant FaaS for Stateful Time-sensitive Applications. In Proceedings of the 35th ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dallas, TX, USA, November 2022.
  • Moiz Arif, Kevin Assogba, M. Mustafa Rafique, and Sudharshan S. Vazhkudai. Exploiting CXL-based Memory for Distributed Deep Learning. In Proceedings of the 51st International Conference on Parallel Processing (ICPP), Bordeaux, France, August 2022.
  • Kevin Assogba, Moiz Arif, M. Mustafa Rafique, Dimitrios S. Nikolopoulos. On Realizing Efficient Deep Learning Using Serverless Computing. In Proceedings of the 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGrid), Taormina (Messina), Italy, May 2022.

The complete list of publication can be seen at DBLP, Google Scholar, and at HPDSL homepage.

Research Interests

  • Distributed and parallel systems
  • High-Performance Computing (HPC)
  • System architecture and resource disaggregation
  • Memory subsystem management
  • Data analytics frameworks and middleware
  • Resource management in heterogeneous and many-core clusters
  • Cloud computing
  • Internet of things (IoT)
  • Edge computing

Teaching

  • CSCI-759 Topics in Systems: Advanced Cloud Computing (Spring 2022, Fall 2020, Fall 2019)
  • CSCI-652 Distributed Systems (Spring 2022, Spring 2019)
  • CSCI-251 Concepts of Parallel and Distributed Systems (Fall 2020, Fall 2019, Fall 2018)

Contact Information

20 Lomb Memorial Drive, 70-3635
Rochester, NY 14623

mrafique AT cs DOT rit DOT edu

+1 585 475 4528

Mon & Wed
12:30pm – 2:30pm