M. Mustafa Rafique

Associate Professor

Department of Computer Science

Rochester Institute of Technology

Link to High Performance Distributed Systems Laboratory (HPDSL)

Biography

M. Mustafa Rafique is an Associate Professor in the Department of Computer Science at Rochester Institute of Technology (RIT). Mustafa has more than fifteen years of professional and research experience developing practical solutions for large-scale enterprise applications and creating innovative solutions for massively parallel, distributed and high-performance computing systems for a variety of application domains. Before joining RIT, Mustafa was a staff member in the High Performance Systems Group at IBM Research in Dublin (Ireland). He has also worked at NEC Labs (Princeton) and Qatar Computing Research Institute (QCRI) on designing innovative solutions for adaptive and efficient resource management in massively parallel, distributed, and high-performance computing systems. He is a Senior Member of the IEEE. At RIT he leads the High Performance Distributed Systems Laboratory (HPDSL).

Recent Publications

  • Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. Deep Optimizer States: Towards Scalable Training of Transformer Models using Interleaved Offloading. In Proceedings of the 25th ACM/IFIP International Middleware Conference (MIDDLEWARE), Hong Kong, China, December 2024.
  • Nigel Tan, Kevin Assogba, Jay Asworth, Befikir Bogale, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae, and Michela Taufer. Towards Affordable Reproducibility Using Scalable Capture and Comparison of Intermediate Multi-Run Results. In Proceedings of the 25th ACM/IFIP International Middleware Conference (MIDDLEWARE), Hong Kong, China, December 2024.
  • Avinash Maurya, Robert Underwood, M. Mustafa Rafique, Franck Cappello, and Bogdan Nicolae. DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language Models. In Proceedings of the 33rd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), Pisa, Italy, June 2024. Best Paper Award
  • Moiz Arif, Avinash Maurya, M. Mustafa Rafique, Dimitrios S. Nikolopoulos, and Ali R. Butt. Application-Attuned Memory Management for Containerized HPC Workflows. In Proceedings of the 38th IEEE International Parallel & Distributed Processing Symposium (IPDPS), San Francisco, California, USA, May 2024.
  • Kevin Assogba, Bogdan Nicolae, and M. Mustafa Rafique. Optimizing the Training of Co-Located Deep Learning Models Using Cache-Aware Staggering. In Proceedings of the 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Goa, India, December 2023. Best Paper Nominee
  • Kevin Assogba, Eduardo Lima, M. Mustafa Rafique, and Minseok Kwon. PredictDDL: Reusable Workload Performance Prediction for Distributed Deep Learning. In Proceedings of the 25th IEEE International Conference on Cluster Computing (Cluster), Santa Fe, New Mexico, USA, October 2023.
  • Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Thierry Tonellot, Franck Cappello, and Hussain J. AlSalem. GPU-Enabled Asynchronous Multi-level Checkpoint Caching and Prefetching. In Proceedings of the 32nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), Orlando, Florida, USA, June 2023.
  • Avinash Maurya, Bogdan Nicolae, M. Mustafa Rafique, Amr M. Elsayed, Thierry Tonellot, and Franck Cappello. Towards Efficient Cache Allocation for High-Frequency Checkpointing. In Proceedings of the 29th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Bangalore, India, December 2022. Best Paper Award

The complete list of publication can be seen at DBLP, Google Scholar, and at HPDSL homepage.

Research Interests

  • Distributed and parallel systems
  • High-Performance Computing (HPC)
  • System architecture and resource disaggregation
  • Memory subsystem management
  • Data analytics frameworks and middleware
  • Resource management in heterogeneous and many-core clusters
  • Cloud computing
  • Internet of things (IoT)
  • Edge computing

Teaching

  • CSCI-759 Topics in Systems: Advanced Cloud Computing (Spring 2022, Fall 2020, Fall 2019)
  • CSCI-652 Distributed Systems (Spring 2022, Spring 2019)
  • CSCI-251 Concepts of Parallel and Distributed Systems (Fall 2020, Fall 2019, Fall 2018)

Contact Information

20 Lomb Memorial Drive, 70-3635
Rochester, NY 14623

mrafique AT cs DOT rit DOT edu

+1 585 475 4528

Mon & Wed
12:30pm – 2:30pm