Shujie Han is now a tenure-track Associate Professor of the School of Computer Science at Northwestern Polytechnical University, Xi’an, Shaanxi province. She was a Boya postdoctoral fellow at Peking University advised by Prof. Qun Huang in 2021-2024. Before that, she received her Ph.D.degree in July 2021 in the Department of Computer Science and Engineering at the Chinese University of Hong Kong under the supervision of Prof. Patrick P. C. Lee, as a member of the Applied Distributed System Lab (ADSLab). She received the B.Eng. degree in Information Security from Northwestern Polytechnical University (NWPU) in 2017.

My research interests are in the intersection of systems and artificial intelligence (AI), such as AI for Systems and Systems for AI.

  • Applying AI for system dependability: failure prediction for hard disk drives (HDDs), solid-state drives (SSDs), and memory errors.
  • System optimizations for AI applications: scaling disk failure prediction via multi-source stream mining.

We are looking for the self-motivated students who are interested in the intersection of systems and AI to join our projects for research. We also welcome senior undergraduate students (e.g., year-3 and year-4) to pursue their master degrees in our group. Please feel free to contact me if you are interested in our research projects.

🔥 News

  • 2025.01:  🎉🎉 One paper gets accepted in ICSE’25.
  • 2024.10:  🎉🎉 One paper gets accepted in HPCC’24.
  • 2024.09:  🎉🎉 One paper gets accepted in ICDM’24.

📝 Publications

Conferences

  1. Tao Duan, Runqing Chen, Pinghui Wang, Junzhou Zhao, Jiongzhou Liu, Shujie Han, Yi Liu, and Fan Xu.
    “BSODiag: A Global Diagnosis Framework for Batch Servers Outage in Large-scale Cloud Infrastructure Systems.”
    Proceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE), Ottawa, Ontario, Canada, April 2025.
    [pdf]

  2. Cheng Li, Jiahe Wei, Huiru Xie, Jinjiang Wang, Xiaonan Zhao, Shujie Han and Xiao Zhang
    “TraceGen: A Block-Level Storage System Performance Evaluation Tool for Analyzing and Generating I/O Traces.”
    Proceedings of the 26th IEEE International Conference on High Performance Computing and Communications (HPCC) (Short paper), Wuhan, China, December 2024.
    [pdf]

  3. Shujie Han, Zirui Ou, Qun Huang, and Patrick P. C. Lee.
    “Scaling Disk Failure Prediction via Multi-Source Stream Mining.”
    Proceedings of the IEEE International Conference on Data Mining (ICDM) (Regular paper), Abu Dhabi, UAE, December 2024.
    (AR: 66/604 = 10.9%)
    [pdf] [software]

  4. Zirui Ou, Shujie Han, Qihuan Zeng, and Qun Huang.
    “FedSSA: Reducing Overhead of Additive Cryptographic Methods in Federated Learning with Sketch.”
    Proceedings of the 32nd IEEE International Conference on Network Protocols (ICNP), Charleroi, Belgium, October 2024.
    (AR: 50/205 = 24.4%)
    [pdf]

  5. Xiao Zhang, Huiru Xie, Zhe Wang, Shujie Han, Leijie Zeng, and Wendi Cheng.
    “FastStore: Optimization of Distributed Block Storage Services for Cloud Computing.”
    Proceedings of the 38th International Conference on Massive Storage Systems and Technology (MSST), Santa Clara, CA, USA, June 2024.
    (AR: 27/66 = 40.9%)
    [pdf]

  6. Jinhong Li, Yanjing Ren, Shujie Han, Patrick P. C. Lee.
    “Enhancing LSM-tree Key-Value Stores for Read-Modify-Writes via Key-Delta Separation.”
    Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE 2024), Utrecht, Netherlands, May 2024.
    [pdf]

  7. Qingxiu Liu, Qun Huang, Xiang Chen, Sa Wang, Wenhao Wang, Shujie Han, and Patrick P. C. Lee.
    “PP-Stream: Toward High-Performance Privacy-Preserving Neural Network Inference via Distributed Stream Processing.”
    Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE 2024), Utrecht, Netherlands, May 2024.
    [pdf]

  8. Zhinan Cheng, Shujie Han (corresponding), Patrick P. C. Lee, Xin Li, Jiongzhou Liu, and Zhan Li.
    “An In-Depth Correlative Study Between DRAM Errors and Server Failures in Production Data Centers.”
    Proceedings of the 41st International Symposium on Reliable Distributed Systems (SRDS 2022), Vienna, Austria, September 2022.
    (AR: 24/105 = 22.9%)
    [pdf]

  9. Fan Xu, Shujie Han (corresponding), Patrick P. C. Lee, Yi Liu, Cheng He, and Jiongzhou Liu.
    “General Feature Selection for Failure Prediction in Large-scale SSD Deployment.”
    Proceedings of the 51st IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2021), June 2021.
    (AR: 48/295 = 16.3%)
    [pdf]

  10. Shujie Han, Patrick P. C. Lee, Fan Xu, Yi Liu, Cheng He, and Jiongzhou Liu.
    “An In-Depth Study of Correlated Failures in Production SSD-Based Data Centers.”
    Proceedings of the 19th USENIX Conference on File and Storage Technologies (FAST 2021), February 2021.
    (AR: 28/130 = 21.5%)
    [pdf] [software] [corrections]

  11. Shujie Han, Patrick P. C. Lee, Zhirong Shen, Cheng He, Yi Liu, and Tao Huang.
    “Toward Adaptive Disk Failure Prediction via Stream Mining.”
    Proceedings of the 40th IEEE International Conference on Distributed Computing Systems (ICDCS 2020), Singapore, November 2020.
    (AR: 105/584 = 18.5%)
    (An extended version appeared in TC 2023)
    [pdf] [software]

  12. Mi Zhang, Shujie Han, and Patrick P. C. Lee.
    “A Simulation Analysis of Reliability in Erasure-Coded Data Centers.”
    Proceedings of the 36th IEEE International Symposium on Reliable Distributed Systems (SRDS 2017), Hong Kong, September 2017.
    (AR: 24/72 = 33.3%)
    [pdf] [software]

Journals

  1. Shujie Han, Patrick P. C. Lee, Zhirong Shen, Cheng He, Yi Liu, and Tao Huang.
    “StreamDFP: A General Stream Mining Framework for Adaptive Disk Failure Prediction.”
    IEEE Transactions on Computers (TC), 72(2), pp.520-534, February 2023.
    (An earlier version appeared in ICDCS 2020)
    [main pdf] [supplementary pdf] [software]

  2. Mi Zhang, Shujie Han, and Patrick P. C. Lee.
    “SimEDC: A Simulator for the Reliability Analysis of Erasure-Coded Data Centers.”
    IEEE Transactions on Parallel and Distributed Systems (TPDS), 30(12), pp. 2836-2848, December 2019.
    (An earlier version appeared in SRDS 2017)
    [main pdf] [supplementary pdf] [software]

  3. Min Fu, Shujie Han, Patrick P. C. Lee, Dan Feng, Zuoning Chen, and Yu Xiao.
    “A Simulation Analysis of Redundancy and Reliability in Primary Storage Deduplication.”
    IEEE Transactions on Computers (TC), 67(9), pp. 1259-1272, September 2018.
    (An earlier version appeared in IISWC 2016)
    [main pdf] [software]

Books

  1. Cheng He, Mengling Feng, Patrick P. C. Lee, Pinghui Wang, Shujie Han, and Yi Liu (Eds.).
    “Large-Scale Disk Failure Prediction.”
    Springer, June 2020 (ISBN: 978-981-15-7749-9).
    [doi]

Preprints

  1. Shujie Han, Jun Wu, Erci Xu, Cheng He, Patrick P. C. Lee, Yi Qiang, Qixing Zheng, Tao Huang, Zixi Huang, and Rui Li.
    “Robust Data Preprocessing for Machine-Learning-Based Disk Failure Prediction in Cloud Production Environments.”
    arXiv:1912.09722, December 2019.
    [arXiv]

📖 Teaching

  1. 信息存储与管理, Spring, 2024/2025.
  2. 数据库系统实验, Fall, 2024.

💬 Activities

  1. PC member in ICA3PP’24 and ICA3PP’23.
  2. Journal Reviewer in TON, TOS, and TCAD.

🎖 Awards

  1. Alibaba Group Outstanding Science Research Intern in the project “Research on online failure prediction for HDDs and field studies for SSDs in large-scale data centers” in 2022.