Research
Rajeev Balasubramonian
My research focuses on many aspects of computer architecture. I am
especially interested in studying how future technology trends
influence the design of microprocessors and memory systems. In recent
years, we have focused on designing memory systems that can cater to
the bandwidth, latency, power, cost, security, and reliability demands of
data-intensive workloads. We are also exploring accelerators for a range
of demanding kernels, including deep neural networks, genomic analysis, and
security primitives.
Current projects include:
- Accelerators: designing accelerators for machine learning applications, genomic analysis, and security primitives.
- Memory Systems : optimizing DRAM/NVM chips and security for data-intensive workloads.
Past projects include:
- Reliability : efficient mechanisms to support chipkill reliability in the memory system.
- Large cache design : methodologies to model large NUCA cache hierarchies, policies to organize data, and organizations in 3D.
- Interconnect design : leveraging different wires in a heterogeneous network for architectural innovation.
- Transactional memory : innovations for scalable hardware transactional memory.
As a graduate student, I delved
into problems involving memory hierarchy bottlenecks
( MICRO'00 ),
pre-execution threads
( ISCA'01 ),
register file complexity
( MICRO'01 ),
and scalability of clustered microprocessors
( ISCA'03 ).
Ph.D. Thesis.
Students
Graduated:
- Naveen Muralimanohar , Ph.D. September 2008, Wire-Aware Cache Architectures , First employment: HP Labs, Current employment: Google.
- Niti Madan , Ph.D. January 2009, Leveraging Mixed-process 3D Die Stacking Technology for Cache Hierarchies and Reliability , First employment: Computing Innovation Fellow at IBM T.J. Watson, Current employment: AMD Research.
- Manu Awasthi , Ph.D. September 2011, Managing Data Locality in Future Memory Hierarchies Using a Hardware Software Co-Design Approach , First employment: Micron, Current employment: Asoka University.
- Aniruddha Udipi , Ph.D. March 2012, Designing Efficient Memory for Future Computing Systems , First employment: ARM, Current employment: Google.
- Kshitij Sudan , Ph.D. October 2012, Data Placement for Efficient Main Memory Access , First employment: Samsung, Current employment: Meta.
- Niladrish Chatterjee , Ph.D. September 2013, Designing Efficient Memory Schedulers for Future Systems , First employment: NVidia.
- Seth Pugsley , Ph.D. May 2014, Opportunities for Near Data Computing in MapReduce Workloads , First employment: Intel.
- Manju Shevgoor , Ph.D. October 2015, Enabling Big Memory with Emerging Technologies, First employment: Intel, Current employment: Apple.
- Ali Shafiee , Ph.D. August 2017, Hardware Accelerators for Deep Learning, First employment: Samsung, Current employment: Meta.
- Karl Taht, Ph.D. April 2020, Introspective Computing, First employment: Meta.
- Anirban Nag, Ph.D. October 2020, Enabling Near Data Processing for Emerging Workloads, First employment: Uppsala University, Current employment: Huawei.
- Meysam Taassori, Ph.D. October 2020, Low Overhead Secure Systems, First employment: AMD Research, Current employment: NVIDIA.
- Sumanth Gudaparthi, Ph.D. April 2022, Tensor Acceleration for Non-conventional applications using Versatile Integrants, First employment: AMD Research.
- Surya Narayanan, Ph.D. May 2022, Exploring Avenues to Efficiently Trained Deep Neural Networks, First employment: Imagination Technologies.
- Vivek Venkatesan, M.S. December 2007, Criticality of On-Chip Wires , First employment: Oracle, Current employment: Meta.
- Byong Wu Chong, M.S. December 2012, Transactional Memory , First employment: Broadcom, Current employment: Amazon.
- Gita Sreekumar, M.S. December 2014, First employment: Qualtrics.
- Sahil Koladiya, M.S. May 2015, First employment: Cisco, Current employment: Amazon.
- Akhila Gundu, M.S. May 2015, First employment: Micron.
- Arjun Deb, M.S. May 2016, First employment: Xilinx, Current employment: Apple.
- Chandru Nagarajan, M.S. May 2017, First employment: Micron, Current employment: Apple.
- Shirley Hon, M.S. December 2018.
- Sharad Bhat, M.S. December 2023.
Current:
- Ananth Krishna Prasad, Ph.D. student, Acceleration with Resistive Memories
- Sarabjeet Singh, Ph.D. student, Acceleration of Security Primitives
- Lin Jia, Ph.D. student, Machine Learning Systems
- Shreyas Singh, Ph.D. student, Near Data Processing
- Jarrett Minton, Ph.D. student, Memory Security
Software Release
HEPack Simulator
CACTI 7
USIMM
CACTI 6.0
Book and Book Chapters
- Innovations in the Memory System , Rajeev Balasubramonian, Synthesis Lectures on Computer Architecture , Morgan and Claypool Publishers, 2019.
- Multi-Core Cache Hierarchies , Rajeev Balasubramonian, Norman P. Jouppi, Naveen Muralimanohar, Synthesis Lectures on Computer Architecture , Morgan and Claypool Publishers, 2011.
- Buses and Crossbars , Rajeev Balasubramonian, Timothy Pinkston, Encyclopedia of Parallel Computing , D. Padua, editor. Springer Science+Business Media, 2011.
-
Novel accelerators can push the limits of performance and energy efficiency
in a number of important application domains. These domains include
machine learning, genomic analysis, and emerging security primitives for
post-quantum crypto and fully homomorphic encryption.
- Hyena: Balancing Packing, Reuse, and Rotations for Encrypted Inference, Sarabjeet Singh, Shreyas Singh, Sumanth Gudaparthi, Xiong Fan, Rajeev Balasubramonian, 45th IEEE Symposium on Security and Privacy (S&P) , San Francisco, May 2024.
- XCRYPT: Accelerating Lattice-Based Cryptography With Memristor Crossbar Arrays, S. Singh, X. Fan, A.K. Prasad, L. Jia, A. Nag, R. Balasubramonian, M. Bojnordi, E. Shi, IEEE Micro, Vol 43(5), Sept 2023. [Expanded arxiv version]
- CANDLES: Channel-Aware Novel Dataflow-Microarchitecture Co-Design for Low Energy Sparse Neural Network Acceleration, Sumanth Gudaparthi, Sarabjeet Singh, Surya Narayanan, Rajeev Balasubramonian, Visvesh Sathe, 28th International Symposium on High-Performance Computer Architecture (HPCA-28) , Seoul, South Korea, April 2022.
- A Multiply-And-Accumulate Array for Machine Learning Applications Based on a 3D Nanofabric Flow, E. Giacomin, S. Gudaparthi, J. Boemmels, R. Balasubramonian, F. Catthoor, P.-E. Gaillardon, IEEE Transactions on Nanotechnology, December 2021.
- SpinalFlow: An Architecture and Dataflow Tailored for Spiking Neural Networks, Surya Narayanan, Karl Taht, Rajeev Balasubramonian, Edouard Giacomin, Pierre-Emmanuel Gaillardon, 47th International Symposium on Computer Architecture (ISCA-47) , June 2020.
- Wire-Aware Architecture and Dataflow for CNN Accelerators, Sumanth Gudaparthi, Surya Narayanan, Rajeev Balasubramonian, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon, 52nd International Symposium on Microarchitecture (MICRO-52) , Columbus OH, October 2019. Lightning talk video
- GenCache: Leveraging In-Cache Operators for Efficient Sequence Alignment, Anirban Nag, C.N. Ramachandra, Rajeev Balasubramonian, Ryan Stutsman, Edouard Giacomin, Hari Kambalasubramanyam, Pierre-Emmanuel Gaillardon, 52nd International Symposium on Microarchitecture (MICRO-52) , Columbus OH, October 2019. Lightning talk video
- Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration, Anirban Nag, Rajeev Balasubramonian, Vivek Srikumar, Ross Walker, Ali Shafiee, John Paul Strachan, Naveen Muralimanohar, IEEE Micro Special Issue on Memristor-Based Computing, September/October 2018. [Expanded arxiv version]
- A Case for Dynamic Activation Quantization in CNNs, Karl Taht, Surya Narayanan, Rajeev Balasubramonian, 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), held in conjunction with ASPLOS, Williamsburg, March 2018.
- Moving CNN Accelerator Computations Closer to Data, Sumanth Gudaparthi, Surya Narayanan, Rajeev Balasubramonian, 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), held in conjunction with ASPLOS, Williamsburg, March 2018.
- Efficient ADC Utilization in Crossbar Acceleration, Anirban Nag, Ali Shafiee, Rajeev Balasubramonian, Vivek Srikumar, Naveen Muralimanohar, 9th Non-Volatile Memories Workshop (NVMW), San Diego, March 2018.
- Memory: The Dominant Bottleneck in Genomic Workloads, Meysam Taassori, Anirban Nag, Keeton Hodgson, Ali Shafiee, Rajeev Balasubramonian, Workshop on Accelerator Architecture in Computational Biology and Bioinformatics, held in conjunction with HPCA-24, Vienna, February 2018.
- INXS: Bridging the Throughput and Energy Gap for Spiking Neural Networks , Surya Narayanan, Ali Shafiee, Rajeev Balasubramonian, 30th International Joint Conference on Neural Networks (IJCNN-30), Anchorage, May 2017.
- Deep Network Acceleration with Memristor Crossbars, Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, Vivek Srikumar, 8th Non-Volatile Memories Workshop (NVMW), San Diego, March 2017.
- ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars , Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, Vivek Srikumar, 43rd International Symposium on Computer Architecture (ISCA-43) , Seoul, June 2016. Videos: Part I and Part II. Top Picks Honorable Mention
-
Modern multi-core processors and accelerators are placing severe pressure
on main memory systems. Several opportunities for innovation exist:
low-overhead security, lower energy per DRAM access, memory compression,
near-data processing, non-volatile memory systems, and 3D-stacked devices.
- PATHFINDER: Practical Real-Time Learning for Data Prefetching, Lin Jia, James McMahon, Sumanth Gudaparthi, Shreyas Singh, Rajeev Balasubramonian, 29th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-29), San Diego, April 2024.
- Interconnects for DNA, Quantum, In-Memory, and Optical Computing: Insights From a Panel Discussion, A. Ganguly, S. Abadal, I. Thakkar, N.E. Jerger, M. Riedel, M. Babaie, R. Balasubramonian, A. Sebastian, S. Pasricha, B. Taskin, IEEE Micro, Vol 42(3), February 2022.
- OrderLight: Lightweight Memory-Ordering Primitive for Efficient Fine-Grained PIM Computations, Anirban Nag, Rajeev Balasubramonian, 54th International Symposium on Microarchitecture (MICRO-54) , October 2021.
- Dvé: Improving DRAM Reliability and Performance On-Demand via Coherent Replication, Adarsh Patil, Vijay Nagarajan, Rajeev Balasubramonian, Nicolai Oswald, 48th International Symposium on Computer Architecture (ISCA-48) , June 2021.
- Efficient Oblivious Query Processing for Range and kNN Queries, Z. Chang, D. Xie, F. Li, J.M. Phillips, R. Balasubramonian, IEEE Transactions on Knowledge and Data Engineering, February 2021.
- Compact Leakage-Free Support for Integrity and Reliability, Meysam Taassori, Rajeev Balasubramonian, Siddhartha Chhabra, Alaa R. Alameldeen, Manjula Peddireddy, Rajat Agarwal, Ryan Stutsman, 47th International Symposium on Computer Architecture (ISCA-47) , June 2020.
- Innovations in the Memory System , Rajeev Balasubramonian, Synthesis Lectures on Computer Architecture , Morgan and Claypool Publishers, 2019.
- ρ: Relaxed Hierarchical ORAM, Chandrasekhar Nagarajan, Ali Shafiee, Rajeev Balasubramonian, Mohit Tiwari, 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-24), Providence, April 2019. Lightning talk video
- An MLP-Aware Leakage-Free Memory Controller, Andrew Vuong, Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, Workshop on Hardware and Architectural Support for Security and Privacy (HASP), held in conjunction with ISCA-45, Los Angeles, June 2018.
- VAULT: Reducing Paging Overheads in SGX with Efficient Integrity Verification Structures, Meysam Taassori, Ali Shafiee, Rajeev Balasubramonian, 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-23), Williamsburg, March 2018.
- Secure DIMM: Moving ORAM Primitives Closer to Memory, Ali Shafiee, Rajeev Balasubramonian, Mohit Tiwari, Feifei Li, 24th International Symposium on High-Performance Computer Architecture (HPCA-24) , Vienna, Austria, February 2018. Lightning talk video
- CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories , Rajeev Balasubramonian, Andrew B. Kahng, Naveen Muralimanohar, Ali Shafiee, Vaishnav Srinivas, ACM TACO, 2017 (invited for presentation at HiPEAC 2018).
- Deep Network Acceleration with Memristor Crossbars, Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, Vivek Srikumar, 8th Non-Volatile Memories Workshop (NVMW), San Diego, March 2017.
- Enabling Technologies for Memory Compression: Metadata, Mapping, and Prediction , Arjun Deb, Ali Shafiee, Rajeev Balasubramonian, Paolo Faraboschi, Naveen Muralimanohar, Robert Schreiber, 34th International Conference on Computer Design (ICCD-34), Phoenix, October 2016.
- Understanding and Alleviating Intra-Die and Intra-DIMM Parameter Variation in the Memory System , Meysam Taassori, Ali Shafiee, Rajeev Balasubramonian, 34th International Conference on Computer Design (ICCD-34), Phoenix, October 2016.
- Memory Interface, Aniruddha Udipi, Naveen Muralimanohar, Norm Jouppi, Rajeev Balasubramonian, Al Davis, US Patent No. 9,411,757, issued Aug 9 2016.
- Making the Case for Feature-Rich Memory Systems: The March Toward Specialized Systems, Rajeev Balasubramonian, IEEE Solid-State Circuits Magazine, Vol 8(2), June 2016.
- Addressing Service Interruptions in Memory with Thread-to-Rank Assignment , Manjunath Shevgoor, Rajeev Balasubramonian, Niladrish Chatterjee, Jung-Sik Kim, International Symposium on Performance Analysis of Systems and Software (ISPASS) , Uppsala, Sweden, April 2016. Best Paper Award
- Efficiently Prefetching Complex Address Patterns, Manjunath Shevgoor, Sahil Koladiya, Rajeev Balasubramonian, Seth Pugsley, Chris Wilkerson, Zeshan Chishti, 48th International Symposium on Microarchitecture (MICRO-48) , Hawaii, December 2015.
- Avoiding Information Leakage in the Memory Controller with Fixed Service Policies, Ali Shafiee, Akhila Gundu, Manjunath Shevgoor, Rajeev Balasubramonian, Mohit Tiwari, 48th International Symposium on Microarchitecture (MICRO-48) , Hawaii, December 2015.
- Improving Memristor Memory with Sneak Current Sharing, Manjunath Shevgoor, Naveen Muralimanohar, Rajeev Balasubramonian, Yoocharn Jeon, 33rd International Conference on Computer Design (ICCD-33), New York, October 2015.
- Fixed-Function Hardware Sorting Accelerators for Near Data MapReduce Execution, Seth Pugsley, Arjun Deb, Rajeev Balasubramonian, Feifei Li, 33rd International Conference on Computer Design (ICCD-33) (short poster paper), New York, October 2015.
- Efficiently Prefetching Complex Address Patterns, Manjunath Shevgoor, Sahil Koladiya, Zeshan Chishti, Rajeev Balasubramonian, 2nd Data Prefetching Championship (DPC2), held in conjunction with ISCA-42, Portland, June 2015.
- Designing a Fast and Reliable Main Memory with Memristor Technology, Manjunath Shevgoor, Naveen Muralimanohar, Rajeev Balasubramonian, 6th Non-Volatile Memories Workshop (NVMW), San Diego, March 2015.
- Designing a High-Performance Main Memory by Overcoming the Challenges of Crossbar Resistive Memory Architectures, Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, Yuan Xie, 6th Non-Volatile Memories Workshop (NVMW), San Diego, March 2015.
- Overcoming the Challenges of Crossbar Resistive Memory Architectures , Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, Yuan Xie, 21st International Symposium on High-Performance Computer Architecture (HPCA-21) , San Francisco, February 2015.
- A Case for Near Data Security , Akhila Gundu, Ali Shafiee, Manjunath Shevgoor, Rajeev Balasubramonian, 2nd Workshop on Near-Data Processing, held in conjunction with MICRO-47, Cambridge, UK, December 2014.
- Managing DRAM Latency Divergence in Irregular GPGPU Applications , Niladrish Chatterjee, Mike O'Connor, Gabriel H. Loh, Nuwan Jayasena, Rajeev Balasubramonian, SC'14 -- The International Conference for High Performance Computing, Networking, Storage and Analysis , New Orleans, November 2014.
- Comparing Implementations of Near Data Computing with In-Memory MapReduce Workloads , Seth Pugsley, Jeffrey Jestes, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, Feifei Li, IEEE Micro Special Issue on Big Data , July/August 2014.
- Near-Data Processing: Insight from a Workshop at MICRO-46 , Rajeev Balasubramonian, Jichuan Chang, Troy Manning, Jaime Moreno, Richard Murphy, Ravi Nair, Steve Swanson, IEEE Micro Special Issue on Big Data , July/August 2014.
- Memory Bandwidth Reservation in the Cloud to Avoid Information Leakage in the Memory Controller , Akhila Gundu, Gita Sreekumar, Ali Shafiee, Seth Pugsley, Hardik Jain, Rajeev Balasubramonian, Mohit Tiwari, 3rd Workshop on Hardware and Architectural Support for Security and Privacy (HASP), held in conjunction with ISCA-41, Minneapolis, June 2014.
- NDC: Analyzing the Impact of 3D-Stacked Memory+Logic Devices on MapReduce Workloads , Seth Pugsley, Jeffrey Jestes, Huihui Zhang, Rajeev Balasubramonian, Vijayalakshmi Srinivasan, Alper Buyuktosunoglu, Al Davis, Feifei Li, International Symposium on Performance Analysis of Systems and Software (ISPASS) , Monterey, March 2014.
- Exploring a Brink-of-Failure Memory Controller to Design an Approximate Memory System , Meysam Taassori, Niladrish Chatterjee, Ali Shafiee, Rajeev Balasubramonian, 1st Workshop on Approximate Computing Across the System Stack (WACAS), held in conjunction with ASPLOS-19, Salt Lake City, March 2014.
- MemZip: Exploiting Unconventional Benefits from Memory Compression , Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, Al Davis, 20th International Symposium on High-Performance Computer Architecture (HPCA-20) , Orlando, February 2014.
- Quantifying the Relationship between the Power Delivery Network and Architectural Policies in a 3D-Stacked Memory Device , Manjunath Shevgoor, Jung-Sik Kim, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, Aniruddha Udipi, 46th International Symposium on Microarchitecture (MICRO-46) , Davis, December 2013.
- Understanding the Role of the Power Delivery Network in 3D-Stacked Memory Devices , Manjunath Shevgoor, Jung-Sik Kim, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, Aniruddha Udipi, 5th Workshop on Energy Efficient Design (WEED), held in conjunction with ISCA-40, Tel Aviv, June 2013.
- Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access , Niladrish Chatterjee, Manjunath Shevgoor, Rajeev Balasubramonian, Al Davis, Zhen Fang, Ramesh Illikkal, Ravi Iyer, 45th International Symposium on Microarchitecture (MICRO-45) , Vancouver, December 2012.
- Optimizing Datacenter Power with Memory System Levers for Guaranteed Quality-of-Service , Kshitij Sudan, Sadagopan Srinivasan, Rajeev Balasubramonian, Ravi Iyer, 21st International Symposium on Parallel Architectures and Compilation Techniques (PACT-21) , Minneapolis, September 2012.
- LOT-ECC: LOcalized and Tiered Reliability Mechanisms for Commodity Memory Systems , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, Al Davis, Norm Jouppi, 39th International Symposium on Computer Architecture (ISCA-39) , Portland, June 2012.
- USIMM: the Utah SImulated Memory Module , Niladrish Chatterjee, Rajeev Balasubramonian, Manjunath Shevgoor, Seth H. Pugsley, Aniruddha N. Udipi, Ali Shafiee, Kshitij Sudan, Manu Awasthi, Zeshan Chishti, Technical Report UUCS-12-002, February 2012.
- Efficient Scrub Mechanisms for Error-Prone Emerging Memories , M. Awasthi, M. Shevgoor, K. Sudan, B. Rajendran, R. Balasubramonian, V. Srinivasan, 18th International Symposium on High-Performance Computer Architecture (HPCA-18) , New Orleans, February 2012.
- Staged Reads: Mitigating the Impact of DRAM Writes on DRAM Reads , N. Chatterjee, N. Muralimanohar, R. Balasubramonian, A. Davis, N. Jouppi, 18th International Symposium on High-Performance Computer Architecture (HPCA-18) , New Orleans, February 2012.
- Managing Data Placement in Memory Systems with Multiple Memory Controllers , Manu Awasthi, Dave Nellans, Kshitij Sudan, Rajeev Balasubramonian, Al Davis, International Journal of Parallel Programming (IJPP) , Vol 40(1), February 2012.
- Prediction Based DRAM Row-Buffer Management in the Many-Core Era , Manu Awasthi, David Nellans, Rajeev Balasubramonian, Al Davis, Proceedings of PACT-20 (poster session, second prize) Galveston Island, October 2011.
- Combining Memory and a Controller with Photonics through 3D-Stacking to Enable Scalable and Energy-Efficient Systems , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, Al Davis, Norm Jouppi, 38th International Symposium on Computer Architecture (ISCA-38) , San Jose, June 2011. CRA Research Highlight .
- Handling PCM Resistance Drift with Device, Circuit, Architecture, and System Solutions , Manu Awasthi, Manju Shevgoor, Kshitij Sudan, Rajeev Balasubramonian, Bipin Rajendran, Viji Srinivasan, 2nd Non-Volatile Memories Workshop (NVMW), San Diego, March 2011.
- Handling the Problems and Opportunities Posed by Multiple On-Chip Memory Controllers , Manu Awasthi, David Nellans, Kshitij Sudan, Rajeev Balasubramonian, Al Davis, 19th International Conference on Parallel Architectures and Compilation Techniques (PACT-19) , Vienna, September 2010. Best paper award
- A summary of our ASPLOS'10 and ISCA'10 papers.
- Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores , Aniruddha Udipi, Naveen Muralimanohar, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, Norm Jouppi, 37th International Symposium on Computer Architecture (ISCA-37) , St. Malo, France, June 2010.
- Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement , Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian, Al Davis, 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-XV) , Pittsburgh, March 2010.
- Rethinking DRAM Design for Low-Power Datacenters, Aniruddha N. Udipi, Naveen Muralimanohar, Niladrish Chatterjee, Rajeev Balasubramonian, 16th International Conference on High Performance Computing (HiPC)(poster session, best poster presentation award), India, December 2009.
-
More than half the area of future chips will be occupied by large cache
hierarchies. Large caches will be partitioned into numerous banks connected
by an on-chip network -- a non-uniform cache architecture (NUCA).
On-going work is designing tools to estimate an
optimal cache organization and exploring mechanisms (reconfiguration,
data mapping, on-chip network design) to improve a core's access to its
data.
- Sandbox Prefetching: Safe, Run-Time Evaluation of Aggressive Prefetchers , Seth Pugsley, Zeshan Chishti, Chris Wilkerson, Troy Chuang, Robert Scott, Aamer Jaleel, Shih-Lien Lu, Kingsum Chow, Rajeev Balasubramonian, 20th International Symposium on High-Performance Computer Architecture (HPCA-20) , Orlando, February 2014. Top Picks Honorable Mention
- Multi-Core Cache Hierarchies , Rajeev Balasubramonian, Norman P. Jouppi, Naveen Muralimanohar, Synthesis Lectures on Computer Architecture , Morgan and Claypool Publishers, 2011.
- Understanding the Behavior of Pthread Applications on Non-Uniform Cache Architectures , Gagandeep S. Sachdev, Kshitij Sudan, Mary W. Hall, Rajeev Balasubramonian, Proceedings of PACT-20 (poster session) Galveston Island, October 2011.
- Refining the Utility Metric for Utility-Based Cache Partitioning , Xing Lin, Rajeev Balasubramonian, 9th Workshop on Duplicating, Deconstructing, and Debunking (WDDD), held in conjunction with ISCA-38, San Jose, June 2011.
- CHOP: Integrating DRAM Caches for CMP Server Platforms , Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravi Iyer, Srihari Makineni, Donald Newell, Yan Solihin, Rajeev Balasubramonian, IEEE Micro's Special issue on Top Picks from 2010 Computer Architecture Conferences , January/February 2011.
- SWEL: Hardware Cache Coherence Protocols to Map Shared Data onto Shared Caches , Seth H. Pugsley, Josef Spjut, David Nellans, Rajeev Balasubramonian, 19th International Conference on Parallel Architectures and Compilation Techniques (PACT-19) , Vienna, September 2010.
- Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, 16th International Symposium on High-Performance Computer Architecture (HPCA-16) , Bangalore, January 2010.
- CHOP: Adaptive Filter-based DRAM Caching for CMP Server Platforms , Xiaowei Jiang, Niti Madan, Li Zhao, Mike Upton, Ravi Iyer, Srihari Makineni, Donald Newell, Yan Solihin, Rajeev Balasubramonian, 16th International Symposium on High-Performance Computer Architecture (HPCA-16) , Bangalore, January 2010. Top Picks
- Non-Uniform Power Access in Large Caches with Low-Swing Wires , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, 16th International Conference on High Performance Computing (HiPC) , Kochi, December 2009. Best paper award
- Optimizing a Multi-Core Processor for Message-Passing Workloads , Niladrish Chatterjee, Seth H. Pugsley, Josef Spjut, Rajeev Balasubramonian, 5th Workshop on Unique Chips and Systems (UCAS-5), held in conjunction with ISPASS, Boston, April 2009.
- Dynamic Hardware-Assisted Software-Controlled Page Placement to Manage Capacity Allocation and Sharing within Large Caches , Manu Awasthi, Kshitij Sudan, Rajeev Balasubramonian, John Carter, 15th International Symposium on High-Performance Computer Architecture (HPCA-15) , Raleigh, February 2009.
- Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy , Niti Madan, Li Zhao (Intel), Naveen Muralimanohar, Aniruddha Udipi, Rajeev Balasubramonian, Ravishankar Iyer (Intel), Srihari Makineni (Intel), Donald Newell (Intel), 15th International Symposium on High-Performance Computer Architecture (HPCA-15) , Raleigh, February 2009.
- Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , Naveen Muralimanohar, Rajeev Balasubramonian, Norm Jouppi (HP Labs), selected to appear in IEEE Micro's Special issue on Top Picks from 2007 Computer Architecture Conferences , Jan/Feb 2008.
- Optimizing NUCA Organizations and Wiring Alternatives for Large Caches With CACTI 6.0 , Naveen Muralimanohar, Rajeev Balasubramonian, Norm Jouppi (HP Labs), 40th International Symposium on Microarchitecture (MICRO-40) , Chicago, December 2007. Top Picks
- Interconnect Design Considerations for Large NUCA Caches , Naveen Muralimanohar, Rajeev Balasubramonian, 34th International Symposium on Computer Architecture (ISCA-34) , San Diego, June 2007.
- The Effect of Interconnect Design on the Performance of Large L2 Caches , Naveen Muralimanohar, Rajeev Balasubramonian, 3rd IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2) , Yorktown Heights, October 2006.
- Leveraging Bloom Filters for Smart Search Within NUCA Caches , Robert Ricci, Steve Barrus, Dan Gebhardt, Rajeev Balasubramonian, 7th Workshop on Complexity-Effective Design (WCED), held in conjunction with ISCA-33 , Boston, June 2006.
- A Dynamically Tunable Memory Hierarchy , Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas, IEEE Transactions on Computers, Vol.52, No.10, October 2003.
-
In future architectures, computation will be relatively cheap, while
communication will be expensive. Performance can be degraded by wires that
take tens of cycles to send signals across the length of the chip.
Interconnects also contribute a significant fraction of power dissipated
in modern processors. VLSI techniques enable a variety of wire
implementations with different latency, power, and bandwidth properties.
This project is examining if power and performance bottlenecks can be
alleviated by exposing wire properties to the architecture level.
- Buses and Crossbars , Rajeev Balasubramonian, Timothy Pinkston, Encyclopedia of Parallel Computing , D. Padua, editor. Springer Science+Business Media, 2011.
- Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, 16th International Symposium on High-Performance Computer Architecture (HPCA-16) , Bangalore, January 2010.
- Non-Uniform Power Access in Large Caches with Low-Swing Wires , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, 16th International Conference on High Performance Computing (HiPC) , Kochi, December 2009 (Best paper award).
- Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy , Niti Madan, Li Zhao (Intel), Naveen Muralimanohar, Aniruddha Udipi, Rajeev Balasubramonian, Ravishankar Iyer (Intel), Srihari Makineni (Intel), Donald Newell (Intel), 15th International Symposium on High-Performance Computer Architecture (HPCA-15) , Raleigh, February 2009.
- Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , Naveen Muralimanohar, Rajeev Balasubramonian, Norm Jouppi (HP Labs), selected to appear in IEEE Micro's Special issue on Top Picks from 2007 Computer Architecture Conferences , Jan/Feb 2008.
- Optimizing NUCA Organizations and Wiring Alternatives for Large Caches With CACTI 6.0 , Naveen Muralimanohar, Rajeev Balasubramonian, Norm Jouppi (HP Labs), 40th International Symposium on Microarchitecture (MICRO-40) , Chicago, December 2007.
- Understanding the Impact of 3D Stacked Layouts on ILP , Manu Awasthi, Vivek Venkatesan, Rajeev Balasubramonian, The Journal of Instruction-Level Parallelism (JILP) , Volume 9, June 2007.
- Interconnect Design Considerations for Large NUCA Caches , Naveen Muralimanohar, Rajeev Balasubramonian, 34th International Symposium on Computer Architecture (ISCA-34) , San Diego, June 2007.
- Leveraging Wire Properties at the Microarchitecture Level , Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, Liqun Cheng, and John Carter, IEEE Micro , Vol. 26, No. 6, November/December 2006.
- Exploring the Design Space for 3D Clustered Architectures , Manu Awasthi, Rajeev Balasubramonian, 3rd IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2) , Yorktown Heights, October 2006.
- The Effect of Interconnect Design on the Performance of Large L2 Caches , Naveen Muralimanohar, Rajeev Balasubramonian, 3rd IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2) , Yorktown Heights, October 2006.
- Interconnect-Aware Coherence Protocols for Chip Multiprocessors , Liqun Cheng, Naveen Muralimanohar, Karthik Ramani, Rajeev Balasubramonian, and John Carter, 33rd International Symposium on Computer Architecture (ISCA-33) , Boston, June 2006.
- Re-Visiting the Performance Impact of Microarchitectural Floorplanning , Anupam Chakravorty, Abhishek Ranjan, Rajeev Balasubramonian, 3rd Workshop on Temperature Aware Computer Systems (TACS), held in conjunction with ISCA-33 , Boston, June 2006.
- Wire Management for Coherence Traffic in Chip Multiprocessors , Liqun Cheng, Naveen Muralimanohar, Karthik Ramani, Rajeev Balasubramonian, and John Carter, 6th Workshop on Complexity-Effective Design (WCED), held in conjunction with ISCA-32 , Madison, June 2005.
- Microarchitectural Wire Management for Performance and Power in Partitioned Architectures , Rajeev Balasubramonian, Naveen Muralimanohar, Karthik Ramani, and Venkatanand Venkatachalapathy, 11th International Symposium on High-Performance Computer Architecture (HPCA-11) , San Francisco, February 2005.
- Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors , Karthik Ramani, Naveen Muralimanohar, and Rajeev Balasubramonian, 5th Workshop on Complexity-Effective Design (WCED), held in conjunction with ISCA-31 , Munich, June 2004.
- Microarchitectural Wire Management for Performance and Power in Partitioned Architectures, R. Balasubramonian, L. Cheng, J. Carter, N. Muralimanohar, K. Ramani, US Patent No. 7,478,190, issued Jan 13 2009.
-
Radiation and noise-induced transient errors in computer systems are on
the rise. Such errors can be detected at the architecture level by
executing two copies of every instruction and conducting periodic checks.
This technique is commonly referred to as redundant multi-threading (RMT).
However, it can impose significant power overheads. This project is
attempting a comprehensive analysis of RMT techniques to better
understand how we can simultaneously optimize reliability, power, and
performance. In recent years, our focus has been on designing efficient
techniques to detect and correct errors in DRAM main memory and PCM systems.
- LOT-ECC: LOcalized and Tiered Reliability Mechanisms for Commodity Memory Systems , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, Al Davis, Norm Jouppi, 39th International Symposium on Computer Architecture (ISCA-39) , Portland, June 2012.
- Efficient Scrub Mechanisms for Error-Prone Emerging Memories , M. Awasthi, M. Shevgoor, K. Sudan, B. Rajendran, R. Balasubramonian, V. Srinivasan, 18th International Symposium on High-Performance Computer Architecture (HPCA-18) , New Orleans, February 2012.
- Handling PCM Resistance Drift with Device, Circuit, Architecture, and System Solutions , Manu Awasthi, Manju Shevgoor, Kshitij Sudan, Rajeev Balasubramonian, Bipin Rajendran, Viji Srinivasan, 2nd Non-Volatile Memories Workshop (NVMW), San Diego, March 2011.
- Rethinking DRAM Design and Organization for Energy-Constrained Multi-Cores , Aniruddha Udipi, Naveen Muralimanohar, Niladrish Chatterjee, Rajeev Balasubramonian, Al Davis, Norm Jouppi, 37th International Symposium on Computer Architecture (ISCA-37) , St. Malo, France, June 2010.
- Leveraging 3D Technology for Improved Reliability , Niti Madan, Rajeev Balasubramonian, 40th International Symposium on Microarchitecture (MICRO-40) , Chicago, December 2007.
- Power Efficient Approaches to Redundant Multithreading , Niti Madan, Rajeev Balasubramonian, IEEE Transactions on Parallel and Distributed Systems (Special Issue on CMP Architectures) , Vol. 18, No. 8, pp. 1066-1079, August 2007.
- Exploiting Eager Register Release in a Redundantly Multi-Threaded Processor , Niti Madan, Rajeev Balasubramonian, 2nd Workshop on Architectural Reliability (WAR-2), held in conjunction with MICRO-39, Orlando, December 2006.
- A First-Order Analysis of Power Overheads of Redundant Multi-Threading , Niti Madan, Rajeev Balasubramonian, 2nd Workshop on the System Effects of Logic Soft Errors (SELSE-2) , Urbana, April 2006.
- Power-Efficient Approaches to Reliability , Niti Madan, Rajeev Balasubramonian, Technical Report UUCS-05-010, December 2005.
-
Transactional memory (TM) is viewed as a promising approach to simplify
the task of parallel programming. It is widely believed that support for
TM can be provided by the hardware and many implementation flavors have
been recently proposed. We are currently exploring scalability,
power-efficiency, and verification aspects of these implementations.
- Scalable and Reliable Communication for Hardware Transactional Memory , Seth H. Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar, Rajeev Balasubramonian, 17th International Conference on Parallel Architectures and Compilation Techniques (PACT-17) , Toronto, October 2008.
- Scalable, Reliable, Power-Efficient Communication for Hardware Transactional Memory , Seth H. Pugsley, Manu Awasthi, Niti Madan, Naveen Muralimanohar, Rajeev Balasubramonian, Technical Report UUCS-08-001, January 2008.
- Commit Algorithms for Scalable Hardware Transactional Memory , Seth H. Pugsley, Rajeev Balasubramonian, Technical Report UUCS-07-016, August 2007.
-
Scalability of Clustered Microarchitectures.
Clustered microarchitectures distribute the processor's resources
across multiple clusters. Because of the small size of each individual
cluster, complexity is lower and a faster clock is possible. Instructions
and data of a single program are distributed across the clusters,
incurring frequent data communications and an IPC penalty. The increased
cost of wire delays in future generations renders microprocessors
communication-bound. This project examines many performance and power
issues in the design of highly clustered microarchitectures at future
technology points.
- Understanding the Impact of 3D Stacked Layouts on ILP , Manu Awasthi, Vivek Venkatesan, Rajeev Balasubramonian, The Journal of Instruction-Level Parallelism (JILP) , Volume 9, June 2007.
- Exploring the Design Space for 3D Clustered Architectures , Manu Awasthi, Rajeev Balasubramonian, 3rd IBM Watson Conference on Interaction between Architecture, Circuits, and Compilers (P=ac2) , Yorktown Heights, October 2006.
- Power Efficient Resource Scaling in Partitioned Architectures through Dynamic Heterogeneity , Naveen Muralimanohar, Karthik Ramani, and Rajeev Balasubramonian, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) , Austin, March 2006.
- Cluster Prefetch: Tolerating On-Chip Wire Delays in Clustered Microarchitectures , Rajeev Balasubramonian, 18th International Conference on Supercomputing (ICS-18) , Saint-Malo, June 2004.
- Dynamically Managing the Communication-Parallelism Trade-Off in Future Clustered Processors , Rajeev Balasubramonian, Sandhya Dwarkadas, and David Albonesi, 30th International Symposium on Computer Architecture (ISCA-30), San Diego, June 2003.
- Microarchitectural Trade-offs in the Design of a Scalable Clustered Microprocessor , Rajeev Balasubramonian, Sandhya Dwarkadas, and David Albonesi, URCS Technical Report #771, January 2002.
- Multi-Cluster Processor Operating only Select Number of Clusters during each Phase Based on Program Statistic Monitored at Predetermined Intervals, Rajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi, US Patent No. 7,490,220, issued Feb 10 2009.
- Performance Monitoring for New Phase Dynamic Optimization of Instruction Dispatch Cluster Configuration, Rajeev Balasubramonian, Sandhya Dwarkadas, David H. Albonesi, US Patent No. 8,103,856, issued Jan 24 2012.
-
Exploiting Criticality for Low Power Cache Design.
Most low power techniques for the data cache introduce access time
penalties. We propose the use of two cache banks, one optimized for high
performance, and the other optimized for low power. Instructions and data
are split into two streams, and instructions that lie on the program
critical path are steered to the high performance cache so that the
performance impact of the low power cache is kept to a minimum.
- Non-Uniform Power Access in Large Caches with Low-Swing Wires , Aniruddha Udipi, Naveen Muralimanohar, Rajeev Balasubramonian, 16th International Conference on High Performance Computing (HiPC) , Kochi, December 2009 (Best paper award).
- Hot-and-Cold: Using Criticality in the Design of Energy-Efficient Caches , Rajeev Balasubramonian, Viji Srinivasan, Sandhya Dwarkadas, Alper Buyuktosunoglu, 3rd Workshop on Power-Aware Computer Systems (PACS), held in conjunction with MICRO-36 , San Diego, December 2003.
-
Dynamic Voltage and Frequency Scaling.
Problems with clock distribution and wire delays motivate the use of
multiple domains within the processor, each with its own clock. This
provides the opportunity to scale down the frequency and voltage of
each individual domain in order to reduce energy consumption. By
only targeting those domains that do not lie on the critical path for
the program, the impact on performance is minimized.
- Energy Efficient Processor Design Using Multiple Clock Domains with Dynamic Voltage and Frequency Scaling , Greg Semeraro, Grigoris Magklis, Rajeev Balasubramonian, David Albonesi, Sandhya Dwarkadas, and Michael Scott, 8th International Symposium on High-Performance Computer Architecture (HPCA-8), pp. 29-40, Cambridge, February 2002.
- Multiple Clock Domain Microprocessor, David H. Albonesi, Greg Semeraro, Grigoris Magklis, Michael L. Scott, Rajeev Balasubramonian, and Sandhya Dwarkadas, US Patent No. 7,089,443, issued Aug 8 2006.
-
Register File Complexity.
The number of registers and register ports dictate the amount of ILP
that can be extracted. At the same time, the access time and the energy
consumption of the register file pose serious design constraints. We
evaluate the effect of two orthogonal techniques on performance, energy,
and cycle time. The first splits the register file into two levels to
reduce the number of entries in the first level. The second introduces
scheduling complexity in the issue queue and at the functional units in
order to reduce the porting requirements on the register file.
- Exploiting Eager Register Release in a Redundantly Multi-Threaded Processor , Niti Madan, Rajeev Balasubramonian, 2nd Workshop on Architectural Reliability (WAR-2), held in conjunction with MICRO-39, Orlando, December 2006.
- Reducing the Complexity of the Register File in Dynamic Superscalar Processors , Rajeev Balasubramonian, Sandhya Dwarkadas, and David Albonesi, 34th International Symposium on Microarchitecture (MICRO-34), pp. 237-248, Austin, December 2001.
- A High-Performance Two-Level Register File Organization , Rajeev Balasubramonian, Sandhya Dwarkadas, and David Albonesi, URCS Technical Report #745, Apr 2001.
-
Efficient Pre-Execution Threads.
The ability to mine distant ILP is hampered by the limited size of
the register file, a key determinant of cycle time. This work proposes
the use of a helper thread that uses part of the register file to jump
ahead of the processor state to prefetch data and resolve branch
predictions early. The helper thread can look in a much larger window
for ILP because of relaxed conditions for register deallocation. A
runtime scheme dynamically determines the partition of resources between
the main program thread and the pre-execution thread.
-
Reconfigurable Memory Hierarchies.
This work explores the potential of a novel cache and TLB layout that
allows for low-cost configurability. The boundary between the L1 and L2
can be dynamically changed so that each program phase is given the
amount of L1 that has the best capacity/access time trade-off. In
addition to improving performance, this also helps reduce energy
consumption by reducing the number of transfers between the various
levels of the hierarchy.
- The POP Detector: A Lightweight Online Program Phase Detection Framework , K. Taht, J. Greensky, R. Balasubramonian, International Symposium on Performance Analysis of Systems and Software (ISPASS), Madison, March 2019.
- Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures, Rajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas, US Patent No. 6,834,328, issued Dec 21 2004.
- Dynamically Reconfigurable Memory Hierarchy, Rajeev Balasubramonian, David H. Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas, US Patent No. 6,684,298, issued Jan 27 2004.
- Dynamically Tuning Processor Resources with Adaptive Processing , D.H. Albonesi, Rajeev Balasubramonian, S.G. Dropsho, S. Dwarkadas, E.G. Friedman, M.C. Huang, V. Kursun, G. Magklis, M.L. Scott, G. Semeraro, P. Bose, A. Buyuktosunoglu, P.W. Cook, S.E. Schuster, IEEE Computer, Special Issue on Power-Aware Computing , Vol.36, No.12, December 2003.
- A Dynamically Tunable Memory Hierarchy , Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas, IEEE Transactions on Computers, Vol.52, No.10, October 2003.
- Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power , Steve Dropsho, Alper Buyuktosunoglu, Rajeev Balasubramonian, David Albonesi, Sandhya Dwarkadas, Greg Semeraro, Grigoris Magklis, and Michael Scott, 11th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 141-152, Charlottesville, September 2002.
- Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures, Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas, 33rd International Symposium on Microarchitecture (MICRO-33), pp. 245-257, Monterey, December 2000.
- Dynamic Memory Hierarchy Performance Optimization , Rajeev Balasubramonian, David Albonesi, Alper Buyuktosunoglu, and Sandhya Dwarkadas, Workshop on Solving the Memory Wall Problem, held in conjunction with the 27th ISCA, Vancouver, June 2000.
Most publications are copyrighted by IEEE or ACM. Please respect these
copyrights. Typically, personal or classroom use is granted; papers cannot be
duplicated for commercial purposes. In recent years, the research group
has been funded by NSF grant CCF-0430063, NSF CAREER award CCF-0545959,
NSF grant CCF-0811249, NSF grant CCF-0916436, NSF grant CNS-1302663, NSF grant CNS-1423583, NSF grant CNS-1718834, NSF grant CCF-2119677, SRC Contract 2008-TJ-1847, Intel, Google, HP Labs, IBM, Samsung,
and the University of Utah.
Any opinions, findings, and conclusions or recommendations expressed in
this material are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation or any other sponsor.
Some of our simulation results are derived with Simics that is supported by
Virtutech .
Calvin's take on research