Conference Program



At a glance:


Monday 25 Tuesday 26 Wednesday 27 Thursday 28 Friday 29
08:05 – 08:15

Session 5

08:15 – 08:30 Welcome
08:30 – 08:45 WT 1 WT 5
08:45 – 09:45 Keynote Address 1 Keynote Address 2
09:45 – 10:10 Session 1 Session 7
10:10 – 10:50 Coffee Break Coffee Break Coffee Break Coffee Break Coffee Break
10:50 – 12:30 WT 2 Session 2 Session 6 Session 8 WT 6
12:30 – 12:55 Lunch Lunch Lunch Lunch
12:55 – 14:00 Lunch
14:00 – 15:40 WT 3 Session 3 Excursion &
Social Dinner
Session 9 WT 7
15:40 – 16:20 Coffee Break Coffee Break Coffee Break Coffee Break
16:20 – 18:00 WT 4 Session 4 Session 10 WT 8
18:00 – 19:00 Welcome Reception



Detailed Program


Note: some workshops and tutorials (see the the Workshops and Tutorials page) have been cancelled.

Monday June 25th

08:30 − 10:10
1
10:10 − 10:50 Coffee Break
10:50 − 12:30
2
12:30 − 14:00 Lunch
14:00 − 15:40
3
15:40 − 16:20 Coffee Break
16:20 − 18:00
4
18:00 − 19:00 The Reception will be held nearby the conference rooms, and will include appetizers and drinks.

Tuesday June 26th

08:15 − 08:45
Welcome
08:45 − 09:45 Keynote Address 1
Yale N. Patt High Performance Supercomputers: should the individual processor be more than a brick?
09:45 − 10:10
1
:
Micro-Architecture 1
Chair: Manolis Katevenis, FORTH & Univ. of Crete, Greece
Mengjie Mao. Distributed Replay Protocol for Distributed Uniprocessors
10:10 − 10:50 Coffee Break
10:50 − 12:30
2
:
GPU
,
Compilers
Chair: David Padua, Univ. of Illinois at Urbana-Champaign, USA
Wenhao Jia, Kelly A. Shaw and Margaret Martonosi. Characterizing and Improving the Use of Demand-Fetched Caches in GPUs
Ziyu Guo, Bo Wu, and Xipeng Shen . One Stone Two Birds: Synchronization Relaxation and Redundancy Removal in GPU-CPU Translations
Hongtao Yu and Zhiyuan Li. Fast Loop-level Data Dependence Profiling
Nishkam Ravi, Yi Yang, Tao Bao and Srimat Chakradhar. Apricot: A Compiler-based Productivity and Performance Tool for x86-compatible Many-core Coprocessors
12:30 − 14:00 Lunch
14:00 − 15:40
3
:
Fault Tolerance
Chair: Eli Upfal, Brown University, USA
Somayeh Sardashti and David Wood. UniFI: Leveraging Non-Volatile Memories for a Unified Fault Tolerance and Idle Power Management Technique
Manu Shantharam, Sowmyalatha Srinivasmurthy and Padma Raghavan. Fault Tolerant Preconditioned Conjugate Gradient for Sparse Linear System Solution
Wenjing Ma and Sriram Krishnamoorthy. Data-driven Fault Tolerance for Work Stealing Computations
Marc Casas Guix, Bronis R. de Supinski, Greg Bronevetsky and Martin Schulz. Fault Resilience of the Algebraic Multi-Grid Solver
15:40 − 16:20 Coffee Break
16:20 − 18:00
4
:
Micro-Architecture 2
,
Interconnection Networks
Chair: Kei Hiraki, University of Tokyo, Japan
Janani Mukundan, Saugata Ghose, Robert Karmazin, Engin İpek and José F. Martínez. Overcoming Single-Thread Performance Hurdles in the Core Fusion Reconfigurable Multicore Architecture
Mingxing Tan, Xianhua Liu, Dong Tong and Xu Cheng. CVP: An Energy-Efficient Indirect Branch Prediction with Compiler-Guided Value Pattern
Miao Luo, Dhabaleswar Panda, Costin Iancu and Khaled Ibrahim. Congestion Avoidance on Manycore High Performance Computing Systems
Yi Xu, Jun Yang and Rami Melhem. Channel Borrowing: An Energy-Efficient Nanophotonic Crossbar Architecture with Light-Weight Arbitration

Wednesday June 27th

08:05 − 10:10
5
:
Runtime
,
Dependences
,
Load Balancing
Chair: Jose Moreira, IBM Research, USA
Liang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu and James Tuck. HiRe: Using Hint & Release to Improve Synchronization between Speculative Threads
Gokcen Kestor, Roberto Gioiosa, Osman Unsal, Adrian Cristal and Mateo Valero. Enhancing the Performance of Assisted Execution Runtime Systems through Hardware/Software Techniques
Quan Chen, Minyi Guo and Zhiyi Huang. CATS: Cache Aware Task-Stealing based on Online Profiling in Multi-socket Multi-core Architectures
Tao Sun, Tao Wang, Haibo Zhang and Xiufeng Sui. CRQ-based Fair Scheduling on Composable Multicore Architectures
Olga Pearce, Todd Gamblin, Bronis de Supinski, Martin Schulz and Nancy Amato. Quantifying the Effectiveness of Load Balance Algorithms
10:10 − 10:50 Coffee Break
10:50 − 12:55
6
:
Communication
,
HPC Applications
Chair: Nancy Amato, Texas A&M University, USA
John Stevenson, Alex Solomatnikov, Amin Firoozshahian, Mark Horowitz and David Cheriton. Sparse Matrix-Vector Multiply on the HICAMP Architecture
Kenneth Czechowski, Casey Battaglino, Chris McClanahan and Richard Vuduc. On the communication complexity of 3D FFTs and its implications for exascale
Ilie Tanase, George Almasi, Charles Archer and Hanhong Xue. Hybrid Collective Operations on Power7 IH
Anshul Mittal, Nikhil Jain, Thomas George, Sameer Kumar and Yogish Sabharwal. Collective Algorithms for Sub-communicators
Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri and Eli Upfal. Space-Round Tradeoffs for MapReduce Computations
12:55 − 14:00 Lunch
15:00 −
Excursion and Social Dinner

Thursday June 28th

08:45 − 09:45 Keynote Address 2
Michael Gschwind Blue Gene/Q: Design for Sustained Multi-Petaflop Computing
09:45 − 10:10
7
:
Workloads
Chair: Gianfranco Bilardi, University of Padova, Italy
Wayne Joubert and Shiquan Su. An Analysis of Computational Workloads for the ORNL Jaguar System
10:10 − 10:50 Coffee Break
10:50 − 12:30
8
:
Memory Hierarchies
and
Interconnects
Chair: Dhabaleswar (DK) Panda, Ohio State University. USA
Nagendra Gulur, Manikantan R, Mahesh Mehendale and Govindarajan R. Multiple Sub-Row Buffers in DRAM: Unlocking Performance and Energy Improvement Opportunities
Yasuo Ishii, Mary Inaba and Kei Hiraki. Unified Memory Optimizing Architecture: Memory Subsystem Control with a Unified Predictor
Dongyuan Zhan, Hong Jiang and Sharad C. Seth. Locality & Utility Co-optimization for Practical Capacity Management of Shared Last Level Caches
Keith Underwood and Eric Borch. Exploiting Communication and Packaging Locality for Cost-effective Large Scale Networks
12:30 − 14:00 Lunch
14:00 − 15:40
9
:
GPUs
and
Parallel Programming
Chair: Keshav Pingali, University of Texas at Austin, USA
Paruj Ratanaworabhan, Martin Burtscher, Darko Kirovski and Benjamin Zorn. Hardware Support for Enforcing Isolation in Lock-Based Parallel Programs
Justin Holewinski, Louis-Noel Pouchet and P. Sadayappan. High-Performance Code Generation for Stencil Computations on GPU Architectures
John W. Romein. An Efficient Work-Distribution Strategy for Gridding Radio-Telescope Data on GPUs
Oded Green, Robert McColl and David Bader. GPU Merge Path - A GPU Merging Algorithm
15:40 − 16:20 Coffee Break
16:20 − 18:00
10
:
GPUs
,
CPUs
, and
Linear Algebra
Chair: Hironori Nakajo, Tokyo University A&T, Japan
Jungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo and Jaejin Lee. SnuCL: An OpenCL Framework for Heterogeneous CPU/GPU Clusters
Bor-Yiing Su and Kurt Keutzer. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs
Fengguang Song, Stanimire Tomov and Jack Dongarra. Enabling and Scaling Matrix Computations on Heterogeneous Multi-Core and Multi-GPU Systems
Jiajia Li and Guangming Tan. Experience of Optimizing DGEMM on a Heterogeneous Architecture with CPU and ATI-GPU

Friday June 29th

08:30 − 10:10
5
10:10 − 10:50 Coffee Break
10:50 − 12:30
6
12:30 − 14:00 Lunch
14:00 − 15:40
7
15:40 − 16:20 Coffee Break
16:20 − 18:00
8