Publications

Accelerating Sparse Deep Neural Networks on FPGAs

Deep neural networks (DNNs) have been widely adopted in many domains, including computer vision, natural language processing, and …

Upate on Triangle Counting on GPU

This work presents an update to the triangle-counting portion of the subgraph isomorphism static graph challenge. This work is …

Upate on k-truss Decomposition on GPU

In this paper, we present an update to our previous submission on k-truss decomposition from Graph Challenge 2018. For single GPU …

Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects

Data-intensive applications such as machine learning and analytics have created a demand for faster interconnects to avert the memory …

Collaborative (CPU+ GPU) Algorithms for Triangle Counting and Truss Decomposition

In this paper, we present an update to our previous submission from Graph Challenge 2017. This work describes and evaluates new …

SCOPE: C3SR Systems Characterization and Benchmarking Framework

This report presents the design of the Scope infrastructure for extensible and portable benchmarking. Improvements in high-performance …

NUMA-Aware Data-Transfer Measurements for Power/NVLink Multi-GPU Systems

High-performance computing increasingly relies on heterogeneous systems with specialized hardware accelerators to improve application …

Heterogeneous Application and System Modeling

With the end of Dennard scaling, high-performance computing increasingly relies on heterogeneous systems with specialized hardware to …

A Fast and Massively-Parallel Solver for Multiple-Scattering Tomographic Image Reconstruction

We present a massively-parallel solver for large Helmholtz-type inverse scattering problems. The solver employs the distorted Born …

Rebooting the Data Access Hierarchy of Computing Systems

In this paper, we present our view of massively-parallel heterogeneous computing for solving large scientific problems. We start by …

Thoughts on Massively-Parallel Heterogeneous Computing for Solving Large Problems

In this paper, we present our view of massively-parallel heterogeneous computing for solving large scientific problems. We start by …

Scalable Parallel DBIM Solutions of Inverse-Scattering Problems

We report scalable solutions of inverse-scattering problems with the distorted Born iterative method (DBIM) on large number of …

Comparative Performance Evaluation of Multi-GPU MLFMM Implementation for 2-D VIE Problems

We compare multi-GPU performance of the multilevel fast multipole method (MLFMM) on two different systems: A shared-memory IBM S822LC …

RAI: A Scalable Project Submission System for Parallel Programming Courses

A major component of many advanced programming courses is an open-ended “end-of-term project” assignment. Delivering and evaluating …

Large Inverse-Scattering Solutions with DBIM on GPU-Enabled Supercomputers

We report inverse-scattering solutions on supercomputers involving large numbers of graphics processing units (GPUs). The …

WebGPU: A Scalable Online Development Platform for GPU Programming Courses

The popularity of computer science classes offered through Massive Open On-line Courses (MOOCs) creates both opportunities and …

Adaptive Cache Bypass and Insertion for Many-Core Accelerators

Many-core accelerators, e.g. GPUs, are widely used for accelerating general-purpose compute kernels. With the SIMT execution model, …