SSRC Retreat Highlights

Back in May, I had the pleasure of attending the SSRC Retreat.  The SSRC is short for System Services Resource Center. Sponsored by the UC Santa Cruz Computer Engineering department, the SSRC group was formed to enable Computer Engineering graduate students to form working relationships with companies and entities in the computer industry, both private and government. I got involved with the SSRC since my PhD thesis advisor (Dr. Ahmed Amer) received his PhD from UC Santa Cruz and he is currently an advisor to the SSRC.

The purpose of the SSRC Retreat is to provide a venue for graduate students to present their current work. I was very impressed by all the students’ presentations as each were interesting. However, I’d like to spotlight a just few that related to my perspective and experience.

Shingled Disk Storage

So the first subject that peaked my interest was “Shingled Disk Storage Systems” by Dr. Ahmed Amer and “Emulating a Shingled Disk” by Rekha Pitchumani.

The term “Shingled Disks” is analogous to shingles on the roof of a house. This work is driven by the desire to dramatically shrink the width of disk tracks. In fact the disk tracks are so narrow that a write to a single track bleeds over into the neighboring two tracks. This property is an artifact of the magnetic property of disk drives. It seems that those pesky little electrons do not want to behave themselves.

This is all well and good if data is written once and not modified since successive writes will only overlay unwritten disk tracks. But what about data updates? This is a big problem. Dr. Amer and Ms. Pitchumani are investigating it. What software and algorithmic changes are required to take advantage of the Shingled Disk design? What are the benefits and advantages? I look forward to hearing from these two in the future.

Cloud Storage Reliability

The next subject that caught my eye was one that is near and dear to my heart – Cloud Storage. Specifically “Improved Reliability for Cloud Storage System” by Ignacio Corderi.

I have spent some time in the Openstack Cloud Storage Swift project. So I have some familiarity with this subject. Essentially Mr. Corderi is addressing the issue of how to optimally connect disk devices in a Storage Cloud. That is, how to optimally facilitate disk device addition and removal. In Openstack Swift, the disks are addressed via Consistent Hashing. Mr. Corderi has a novel approach that is more efficient. I am wondering if I can assist Mr. Corderi in implementing and modeling his research on Openstack Swift.

Data Provenance

The next presentation of interest was “Fundamental Properties of Data Provenance” by Stephanie Jones. Ms. Jones is focusing on the long term storage of data provenance information.

Data Provenance essentially consists of keeping track of how data is manipulated and accessed. This greatly increases data storage requirements. This work somewhat overlaps my thesis area of interest – Streaming OLAP. That is, high speed database injest.

ExoKernels

And finally, I would like to introduce Mr. James Larkby-Laket. His research is in the area of “ExoKernels”. The goal of an ExoKernel is to expose as much functionality as possible in user space. This goal is obviously highly laudable since such a kernel greatly facilitates porting software to new hardware. It’s achieved mainly by mapping everything via page tables. This would be similar to Memory Mapping – which allows files to be directly mapped in process memory. This is facilitated by 64 bit addresses. Since 64 bits maps 16 ExaBytes, it will probably be a while before we have any kind of common (or reasonable) access to a system with 16 ExaBytes of direct attached disk space. Thus, the ExoKernel maps all devices through page tables.

I have spent some time examining Storage Compute Hyper Visors, namely Openstack Nova KVM and Xen. I was not impressed with KVM or Xen. They were difficult to use and very hard to debug. I am interested to see if Mr. Larkby-Laket’s can use the ExoKernel to replace the Hyper Visor. This would be a huge improvement in Cloud Computing.