Debugging Code

Debugging Code

I would like to take some time and pontificate about a subject that is near and dear to my heart – Debugging Code.  So here are some basic principles:

1)  Developers should be responsible for debugging – not Quality Assurance.

Developers know the code – they wrote it.
The same is not true for QA.
So there is no way that QA can debug code as well as Development.

An Application Locking Facility (part 3)

An Application Locking Facility (part 3)

In my last two blog posts, I presented the idea of an Application Locking Mechanism and I also presented a prototype.  I would now like to present the full implementation of this facility.  This is the final post in this series.

//
//  The magic number is used to help detect data corruption in the lock
//  structure.
//

#define LOCK_MAGIC              314157

An Application Locking Facility

An Application Locking Facility

In previous blog posts, I have described how the operating system solves locking problems.  Now a days, Linux is popular and it is a good idea to try to solve problems in
user code.  This makes application code much easier to port and goes a long way in making it platform independent.  So I would now like to describe a Locking Mechanism that is completely implemented in user space.  So what features do we want to support:

1)  Multiple readers.
If there is no writer, then there can be multiple readers.

OLTP Database & OLAP: PhD Thesis Overview Part 3

OLTP Database & OLAP: PhD Thesis Overview Part 3

Now that I’ve presented an overview of OLTP Database & OLAP as well as addressed the problems found during my research, I’ll now list list of projects and papers that currently comprise my PhD Thesis.

1. A new type of OLAP hypercube that links cells to the underlying OLTP database.

This work was written in C. I used the Network Data Stream as an implementation example. The Network Data Stream is defined to be: {content, time stamp, destination
ip, destination location, destination port, mail bcc, mail cc, mail file name, mail recipient, mail sender, mail subject, protocol, size, source ip, source location, source port}.

OLTP Database & OLAP: PhD Thesis Overview Pt 2

OLTP Database & OLAP: PhD Thesis Overview Pt 2

In my last blog post, I presented an overview of OLTP and OLAP. I would now like to address two problems with OLTP database and OLAP that I have found in my research.

2 Problems with OLTP Database & OLAP

Problem 1: The OLTP database is not linked to the OLAP hypercube. The OLAP hypercube is unable to provide query details (by design). In general, this is ok for queries that are focused on trend analysis. But what about anomaly detection, outliers, forensics, or security? In these cases, a count of 1 is significant and the desire is to obtain quick details.

PhD Thesis Overview

PhD Thesis Overview

A PhD Thesis is not a single paper or project but rather a body of work. It is a progression of projects, papers, and research which is hopefully under the umbrella of a single unifying theme or subject.

With that said, the subject of my thesis is “Streaming OLAP”, where OLAP is OnLine Analytical Processing. OLAP is a contrast to OLTP, which is OnLine Transaction Processing. So how do OLTP, OLAP, and Streaming OLAP relate to each other?

OLTP can be thought of as traditional processing performed on relational databases. These databases typically consisted of megabytes of data. This was back in the 1970s. Such database queries were highly detailed and required fast response times.

A Shared Memory File System (part 3)

A Shared Memory File System (part 3)

In previous blog posts, I have presented the idea of a shared memory file system, that is FAT based. The FAT consists of a Boot Block and a Disk Block Table. I will finish up this discussion when I describe:

  • Directory Blocks.
  • Extending a FAT in place.
  • File Locking.

Shared Memory File System Directory Blocks

So if a disk block is used, then it is either a directory or a file. So how do we know if a disk block is a directory or a file? We use the next to last most significant bit in the disk block number as a directory flag.

A Shared Memory File System (part 2)

A Shared Memory File System (part 2)

Last time I presented the idea of a memory file system that is FAT based. I would now like to describe the design of our FAT.

Memory File System & The Boot Block

Any decent memory file system starts with a Boot Block. This is the first disk block on the file system partition. The Boot Block contains the file system data structure. And here is what it looks like:

A High Performance Write File System

A High Performance Write File System

I would like to take some time to describe how to write a high performance file system that is optimized. For simplicity sake, I call it the Write File System and it has 3 goals:

Three Goals of A Write File System

1) USER APPLICATION

We do not want any kernel code. Why? Because it makes it much easier to port to a new platform, both for hardware and operating system reasons. In addition, user code is more reliable. That is, errors that don’t crash the system and just result in a core dump.

GUI Interface to the Linux Engine

GUI Interface to the Linux Engine

I would like to spend some time talking about a subject that is near and dear to my heart – the GUI Interface to the Linux Engine. In today’s computing environment, Linux is usually used as a platform. That is, most Linux implementations primarily consist of user applications. They tend to stay out of the kernel. Why, you ask? Because hardware is rapidly changing and it is much easier to port user applications as opposed to kernel changes. Also, it is much easier to debug user applications. And lastly, user applications usually do not crash the kernel. Thus such systems are inherently more reliable.

The Non Maskable Interrupt

The Non Maskable Interrupt

I would like to take some time to discuss a wonderful coding tool that is provided by a number of modern chips makers. I am (of course) talking about the Non Maskable Interrupt (NMI).

In general, there are two System Registers that are used to manage system interrupts – the Interrupt Mask Register and the Interrupt Cause Register. The Interrupt Mask Register allows the root user to disable/enable specific interrupts. This register contains a bit for each interrupt type. The Interrupt Cause Register indicates when the interrupts are ready for service. This register also contains a bit for each interrupt type.

Swift MySQL RESAR Part 8

Swift MySQL RESAR Part 8

The final installment of this eight-part series about the Swift MySQL RESAR saga. In Part 7, we presented the performance results for Swift RESAR and the Stream Star Schema. In this post we will complete this series by summarizing the project and attempt to draw some conclusions.

Powerful Extension to Swift Cloud Storage

In this project, we have described and demonstrated a powerful extension to Swift cloud storage: Swift RESAR. This facility greatly empowers Swift Administrators in managing large numbers of cloud devices. It also fully enables said administrators to employ mathematical models so that device reliability can be optimized.

Swift MySQL RESAR Part 7

Swift MySQL RESAR Part 7

The Swift MySQL RESAR project had less than stellar results on its iteration. Last time I presented a new approach to Swift RESAR: the Stream Star Schema. In this post we will present results for this new approach.

Implementing Stream Star Schema to MySQL RESAR

Once the Stream Star Schema was defined, the Swift RESAR database was also implemented as a Stream Star Schema. The following is the Stream Star Schema for RESAR:

MetaData Fact Table

    1. CreateTime – string
    1. DiskletSize – int32

Swift MySQL RESAR Part 6

Swift MySQL RESAR Part 6

Last time we theorized why the RESAR Swift MySQL approach has such dismal performance results. In this post we will discuss a new approach.

B-Trees & Hashing

Specifically, we will define the stream star schema that will result in a database that is optimized for insertion performance. This performance improvement will primarily be the result of the improved attribute indexing mechanism. Many relational databases like MySQL use B-trees to implement attribute indexes. Index table insertion time thus depends on the table size and is O(log n) where n is the number of entries in a table. For the stream star schema, we proposed using Hash Tables instead of B-Trees.

Swift RESAR Project Part 5

Swift RESAR Project Part 5

My last blog post presented the performance results for RESAR Swift using MySQL. I will now analyze why this approach has such dismal performance results.

Poor RESAR Swift Performance Explained

You will remember that for a cloud cluster of 1 million devices, database construction time was (on average) 0.12 seconds for a single device and reliability group. It required over 34 hours to create the entire database of 1 million devices and 1 million reliability groups.  In addition, the minimum query time was 0.000272622203333 seconds and the maximum was 0.00045933403 seconds. We sincerely hoped that this database creation time was excessive and that the second RESAR Swift approach would greatly improve database performance.

Swift RESAR Project Part 4

Swift RESAR Project Part 4

This post is a continuation of the Swift RESAR saga. Last time I presented the design of RESAR Swift using MySQL. I would now like to present performance results for this implementation.

Swift MySQL Database Construction

The following shows MySQL database construction times for a given number of disk devices. Each disk device was partitioned into 3 disklets. A Reliability Group was also created for each disk device. Each Reliability Group consisted of 3 disklets.

Swift RESAR Project Part 3

Swift RESAR Project Part 3

This continues the RESAR saga. Last time I explained the project design; now I’d like to focus on how RESAR was implemented in Swift.

Two Available Swift Approaches

During our research, we realized that there were essentially two approaches available. The first approach emphasized leveraging existing code in the Python community. Swift is implemented in the Python programming language. So the Swift RESAR project is also implemented in Python. So for the first approach, the primary goal was to minimize the amount of new code thus resulting in timely results. On the other hand, the second approach emphasized performance. We were willing to write new code as long as it resulted in better RESAR performance. This approach will be presented in a future blog.