Find Storage Acronyms Irritating? Me Too.

Posted by

Are you confused by the many acronyms used in the storage industry? Do you want to understand what they mean?  Here is an opportunity to dig into the mysteries of storage technology.  This first article will focus on the file and object protocols like SMB, NFS, and S3.  Support for all three seems to be the nirvana many storage vendors seek to reach, but just because you have support for these, is it all the same?

Let’s start with my first pet peeve, CIFS and SMB. When Microsoft first released the standard file access protocol, it was SMB1 or CIFS. All versions of SMB after the first are referred to SMB vX.  Saying something supports CIFS only implies that only SMB v.1 is supported.

It started with a local file system; as data sets began to grow and demands for larger file systems began to increase, scale-out file systems came on the market. This is where multiple system nodes (servers) are aggregated into a single file system.  Files may be read from any of these system nodes in a cluster. The confusion comes when scale-out is confused with parallel.  Scale-out aggregates nodes together into a single cluster with a single file system spanning across all the nodes, but this doesn’t mean that data can be streamed in parallel across multiple nodes in the cluster simultaneously. Parallel file system implies that an action can be taken in parallel across multiple nodes at the same time.  You may have a scale-out system without it being parallel. 

In the past two decades we have seen object storage gain market traction, especially with AWS S3 storage service. Object storage lacks the hierarchy of a file system.  In any file system there will be files organized into folders, directories, and file systems and there is metadata at each level. In object storage, each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions (a file with the same name but different data may not exist in the same folder or directory). Though many file systems/NAS say they support S3 API, it doesn’t mean that they are object storage with all the characteristics.  S3 is a network protocol, like SMB and NFS; it gives access, but it doesn’t define the underlying architecture of the system. A local file system or a scale-out or parallel can export NFS and SMB but they are very different in their capabilities. The same is true with S3; a NAS exporting S3 is very different from an object storage system. When evaluating the options, don’t forget, cucumbers and zucchinis may look alike, but are very different vegetables.

The last one I will mention today is snapshots. We know that snapshots are an efficient way to protect data; provide a point in time state of data. Beyond that, few provide differentiation between a file snapshot, a file system snapshot, or a block snapshot. So, let’s be clear:

  • A file snapshot is a space efficient versioning. It captures the block changes that were made to a file and stores them with a reference point to the original file. It is a version of a file without creating a full copy.
  • A file system snapshot is a space efficient capture of changes at the file system level. This means that you can view the file system as a whole at a point in time. In order to extract individual files that have changed, the file system would be mounted, and file extracted.
  • Block snapshot is the capture of changed ones and zeros at a LUN level. Block storage systems don’t have the higher-level file system visibility so can only capture what has changed at the block level.  To extract a file system state or a file version, the snapshot would be mounted to the file system and then data would be extracted.

Why does this matter? It matters because different products implement capabilities differently but use common terminology. Without understanding the meaning behind the terms, it is too easy to fall pray to checking the box without getting the desired end result.

2 comments

Leave a Reply to Super André Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.