C ollection and analysis of large data sets is perennially hot.  Remember Data Warehouses?  ‘Big Data’ is just the latest buzzword for this trend.  Admit it – it’s an alluring vision.  Supposedly just save enough data and apply the right tools, and insight (and money) will rain from the clouds.  Though frequently clothed in breathless hype, there is a kernel of truth here.  You can find insight in rivers of data if you have the right tools.  Organizations across a range of industries are successfully capturing and analyzing oceans of machine- and sensor-generated data with Splunk.

If you want to pan for gold in the big data river you’ll need a river, and a pan.  The river is running right at your feet.  It’s in the logs and alert streams from your servers and network gear, or your vending machines, warehouse robots, sensor networks, whatever.  And the best-known pan is Splunk.

Understanding what is happening now is only a first step.  For data to be useful it must have context.  You can’t understand today without comparing it to times past.  To see trends and anomalies you have to save a bunch of that data flowing by.  You need a reservoir.

Splunk uses a “bucket” metaphor for storage.  Incoming data lands in a “hot” bucket, and eventually rolls to “warm” and “cold” buckets as it ages.  But all three – even cold buckets – require high-performance storage such as flash or disk.  (Quantum QXS hybrid storage is perfect for this use, by the way.)  If you want to move indexes from cold buckets to cheaper storage, you can “freeze” them.  But frozen indexes can’t be used for queries or analysis.  To bring them back you first manually locate the right index in the freezer, then “thaw” it using the CLI.

High-performance storage costs money, and there is never enough of it.  Time is valuable, and manually locating and thawing indexes is extremely inconvenient.  The expedient solution is to delete the older data when you run out of space, and this is what most Splunk users do.  But this truncates your context, limiting your ability to see long-term trends like seasonal business cycles and annual growth.

Quantum’s StorNext appliances Artico and Xcellis are fantastic reservoirs for Splunk.  Their transparent tiering enables what we call “cold bucket extension.”  StorNext allows your cold buckets to grow beyond disk onto cheaper media, like object storage, public cloud, or even tape.  When a query is submitted requiring data that has been tiered, Splunk displays a polite progress bar while StorNext retrieves the required indexes back to performance disk.  It’s automatic, no freeze/thaw needed!

This magic is possible because of StorNext’s “stub” capability.  Normally when a file is purged from the appliance disk cache (we say truncated), the file metadata remains so the file is still visible to users.  When a truncated file is accessed, StorNext copies the file data from an archive tier back to the disk cache.

But when stubbing is enabled and files are truncated, a user-defined amount of file data is retained on the appliance disk in addition to the metadata.  Files smaller than the defined thresholds are retained on disk in their entirety.  For larger files the selected amount of data from the head of the file stays on disk, while the rest is truncated.  When the cached file data is accessed, StorNext initiates a copy of the rest of the file back to the appliance disk from the archive.

Splunk indexes consist of large files containing compressed data, and smaller control files.  The control files and initial portion of the data files must be available at disk speed or queries will fail.  But after the initial access, Splunk will patiently wait for additional data to be delivered, even if it must be retrieved from tape.  By configuring StorNext stubbing appropriately, Splunk queries run successfully even when the majority of the cold bucket index data is archived.

Between the cost-effective high performance of Quantum QXS hybrid storage and cold bucket extension with Artico or Xcellis, now you can afford to save a lot more Splunk index data for analysis.  That alone won’t make you a Splunk Ninja, but it can’t hurt.  And I don’t know about you, but all these water analogies are making me thirsty.  I’m going to get a drink.

If you want to pan for gold in the big data river you’ll need a river, and a pan.  The river is running right at your feet.  It’s in the logs and alert streams from your servers and network gear, or your vending machines, warehouse robots, sensor networks, whatever.  And the best-known pan is Splunk.

Understanding what is happening now is only a first step.  For data to be useful it must have context.  You can’t understand today without comparing it to times past.  To see trends and anomalies you have to save a bunch of that data flowing by.  You need a reservoir.

Splunk uses a “bucket” metaphor for storage.  Incoming data lands in a “hot” bucket, and eventually rolls to “warm” and “cold” buckets as it ages.  But all three – even cold buckets – require high-performance storage such as flash or disk.  (Quantum QXS hybrid storage is perfect for this use, by the way.)  If you want to move indexes from cold buckets to cheaper storage, you can “freeze” them.  But frozen indexes can’t be used for queries or analysis.  To bring them back you first manually locate the right index in the freezer, then “thaw” it using the CLI.

High-performance storage costs money, and there is never enough of it.  Time is valuable, and manually locating and thawing indexes is extremely inconvenient.  The expedient solution is to delete the older data when you run out of space, and this is what most Splunk users do.  But this truncates your context, limiting your ability to see long-term trends like seasonal business cycles and annual growth.

Quantum’s StorNext appliances Artico and Xcellis are fantastic reservoirs for Splunk.  Their transparent tiering enables what we call “cold bucket extension.”  StorNext allows your cold buckets to grow beyond disk onto cheaper media, like object storage, public cloud, or even tape.  When a query is submitted requiring data that has been tiered, Splunk displays a polite progress bar while StorNext retrieves the required indexes back to performance disk.  It’s automatic, no freeze/thaw needed!

This magic is possible because of StorNext’s “stub” capability.  Normally when a file is purged from the appliance disk cache (we say truncated), the file metadata remains so the file is still visible to users.  When a truncated file is accessed, StorNext copies the file data from an archive tier back to the disk cache.

But when stubbing is enabled and files are truncated, a user-defined amount of file data is retained on the appliance disk in addition to the metadata.  Files smaller than the defined thresholds are retained on disk in their entirety.  For larger files the selected amount of data from the head of the file stays on disk, while the rest is truncated.  When the cached file data is accessed, StorNext initiates a copy of the rest of the file back to the appliance disk from the archive.

Splunk indexes consist of large files containing compressed data, and smaller control files.  The control files and initial portion of the data files must be available at disk speed or queries will fail.  But after the initial access, Splunk will patiently wait for additional data to be delivered, even if it must be retrieved from tape.  By configuring StorNext stubbing appropriately, Splunk queries run successfully even when the majority of the cold bucket index data is archived.

Between the cost-effective high performance of Quantum QXS hybrid storage and cold bucket extension with Artico or Xcellis, now you can afford to save a lot more Splunk index data for analysis.  That alone won’t make you a Splunk Ninja, but it can’t hurt.  And I don’t know about you, but all these water analogies are making me thirsty.  I’m going to get a drink.

Recommended Posts

Leave a Comment