Tyler Muth’s Blog

Technology with a focus on Oracle, Application Express and Linux

A Little Hard Drive History and the Big Data Problem

Posted by Tyler Muth on November 2, 2011

Most of the “Big Data Problems” I see are related to performance. Most often these are long running reports or batch processes, but also include data loads. I see a lot of these as one of my primary roles is in leading Exadata “Proof of Value” (POV) engagements with customers. It’s my observation that over 90% of the time, the I/O throughput of the existing system is grossly under-sized. To be fair, the drives themselves in a SAN or NAS are not the bottleneck, it’s the 2x 2 Gbps or 2x 4 Gbps Fibre channel connection to the SAN more often than not. However, the metrics for hard drive performance and capacity over the last 30 or so years were a lot easier to find so that’s what this post is about.

In short, our capacity to store data has far outpaced our ability to process that data. Here’s an example using “personal” class drives to illustrate the point. In the year 2000 I could by a Maxtor UDMA 20 GB hard drive connected locally via (P) ATA which provided roughly 11 MB/s of throughput (the drive might be limited to a bit less, but not much). Today, I can buy a 3 TB Seagate Barracuda XT ST32000641AS that can sustain 123 MB/s of throughput. So, lets say I need to scan (search) my whole drive. 21 years ago that would take 31 minutes. Today it would take 7 hours! What?!?! The time it takes to search the data I can store has increased by a factor of 14x. How can this be? What about Moore’s Law? Sorry, that only works for processors. Oh right, I meant Kryder’s Law. Kryder’s Law states that magnetic disk areal storage density doubles annually and has nothing to do with performance. Well, actually this phenomenon is half the problem since throughput isn’t doubling annually.

I did a little research and found some relevant data on seagate.com, tomshardware.com, and of course The Wayback Machine. Now on to the data and some graphs (click to enlarge graphs):

Seagate ST506

Rodime R032

Seagate ST3550A

Quantum Fireball ST3 2A

IBM DTTA-351010

Seagate Cheetah X15

Seagate Cheetah X15.3

Seagate Cheetah 15k.6

Seagate Cheetah 15k.7

Seagate Pulsar XT.2 SSD

Year 1979 1983 1993 1998 1999 2001 2003 2008 2011 2011
Capacity (MB) 5 10 452 3276 10240 18432 36864 460800 614400 409600
Capacity (GB) 0.005 0.010 0.441 3.2 10 18 36 450 600 400
Throughput (MB/s) 5 0.6 11.1 7.6 9.5 29 63.6 164 204 360
Capacity Factor of Change 1x 2x 90x 655x 2,048x 3,686x 7,373x 92,160x 122,880x 81,920x
Throughput Factor of Change 1x 0.1x 2x 2x 2x 6x 13x 33x 41x 72x
Capacity Percent Change 0% 100% 8,940% 65,436% 204,700% 368,540% 737,180% 9,215,900% 12,287,900% 8,191,900%
Throughput Percent Change 0% -88% 122% 52% 90% 480% 1,172% 3,180% 3,980% 7,100%
Time in Seconds to Read Full Drive 1 16 41 431 1078 636 580 2810 3012 1138

Time-in-Seconds-to-Read-Ful

Capacity-vs-Throughput---Lo

So, what does all of this mean? Well, to me it means if you’re architecting a data warehouse or even small data mart, make sure you focus on the storage. Over and over again I get drawn into discussions about the minutiae of chip speeds or whether Linux is faster than Solaris, yet when I ask about the storage the answer is almost universally “we’ll throw it on the SAN”. OK, how many HBAs and what speed? Is the SAN over-utilized already? etc, etc, etc.

So, how does this relate to Exadata? The central focus of Exadata is I/O throughput. By including processing power in the storage servers, Exadata has a fixed relationship between capacity and throughput. As you add capacity, you add throughput.

Posted in Exadata, Oracle, Uncategorized | 11 Comments »

Dropbox for Servers

Posted by Tyler Muth on September 29, 2011

I love dropbox. Let me say that again: I love dropbox! I’ve been using it for well over a year now and it’s the best file synchronizing / sharing service I’ve found (and I’m certainly not alone in that opinion). I use it on my laptop, my home desktop, my home server, my iPhone, my iPad, and with selective-sync, on my wife’s laptop. All of my important files (up to 50 GB worth which is what I pay for) are synced between all of those places all of the time. I don’t send 20 MB email attachments (which completely annoys me), I send links to download from dropbox. The list goes on and I’m now on a bit of a tangent.

Sever Use Case

I work on a lot of shared servers, such as for POVs (proof of value, benchmark) and most of them are Linux-based (I also sometimes use Amazon EC2 instances to teach APEX Classes). I found myself spending a lot of time transferring files to and from the server such as SQL scripts, AWR reports, SQL Monitoring reports, etc via scp (sometimes rsync). I didn’t want to setup my dropbox account on the server as well since it has all of my personal files too. Then I had a bit of an epiphany(and I’m sure I’m not the first one to think of this):

  1. Create a 2nd, free dropbox account (2 GB of free space)
  2. Setup that account on the linux server using the CLI version of dropbox located here. Dropbox has a formal, supported Linux client, but there were a number of package dependencies and specific versions that were more challenging to work around than it was worth.
  3. Share a folder between my primary dropbox account and my 2nd dropbox account.
Since all my secondary dropbox account will see are the folders I share with it, anyone else that happens to be on the server won’t have access to my personal files. Whatever files I add to that folder on my laptop will almost instantly show up on the server and vice versa.

Security Caveat

Having co-authored a book on security (“Applied Oracle Security“), it’s clearly one of my primary interests. While dropbox does encrypt your files when stored on it’s server and does use TLS to encrypt them on the network, and they have policies and internal controls in place to prevent their own employees from decrypting them without authorization, it’s still possible. They store and can access the encryption keys. If the government compels them to decrypt your files, they can and will. So, if you’re storing your tax records or corporate intellectual property, I strongly encourage you to use some form of client-side encryption. Personally, I use a truecrypt file (which I mount as a disk) for super-sensitive stuff within dropbox. Actually, my whole dropbox folder sits on a truecrypt volume (50 GB) which prevents someone that steals my laptop from getting to all of my files, but this does nothing to stop dropbox employees from accessing them. I then create small (1-30 MB) truecrypt files / volumes within dropbox to encrypt anything that is super-sensitive before it leaves my laptop. So, my layers are Big Truecrypt Volume > Dropbox Folder > Little truecrypt volumes for any sensitive files.

Posted in Linux, Oracle | 4 Comments »

Off-Topic: Avoid Levana Baby Monitors

Posted by Tyler Muth on September 23, 2011

I know this is completely off topic and I haven’t had time to blog in a while but there’s been a new addition in our family. I know there are a lot of parents out there and after the experience I’ve had with Levana, I’ll do everything in my power to make sure others avoid their products. Here’s a link to my review on amazon : http://sn.im/levana-review. I’ll also try and post some positive product reviews soon to offset this negativity…

Posted in Uncategorized | 8 Comments »

Static Version of rlwrap for sqlplus

Posted by Tyler Muth on August 31, 2011

Quick Summary

If you don’t have root access to a machine, it can be challenging to compile rlwrap (a nice companion to sqlplus). You can compile it on another machine / VM, then use a utility called statifier to produce one binary that includes all of the required dependencies. If you want to just cut to the chase, you can download the version I compiled for OEL 5.6 x64 here.

How to Compile

If you don’t trust what I’ve compiled (and you probably shouldn’t if you care about security), here are the steps to compile your own version.

On a machine or VM that you control running a comparable Linux distribution:

  1. Download and compile rlwrap.
  2. Download and compile statifier.
  3. Compile your static version of rlwrap:
    # statifier /usr/local/bin/rlwrap /tmp/rlwrap_static_x64
Copy the file to your locked-down server (where you don’t have root), then:
  1. Put it in some logical directory such as ~/local/bin
  2. chmod +x ~/local/bin/*
  3. I create a symlink for it (but you could just rename it if you want):
    ln -s ~/local/bin/rlwrap_static_x64 ~/local/bin/rlwrap
  4. Edit ~/.bashrc and add it to your path:
    export PATH=$PATH:~/local/bin
    alias sqlplus='rlwrap sqlplus'
  5. source ~/.bashrc

More Details

You might be saying, “but Tyler, surely your sys admins trust you with root?”. No, they don’t, and I’m fine with that. I work on a lot of Exadata POVs where we have absurdly tight timeframes. The machines are managed by another group (a group which I have a tremendous amount of respect for). So, I never get root, but that also limits my liability to some degree if something should go wrong.
Yes, I know you can compile the required libraries (gnu readline and ncurses) and set compiler flags to point to different directories for these libraries, but something always goes wrong and I’m tired of hacking around with it.

Posted in Linux, Oracle | 6 Comments »

My Brief Review of “Expert Oracle Exadata”

Posted by Tyler Muth on August 12, 2011

I purchased the Alpha version of Expert Oracle Exadata (amazon link) several months ago, written by Kerry Osborne, Tanel Poder and Randy Johnson. The print copy is still in pre-order (shipping very soon), but you can order the ebook from apress here. My primary role these days is talking about and working with Exadata in customer presentations, classes, and POVs. While I’m not in the same league as the real Exadata experts such as the authors, I’ve worked with Exadata enough over the past year or so to have a good sense about what you need to know.

Expert Oracle Exadata (Image from apress.com)

This book is simply outstanding. The concepts are clear and concise, and ordered in a nice logical progression.  The language has a very approachable, conversational tone. Most importantly, the book is filled with examples that serve to either reinforce or prove the concept of that section. In case you are concerned that there is a lot of marketing “fluff”, forget it. None of the authors work for Oracle and are completely free to speak their minds (so are Oracle employees, just emphasizing the point here). Their primary motivation is to educate and they clearly highlight some of the “issues” you can encounter in the real world with Exadata (yes, they exist).

Additionally, the comments from Kevin Closson that appear throughout the production release of the book as “Kevin Says” are worth the cost of the book alone. Kevin was one of the architects of Exadata and there are very few people (if any) that know more about Exadata from the high level concepts down to the code itself. He also tends to tell it like it is, keeping the facts paramount, and all other considerations secondary. On a related note, Kevin was a great mentor to me over the last year or so when I got thrown into the Exadata world. I had no clue about hardware (“what’s an HBA?”), nor much experience with data warehousing. I can’t thank him enough for his time and  efforts.

In my opinion, this is by far the best source of information out there on Exadata. Actually, I usually reccomend that people watch Kevin’s webcast entitled “Oracle Exadata Storage Server Technical Deep Dive. Part I” (I also made an iPhone friendly version of it here with chapter markers) first, as it gives you a 1 hour intro to the concepts. After watching that, buy (and read) the book.

To the authors, thank you for thanking me! They added a nice thank you section to the “Unofficial Editors” in which I was mentioned by name. This was a very nice touch and I sincerely appreciate it. They helped me a whole lot more than I helped them.

Posted in Exadata, Oracle | 5 Comments »

Logger 1.4.0 Released

Posted by Tyler Muth on August 11, 2011

This is just a minor update to logger (project page here) to fix a bug in detecting DB Version 11.2 on RAC.

One important point to note is that in APEX 4.0 (or possibly 4.1), the APEX team removed the “create any context” priv when creating a schema associated with provisioning a workspace. Logger needs this priv. So, this means you won’t be able to simply install logger through the SQL Workshop without granting this priv to the schema owning logger. The biggest impact will be on hosted instances such as apex.oracle.com as you can’t simply ask someone to grant this priv to you (and please don’t try). I will try to see if it’s easy to work around this issue and what the implications are in the next few weeks.

Posted in APEX, Oracle, PLSQL | 3 Comments »

AWR Formatter 1.6 Released

Posted by Tyler Muth on July 12, 2011

I made some minor updates to my previously released Google Chrome Extension called AWR Formatter. Enhancements include:

  • Performance increased by 3x in rendering large AWR reports.
  • Added new tabs to SQL Popup for Query Plan, Color SQL, and AWR SQL Report

The new options in the SQL Popup expose features of the AWR repository that I don’t think many people are aware of, but I think are really useful.

Installation

If you already have it installed, it should update automatically and the version will say 1.6. If you do not have it installed:

I hope to add support for RAC reports soon…

Posted in AWR, Oracle | Tagged: , , , | 42 Comments »

awrload.sql > ORA-12899: value too large for column….

Posted by Tyler Muth on June 15, 2011

Today I tried to load exported AWR data from a recent Exadata POV via $ORACLE_HOME/rdbms/admin/awrload.sql and recieved the following error:

ORA-12899: value too large for column "SYS"."WRH$_ACTIVE_SESSION_HISTORY"."MODULE"

Searched all over and didn’t find an answer (including MOS). The issue was that the data was exported from an 11.2.0.2 database and I was trying to import into 11.2.0.1. I don’t really expect that to work flawlessly, but a better error would have been nice. Patched my local DB up to 11.2.0.2 and everything worked great. Hopefully google will lead someone here and save them some time…

Posted in Oracle | 2 Comments »

AWR Formatter

Posted by Tyler Muth on April 20, 2011

I’ve found myself looking at a lot of AWR Reports lately and doing the same calculations on them over and over. Converting gets / reads / bytes to KB / MB / GB / TB, converting seconds to hours minutes and seconds, etc. So, I wrote a few lines (~25) of JavaScript (using jQuery) to do some of this math for me. Like many projects, it quickly grew and grew. So, 2,000+ lines of JavaScript later, I present to you a google chrome plugin I’ve creatively entitled “AWR Formatter”.

It will not tune your database nor end world hunger, it just makes AWRs a lot easier to read. I did attempt to make some “observations”, but I certainly want to be careful with these. Currently it works with 10.2+ (maybe 10.1) single-node reports. It does not work on RAC / Global reports or on compare period reports but I’m going to start on those soon.

Thanks to Tom Kyte, Graham Wood, Robert Freeman and Tom Roach for your contributions.

Installation

Features (screencast at the end)

  • “Smart Text” conversion of gets / reads / bytes to KB / MB / GB / TB. You can click the orange text to cycle through these units
  • Medium size tables are sortable
  • Large tables are searchable
  • All documented wait events are “hot” and local definitions are included
  • Nicely formatted SQL Full text popup
  • Combined view of key Top SQL sections
  • “Observations” including:
    • Time period of AWR > 60 minutes
    • Hidden / undocumented parameter use
    • Highlight memory recommendations from AWR
  • Link to MOS note for “PX” events
  • Link to documentation for  “gc” events
  • I/O Graphs of tablespace activity
  • Updates to the extension are delivered automatically as I release new versions
  • I am not collecting any data or sending any data to myself or anywhere else. You can review the code anytime via the instructions in the last paragraph
You can watch a screencast of it in action here. If you need a “demo” AWR Report, you can download this report from an Exadata 1/4 rack. As always, feedback or enhancement requests are welcome. If you want to view the source, just change the file extension from .crx to .zip and unzip it.

Posted in Exadata, Oracle | 58 Comments »

APEX is 57.4x Faster on Exadata

Posted by Tyler Muth on April 12, 2011

…is just about as useful (and accurate) as saying APEX is just as fast on my laptop as it is on Exadata. However, it makes for a great title. Now that I hopefully have your attention, lets talk about the reasons for this post. As many of you know, APEX used to be my primary specialty but I’ve changed roles a bit over the last year at Oracle and my primary focus is now Exadata. I’ve received an increasing number of questions about APEX on Exadata lately so I thought I’d put together some thoughts on them. Additionally, I’ve had a lot of Exadata questions recently that lead me to the conclusion that it’s not as well understood as I thought, at least within the groups of people I regularly interact with. Contrary to popular belief, it’s not a mythological beast powered by ferry dust and unicorn blood. It’s a very well engineered solution that solves a number of exceptionally challenging problems. In my opinion, the more you understand about it, the more you’ll appreciate it.

Does APEX Work on Exadata?

Yep, Exadata is just Oracle. Let me say that again: It’s just Oracle. Exadata runs an 11.2 Database on Linux x64. It’s the exact same binary install if you download those binaries for “generic” Linux x64 11.2 from OTN. So, if your code / app runs on 11.2, it runs on Exadata. There are a few exceptions to this rule for apps that don’t play well with ASM. So, more accurately, if it runs on 11.2 on Linux x64 with ASM and RAC (optional, but desirable), it runs on Exadata. The APEX Dev Team (my old friends and colleagues) did absolutely nothing to port their code to Exadata. I’ve run a number of customer benchmarks with customer’s data and queries and have yet to make a single change to their queries or structures to make them work on Exadata. The only thing I’ve done is to set indexes invisible in Data Warehouse benchmarks, but that was more for fun…more on that later.

What Problems Does Exadata Solve?

The most obvious problem is solves is the I/O bottleneck. While I’m not in the same league as many of the well known performance architects in the Oracle community, I spend a fair amount of time talking to customers about performance problems. In my estimation, the vast majority of the time the performance problems I encounter come down to I/O. Most DBAs (myself included) don’t know near enough about storage. Compounding that problem is the fact that there are often communication “issues” between storage groups and the DBAs.

Lets talk about how we get to data for second. I’ll try and use a Tom Kyte example. Lets take an example of reading a phone book and assume it has an index (paper style) in the back that tells you which names are on which pages. If you wanted to count all of the names in the phonebook, it would be much quicker to just scan through all of the names from beginning to end than it would to go to the index to find each and every name. This is an example of why full table and partition scans are often faster for data warehouse queries than index-based access paths. Flip that around and tell me the phone number for the “Muth’s”, and it would be much faster to look at the index, find out what page the “Mu”’s start on and go to that page. Think of this as an OLTP style query.

So, I/O throughput is particularly important in DSS and Data Warehousing environments because you’re scanning through whole tables and partitions. However, the vast majority of the customers I talk to are running a SAN attached via 2 or 4 Gbps (gigabit per second) Fibre Channel. That connection provides 200 or 400 MB/s (megabytes per second) respectively. If you want to run a full scan on a 200 GB table and you’re connected via 4 Gb FC, some simple math will tell you it will never run faster than 512 seconds or 8.5 minutes Now, compare that to the disk-only 25 GB/s (gigabytes per second) scan rate of an X2-2 full rack with High Performance disks and we can now scan that table in 8 seconds. Our scan rate jumps to 75 GB/s with Flash Cache, but I’ll stick to disk-only numbers for scan operations. It’s easy to see how a query in this example could run 64x faster on Exadata which supports my claim in the title of this post. The results I’ve obtained in POVs I’ve personally worked on have been incredible. On several occasions I’ve had prove to customers that the results were real, especially in the case of I/O intensive operations.

You don’t need to have a 10 TB database to have I/O problems either. The last POV I ran was on a 500 GB data set (including indexes). The database is a shared resource across many organizations and applications. As they’ve built more applications on this database, their indexing strategy hasn’t exactly kept pace. Their top wait events from statspack are mostly I/O related. Additionally, they copy this database nightly to a mirror server that they use for ad-hoc reporting, which as you might expect has even more I/O issues. It’s hard to index for ad-hoc…

What does this have to do with APEX? I’ve seen (and written) plenty of APEX apps that are front-ends to a data warehouse. If you are working with large data sets, doing a lot of aggregation, or allowing a lot of ad-hoc queries, traditional index strategies are often difficult to develop and of marginal value. In many cases like these, the best strategy for performance is to improve the I/O of your system.

So We Don’t Need Indexes Anymore?

I’ve heard this concept from far too many people not to address it. For Data Warehouse / DSS style queries, it is often the case with Exadata that the optimizer will choose a full table or partition scan access path, and consequently not use indexes. In several Data Warehouse POVs I’ve simply set all indexes invisible without negatively impacting performance since the optimizer wasn’t using them anyway. You still need indexes for primary keys, uniqueness, etc, but I think it’s fair to say that implementing a Data Warehouse on Exadata will allow you to eliminate many (dare I say “most) of your indexes. This allows you to reclaim a lot of disk space and certainly makes data loads faster.

However, this concept has nothing to do with OLTP style queries! If you want to retrieve a row based on it’s primary key, which do you think is faster 1) Traversing a few blocks in a b-tree index to find the rowid, then plucking the single-block that contains your row from disk (or RAM) or 2) scanning the whole multi-gigabyte table? We still need / want indexes for OLTP. Dropping all of the indexes on a busy Exadata-hosted OLTP application would be disastrous. This also applies to the APEX builder itself. Those tables are heavily indexed as other than some of the administrative reports, all of the queries are going after a few specific rows to render your page.

But I Thought Storage Indexes Replaced Traditional Indexes

Uh, no. Storage indexes are often mis-understood. They are unique to Exadata and implemented automatically by the storage cells. I actually like to think of them as anti-indexes. They store the high and low values of number and date columns for a storage unit (~1 MB). As we are scanning through the table or partition, and the query has a predicate that filters on one of these columns, we know we can skip all of the blocks in the storage unit if the value we are looking for falls outside of those high and low values. The key word there is “scanning”. They are used in data warehouse style queries, but not OLTP style queries. Clearly, they are no replacement for traditional indexes in OLTP applications.

Many of My APEX Apps are OLTP, Will They Run Faster on Exadata?

Potentially. Even though those apps shouldn’t be doing a lot of table scans, they will often still incur some physical I/O in the form of single-block reads. If the size of your “hot data and indexes” for all of your applications is larger than the RAM you have allocated to your SGA, you’re going to incur physical I/O. The time difference between reading a block from the buffer cache to reading it from disk dramatic. Buffer cache reads should be in the micro-second range whereas reads from disk can take 10’s of milliseconds. Your single block read times essentially fall off a cliff when you go to disk.

Exadata added Flash Cache in the second (V2) and third (X2-2, X2-8) versions. It’s essentially solid state storage (as cards, not disks) in the storage cells that is configured as a write-through (read) cache. So, if you ask for a block and we don’t find it in the SGA, we go to the storage cells. If it’s in Flash, we can return it from there. If not, we grab the block off disk, then optionally copy it to Flash so the next time we need to read that block from “disk” we can return it from Flash instead. The single-block read times from flash are often under a millisecond. It provides a nice step down in access time between RAM and disk. A full rack has 5.3 TB of Flash Cache, so we can cache A LOT of blocks there. I’ve skipped a lot of details here, but I can tell you from the results of several OLTP POVs that I’ve worked on that it plays a very significant role in OLTP performance. There’s a nice white paper on Flash Cache here (PDF).

APEX is Only for Departmental Applications, Why Would I Run it on Exadata?

Anybody connected to the APEX community know that this has a lot more to do with product positioning than it does actual capabilities. There are plenty of enterprise-class APEX apps there. One great example is the new Oracle Store. APEX is a great front-end to database centric applications. Exadata is a great platform for databases. Enough said.

Posted in APEX, Exadata | Tagged: , , | 5 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 87 other followers