Tyler Muth’s Blog

Technology with a focus on Oracle, Application Express and Linux

Archive for the ‘Exadata’ Category

Scripted Collection of OS Watcher Files on Exadata

Posted by Tyler Muth on November 2, 2012

I’ve been working a lot with graphing DB and OS metrics in R. I find it especially useful in Exadata POVs (proof of value) to gather and graph the oswatcher vmstat files for the compute nodes and iostat for the cells. For an example, take a look at this graph (PDF, 168 KB) of what happens when you have a connection pooling issue and keep creating connections throughout the test (14,930 to be precise). It’s a nice picture of what happened to the OS as it ran out of memory, started swapping, then fell over. To quote the Real World Performance Team: “When Databases swap, Databases stop”.

Anyway, back to the point of this post. OS Watcher files are automatically collected on each db node and storage cell. It’s not too bad to get them manually on a 1/4 rack as that’s only 5 components, but what about a full X3-2 with 22 components? Or the previous POV I did with 3 X2-8’s which had 42 storage cells and 6 DB nodes? Note that support will often ask for these files when you have an issue and log an SR.

Here’s a script I wrote to collect the files with various find options in the comments at the top. It also optionally takes in 1 parameter of a name to include in the tar.gz files so you can identify them easier. It will create 1 tar.gz file for each component, so in the case of a 1/4 rack it will create 5 files.

#/bin/bash
file_name=""
if [ $# -gt 0 ]
then
	file_name="_$1"
fi

# A few find examples
# \( -name "*.*" \) \
# \( -name "*vmstat*.*" -o -name "*iostat*.*" \) \
# \( -name "*vmstat*.*" -o -name "*iostat*.*" \) -mmin -60 \
# \( -name "*vmstat*.*" -o -name "*iostat*.*" -o -name "*netstat*.*" \) -mtime -8 \

while read line; do
	(ssh -n -q root@$line 'find /opt/oracle.oswatcher/osw/archive  \
	\( -name "*vmstat*.*" -o -name "*iostat*.*" \) \
	-prune -print0 2>/dev/null | \
	xargs -0 tar --no-recursion -czf - 2>/dev/null ' | cat >  osw${file_name}_${line}.tar.gz
)
done < /home/oracle/tyler/osw/allnodes

Note the file “allnodes” on the last line is just a file with the names of the db nodes and cells, each on a new line. It’s the same format you use for dcli. This is of course a lot easier if you’ve setup root equivalencies for each component…

Advertisements

Posted in Exadata, Oracle | 10 Comments »

Speaking at Enkitec Extreme Exadata Expo

Posted by Tyler Muth on August 9, 2012

I’ll be speaking at the Enkitec Extreme Exadata Expo (E4), August 13-14 in Dallas Texas (you can also attend virtually). They’ve recruited some of the top names from community including keynote speaker Andrew Mendelsohn, Arup Nanda, Cary Millsap, Jonathan Lewis, Karen Morton, Maria Colgan, Kerry Osborne and Tanel Põder. I left a lot of names off the list, many of which you probably know so take a look at the full list of speakers here. Having implemented over 80 Exadata environments, the Enkitec speakers will clearly have a lot to offer as well.

Many of the speakers are not employed by Oracle so I expect the tone and perspective to be significantly different than an Oracle hosted event.

I realize I never published this draft until this morning and this is probably pretty late notice if you want to attend in person. However, you can attend virtually and the price is exceptionally low for that option…

Posted in Exadata, Oracle | 3 Comments »

A Little Hard Drive History and the Big Data Problem

Posted by Tyler Muth on November 2, 2011

Most of the “Big Data Problems” I see are related to performance. Most often these are long running reports or batch processes, but also include data loads. I see a lot of these as one of my primary roles is in leading Exadata “Proof of Value” (POV) engagements with customers. It’s my observation that over 90% of the time, the I/O throughput of the existing system is grossly under-sized. To be fair, the drives themselves in a SAN or NAS are not the bottleneck, it’s the 2x 2 Gbps or 2x 4 Gbps Fibre channel connection to the SAN more often than not. However, the metrics for hard drive performance and capacity over the last 30 or so years were a lot easier to find so that’s what this post is about.

In short, our capacity to store data has far outpaced our ability to process that data. Here’s an example using “personal” class drives to illustrate the point. In the year 2000 I could by a Maxtor UDMA 20 GB hard drive connected locally via (P) ATA which provided roughly 11 MB/s of throughput (the drive might be limited to a bit less, but not much). Today, I can buy a 3 TB Seagate Barracuda XT ST32000641AS that can sustain 123 MB/s of throughput. So, lets say I need to scan (search) my whole drive. 21 years ago that would take 31 minutes. Today it would take 7 hours! What?!?! The time it takes to search the data I can store has increased by a factor of 14x. How can this be? What about Moore’s Law? Sorry, that only works for processors. Oh right, I meant Kryder’s Law. Kryder’s Law states that magnetic disk areal storage density doubles annually and has nothing to do with performance. Well, actually this phenomenon is half the problem since throughput isn’t doubling annually.

I did a little research and found some relevant data on seagate.com, tomshardware.com, and of course The Wayback Machine. Now on to the data and some graphs (click to enlarge graphs):

Seagate ST506

Rodime R032

Seagate ST3550A

Quantum Fireball ST3 2A

IBM DTTA-351010

Seagate Cheetah X15

Seagate Cheetah X15.3

Seagate Cheetah 15k.6

Seagate Cheetah 15k.7

Seagate Pulsar XT.2 SSD

Year 1979 1983 1993 1998 1999 2001 2003 2008 2011 2011
Capacity (MB) 5 10 452 3276 10240 18432 36864 460800 614400 409600
Capacity (GB) 0.005 0.010 0.441 3.2 10 18 36 450 600 400
Throughput (MB/s) 5 0.6 11.1 7.6 9.5 29 63.6 164 204 360
Capacity Factor of Change 1x 2x 90x 655x 2,048x 3,686x 7,373x 92,160x 122,880x 81,920x
Throughput Factor of Change 1x 0.1x 2x 2x 2x 6x 13x 33x 41x 72x
Capacity Percent Change 0% 100% 8,940% 65,436% 204,700% 368,540% 737,180% 9,215,900% 12,287,900% 8,191,900%
Throughput Percent Change 0% -88% 122% 52% 90% 480% 1,172% 3,180% 3,980% 7,100%
Time in Seconds to Read Full Drive 1 16 41 431 1078 636 580 2810 3012 1138

Time-in-Seconds-to-Read-Ful

Capacity-vs-Throughput---Lo

So, what does all of this mean? Well, to me it means if you’re architecting a data warehouse or even small data mart, make sure you focus on the storage. Over and over again I get drawn into discussions about the minutiae of chip speeds or whether Linux is faster than Solaris, yet when I ask about the storage the answer is almost universally “we’ll throw it on the SAN”. OK, how many HBAs and what speed? Is the SAN over-utilized already? etc, etc, etc.

So, how does this relate to Exadata? The central focus of Exadata is I/O throughput. By including processing power in the storage servers, Exadata has a fixed relationship between capacity and throughput. As you add capacity, you add throughput.

Posted in Exadata, Oracle, Uncategorized | 11 Comments »

My Brief Review of “Expert Oracle Exadata”

Posted by Tyler Muth on August 12, 2011

I purchased the Alpha version of Expert Oracle Exadata (amazon link) several months ago, written by Kerry Osborne, Tanel Poder and Randy Johnson. The print copy is still in pre-order (shipping very soon), but you can order the ebook from apress here. My primary role these days is talking about and working with Exadata in customer presentations, classes, and POVs. While I’m not in the same league as the real Exadata experts such as the authors, I’ve worked with Exadata enough over the past year or so to have a good sense about what you need to know.

Expert Oracle Exadata (Image from apress.com)

This book is simply outstanding. The concepts are clear and concise, and ordered in a nice logical progression.  The language has a very approachable, conversational tone. Most importantly, the book is filled with examples that serve to either reinforce or prove the concept of that section. In case you are concerned that there is a lot of marketing “fluff”, forget it. None of the authors work for Oracle and are completely free to speak their minds (so are Oracle employees, just emphasizing the point here). Their primary motivation is to educate and they clearly highlight some of the “issues” you can encounter in the real world with Exadata (yes, they exist).

Additionally, the comments from Kevin Closson that appear throughout the production release of the book as “Kevin Says” are worth the cost of the book alone. Kevin was one of the architects of Exadata and there are very few people (if any) that know more about Exadata from the high level concepts down to the code itself. He also tends to tell it like it is, keeping the facts paramount, and all other considerations secondary. On a related note, Kevin was a great mentor to me over the last year or so when I got thrown into the Exadata world. I had no clue about hardware (“what’s an HBA?”), nor much experience with data warehousing. I can’t thank him enough for his time and  efforts.

In my opinion, this is by far the best source of information out there on Exadata. Actually, I usually reccomend that people watch Kevin’s webcast entitled “Oracle Exadata Storage Server Technical Deep Dive. Part I” (I also made an iPhone friendly version of it here with chapter markers) first, as it gives you a 1 hour intro to the concepts. After watching that, buy (and read) the book.

To the authors, thank you for thanking me! They added a nice thank you section to the “Unofficial Editors” in which I was mentioned by name. This was a very nice touch and I sincerely appreciate it. They helped me a whole lot more than I helped them.

Posted in Exadata, Oracle | 5 Comments »

AWR Formatter

Posted by Tyler Muth on April 20, 2011

I’ve found myself looking at a lot of AWR Reports lately and doing the same calculations on them over and over. Converting gets / reads / bytes to KB / MB / GB / TB, converting seconds to hours minutes and seconds, etc. So, I wrote a few lines (~25) of JavaScript (using jQuery) to do some of this math for me. Like many projects, it quickly grew and grew. So, 2,000+ lines of JavaScript later, I present to you a google chrome plugin I’ve creatively entitled “AWR Formatter”.

It will not tune your database nor end world hunger, it just makes AWRs a lot easier to read. I did attempt to make some “observations”, but I certainly want to be careful with these. Currently it works with 10.2+ (maybe 10.1) single-node reports. It does not work on RAC / Global reports or on compare period reports but I’m going to start on those soon.

Thanks to Tom Kyte, Graham Wood, Robert Freeman and Tom Roach for your contributions.

Installation

Features (screencast at the end)

  • “Smart Text” conversion of gets / reads / bytes to KB / MB / GB / TB. You can click the orange text to cycle through these units
  • Medium size tables are sortable
  • Large tables are searchable
  • All documented wait events are “hot” and local definitions are included
  • Nicely formatted SQL Full text popup
  • Combined view of key Top SQL sections
  • “Observations” including:
    • Time period of AWR > 60 minutes
    • Hidden / undocumented parameter use
    • Highlight memory recommendations from AWR
  • Link to MOS note for “PX” events
  • Link to documentation for  “gc” events
  • I/O Graphs of tablespace activity
  • Updates to the extension are delivered automatically as I release new versions
  • I am not collecting any data or sending any data to myself or anywhere else. You can review the code anytime via the instructions in the last paragraph
You can watch a screencast of it in action here. If you need a “demo” AWR Report, you can download this report from an Exadata 1/4 rack. As always, feedback or enhancement requests are welcome. If you want to view the source, just change the file extension from .crx to .zip and unzip it.

Posted in Exadata, Oracle | 96 Comments »

APEX is 57.4x Faster on Exadata

Posted by Tyler Muth on April 12, 2011

…is just about as useful (and accurate) as saying APEX is just as fast on my laptop as it is on Exadata. However, it makes for a great title. Now that I hopefully have your attention, lets talk about the reasons for this post. As many of you know, APEX used to be my primary specialty but I’ve changed roles a bit over the last year at Oracle and my primary focus is now Exadata. I’ve received an increasing number of questions about APEX on Exadata lately so I thought I’d put together some thoughts on them. Additionally, I’ve had a lot of Exadata questions recently that lead me to the conclusion that it’s not as well understood as I thought, at least within the groups of people I regularly interact with. Contrary to popular belief, it’s not a mythological beast powered by ferry dust and unicorn blood. It’s a very well engineered solution that solves a number of exceptionally challenging problems. In my opinion, the more you understand about it, the more you’ll appreciate it.

Does APEX Work on Exadata?

Yep, Exadata is just Oracle. Let me say that again: It’s just Oracle. Exadata runs an 11.2 Database on Linux x64. It’s the exact same binary install if you download those binaries for “generic” Linux x64 11.2 from OTN. So, if your code / app runs on 11.2, it runs on Exadata. There are a few exceptions to this rule for apps that don’t play well with ASM. So, more accurately, if it runs on 11.2 on Linux x64 with ASM and RAC (optional, but desirable), it runs on Exadata. The APEX Dev Team (my old friends and colleagues) did absolutely nothing to port their code to Exadata. I’ve run a number of customer benchmarks with customer’s data and queries and have yet to make a single change to their queries or structures to make them work on Exadata. The only thing I’ve done is to set indexes invisible in Data Warehouse benchmarks, but that was more for fun…more on that later.

What Problems Does Exadata Solve?

The most obvious problem is solves is the I/O bottleneck. While I’m not in the same league as many of the well known performance architects in the Oracle community, I spend a fair amount of time talking to customers about performance problems. In my estimation, the vast majority of the time the performance problems I encounter come down to I/O. Most DBAs (myself included) don’t know near enough about storage. Compounding that problem is the fact that there are often communication “issues” between storage groups and the DBAs.

Lets talk about how we get to data for second. I’ll try and use a Tom Kyte example. Lets take an example of reading a phone book and assume it has an index (paper style) in the back that tells you which names are on which pages. If you wanted to count all of the names in the phonebook, it would be much quicker to just scan through all of the names from beginning to end than it would to go to the index to find each and every name. This is an example of why full table and partition scans are often faster for data warehouse queries than index-based access paths. Flip that around and tell me the phone number for the “Muth’s”, and it would be much faster to look at the index, find out what page the “Mu”’s start on and go to that page. Think of this as an OLTP style query.

So, I/O throughput is particularly important in DSS and Data Warehousing environments because you’re scanning through whole tables and partitions. However, the vast majority of the customers I talk to are running a SAN attached via 2 or 4 Gbps (gigabit per second) Fibre Channel. That connection provides 200 or 400 MB/s (megabytes per second) respectively. If you want to run a full scan on a 200 GB table and you’re connected via 4 Gb FC, some simple math will tell you it will never run faster than 512 seconds or 8.5 minutes Now, compare that to the disk-only 25 GB/s (gigabytes per second) scan rate of an X2-2 full rack with High Performance disks and we can now scan that table in 8 seconds. Our scan rate jumps to 75 GB/s with Flash Cache, but I’ll stick to disk-only numbers for scan operations. It’s easy to see how a query in this example could run 64x faster on Exadata which supports my claim in the title of this post. The results I’ve obtained in POVs I’ve personally worked on have been incredible. On several occasions I’ve had prove to customers that the results were real, especially in the case of I/O intensive operations.

You don’t need to have a 10 TB database to have I/O problems either. The last POV I ran was on a 500 GB data set (including indexes). The database is a shared resource across many organizations and applications. As they’ve built more applications on this database, their indexing strategy hasn’t exactly kept pace. Their top wait events from statspack are mostly I/O related. Additionally, they copy this database nightly to a mirror server that they use for ad-hoc reporting, which as you might expect has even more I/O issues. It’s hard to index for ad-hoc…

What does this have to do with APEX? I’ve seen (and written) plenty of APEX apps that are front-ends to a data warehouse. If you are working with large data sets, doing a lot of aggregation, or allowing a lot of ad-hoc queries, traditional index strategies are often difficult to develop and of marginal value. In many cases like these, the best strategy for performance is to improve the I/O of your system.

So We Don’t Need Indexes Anymore?

I’ve heard this concept from far too many people not to address it. For Data Warehouse / DSS style queries, it is often the case with Exadata that the optimizer will choose a full table or partition scan access path, and consequently not use indexes. In several Data Warehouse POVs I’ve simply set all indexes invisible without negatively impacting performance since the optimizer wasn’t using them anyway. You still need indexes for primary keys, uniqueness, etc, but I think it’s fair to say that implementing a Data Warehouse on Exadata will allow you to eliminate many (dare I say “most) of your indexes. This allows you to reclaim a lot of disk space and certainly makes data loads faster.

However, this concept has nothing to do with OLTP style queries! If you want to retrieve a row based on it’s primary key, which do you think is faster 1) Traversing a few blocks in a b-tree index to find the rowid, then plucking the single-block that contains your row from disk (or RAM) or 2) scanning the whole multi-gigabyte table? We still need / want indexes for OLTP. Dropping all of the indexes on a busy Exadata-hosted OLTP application would be disastrous. This also applies to the APEX builder itself. Those tables are heavily indexed as other than some of the administrative reports, all of the queries are going after a few specific rows to render your page.

But I Thought Storage Indexes Replaced Traditional Indexes

Uh, no. Storage indexes are often mis-understood. They are unique to Exadata and implemented automatically by the storage cells. I actually like to think of them as anti-indexes. They store the high and low values of number and date columns for a storage unit (~1 MB). As we are scanning through the table or partition, and the query has a predicate that filters on one of these columns, we know we can skip all of the blocks in the storage unit if the value we are looking for falls outside of those high and low values. The key word there is “scanning”. They are used in data warehouse style queries, but not OLTP style queries. Clearly, they are no replacement for traditional indexes in OLTP applications.

Many of My APEX Apps are OLTP, Will They Run Faster on Exadata?

Potentially. Even though those apps shouldn’t be doing a lot of table scans, they will often still incur some physical I/O in the form of single-block reads. If the size of your “hot data and indexes” for all of your applications is larger than the RAM you have allocated to your SGA, you’re going to incur physical I/O. The time difference between reading a block from the buffer cache to reading it from disk dramatic. Buffer cache reads should be in the micro-second range whereas reads from disk can take 10’s of milliseconds. Your single block read times essentially fall off a cliff when you go to disk.

Exadata added Flash Cache in the second (V2) and third (X2-2, X2-8) versions. It’s essentially solid state storage (as cards, not disks) in the storage cells that is configured as a write-through (read) cache. So, if you ask for a block and we don’t find it in the SGA, we go to the storage cells. If it’s in Flash, we can return it from there. If not, we grab the block off disk, then optionally copy it to Flash so the next time we need to read that block from “disk” we can return it from Flash instead. The single-block read times from flash are often under a millisecond. It provides a nice step down in access time between RAM and disk. A full rack has 5.3 TB of Flash Cache, so we can cache A LOT of blocks there. I’ve skipped a lot of details here, but I can tell you from the results of several OLTP POVs that I’ve worked on that it plays a very significant role in OLTP performance. There’s a nice white paper on Flash Cache here (PDF).

APEX is Only for Departmental Applications, Why Would I Run it on Exadata?

Anybody connected to the APEX community know that this has a lot more to do with product positioning than it does actual capabilities. There are plenty of enterprise-class APEX apps there. One great example is the new Oracle Store. APEX is a great front-end to database centric applications. Exadata is a great platform for databases. Enough said.

Posted in APEX, Exadata | Tagged: , , | 5 Comments »

Exadata Smart Scan – Just The Columns I Want

Posted by Tyler Muth on June 23, 2010

In my last post I talked about the Smart Scan component of cell offload processing.  I showed how much network traffic this can save between storage and the Database. That savings comes from the fact that the storage filters the results down just the rows and columns you asked for.  This is just a quick post to demonstrate the difference in bytes shipped across the wire between selecting one column and selecting all columns.

-- query v$mystat

select *
 from all_objs
 where object_type = 'TABLE'
 and object_name like '%RESOURCE%';

-- query v$mystat

select object_name
 from all_objs
 where object_type = 'TABLE'
 and object_name like '%RESOURCE%';

-- query v$mystat

 ALL_COLUMNS_MB    ONE_COLUMN_MB      FACTOR_OF_IMPROVEMENT
------------------ ------------------ ---------------------
 5.4               1.8                3.0

Posted in Exadata, Oracle, Uncategorized | 2 Comments »

Exadata Cell Offload Processing

Posted by Tyler Muth on June 18, 2010

In my previous post I talked about Exadata Intelligent Storage.  One of my favorite aspects of Intelligent Storage is the concept of Cell Offload Processing. The term Offload Processing encompasses things that also have nothing to do with a query. Offload Processing is comprised of Smart Scan, fast file creation, RMAN block scanning, etc. This post is specifically about Smart Scan” – Kevin Closson. The idea that the storage cells can understand and process most of the common query predicates and only send back the data we actually need is just brilliant.  It also can have a huge impact on performance when working with large data sets.  What predicates do the cells understand?

=, !=, <, >, <=, >=, IS[NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OFtype, NOT, AND, OR

The following is the explain plan output of the same query, first without cell offload processing (traditional), then with cell offload processing:

| Id  | Operation           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |          |   816 | 33456 |  5559   (1)| 00:01:07 |
|   1 |  SORT ORDER BY      |          |   816 | 33456 |  5559   (1)| 00:01:07 |
|   2 |   HASH GROUP BY     |          |   816 | 33456 |  5559   (1)| 00:01:07 |
|*  3 |    TABLE ACCESS FULL| ALL_OBJS |   816 | 33456 |  5557   (1)| 00:01:07 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
 3 - filter(("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE' OR
 "OBJECT_TYPE"='VIEW') AND "OBJECT_NAME" LIKE 'DBMS_RESOURCE%')

Now with cell offload processing:

| Id  | Operation                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |   816 | 33456 |  5559   (1)| 00:01:07 |
|   1 |  SORT ORDER BY              |          |   816 | 33456 |  5559   (1)| 00:01:07 |
|   2 |   HASH GROUP BY             |          |   816 | 33456 |  5559   (1)| 00:01:07 |
|*  3 |    TABLE ACCESS STORAGE FULL| ALL_OBJS |   816 | 33456 |  5557   (1)| 00:01:07 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
 3 - storage(("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE' OR
 "OBJECT_TYPE"='VIEW') AND "OBJECT_NAME" LIKE 'DBMS_RESOURCE%')
 filter(("OBJECT_TYPE"='INDEX' OR "OBJECT_TYPE"='TABLE' OR
 "OBJECT_TYPE"='VIEW') AND "OBJECT_NAME" LIKE 'DBMS_RESOURCE%')

Notice the difference on line 6 as well as the predicate details.
Now lets take a look at what impact that has on the amount of data we send from storage to the database. To illustrate this, I’ll run the same query with and without cell offload processing enabled and query v$mystat for “cell physical IO interconnect bytes” before and after each query.  The following output is just a summary to make it more clear, the full script is at the end of this post.  Also, I’d like to thank Kevin Closson for steering me in the right direction of v$mystat as I was headed down a Wireshark packet sniffing rat hole for this demo that was leading nowhere. Also keep in mind that this is on a simulator, not a real Exadata box so the timing is not accurate, but the relative times are still interesting.

-- query v$mystat
select /*+PARALLEL(all_objs, 4)*/ count(*)
 from all_objs
 where object_type = 'TABLE'
 and object_name like '%RESOURCE%';
-- Elapsed: 00:00:00.96

-- query v$mystat

alter session set "cell_offload_processing" = FALSE;
select /*+PARALLEL(all_objs, 4)*/ count(*)
 from all_objs
 where object_type = 'TABLE'
 and object_name like '%RESOURCE%';
-- Elapsed: 00:00:10.00
 -- query v$mystat

SMART_SCAN_MB NON_SMART_SCAN_MB
------------- -----------------
 4.3427887         160.71875

That means that in this case there was 37x more network traffic with traditional storage than with Exadata Cell Offload Processing.  Also keep in mind the 40 Gb Infiniband network offers 10x the throughput of traditional Fibre Channel.  Forget Flash Cache, Compression, and Storage Indexes for a minute.  The performance gains alone of 37x less data over a 10x bigger pipe are pretty remarkable.

The following is the full demo script.  You’ll notice some odd alter sessions and the parallel hint in the query.  These are to get cell offload processing to kick in on a relatively small table (1.4 million rows)The prime ingredient in Smart Scan is access method full and PGA buffering. So you forced the plan and set serial_direct_reads. Without this you’d likely get a different access method buffered in the SGA and thus no Smart Scan” – Kevin Closson.  There’s also some sqlplus magic to store the values that I had to dig up from my days working on the APEX installer.

ALTER TABLE all_objs STORAGE( CELL_FLASH_CACHE none);

alter session set "_serial_direct_read"=true;

column NON_SMART_SCAN_KB format 999,999,999,999.0
column SMART_SCAN_KB format 999,999,999,999.0
set define '^'
set verify off
column cell_initial_bytes new_val INITIAL_BYTES
column cell_offload_bytes new_val OFFLOAD_BYTES
column cell_non_offload_bytes new_val NON_OFFLOAD_BYTES

set linesize 200
set wrap off
column name format a50

alter session set "_small_table_threshold"=1;

ALTER SYSTEM FLUSH BUFFER_CACHE;
alter session set "cell_offload_processing" = TRUE;

select s.name, m.value cell_initial_bytes
 from v$mystat m, v$statname s
 where s.statistic# = m.statistic#
 and name = 'cell physical IO interconnect bytes';

set timing on
-- set termout off

select /*+PARALLEL(all_objs, 4)*/ count(*)
 from all_objs
 where object_type = 'TABLE'
 and object_name like '%RESOURCE%';

set termout on
set timing off
select s.name, m.value cell_offload_bytes
 from v$mystat m, v$statname s
 where s.statistic# = m.statistic#
 and name = 'cell physical IO interconnect bytes';

alter session set "cell_offload_processing" = FALSE;
ALTER SYSTEM FLUSH BUFFER_CACHE;
-- set termout off

set timing on
select /*+PARALLEL(all_objs, 4)*/ count(*)
 from all_objs
 where object_type = 'TABLE'
 and object_name like '%RESOURCE%';

 set timing off
set termout on

select s.name, m.value cell_non_offload_bytes
 from v$mystat m, v$statname s
 where s.statistic# = m.statistic#
 and name = 'cell physical IO interconnect bytes';

SELECT (^OFFLOAD_BYTES - ^INITIAL_BYTES)/1024/1024 smart_scan_mb, (^NON_OFFLOAD_BYTES - ^OFFLOAD_BYTES)/1024/1024 non_smart_scan_mb FROM DUAL;
n on a relatively small table (1.4 million rows)

Posted in Exadata, Oracle | 8 Comments »

Exadata Intelligent Storage

Posted by Tyler Muth on June 15, 2010

One of the key differentiators of Exadata is the concept that processing occurs in the storage. For example, if you issue a query such as:

select first_name,last_name

from employees

where department = ‘Accounting’

sort by last_name

The storage cell will actually process the “where” predicate.  This means the database doesn’t have to do this work, but more importantly it means we don’t have to ship the entire employees table over the network from the storage to the database like we do with traditional storage.  The storage cell will filter the rows AND columns down to just what we asked for then pass those results back to the database.  The database will then further process the results, such as sorts, aggregates, etc.

There are a lot more details to the concept of Intelligent Storage that I’ll cover in subsequent posts, but I wanted to start with it as a broad concept. I was “dancing” around the concept in a presentation I was working on and I’d like to thank Tom Kyte for pointing out the importance of this concept.

What does Intelligent Storage mean for performance?  It not only gives us dramatically better baseline performance, but it also means that as our capacity to store data grows, our ability to process that data grows.  In contrast, with a traditional SAN or NAS solution, adding more storage only increases capacity, it does not increase performance.  Sure, you have to add enough spindles to spread the load, but at a certain point adding more spindles will do nothing to improve your performance.  The bottleneck will simply shift from the storage to the network.

Lets use an example of scanning a 10 TB table, such as our previous example based on the employees table.  When we query this table in a traditional storage model, we’ll have to ship all 10 TB across the wire. Yes, the database will likely cache portions of this table in the buffer cache, but you simply don’t have enough RAM to cache 10 TB.  Now when we run the same query against Exadata, we only need to ship the rows and columns we asked for across the network.  Now imagine that table grows to 100 TB and we need to add storage to handle this growth.  In a traditional model, as our table grows our query will get continuously slower. With Exadata, adding storage also adds processing power which allows us to maintain consistent performance even as our table continues to grow.

combined

Posted in Exadata, Oracle | 3 Comments »

Physical I/O Saved With Exadata Hybrid Columnar Compression

Posted by Tyler Muth on June 8, 2010

In my previous post I highlighted the space savings of Exadata HCC. This post is a simple example of the physical I/O savings of Exadata HCC:

alter system flush buffer_cache;
alter system flush shared_pool;
select /* not_compressed */ count(*) from all_objs where object_type = 'TABLE';
select /* compressed     */ count(*) from all_objs_compressed_query where object_type = 'TABLE';

select substr(sql_text,1,27)||'...' sql_text,disk_reads,
	   (physical_read_bytes /1024 / 1024) physical_io_mb 
  from v$sql 
 where sql_text like '%compressed%' 
   and sql_text not like '%v$sql%';

SQL_TEXT                       DISK_READS  PHYSICAL_IO_MB
------------------------------ ---------- ---------------
select /* compressed     */...        124             .97
select /* not_compressed */...      20520          160.31

Obviously your results will vary based on the nature of the data, but reducing physical I/O from 160 MB to 1 MB for the same query could have a dramatic impact on performance.

Posted in Exadata, Oracle | 2 Comments »