Tyler Muth’s Blog

Technology with a focus on Oracle, Application Express and Linux

Archive for the ‘Oracle’ Category

Add Code w/ Syntax Highlighting to PowerPoint

Posted by Tyler Muth on February 6, 2013

…or any media that supports rich text or HTML. I often include code-snippets or queries in technical presentations. I find them much more visually appealing and easier to read if they include syntax highlighting. If you’re using Notepad++ on Windows, it’s trivial. Simply go to the “Plugins” menu > “NppExport” > “Copy all formats to clipboard”. Check out the screenshot below as well as an example slide from my talk last year at E4. On a related note, ZoomIt is a nice tool for presentations involving code or other things that are hard to read so you can zoom in on them.

Notepad++ Syntax Highlighting Export

 

Notepad_pp_Syntax_Highlight_Slide

Posted in Oracle | 8 Comments »

“Refresh” a CSV File

Posted by Tyler Muth on February 5, 2013

In a previous post I discussed a PDF reader that allows you to refresh the PDF after changes are made. This is the same concept except it’s for CSV files. I’ve been using Perl to convert sqlplus spool output to CSV which is much easier to read into R. As I’m updating my code and regenerating sample CSV files, it’s really annoying that Excel gets an exclusive lock on the file. So, I would open it in Notepad++ or vim, but it’s much harder in those tools to pick out the columns and rows to find errors.

CSVed to the rescue!!! Here’s the website for CSVed. It’s a great, simple CSV viewer / editor and it does not lock the file. You simply press the “reload” button or ctrl+r and it reloads the file. Again, so simple but so useful for my use-case.

Posted in Oracle, R | Tagged: , , , | 1 Comment »

Scripted Collection of OS Watcher Files on Exadata

Posted by Tyler Muth on November 2, 2012

I’ve been working a lot with graphing DB and OS metrics in R. I find it especially useful in Exadata POVs (proof of value) to gather and graph the oswatcher vmstat files for the compute nodes and iostat for the cells. For an example, take a look at this graph (PDF, 168 KB) of what happens when you have a connection pooling issue and keep creating connections throughout the test (14,930 to be precise). It’s a nice picture of what happened to the OS as it ran out of memory, started swapping, then fell over. To quote the Real World Performance Team: “When Databases swap, Databases stop”.

Anyway, back to the point of this post. OS Watcher files are automatically collected on each db node and storage cell. It’s not too bad to get them manually on a 1/4 rack as that’s only 5 components, but what about a full X3-2 with 22 components? Or the previous POV I did with 3 X2-8’s which had 42 storage cells and 6 DB nodes? Note that support will often ask for these files when you have an issue and log an SR.

Here’s a script I wrote to collect the files with various find options in the comments at the top. It also optionally takes in 1 parameter of a name to include in the tar.gz files so you can identify them easier. It will create 1 tar.gz file for each component, so in the case of a 1/4 rack it will create 5 files.

#/bin/bash
file_name=""
if [ $# -gt 0 ]
then
	file_name="_$1"
fi

# A few find examples
# \( -name "*.*" \) \
# \( -name "*vmstat*.*" -o -name "*iostat*.*" \) \
# \( -name "*vmstat*.*" -o -name "*iostat*.*" \) -mmin -60 \
# \( -name "*vmstat*.*" -o -name "*iostat*.*" -o -name "*netstat*.*" \) -mtime -8 \

while read line; do
	(ssh -n -q root@$line 'find /opt/oracle.oswatcher/osw/archive  \
	\( -name "*vmstat*.*" -o -name "*iostat*.*" \) \
	-prune -print0 2>/dev/null | \
	xargs -0 tar --no-recursion -czf - 2>/dev/null ' | cat >  osw${file_name}_${line}.tar.gz
)
done < /home/oracle/tyler/osw/allnodes

Note the file “allnodes” on the last line is just a file with the names of the db nodes and cells, each on a new line. It’s the same format you use for dcli. This is of course a lot easier if you’ve setup root equivalencies for each component…

Posted in Exadata, Oracle | 10 Comments »

Speaking at Enkitec Extreme Exadata Expo

Posted by Tyler Muth on August 9, 2012

I’ll be speaking at the Enkitec Extreme Exadata Expo (E4), August 13-14 in Dallas Texas (you can also attend virtually). They’ve recruited some of the top names from community including keynote speaker Andrew Mendelsohn, Arup Nanda, Cary Millsap, Jonathan Lewis, Karen Morton, Maria Colgan, Kerry Osborne and Tanel Põder. I left a lot of names off the list, many of which you probably know so take a look at the full list of speakers here. Having implemented over 80 Exadata environments, the Enkitec speakers will clearly have a lot to offer as well.

Many of the speakers are not employed by Oracle so I expect the tone and perspective to be significantly different than an Oracle hosted event.

I realize I never published this draft until this morning and this is probably pretty late notice if you want to attend in person. However, you can attend virtually and the price is exceptionally low for that option…

Posted in Exadata, Oracle | 3 Comments »

Convert sqlplus spool output to CSV

Posted by Tyler Muth on April 11, 2012

I’ve been doing some work lately with sizing / capacity planning where the source are a few of Karl Arao’s excellent AWR Mining scripts. Had I been reading his blog more I would have known he was going down a parallel path using Tableau as I was moving in the direction of R.  Anyway, I wanted to get the output of these sqlplus scripts which spool to a file into CSV format to import into R as data frames. Yes, R can deal with fixed-width data, but not nearly as well as CSV. If it were just a few files, wrangling the data into CSV isn’t that hard. However, I had ~40 files and I plan on doing this a lot more so it had to be easy and fast. I looked at a couple of existing solutions such as the sqlplus colsep directive combined with sed or XMLAGG. But each had it’s drawbacks.

For me, perl seemed like the best fit as it’s easy, cross platform, and made for text files. I only used one module and it’s one that’s installed by default on most platforms. Anyway, what started as about 10 lines grew to about a 100 lines as I had to handle edge cases. I could be a lot less, but it’s intentionally verbose as I’ll trade compactness for readability every time. By default it automatically uses the column headings as the headings in the CSV output.

OK, enough rambling. Here’s the script on dropbox. I’ll post the whole script at the end of this post for viewing and for people that can’t get to dropbox hosted files. Here are my test case files used in the following examples.

If you don’t supply parameters it returns the usage instructions:

$ perl sqlplus_spool_to_csv.pl
usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]

This will generate a csv file for every file like “a-*.out”:

$ perl sqlplus_spool_to_csv.pl --infiles "a-*.out"
a-countries.out
a-departments.out
a-employees.out
a-employees_repeat_headers.out

$ ls *.csv
a-countries.out.csv  a-departments.out.csv  a-employees.out.csv  a-employees_repeat_headers.out.csv  a-locations.out.csv

$ head -7 a-countries.out
hr@orcl> select * from countries;

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
AR Argentina                                         2
AU Australia                                         3
BE Belgium                                           1

$ head -4 a-countries.out.csv
"CO","COUNTRY_NAME","REGION_ID"
"AR","Argentina","2"
"AU","Australia","3"
"BE","Belgium","1"

This example will glue on the headers from a custom text file:

$ perl sqlplus_spool_to_csv.pl --infiles "b-*.out" --headers FILE --header_file employees_headers.txt
b-employees.out

On my laptop it only took 25 seconds to convert a 100M file to CSV (a directory of 40 smaller files took under 5 seconds). Here’s the 100M filecompressed to 17K in case you want it.:

$ ls -lh employees_big.out
-rw-r--r--+ 1 tmuth None 100M Apr 11 15:31 employees_big.out

$ wc -l employees_big.out
556200 employees_big.out

$ time perl sqlplus_spool_to_csv.pl --infiles employees_big.out
employees_big.out

real    0m25.047s
user    0m24.289s
sys     0m0.467s

OK, here’s the perl source in case you just want to take a peak or can’t download from dropbox:

#! /usr/bin/perl
# Author: Tyler D Muth - https://tylermuth.wordpress.com
# Version: 1.1

use warnings;
use strict;
use Getopt::Long;
my (@infiles, $headers, $header_file);
usage() if ( @ARGV < 1 or           ! GetOptions('infiles=s@' => \@infiles, 'headers=s' => \$headers, 'header_file=s' => \$header_file));

$headers = defined($headers) ? $headers : 'AUTO'; $headers=uc($headers);

file_loop();

sub usage
{
  print "Unknown option: @_\n" if ( @_ );
  print qq/usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]\n/;
  exit;
}

sub file_loop{
	for my $in_file ( map {glob($_)} @infiles ) {
		print $in_file, "\n";
		my $unpack_layout =''; my $header_titles=''; my $headers_mapped='';my $dashes_line='';
		my $header_line_new='';
		open ( INFILE, "$in_file.csv" ) or die "$!";
		get_unpack_layout(*INFILE,*OUTFILE,$unpack_layout,$dashes_line,$header_line_new);
		chomp $header_line_new;
		get_header_titles($header_line_new,$header_titles);

		if($headers eq 'AUTO'){
			$headers_mapped=text_line_to_csv($header_titles,$unpack_layout);
		}else{
			$headers_mapped=$header_titles;
		}
		print OUTFILE "$headers_mapped\n" if $headers ne "NONE";
		while( my $line =  ){
			chomp $line;
			last if ($line =~ /^[[:digit:]]+[[:space:]]rows[[:space:]]selected.*/);	# this will be an issue for non-english sqlplus
			if($line ne $dashes_line && $line ne $header_line_new && length($line) > 2){
				print OUTFILE text_line_to_csv($line ,$unpack_layout) . "\n";
			}
		}
		close INFILE;
		close ( OUTFILE );
	};
}

sub get_unpack_layout{
	my $infile_handle=$_[0];my $outfile_handle=$_[1];my $unpack_layout_int='';
	my $header_line='';
	while( my $line =  ){
		if ($line =~ /-{3,}.*/) { #if the line starts with 3 or more dashes (may cause problems for < 3 character leading columns
			chomp $line;
			$_[3]=$line; #return the line of dashes so we can skip them if repeated
			my @columns = split(' ', $line);
			foreach my $col (@columns) {
				my $col_length = length($col)+1;
				$unpack_layout_int .= "A$col_length ";
			}
			$_[2]=$unpack_layout_int;
			$_[4]=$header_line;
			last;
		}
		$header_line=$line;
	}
}
#----------------------------------------------------------
sub get_header_titles{
	my $header_titles_in=$_[0]; my $header_titles_return='';my $dummy_line='';
	$header_titles_return=$header_titles_in if $headers eq 'AUTO';
	if($headers eq 'FILE'){
		open ( HEADERS_FILE, "		$header_titles_return = ;
		close ( HEADERS_FILE);
	}
	chomp $header_titles_return; $_[1]=$header_titles_return;
}
#----------------------------------------------------------
sub text_line_to_csv{
	my $in_line=$_[0]; my $in_unpack_layout=$_[1];

	my @f = unpack("$in_unpack_layout", $in_line);
	my $out_text = join ',', map {
		s/^\s+//;  		 # strip leading spaces
			  s/\s+$//;  # strip trailing spaces
			  s/#+/ /g;  # replace any pound signs (#) in the line
		qq/"$_"/ } @f;

	print "$out_text \n" if $out_text =~ m/\#+/g;
	return $out_text;
}

Posted in Oracle | Tagged: , , | 14 Comments »

Create Bigfile Tablespace – Oracle Managed Files (OMF)

Posted by Tyler Muth on February 10, 2012

Working with Exadata a lot (almost exclusively) I create almost all my tablespaces as bigfile using OMF. For some reason I can never find the right syntax for this when I need it. So, that’s the main reason for this post.

CREATE BIGFILE TABLESPACE data1  NOLOGGING datafile '+DATA_DM01' SIZE 50G AUTOEXTEND ON NEXT 5G;
CREATE BIGFILE TABLESPACE dbfs_ts  NOLOGGING ONLINE PERMANENT EXTENT MANAGEMENT LOCAL AUTOALLOCATE  SEGMENT SPACE MANAGEMENT AUTO datafile '+DBFS_DM01' SIZE 500G AUTOEXTEND ON NEXT 10G;

On a related note, wow I use “column mode” in Notepad++ (link) a lot for this stuff. Many other editors support column mode, including an old favorite on Windows, UltraEdit (link).

  

Posted in Oracle | 4 Comments »

I Need More Hardware!

Posted by Tyler Muth on February 8, 2012

Actually, this is one of the few times I can remember in my career when I can honestly say I have enough horsepower at my disposal.

Want to guess what I’m running right now?

> select count(*) from v$asm_disk where name like 'DATA_XXX%';

  COUNT(*)
----------
       504

> select sum(total_mb)/1024/1024 tb from v$asm_disk;

        TB
----------
 272.15918

> select count(*) from gv$instance;

  COUNT(*)
----------
         6
[oracle@somehost: ~]$ cat /proc/meminfo | grep MemTotal
MemTotal:       1058429820 kB

[oracle@somehost: ~]$ cat /proc/cpuinfo | grep -m 1 Xeon
model name      : Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz

[oracle@somehost: ~]$ cat /proc/cpuinfo | grep -c Xeon
128

[root@somehost ~]# find /sys/class/infiniband/ -name 'rate' -exec cat {} \;
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)

This POV will probably generate a number of follow-up posts. One concept: there are always bottlenecks. When you remove the hardware bottlenecks, bottlenecks in your code quickly surface.

Posted in Oracle | 8 Comments »

Oracle Database 10.2 De-Supported

Posted by Tyler Muth on January 11, 2012

OK, that’s an alarmist title and is not really true. However, I feel this topic is important enough to warrant the title. I’ve talked to a lot of people lately that were not aware of the actual support status of Database 10.2 and below (it changed on July 2010 and the 1 year grace period ended July 2011). My goal of this post is certainly not to be critical of anyone, just to spread the word and allow people to make informed decisions.

The document “Oracle Lifetime Support Policy” (PDF, 474KB) discusses most of the details. There’s also a FAQ on MOS entitled “Oracle Database 10.2 End of Premier Support – Frequently Asked Questions [ID 1130327.1]” (HTML Version or Full Flash Version (for now…(details on that here)).

Here’s my quick summary of some key points:

  • 10.2 general availability was July 2005
  • “Premiere Support” is what you probably think of as “supported”.
    • It extends for 5 years from the date of general availability.
    • It ended for 10.2 in July of 2010.
    • Oracle offered a 1 year, no-cost grace period for 10.2 premiere support customers which ended in July of 2011. This was basically a year of free Extended Support.
  • “Extended Support” is a for-cost option that allows you to extend Premiere Support for an additional 3 years.
  • “Sustaining Support” is the level of support after Premiere Support ends, assuming you are not paying for Extended Support.
    • This is probably where a large majority of the 10.2 and earlier customers are today, but may not be aware of this.
    • This does NOT include a number of key services such as “New updates, fixes, security alerts, data fixes, and critical patch updates”.

A number of customers that I’ve talked to realized this situation when they went to download the latest Critical Patch Update and could not as they were on Sustaining Support.

The PDF also contains the following chart which is a nice overview of the dates for planning purposes:

Database Support Dates Chart

Posted in Oracle | Tagged: | 10 Comments »

Logger Project Moved Temporarilly

Posted by Tyler Muth on November 9, 2011

The site samplecode.oracle.com was decommissioned recently. I was hosting a number of projects there including “Logger”, my PL/SQL instrumentation package. Until I find a new home, here’s a temporary link to the latest release (1.4) or if you just want to view the readme use this link. I’ll update this post when I decide on a new home.

This project is still very much alive and I will continue to update it based on your suggestions. I know there are a number of them in my “queue”, I just haven’t had time. My plan is to find a public place to host it such as github where everyone can contribute.

Posted in APEX, Oracle, PLSQL | 25 Comments »

A Little Hard Drive History and the Big Data Problem

Posted by Tyler Muth on November 2, 2011

Most of the “Big Data Problems” I see are related to performance. Most often these are long running reports or batch processes, but also include data loads. I see a lot of these as one of my primary roles is in leading Exadata “Proof of Value” (POV) engagements with customers. It’s my observation that over 90% of the time, the I/O throughput of the existing system is grossly under-sized. To be fair, the drives themselves in a SAN or NAS are not the bottleneck, it’s the 2x 2 Gbps or 2x 4 Gbps Fibre channel connection to the SAN more often than not. However, the metrics for hard drive performance and capacity over the last 30 or so years were a lot easier to find so that’s what this post is about.

In short, our capacity to store data has far outpaced our ability to process that data. Here’s an example using “personal” class drives to illustrate the point. In the year 2000 I could by a Maxtor UDMA 20 GB hard drive connected locally via (P) ATA which provided roughly 11 MB/s of throughput (the drive might be limited to a bit less, but not much). Today, I can buy a 3 TB Seagate Barracuda XT ST32000641AS that can sustain 123 MB/s of throughput. So, lets say I need to scan (search) my whole drive. 21 years ago that would take 31 minutes. Today it would take 7 hours! What?!?! The time it takes to search the data I can store has increased by a factor of 14x. How can this be? What about Moore’s Law? Sorry, that only works for processors. Oh right, I meant Kryder’s Law. Kryder’s Law states that magnetic disk areal storage density doubles annually and has nothing to do with performance. Well, actually this phenomenon is half the problem since throughput isn’t doubling annually.

I did a little research and found some relevant data on seagate.com, tomshardware.com, and of course The Wayback Machine. Now on to the data and some graphs (click to enlarge graphs):

Seagate ST506

Rodime R032

Seagate ST3550A

Quantum Fireball ST3 2A

IBM DTTA-351010

Seagate Cheetah X15

Seagate Cheetah X15.3

Seagate Cheetah 15k.6

Seagate Cheetah 15k.7

Seagate Pulsar XT.2 SSD

Year 1979 1983 1993 1998 1999 2001 2003 2008 2011 2011
Capacity (MB) 5 10 452 3276 10240 18432 36864 460800 614400 409600
Capacity (GB) 0.005 0.010 0.441 3.2 10 18 36 450 600 400
Throughput (MB/s) 5 0.6 11.1 7.6 9.5 29 63.6 164 204 360
Capacity Factor of Change 1x 2x 90x 655x 2,048x 3,686x 7,373x 92,160x 122,880x 81,920x
Throughput Factor of Change 1x 0.1x 2x 2x 2x 6x 13x 33x 41x 72x
Capacity Percent Change 0% 100% 8,940% 65,436% 204,700% 368,540% 737,180% 9,215,900% 12,287,900% 8,191,900%
Throughput Percent Change 0% -88% 122% 52% 90% 480% 1,172% 3,180% 3,980% 7,100%
Time in Seconds to Read Full Drive 1 16 41 431 1078 636 580 2810 3012 1138

Time-in-Seconds-to-Read-Ful

Capacity-vs-Throughput---Lo

So, what does all of this mean? Well, to me it means if you’re architecting a data warehouse or even small data mart, make sure you focus on the storage. Over and over again I get drawn into discussions about the minutiae of chip speeds or whether Linux is faster than Solaris, yet when I ask about the storage the answer is almost universally “we’ll throw it on the SAN”. OK, how many HBAs and what speed? Is the SAN over-utilized already? etc, etc, etc.

So, how does this relate to Exadata? The central focus of Exadata is I/O throughput. By including processing power in the storage servers, Exadata has a fixed relationship between capacity and throughput. As you add capacity, you add throughput.

Posted in Exadata, Oracle, Uncategorized | 11 Comments »