Tyler Muth’s Blog

Technology with a focus on Oracle, Application Express and Linux

Convert sqlplus spool output to CSV

Posted by Tyler Muth on April 11, 2012

I’ve been doing some work lately with sizing / capacity planning where the source are a few of Karl Arao’s excellent AWR Mining scripts. Had I been reading his blog more I would have known he was going down a parallel path using Tableau as I was moving in the direction of R.  Anyway, I wanted to get the output of these sqlplus scripts which spool to a file into CSV format to import into R as data frames. Yes, R can deal with fixed-width data, but not nearly as well as CSV. If it were just a few files, wrangling the data into CSV isn’t that hard. However, I had ~40 files and I plan on doing this a lot more so it had to be easy and fast. I looked at a couple of existing solutions such as the sqlplus colsep directive combined with sed or XMLAGG. But each had it’s drawbacks.

For me, perl seemed like the best fit as it’s easy, cross platform, and made for text files. I only used one module and it’s one that’s installed by default on most platforms. Anyway, what started as about 10 lines grew to about a 100 lines as I had to handle edge cases. I could be a lot less, but it’s intentionally verbose as I’ll trade compactness for readability every time. By default it automatically uses the column headings as the headings in the CSV output.

OK, enough rambling. Here’s the script on dropbox. I’ll post the whole script at the end of this post for viewing and for people that can’t get to dropbox hosted files. Here are my test case files used in the following examples.

If you don’t supply parameters it returns the usage instructions:

$ perl sqlplus_spool_to_csv.pl
usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]

This will generate a csv file for every file like “a-*.out”:

$ perl sqlplus_spool_to_csv.pl --infiles "a-*.out"
a-countries.out
a-departments.out
a-employees.out
a-employees_repeat_headers.out

$ ls *.csv
a-countries.out.csv  a-departments.out.csv  a-employees.out.csv  a-employees_repeat_headers.out.csv  a-locations.out.csv

$ head -7 a-countries.out
hr@orcl> select * from countries;

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
AR Argentina                                         2
AU Australia                                         3
BE Belgium                                           1

$ head -4 a-countries.out.csv
"CO","COUNTRY_NAME","REGION_ID"
"AR","Argentina","2"
"AU","Australia","3"
"BE","Belgium","1"

This example will glue on the headers from a custom text file:

$ perl sqlplus_spool_to_csv.pl --infiles "b-*.out" --headers FILE --header_file employees_headers.txt
b-employees.out

On my laptop it only took 25 seconds to convert a 100M file to CSV (a directory of 40 smaller files took under 5 seconds). Here’s the 100M filecompressed to 17K in case you want it.:

$ ls -lh employees_big.out
-rw-r--r--+ 1 tmuth None 100M Apr 11 15:31 employees_big.out

$ wc -l employees_big.out
556200 employees_big.out

$ time perl sqlplus_spool_to_csv.pl --infiles employees_big.out
employees_big.out

real    0m25.047s
user    0m24.289s
sys     0m0.467s

OK, here’s the perl source in case you just want to take a peak or can’t download from dropbox:

#! /usr/bin/perl
# Author: Tyler D Muth - http://tylermuth.wordpress.com
# Version: 1.1

use warnings;
use strict;
use Getopt::Long;
my (@infiles, $headers, $header_file);
usage() if ( @ARGV < 1 or           ! GetOptions('infiles=s@' => \@infiles, 'headers=s' => \$headers, 'header_file=s' => \$header_file));

$headers = defined($headers) ? $headers : 'AUTO'; $headers=uc($headers);

file_loop();

sub usage
{
  print "Unknown option: @_\n" if ( @_ );
  print qq/usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]\n/;
  exit;
}

sub file_loop{
	for my $in_file ( map {glob($_)} @infiles ) {
		print $in_file, "\n";
		my $unpack_layout =''; my $header_titles=''; my $headers_mapped='';my $dashes_line='';
		my $header_line_new='';
		open ( INFILE, "$in_file.csv" ) or die "$!";
		get_unpack_layout(*INFILE,*OUTFILE,$unpack_layout,$dashes_line,$header_line_new);
		chomp $header_line_new;
		get_header_titles($header_line_new,$header_titles);

		if($headers eq 'AUTO'){
			$headers_mapped=text_line_to_csv($header_titles,$unpack_layout);
		}else{
			$headers_mapped=$header_titles;
		}
		print OUTFILE "$headers_mapped\n" if $headers ne "NONE";
		while( my $line =  ){
			chomp $line;
			last if ($line =~ /^[[:digit:]]+[[:space:]]rows[[:space:]]selected.*/);	# this will be an issue for non-english sqlplus
			if($line ne $dashes_line && $line ne $header_line_new && length($line) > 2){
				print OUTFILE text_line_to_csv($line ,$unpack_layout) . "\n";
			}
		}
		close INFILE;
		close ( OUTFILE );
	};
}

sub get_unpack_layout{
	my $infile_handle=$_[0];my $outfile_handle=$_[1];my $unpack_layout_int='';
	my $header_line='';
	while( my $line =  ){
		if ($line =~ /-{3,}.*/) { #if the line starts with 3 or more dashes (may cause problems for < 3 character leading columns
			chomp $line;
			$_[3]=$line; #return the line of dashes so we can skip them if repeated
			my @columns = split(' ', $line);
			foreach my $col (@columns) {
				my $col_length = length($col)+1;
				$unpack_layout_int .= "A$col_length ";
			}
			$_[2]=$unpack_layout_int;
			$_[4]=$header_line;
			last;
		}
		$header_line=$line;
	}
}
#----------------------------------------------------------
sub get_header_titles{
	my $header_titles_in=$_[0]; my $header_titles_return='';my $dummy_line='';
	$header_titles_return=$header_titles_in if $headers eq 'AUTO';
	if($headers eq 'FILE'){
		open ( HEADERS_FILE, "		$header_titles_return = ;
		close ( HEADERS_FILE);
	}
	chomp $header_titles_return; $_[1]=$header_titles_return;
}
#----------------------------------------------------------
sub text_line_to_csv{
	my $in_line=$_[0]; my $in_unpack_layout=$_[1];

	my @f = unpack("$in_unpack_layout", $in_line);
	my $out_text = join ',', map {
		s/^\s+//;  		 # strip leading spaces
			  s/\s+$//;  # strip trailing spaces
			  s/#+/ /g;  # replace any pound signs (#) in the line
		qq/"$_"/ } @f;

	print "$out_text \n" if $out_text =~ m/\#+/g;
	return $out_text;
}

Posted in Oracle | Tagged: , , | 7 Comments »

Create Bigfile Tablespace – Oracle Managed Files (OMF)

Posted by Tyler Muth on February 10, 2012

Working with Exadata a lot (almost exclusively) I create almost all my tablespaces as bigfile using OMF. For some reason I can never find the right syntax for this when I need it. So, that’s the main reason for this post.

CREATE BIGFILE TABLESPACE data1  NOLOGGING datafile '+DATA_DM01' SIZE 50G AUTOEXTEND ON NEXT 5G;
CREATE BIGFILE TABLESPACE dbfs_ts  NOLOGGING ONLINE PERMANENT EXTENT MANAGEMENT LOCAL AUTOALLOCATE  SEGMENT SPACE MANAGEMENT AUTO datafile '+DBFS_DM01' SIZE 500G AUTOEXTEND ON NEXT 10G;

On a related note, wow I use “column mode” in Notepad++ (link) a lot for this stuff. Many other editors support column mode, including an old favorite on Windows, UltraEdit (link).

  

Posted in Oracle | Leave a Comment »

I Need More Hardware!

Posted by Tyler Muth on February 8, 2012

Actually, this is one of the few times I can remember in my career when I can honestly say I have enough horsepower at my disposal.

Want to guess what I’m running right now?

> select count(*) from v$asm_disk where name like 'DATA_XXX%';

  COUNT(*)
----------
       504

> select sum(total_mb)/1024/1024 tb from v$asm_disk;

        TB
----------
 272.15918

> select count(*) from gv$instance;

  COUNT(*)
----------
         6

[oracle@somehost: ~]$ cat /proc/meminfo | grep MemTotal
MemTotal:       1058429820 kB

[oracle@somehost: ~]$ cat /proc/cpuinfo | grep -m 1 Xeon
model name      : Intel(R) Xeon(R) CPU           X7560  @ 2.27GHz

[oracle@somehost: ~]$ cat /proc/cpuinfo | grep -c Xeon
128

[root@somehost ~]# find /sys/class/infiniband/ -name 'rate' -exec cat {} \;
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)
40 Gb/sec (4X QDR)

This POV will probably generate a number of follow-up posts. One concept: there are always bottlenecks. When you remove the hardware bottlenecks, bottlenecks in your code quickly surface.

Posted in Oracle | 8 Comments »

Oracle Database 10.2 De-Supported

Posted by Tyler Muth on January 11, 2012

OK, that’s an alarmist title and is not really true. However, I feel this topic is important enough to warrant the title. I’ve talked to a lot of people lately that were not aware of the actual support status of Database 10.2 and below (it changed on July 2010 and the 1 year grace period ended July 2011). My goal of this post is certainly not to be critical of anyone, just to spread the word and allow people to make informed decisions.

The document “Oracle Lifetime Support Policy” (PDF, 474KB) discusses most of the details. There’s also a FAQ on MOS entitled “Oracle Database 10.2 End of Premier Support – Frequently Asked Questions [ID 1130327.1]” (HTML Version or Full Flash Version (for now…(details on that here)).

Here’s my quick summary of some key points:

  • 10.2 general availability was July 2005
  • “Premiere Support” is what you probably think of as “supported”.
    • It extends for 5 years from the date of general availability.
    • It ended for 10.2 in July of 2010.
    • Oracle offered a 1 year, no-cost grace period for 10.2 premiere support customers which ended in July of 2011. This was basically a year of free Extended Support.
  • “Extended Support” is a for-cost option that allows you to extend Premiere Support for an additional 3 years.
  • “Sustaining Support” is the level of support after Premiere Support ends, assuming you are not paying for Extended Support.
    • This is probably where a large majority of the 10.2 and earlier customers are today, but may not be aware of this.
    • This does NOT include a number of key services such as “New updates, fixes, security alerts, data fixes, and critical patch updates”.

A number of customers that I’ve talked to realized this situation when they went to download the latest Critical Patch Update and could not as they were on Sustaining Support.

The PDF also contains the following chart which is a nice overview of the dates for planning purposes:

Database Support Dates Chart

Posted in Oracle | Tagged: | 3 Comments »

Logger Project Moved Temporarilly

Posted by Tyler Muth on November 9, 2011

The site samplecode.oracle.com was decommissioned recently. I was hosting a number of projects there including “Logger”, my PL/SQL instrumentation package. Until I find a new home, here’s a temporary link to the latest release (1.4) or if you just want to view the readme use this link. I’ll update this post when I decide on a new home.

This project is still very much alive and I will continue to update it based on your suggestions. I know there are a number of them in my “queue”, I just haven’t had time. My plan is to find a public place to host it such as github where everyone can contribute.

Posted in APEX, Oracle, PLSQL | 8 Comments »

A Little Hard Drive History and the Big Data Problem

Posted by Tyler Muth on November 2, 2011

Most of the “Big Data Problems” I see are related to performance. Most often these are long running reports or batch processes, but also include data loads. I see a lot of these as one of my primary roles is in leading Exadata “Proof of Value” (POV) engagements with customers. It’s my observation that over 90% of the time, the I/O throughput of the existing system is grossly under-sized. To be fair, the drives themselves in a SAN or NAS are not the bottleneck, it’s the 2x 2 Gbps or 2x 4 Gbps Fibre channel connection to the SAN more often than not. However, the metrics for hard drive performance and capacity over the last 30 or so years were a lot easier to find so that’s what this post is about.

In short, our capacity to store data has far outpaced our ability to process that data. Here’s an example using “personal” class drives to illustrate the point. In the year 2000 I could by a Maxtor UDMA 20 GB hard drive connected locally via (P) ATA which provided roughly 11 MB/s of throughput (the drive might be limited to a bit less, but not much). Today, I can buy a 3 TB Seagate Barracuda XT ST32000641AS that can sustain 123 MB/s of throughput. So, lets say I need to scan (search) my whole drive. 21 years ago that would take 31 minutes. Today it would take 7 hours! What?!?! The time it takes to search the data I can store has increased by a factor of 14x. How can this be? What about Moore’s Law? Sorry, that only works for processors. Oh right, I meant Kryder’s Law. Kryder’s Law states that magnetic disk areal storage density doubles annually and has nothing to do with performance. Well, actually this phenomenon is half the problem since throughput isn’t doubling annually.

I did a little research and found some relevant data on seagate.com, tomshardware.com, and of course The Wayback Machine. Now on to the data and some graphs (click to enlarge graphs):

Seagate ST506

Rodime R032

Seagate ST3550A

Quantum Fireball ST3 2A

IBM DTTA-351010

Seagate Cheetah X15

Seagate Cheetah X15.3

Seagate Cheetah 15k.6

Seagate Cheetah 15k.7

Seagate Pulsar XT.2 SSD

Year 1979 1983 1993 1998 1999 2001 2003 2008 2011 2011
Capacity (MB) 5 10 452 3276 10240 18432 36864 460800 614400 409600
Capacity (GB) 0.005 0.010 0.441 3.2 10 18 36 450 600 400
Throughput (MB/s) 5 0.6 11.1 7.6 9.5 29 63.6 164 204 360
Capacity Factor of Change 1x 2x 90x 655x 2,048x 3,686x 7,373x 92,160x 122,880x 81,920x
Throughput Factor of Change 1x 0.1x 2x 2x 2x 6x 13x 33x 41x 72x
Capacity Percent Change 0% 100% 8,940% 65,436% 204,700% 368,540% 737,180% 9,215,900% 12,287,900% 8,191,900%
Throughput Percent Change 0% -88% 122% 52% 90% 480% 1,172% 3,180% 3,980% 7,100%
Time in Seconds to Read Full Drive 1 16 41 431 1078 636 580 2810 3012 1138

Time-in-Seconds-to-Read-Ful

Capacity-vs-Throughput---Lo

So, what does all of this mean? Well, to me it means if you’re architecting a data warehouse or even small data mart, make sure you focus on the storage. Over and over again I get drawn into discussions about the minutiae of chip speeds or whether Linux is faster than Solaris, yet when I ask about the storage the answer is almost universally “we’ll throw it on the SAN”. OK, how many HBAs and what speed? Is the SAN over-utilized already? etc, etc, etc.

So, how does this relate to Exadata? The central focus of Exadata is I/O throughput. By including processing power in the storage servers, Exadata has a fixed relationship between capacity and throughput. As you add capacity, you add throughput.

Posted in Exadata, Oracle, Uncategorized | 9 Comments »

Dropbox for Servers

Posted by Tyler Muth on September 29, 2011

I love dropbox. Let me say that again: I love dropbox! I’ve been using it for well over a year now and it’s the best file synchronizing / sharing service I’ve found (and I’m certainly not alone in that opinion). I use it on my laptop, my home desktop, my home server, my iPhone, my iPad, and with selective-sync, on my wife’s laptop. All of my important files (up to 50 GB worth which is what I pay for) are synced between all of those places all of the time. I don’t send 20 MB email attachments (which completely annoys me), I send links to download from dropbox. The list goes on and I’m now on a bit of a tangent.

Sever Use Case

I work on a lot of shared servers, such as for POVs (proof of value, benchmark) and most of them are Linux-based (I also sometimes use Amazon EC2 instances to teach APEX Classes). I found myself spending a lot of time transferring files to and from the server such as SQL scripts, AWR reports, SQL Monitoring reports, etc via scp (sometimes rsync). I didn’t want to setup my dropbox account on the server as well since it has all of my personal files too. Then I had a bit of an epiphany(and I’m sure I’m not the first one to think of this):

  1. Create a 2nd, free dropbox account (2 GB of free space)
  2. Setup that account on the linux server using the CLI version of dropbox located here. Dropbox has a formal, supported Linux client, but there were a number of package dependencies and specific versions that were more challenging to work around than it was worth.
  3. Share a folder between my primary dropbox account and my 2nd dropbox account.
Since all my secondary dropbox account will see are the folders I share with it, anyone else that happens to be on the server won’t have access to my personal files. Whatever files I add to that folder on my laptop will almost instantly show up on the server and vice versa.

Security Caveat

Having co-authored a book on security (“Applied Oracle Security“), it’s clearly one of my primary interests. While dropbox does encrypt your files when stored on it’s server and does use TLS to encrypt them on the network, and they have policies and internal controls in place to prevent their own employees from decrypting them without authorization, it’s still possible. They store and can access the encryption keys. If the government compels them to decrypt your files, they can and will. So, if you’re storing your tax records or corporate intellectual property, I strongly encourage you to use some form of client-side encryption. Personally, I use a truecrypt file (which I mount as a disk) for super-sensitive stuff within dropbox. Actually, my whole dropbox folder sits on a truecrypt volume (50 GB) which prevents someone that steals my laptop from getting to all of my files, but this does nothing to stop dropbox employees from accessing them. I then create small (1-30 MB) truecrypt files / volumes within dropbox to encrypt anything that is super-sensitive before it leaves my laptop. So, my layers are Big Truecrypt Volume > Dropbox Folder > Little truecrypt volumes for any sensitive files.

Posted in Linux, Oracle | 3 Comments »

Off-Topic: Avoid Levana Baby Monitors

Posted by Tyler Muth on September 23, 2011

I know this is completely off topic and I haven’t had time to blog in a while but there’s been a new addition in our family. I know there are a lot of parents out there and after the experience I’ve had with Levana, I’ll do everything in my power to make sure others avoid their products. Here’s a link to my review on amazon : http://sn.im/levana-review. I’ll also try and post some positive product reviews soon to offset this negativity…

Posted in Uncategorized | 4 Comments »

Static Version of rlwrap for sqlplus

Posted by Tyler Muth on August 31, 2011

Quick Summary

If you don’t have root access to a machine, it can be challenging to compile rlwrap (a nice companion to sqlplus). You can compile it on another machine / VM, then use a utility called statifier to produce one binary that includes all of the required dependencies. If you want to just cut to the chase, you can download the version I compiled for OEL 5.6 x64 here.

How to Compile

If you don’t trust what I’ve compiled (and you probably shouldn’t if you care about security), here are the steps to compile your own version.

On a machine or VM that you control running a comparable Linux distribution:

  1. Download and compile rlwrap.
  2. Download and compile statifier.
  3. Compile your static version of rlwrap:
    # statifier /usr/local/bin/rlwrap /tmp/rlwrap_static_x64
Copy the file to your locked-down server (where you don’t have root), then:
  1. Put it in some logical directory such as ~/local/bin
  2. chmod +x ~/local/bin/*
  3. I create a symlink for it (but you could just rename it if you want):
    ln -s ~/local/bin/rlwrap_static_x64 ~/local/bin/rlwrap
  4. Edit ~/.bashrc and add it to your path:
    export PATH=$PATH:~/local/bin
    alias sqlplus='rlwrap sqlplus'
  5. source ~/.bashrc

More Details

You might be saying, “but Tyler, surely your sys admins trust you with root?”. No, they don’t, and I’m fine with that. I work on a lot of Exadata POVs where we have absurdly tight timeframes. The machines are managed by another group (a group which I have a tremendous amount of respect for). So, I never get root, but that also limits my liability to some degree if something should go wrong.
Yes, I know you can compile the required libraries (gnu readline and ncurses) and set compiler flags to point to different directories for these libraries, but something always goes wrong and I’m tired of hacking around with it.

Posted in Linux, Oracle | 3 Comments »

My Brief Review of “Expert Oracle Exadata”

Posted by Tyler Muth on August 12, 2011

I purchased the Alpha version of Expert Oracle Exadata (amazon link) several months ago, written by Kerry Osborne, Tanel Poder and Randy Johnson. The print copy is still in pre-order (shipping very soon), but you can order the ebook from apress here. My primary role these days is talking about and working with Exadata in customer presentations, classes, and POVs. While I’m not in the same league as the real Exadata experts such as the authors, I’ve worked with Exadata enough over the past year or so to have a good sense about what you need to know.

Expert Oracle Exadata (Image from apress.com)

This book is simply outstanding. The concepts are clear and concise, and ordered in a nice logical progression.  The language has a very approachable, conversational tone. Most importantly, the book is filled with examples that serve to either reinforce or prove the concept of that section. In case you are concerned that there is a lot of marketing “fluff”, forget it. None of the authors work for Oracle and are completely free to speak their minds (so are Oracle employees, just emphasizing the point here). Their primary motivation is to educate and they clearly highlight some of the “issues” you can encounter in the real world with Exadata (yes, they exist).

Additionally, the comments from Kevin Closson that appear throughout the production release of the book as “Kevin Says” are worth the cost of the book alone. Kevin was one of the architects of Exadata and there are very few people (if any) that know more about Exadata from the high level concepts down to the code itself. He also tends to tell it like it is, keeping the facts paramount, and all other considerations secondary. On a related note, Kevin was a great mentor to me over the last year or so when I got thrown into the Exadata world. I had no clue about hardware (“what’s an HBA?”), nor much experience with data warehousing. I can’t thank him enough for his time and  efforts.

In my opinion, this is by far the best source of information out there on Exadata. Actually, I usually reccomend that people watch Kevin’s webcast entitled “Oracle Exadata Storage Server Technical Deep Dive. Part I” (I also made an iPhone friendly version of it here with chapter markers) first, as it gives you a 1 hour intro to the concepts. After watching that, buy (and read) the book.

To the authors, thank you for thanking me! They added a nice thank you section to the “Unofficial Editors” in which I was mentioned by name. This was a very nice touch and I sincerely appreciate it. They helped me a whole lot more than I helped them.

Posted in Exadata, Oracle | 3 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 44 other followers