Tyler Muth’s Blog

Technology with a focus on Oracle, Application Express and Linux

Convert sqlplus spool output to CSV

Posted by Tyler Muth on April 11, 2012

I’ve been doing some work lately with sizing / capacity planning where the source are a few of Karl Arao’s excellent AWR Mining scripts. Had I been reading his blog more I would have known he was going down a parallel path using Tableau as I was moving in the direction of R.  Anyway, I wanted to get the output of these sqlplus scripts which spool to a file into CSV format to import into R as data frames. Yes, R can deal with fixed-width data, but not nearly as well as CSV. If it were just a few files, wrangling the data into CSV isn’t that hard. However, I had ~40 files and I plan on doing this a lot more so it had to be easy and fast. I looked at a couple of existing solutions such as the sqlplus colsep directive combined with sed or XMLAGG. But each had it’s drawbacks.

For me, perl seemed like the best fit as it’s easy, cross platform, and made for text files. I only used one module and it’s one that’s installed by default on most platforms. Anyway, what started as about 10 lines grew to about a 100 lines as I had to handle edge cases. I could be a lot less, but it’s intentionally verbose as I’ll trade compactness for readability every time. By default it automatically uses the column headings as the headings in the CSV output.

OK, enough rambling. Here’s the script on dropbox. I’ll post the whole script at the end of this post for viewing and for people that can’t get to dropbox hosted files. Here are my test case files used in the following examples.

If you don’t supply parameters it returns the usage instructions:

$ perl sqlplus_spool_to_csv.pl
usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]

This will generate a csv file for every file like “a-*.out”:

$ perl sqlplus_spool_to_csv.pl --infiles "a-*.out"
a-countries.out
a-departments.out
a-employees.out
a-employees_repeat_headers.out

$ ls *.csv
a-countries.out.csv  a-departments.out.csv  a-employees.out.csv  a-employees_repeat_headers.out.csv  a-locations.out.csv

$ head -7 a-countries.out
hr@orcl> select * from countries;

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
AR Argentina                                         2
AU Australia                                         3
BE Belgium                                           1

$ head -4 a-countries.out.csv
"CO","COUNTRY_NAME","REGION_ID"
"AR","Argentina","2"
"AU","Australia","3"
"BE","Belgium","1"

This example will glue on the headers from a custom text file:

$ perl sqlplus_spool_to_csv.pl --infiles "b-*.out" --headers FILE --header_file employees_headers.txt
b-employees.out

On my laptop it only took 25 seconds to convert a 100M file to CSV (a directory of 40 smaller files took under 5 seconds). Here’s the 100M filecompressed to 17K in case you want it.:

$ ls -lh employees_big.out
-rw-r--r--+ 1 tmuth None 100M Apr 11 15:31 employees_big.out

$ wc -l employees_big.out
556200 employees_big.out

$ time perl sqlplus_spool_to_csv.pl --infiles employees_big.out
employees_big.out

real    0m25.047s
user    0m24.289s
sys     0m0.467s

OK, here’s the perl source in case you just want to take a peak or can’t download from dropbox:

#! /usr/bin/perl
# Author: Tyler D Muth - http://tylermuth.wordpress.com
# Version: 1.1

use warnings;
use strict;
use Getopt::Long;
my (@infiles, $headers, $header_file);
usage() if ( @ARGV < 1 or           ! GetOptions('infiles=s@' => \@infiles, 'headers=s' => \$headers, 'header_file=s' => \$header_file));

$headers = defined($headers) ? $headers : 'AUTO'; $headers=uc($headers);

file_loop();

sub usage
{
  print "Unknown option: @_\n" if ( @_ );
  print qq/usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]\n/;
  exit;
}

sub file_loop{
	for my $in_file ( map {glob($_)} @infiles ) {
		print $in_file, "\n";
		my $unpack_layout =''; my $header_titles=''; my $headers_mapped='';my $dashes_line='';
		my $header_line_new='';
		open ( INFILE, "$in_file.csv" ) or die "$!";
		get_unpack_layout(*INFILE,*OUTFILE,$unpack_layout,$dashes_line,$header_line_new);
		chomp $header_line_new;
		get_header_titles($header_line_new,$header_titles);

		if($headers eq 'AUTO'){
			$headers_mapped=text_line_to_csv($header_titles,$unpack_layout);
		}else{
			$headers_mapped=$header_titles;
		}
		print OUTFILE "$headers_mapped\n" if $headers ne "NONE";
		while( my $line =  ){
			chomp $line;
			last if ($line =~ /^[[:digit:]]+[[:space:]]rows[[:space:]]selected.*/);	# this will be an issue for non-english sqlplus
			if($line ne $dashes_line && $line ne $header_line_new && length($line) > 2){
				print OUTFILE text_line_to_csv($line ,$unpack_layout) . "\n";
			}
		}
		close INFILE;
		close ( OUTFILE );
	};
}

sub get_unpack_layout{
	my $infile_handle=$_[0];my $outfile_handle=$_[1];my $unpack_layout_int='';
	my $header_line='';
	while( my $line =  ){
		if ($line =~ /-{3,}.*/) { #if the line starts with 3 or more dashes (may cause problems for < 3 character leading columns
			chomp $line;
			$_[3]=$line; #return the line of dashes so we can skip them if repeated
			my @columns = split(' ', $line);
			foreach my $col (@columns) {
				my $col_length = length($col)+1;
				$unpack_layout_int .= "A$col_length ";
			}
			$_[2]=$unpack_layout_int;
			$_[4]=$header_line;
			last;
		}
		$header_line=$line;
	}
}
#----------------------------------------------------------
sub get_header_titles{
	my $header_titles_in=$_[0]; my $header_titles_return='';my $dummy_line='';
	$header_titles_return=$header_titles_in if $headers eq 'AUTO';
	if($headers eq 'FILE'){
		open ( HEADERS_FILE, "		$header_titles_return = ;
		close ( HEADERS_FILE);
	}
	chomp $header_titles_return; $_[1]=$header_titles_return;
}
#----------------------------------------------------------
sub text_line_to_csv{
	my $in_line=$_[0]; my $in_unpack_layout=$_[1];

	my @f = unpack("$in_unpack_layout", $in_line);
	my $out_text = join ',', map {
		s/^\s+//;  		 # strip leading spaces
			  s/\s+$//;  # strip trailing spaces
			  s/#+/ /g;  # replace any pound signs (#) in the line
		qq/"$_"/ } @f;

	print "$out_text \n" if $out_text =~ m/\#+/g;
	return $out_text;
}
About these ads

12 Responses to “Convert sqlplus spool output to CSV”

  1. Greg Rahn said

    Any reason not to embed the query in R and use RJDBC, then save the data frame to a file like such:
    https://gist.github.com/2382469

    • Tyler Muth said

      Greg,

      I actually got started with R using your ASH example. My use case involves databases of customers where I can’t connect directly. Will send details via email.

      Thanks,
      Tyler

  2. Eric Pement said

    Using the test case input files provided in test_files.zip, I do not get these results. The input filenames are echoed to the console as each file is processed, but all of the output files have zero bytes in length. Not a single warning or error message is sent to the console. I used the identical command sequence:

    c:\temp>perl sqlplus_spool_to_csv.pl –infiles a*.out
    a-countries.out
    a-departments.out
    a-employees.out
    a-employees_repeat_headers.out
    a-locations.out

    c:\temp>dir a*
    Volume in drive C is OS
    Volume Serial Number is 3486-61FA

    Directory of c:\temp

    2012-04-11 02:48 PM 1,592 a-countries.out
    2012-04-26 05:09 PM 0 a-countries.out.csv
    2012-04-11 02:48 PM 2,083 a-departments.out
    2012-04-26 05:09 PM 0 a-departments.out.csv
    2012-04-11 02:47 PM 20,341 a-employees.out
    2012-04-26 05:09 PM 0 a-employees.out.csv
    2012-04-10 05:20 PM 40,173 a-employees_repeat_headers.out
    2012-04-26 05:09 PM 0 a-employees_repeat_headers.out.csv
    2012-04-11 02:49 PM 3,255 a-locations.out
    2012-04-26 05:09 PM 0 a-locations.out.csv
    10 File(s) 67,444 bytes
    0 Dir(s) 66,177,171,456 bytes free

    I got these results with Strawberry Perl version 5, subversion 12.

    I also copied these files to a Cygwin environment (Perl 5, subversion 14), and received the same results: output files created with a zero-length filesize and no warnings. What could be missing here? Strange that it fails both under Win7 CMD and under Cygwin . . . Feedback welcome.

    • Tyler Muth said

      Can you try putting quotes on the file wild cards?

      Thanks, Tyler

      • Eric Pement said

        I tried double-quoting the filename arguments, and using a single filename also (perl script –infiles foo.txt) or (perl script –infiles “foo.txt”) and alternately appending “–headers AUTO” in both Cygwin/bash and from a Win7/CMD shell prompt, and neither one worked for me. The thing that did work was not using “get_extra_data” and restoring the commented code block.

        I must say, I am glad for all the work you did. And thanks a bunch for the super-fast response!!

    • Eric Pement said

      I found the answer to my own question, eventually.

      The solution is to comment OUT the function “get_extra_data” on line #34, and to un-comment lines #36 through #52, restoring the Perl script to sorta what it looks like on the display above these comments.

      Yup, that’s “sorta”, because both the colorized “perl source” display and the “view source” display are both corrupted because they can’t grok the presence of left angle brackets (less-than signs) in the perl code, trying to interpret them as HTML directives instead. As a result, the visible, colorized Perl code on the main page cannot be trusted either.

      Finally, the compressed 100MB test case .7z archive is corrupted, beginning on line #109 and recurring every 108 or 109 lines. The smaller test file archive files are fine.

      • Tyler Muth said

        Eric,

        So, I added the get_extra stuff to what I thought was a copy of this script. I’ll try to restore the original tomorrow. Nice catch. Also, nice debugging on the other issues. I’ll look into them tomorrow as well and update the post. I was hoping someone with more perl skills than I possess would critique and contribute to this. I’m just sorry you had to waste time even getting it to work.

        Thanks, Tyler

  3. We have never tried to put this into any programme. But this sounds interesting. We will try to get this into our java application. Now no need to have some third party tool to convert data from query to some file.

  4. Arr Gee said

    Very helpful. I needed to do similar and used this code as the basis for a filter. Couldn’t access the drop box, so the perl source as shown needed a little editing.

    while( my $line = ){
    should be replaced with
    while( my $line = ){

    or similar

    As I made it a filter I simplified to while( my $line = ){

    The line
    open ( HEADERS_FILE, ” $header_titles_return = ;

    needed correcting from above, but as I didn’t have a header file I removed it.

    Thanks

  5. Just for an example, I have created similar stuff, but pure sql-”SQL based export data to file” (http://damir-vadas.blogspot.com/2010/01/export-table-to-filecsv.html), so someone might that interested.

    Tested on export of 100 mil rows, with 28 GB of exported text-no probs. :-)

  6. This can be all aload of crap.
    Steroids offer you guy boobs and shrunken testicles

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 87 other followers

%d bloggers like this: