Posts Tagged ‘csv’

“Refresh” a CSV File

Posted by Tyler Muth on February 5, 2013

In a previous post I discussed a PDF reader that allows you to refresh the PDF after changes are made. This is the same concept except it’s for CSV files. I’ve been using Perl to convert sqlplus spool output to CSV which is much easier to read into R. As I’m updating my code and regenerating sample CSV files, it’s really annoying that Excel gets an exclusive lock on the file. So, I would open it in Notepad++ or vim, but it’s much harder in those tools to pick out the columns and rows to find errors.

CSVed to the rescue!!! Here’s the website for CSVed. It’s a great, simple CSV viewer / editor and it does not lock the file. You simply press the “reload” button or ctrl+r and it reloads the file. Again, so simple but so useful for my use-case.

Posted in Oracle, R | Tagged: csv, Oracle, perl, r | 1 Comment »

Convert sqlplus spool output to CSV

Posted by Tyler Muth on April 11, 2012

I’ve been doing some work lately with sizing / capacity planning where the source are a few of Karl Arao’s excellent AWR Mining scripts. Had I been reading his blog more I would have known he was going down a parallel path using Tableau as I was moving in the direction of R. Anyway, I wanted to get the output of these sqlplus scripts which spool to a file into CSV format to import into R as data frames. Yes, R can deal with fixed-width data, but not nearly as well as CSV. If it were just a few files, wrangling the data into CSV isn’t that hard. However, I had ~40 files and I plan on doing this a lot more so it had to be easy and fast. I looked at a couple of existing solutions such as the sqlplus colsep directive combined with sed or XMLAGG. But each had it’s drawbacks.

For me, perl seemed like the best fit as it’s easy, cross platform, and made for text files. I only used one module and it’s one that’s installed by default on most platforms. Anyway, what started as about 10 lines grew to about a 100 lines as I had to handle edge cases. I could be a lot less, but it’s intentionally verbose as I’ll trade compactness for readability every time. By default it automatically uses the column headings as the headings in the CSV output.

OK, enough rambling. Here’s the script on dropbox. I’ll post the whole script at the end of this post for viewing and for people that can’t get to dropbox hosted files. Here are my test case files used in the following examples.

If you don’t supply parameters it returns the usage instructions:

$ perl sqlplus_spool_to_csv.pl
usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]

This will generate a csv file for every file like “a-*.out”:

$ perl sqlplus_spool_to_csv.pl --infiles "a-*.out"
a-countries.out
a-departments.out
a-employees.out
a-employees_repeat_headers.out

$ ls *.csv
a-countries.out.csv  a-departments.out.csv  a-employees.out.csv  a-employees_repeat_headers.out.csv  a-locations.out.csv

$ head -7 a-countries.out
hr@orcl> select * from countries;

CO COUNTRY_NAME                              REGION_ID
-- ---------------------------------------- ----------
AR Argentina                                         2
AU Australia                                         3
BE Belgium                                           1

$ head -4 a-countries.out.csv
"CO","COUNTRY_NAME","REGION_ID"
"AR","Argentina","2"
"AU","Australia","3"
"BE","Belgium","1"

This example will glue on the headers from a custom text file:

$ perl sqlplus_spool_to_csv.pl --infiles "b-*.out" --headers FILE --header_file employees_headers.txt
b-employees.out

On my laptop it only took 25 seconds to convert a 100M file to CSV (a directory of 40 smaller files took under 5 seconds). Here’s the 100M filecompressed to 17K in case you want it.:

$ ls -lh employees_big.out
-rw-r--r--+ 1 tmuth None 100M Apr 11 15:31 employees_big.out

$ wc -l employees_big.out
556200 employees_big.out

$ time perl sqlplus_spool_to_csv.pl --infiles employees_big.out
employees_big.out

real    0m25.047s
user    0m24.289s
sys     0m0.467s

OK, here’s the perl source in case you just want to take a peak or can’t download from dropbox:

#! /usr/bin/perl
# Author: Tyler D Muth - https://tylermuth.wordpress.com
# Version: 1.1

use warnings;
use strict;
use Getopt::Long;
my (@infiles, $headers, $header_file);
usage() if ( @ARGV < 1 or           ! GetOptions('infiles=s@' => \@infiles, 'headers=s' => \$headers, 'header_file=s' => \$header_file));

$headers = defined($headers) ? $headers : 'AUTO'; $headers=uc($headers);

file_loop();

sub usage
{
  print "Unknown option: @_\n" if ( @_ );
  print qq/usage: program [--infiles "file.txt" | "*.out"] [--headers [AUTO] | FILE | NONE] [--header_file header.txt] [--help|-?]\n/;
  exit;
}

sub file_loop{
	for my $in_file ( map {glob($_)} @infiles ) {
		print $in_file, "\n";
		my $unpack_layout =''; my $header_titles=''; my $headers_mapped='';my $dashes_line='';
		my $header_line_new='';
		open ( INFILE, "$in_file.csv" ) or die "$!";
		get_unpack_layout(*INFILE,*OUTFILE,$unpack_layout,$dashes_line,$header_line_new);
		chomp $header_line_new;
		get_header_titles($header_line_new,$header_titles);

		if($headers eq 'AUTO'){
			$headers_mapped=text_line_to_csv($header_titles,$unpack_layout);
		}else{
			$headers_mapped=$header_titles;
		}
		print OUTFILE "$headers_mapped\n" if $headers ne "NONE";
		while( my $line =  ){
			chomp $line;
			last if ($line =~ /^[[:digit:]]+[[:space:]]rows[[:space:]]selected.*/);	# this will be an issue for non-english sqlplus
			if($line ne $dashes_line && $line ne $header_line_new && length($line) > 2){
				print OUTFILE text_line_to_csv($line ,$unpack_layout) . "\n";
			}
		}
		close INFILE;
		close ( OUTFILE );
	};
}

sub get_unpack_layout{
	my $infile_handle=$_[0];my $outfile_handle=$_[1];my $unpack_layout_int='';
	my $header_line='';
	while( my $line =  ){
		if ($line =~ /-{3,}.*/) { #if the line starts with 3 or more dashes (may cause problems for < 3 character leading columns
			chomp $line;
			$_[3]=$line; #return the line of dashes so we can skip them if repeated
			my @columns = split(' ', $line);
			foreach my $col (@columns) {
				my $col_length = length($col)+1;
				$unpack_layout_int .= "A$col_length ";
			}
			$_[2]=$unpack_layout_int;
			$_[4]=$header_line;
			last;
		}
		$header_line=$line;
	}
}
#----------------------------------------------------------
sub get_header_titles{
	my $header_titles_in=$_[0]; my $header_titles_return='';my $dummy_line='';
	$header_titles_return=$header_titles_in if $headers eq 'AUTO';
	if($headers eq 'FILE'){
		open ( HEADERS_FILE, "		$header_titles_return = ;
		close ( HEADERS_FILE);
	}
	chomp $header_titles_return; $_[1]=$header_titles_return;
}
#----------------------------------------------------------
sub text_line_to_csv{
	my $in_line=$_[0]; my $in_unpack_layout=$_[1];

	my @f = unpack("$in_unpack_layout", $in_line);
	my $out_text = join ',', map {
		s/^\s+//;  		 # strip leading spaces
			  s/\s+$//;  # strip trailing spaces
			  s/#+/ /g;  # replace any pound signs (#) in the line
		qq/"$_"/ } @f;

	print "$out_text \n" if $out_text =~ m/\#+/g;
	return $out_text;
}

Posted in Oracle | Tagged: csv, Oracle, sqlplus | 14 Comments »

	Veasna MOK on AWR Formatter 1.6 Release…
	lohit on Synonym Switching Technique an…
	Praful on Oracle Wallet w/ Self-Signed…
	Praful on Oracle Wallet w/ Self-Signed…
	RichardLam on PL/SQL Associative Arrays

Tyler Muth’s Blog

Technology with a focus on Oracle, Application Express and Linux

Recent Posts

Categories

Recent Comments

Archives

Email Subscription

Blogroll

Meta

Subscribe