From @drax.isi.edu:rogers@isi.edu Tue May 17 23:47:34 1994 To: Tim Bunce Subject: Re: Extracting data from spreadsheet (.wk3 or .xls) files? In-Reply-To: Your message of "Sat, 14 May 1994 15:14:17 BST." <9405141521.aa08369@post.demon.co.uk> Date: Tue, 17 May 94 14:52:14 PDT From: Craig Milo Rogers #!/home/oraboff/bin/coraperl 'di'; 'ig00'; # # sql 24 Jun 93 Craig Milo Rogers at USC/ISI # 1) The -m flag restricts the process' umask, so we can create # confidential files in confidence. # 2) Corrected some typos in comments. # 3) Save and restore umask. # sql 23 Jun 93 Craig Milo Rogers at USC/ISI # 1) Use the custom Perl in /home/oraboff/bin. # 2) The -s option produces comma-separated quoted strings. # 3) Make the -f formatting use the -d ($,) field delimiter. # forgo the pleasure of "+" between fields on the underline line. # 4) Right justify numeric fields (determined by &ora_types()). # 5) The default field delimiter is now a space, like sqlplus. # 6) The -F switch specifies part of a perl format string. # 7) Make formatted data (but no headers) be the default: # remove the original -f option, rename -F to -f, add -u # for unformated data. # 8) Added "-o file" to name the output file. # 9) -r reports the number of records processed. # 10) -w is an initial hack at writing Lotus 1-2-3 .WK1 files. # Most of this code belongs in a seperate package. Note # that the output data has to be buffered because we need to # know how many rows there are before we can write the data # (assuming we want to be pipe-compatable). # sql # # Script to run an Oracle statement from the command line. # # Parameters (* = mandatory) # # -#debug debugging control string (must be first argument) # -b base database to use (default $ENV{'ORACLE_SID'}) # -c cache SQL fetch cache size # -d delim specifies the field delimiter (default SPACE) # -f format format data using a perl @<<<... string. # -h add headers # -l page_len lines per page, only used by -f (default 60) # -m umask restrict access to the output file. # -n string replace NULL fields by string # -o file output data to specified file. # -r report the number of records selected. # -s produce comma-separated quoted strings # -u produce unformatted data. # -w produce Lotus 1-2-3 .WK1 data. # name/pass * Oracle username and password # stmt Oracle statement to be executed # read from stdin if not given on command line # # Data formatting is assumed, unless the -s, -u, or -w flags are specified. # The -w flag is compatible with -f, but -s and -u are not. # # TODO: # 1) Seperate the .WK1-generation code into a separate package. # 2) Consider generating pseudo-ROWNUMs (real ROWNUMs don't work # well in GROUPs) to make Lotus Improv happier when importing. # 3) Consider supporting date values, etc. Perhaps the -w flag # should take a special format string as an argument? # 4) Rewrite the attached documentation! # 5) How about multi-statement execution? COMMIT_OR_ROLLBACK? # 6) Since this version is so radically different from the original, # and has incompatible flags, perhaps I should rename it before # distributing it? "nsql"? # 7) I'd like an enhancement to the underlying oraperl code so I # can tell how many records were modified by, say, an update. # 8) Oracle's sqlplus product does a lot of stuff that this script # doesn't do: # 1) page footers # 2) page headers separate from column headers # 3) breaks, computes # 4) more column formatting options # 5) user-controlled macro variables # 9) Are there things I could do to make it easier to interface # Oracle output with other programs? # 1) tex or latex # 2) troff, tbl, etc. # 3) gnuplot # 4) FrameMaker # 10) Should I support more general translation in the field delimiter # strings? # # Original attribution: # # Written in response to in alt.sources.wanted. # # Author: Kevin Stock # Date: 18th November 1991 # Last change: 9th June 1993 # $ora_debug = shift if $ARGV[0] =~ /^-#/; # Who uses the following value, and can it be reformatted to # avoid line wrap? $USAGE = <<; [-bbase] [-ccache] [-ddelim] [-Fformat] [-h] [-lpage_len] [-mumask][-nstring] [-o file] [-r] [-s] [-u] [-w] name/pass [stmt] require 'getopts.pl'; # option parsing do Getopts('b:c:d:f:hl:mn:o:rsuw'); die "$0: only one of -f -s -u may be specified\n" if ((defined($opt_f) + defined($opt_s) + defined($opt_u)) > 1); die "$0: only one of -s -u -w may be specified\n" if ((defined($opt_s) + defined($opt_u) + defined($opt_w)) > 1); # Special hack: "-d \t" (literal slash and t) is translated into a # real tab. $opt_d = "\t" if defined($opt_d) && $opt_d eq "\\t"; die "$0: -d may not be \\, @ or ^\n" if defined($opt_d) && ($opt_d =~ m/\\|\@|\^/); $formatted = 1 if (! defined($opt_s) && ! defined($opt_u)); $USER = shift || die "user/password not specified\n"; if ($#ARGV >= 0) { @stmt = @ARGV; } else { print "Enter the statement to execute (^D to end):\n"; @stmt = ; } $, = " "; # The default delimiter is a space, $, = "," if defined($opt_s); # but it's comma if -s is specified. $, = $opt_d if defined($opt_d); # Explicitly set the column delimiter. $\ = "\n"; # each record terminated with newline $db = $opt_b if defined($opt_b); # set database $ora_cache = $opt_c if defined($opt_c); # set fetch cache # Select the destination for header and data output. It can be # STDOUT or a specified file. We'll store the desired filehandle # into a string variable, which we'll use in the output statements # to select output. All this ensures that error messages, etc. # and the data stream are routed properly. if ($opt_o) { # Send header and data output to the specified file: if ($opt_m) { $saveumask = umask; # Save current umask. umask(umask & oct($opt_m)); # Restrict access further. } open (OUTFILE, ">" . $opt_o) || die "Can't open $opt_o: $!\n"; $OUTHANDLE = "OUTFILE"; umask($saveumask) if $opt_m; # Restore umask value. } else { # Send header and data output to STDOUT: $OUTHANDLE = "STDOUT"; } select((select($OUTHANDLE), $= = $opt_l)[0]) if defined($opt_l); # set page length. # log into the database and execute the statement $lda = &ora_login($db, $USER, '') || die "$ora_errstr\n"; $csr = &ora_open($lda, "@stmt") || die "$ora_errstr\n"; # print out any information which comes back if (($nfields = &ora_fetch($csr)) > 0) # does the statement return data? { # Will we format the data? if ($formatted) { if ($opt_f) { # Analyze the supplied format string. The components # must be separated by white space. Assume right # justification if we see a ">" or "#". Setup # dummy values for @types: a format with "#" # implies numbers, all others imply text: @fstring = split(/[ \t\n]+/, $opt_f); @lengths = @just = @types =(); foreach $i (0 .. $nfields - 1) { $lengths[$i] = length $fstring[$i]; $just[$i] = "<"; $just[$i] = ">" if $fstring[$i] =~ /\>|\#/; $types[$i] = 1; $types[$i] = 2 if $fstring[$i] =~ /\#/; } } else { # Determine format from the database. Numeric # fields (types 2, 3, 4, 7) are right justified, # all others are left justified: @lengths = &ora_lengths($csr); @types = &ora_types($csr); @just = @fstring = (); foreach $i (0 .. $nfields - 1) { $just[$i] = "<"; $just[$i] = ">" if ($types[$i] == 2) || ($types[$i] == 3) || ($types[$i] == 4) || ($types[$i] == 7); $fstring[$i] = "@" . ($just[$i] x ($lengths[$i] - 1)); } } } # Do we need a header and/or perl formats? if ($formatted & ! $opt_w) { # Start building a header format string (may be null): if ($opt_h) { # Build the list of field names, # in columns of the appropriate width. # Note: this code will fail if ${,} eq "\\". $fmt = ''; foreach $i (0 .. $nfields - 1) { $fmt .= "%"; $fmt .= "-" if $just[$i] eq "<"; $fmt .= $lengths[$i] . "." . $lengths[$i] . "s"; $fmt .= "%" if ${,} eq "%"; $fmt .= ${,}; } chop $fmt; chop $fmt if ${,} eq "%"; $format = "format ${OUTHANDLE}_TOP = \n" . sprintf($fmt, &ora_titles($csr, 0)) . "\n"; # Then underline the field names. $fmt = ''; foreach $i (0 .. $nfields - 1) { $fmt .= ("-" x $lengths[$i]) . ${,}; } chop $fmt; $format .= $fmt . "\n.\n"; } else { # No header format: $format = "format ${OUTHANDLE}_TOP = \n.\n"; } # Next for the data format, a @<<... or @>>.. field per # column. # Note: this code may fail if ${,} is "@" or "^". $format .= "format ${OUTHANDLE} =\n" . join("${,}", @fstring) . "\n"; # Finally the variable associated with each column # Why doesn't Perl let us specify an array here? foreach $i (0 .. $nfields - 1) { $format .= "\$result[$i],"; } chop $format; # remove extraneous comma $format .= "\n.\n"; eval($format); } elsif ($opt_s && $opt_h) { # Field names enclosed in quotes, separated by commas: @titles = &ora_titles($csr, 0); grep(s/ *$//, @titles); print($OUTHANDLE '"' . join('"' . ${,} . '"', @titles) . '"'); } elsif ($opt_u && $opt_h) { # Simple headers with underlines: @titles = &ora_titles($csr, 0); grep(s/ *$//, @titles); print($OUTHANDLE @titles); grep(tr//-/c, @titles); print($OUTHANDLE @titles); } elsif ($opt_w) { # Prepare to save the .WK1 records: &wk1_init(); } # Now we'll fetch the data, one record at a time: $nrecords = 0; while (@result = &ora_fetch($csr)) { # Fetched a record: $nrecords++; # Override the NULL value (when selected): grep(defined $_ || ($_ = $opt_n), @result) if $opt_n; # Write, using the appropriate formatting: if ($opt_u) { print($OUTHANDLE @result); } elsif ($opt_s) { print($OUTHANDLE '"' . join('"' . ${,} . '"', @result) . '"'); } elsif ($opt_w) { &wk1_write(); } else { write($OUTHANDLE); } } warn "$ora_errstr\n" if ($ora_errno != 0); } printf($OUTHANDLE "\n$nrecords records selected.\n\n") if ($opt_r); # finish off neatly &wk1_finish() if $opt_w; do ora_close($csr); do ora_logoff($lda); close OUTFILE if $opt_o; # *************************************************************************** # # This following reoutines don't belong here. They should be in a # seperate package, and they shouldn't access global variables for things # such as the output filehandle; no, siree. Nonetheless, I'm running out # of time, so here's the routines for writing a Lotus 1-2-3 .WK1 file. # # One record type in the .WK1 file identifies the range of data # the file. I believe that this record must appear ner the top of the # .WK1 file. This is unfortunate, because we won't know how many data # records there are until we're done fetching them. So we'll buffer up # up the output in memory until were' done, then we'll patch up the # data range declaration record before we write to the output filehandle. # # Most of the information show here came from the document "Worksheet # File Format from Lotus", Dec/. 1984, available here and there on the # Internet. Some usage patterns, however, were deduced from .WK1 files # produced by Lotus Improv running on a NeXTstation. Your mileage may # vary... sub wk1_init { # Prepare to buffer .WK1 records: @wk1_buf = (); $#wk1_buf = 1000; $wk1_nrec = 0; &wk1_BOF(); # Start the .WK1 file. &wk1_RANGE($nfields - 1, 0); &wk1_CALCMODE(0xff); &wk1_CALCORDER(0); &wk1_CALCCOUNT(1); &wk1_SPLIT(0); &wk1_WINDOW1(); foreach $i (0 .. $nfields - 1) { # Adding 1 to the length made Improv happier... &wk1_COLW1($i, $lengths[$i] + 1); } if ($opt_h) { @titles = &ora_titles($csr, 0); foreach $i (0 .. $nfields - 1) { &wk1_TITLE($i, $titles[$i], $just[$i]); } } # The offset between the record counter in the main code and # the .WK1 value depends upon whether or not we output a # set of header labels: $wk1_loff = defined($opt_h) ? 0 : 1; } sub wk1_write { # Store the data values: foreach $i (0 .. $nfields - 1) { if ($types[$i] == 1) { &wk1_TEXT($nrecords - $wk1_loff, $i, $result[$i], $just[$i]); } else { &wk1_NUMBER($nrecords - $wk1_loff, $i, $result[$i]); } } } sub wk1_finish { &wk1_EOF(); # Overwrite the RANGE record with a correct one: &wk1_reRANGE(1, $nfields - 1, $nrecords - 1); # Write the buffered records: binmode($OUTHANDLE); undef $\; for ($i = 0; $i < $wk1_nrec; $i++) { print($OUTHANDLE $wk1_buf[$i]); } } # The following routines are in record type (I called it "opcode") order. # With a couple of exceptions, there's one routine for each record type # that we might choose to output. sub wk1_BOF # () 0: Beginning of file. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCCC", 0, 0, # opcode 0 2, 0, # 2 body octets 4, 4 # Lotus 1-2-3 identifier. ); } sub wk1_EOF # () 1: End of file. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCC", 1, 0, # opcode 1 0, 0 # no body ); } sub wk1_CALCMODE # (modeflag) 2: Calculation mode. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCC", 2, 0, # opcode 2 1, 0, # 1 body octet $_[0] # calculation mode flag ); } sub wk1_CALCORDER # (orderflag) 3: Calculation order. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCC", 3, 0, # opcode 3 1, 0, # 1 body octet $_[0] # calculation order flag ); } sub wk1_SPLIT # (splitmode) 4: Split into 2 windows? { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCC", 4, 0, # opcode 4 1, 0, # 1 body octet $_[0] # split screen flag ); } sub wk1_RANGE # (endcol, endrow) 6: Range of data in file. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCCCCCCCCC", 6, 0, # opcode 6 8, 0, # 8 body octets 0, 0, # starting column 0, 0, # starting row $_[0] % 256, $_[0] >> 8, # ending column $_[1] % 256, $_[1] >> 8 # ending row ); } sub wk1_reRANGE # (bufidx, endcol, endrow) 6: Range of data in file. { $wk1_buf[$_[0]] = pack("CCCCCCCCCCCC", 6, 0, # opcode 6 8, 0, # 8 body octets 0, 0, # starting column 0, 0, # starting row $_[1] % 256, $_[1] >> 8, # ending column $_[2] % 256, $_[2] >> 8 # ending row ); } sub wk1_WINDOW1 # () 7: Window1 parameters. { # I confess: I don't know what some of this means. I lifted # the values directly from a Lotus Improv .WK1 file. $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCC" . ("C" x 31), 7, 0, # Opcode 7 31, 0, # 31 body octets 0, 0, # cursor column position 0, 0, # cursor row position 0x10, 0, # Format, unused 9, 0, # column width 6, 0, # columns on screen 20, 0, # rows on screen 0, 0, # left column 0, 0, # top row 0, 0, # number of title columns 0, 0, # number of title rows 0, 0, # left title column 0, 0, # top title row 4, 0, # border width column 4, 0, # border width row 80, 0, # window width 0 # unused ); } sub wk1_COLW1 # (colnum, width) 8: Column width. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCCCC", 8, 0, # opcode 8 3, 0, # 3 body octets $_[0] % 256, $_[0] >> 8, # Column number $_[1] % 256 # column width ); } sub wk1_NUMBER # (rownum, colnum, val) 14: 64-bit Float number. { # This routine is a dreadfull hack. It works on a Sun 4; it # may or may not work for you. If it doesn't work for you, try # replacing the joun/reverse/split with just $flt. $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $flt = pack("d", ($_[2] + 0.00)); # Convert the floating point number. $wk1_buf[$wk1_nrec++] = pack("CCCCCCCCC" . "a8", 14, 0, # opcode 14 13, 0, # 13 body octets 0xf1, # format code $_[1] % 256, $_[1] >> 8, # Column number $_[0] % 256, $_[0] >> 8, # Row number join('', reverse(split(//, $flt))) ); } sub wk1_TITLE # (colnum, label, just) 15: Header label. { # Note: the values for the "just" parameter were selected to # harmonize with perl FORMAT strings. $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $len = length($_[1]) + 2; # leading prime and trailing null $wk1_buf[$wk1_nrec++] = pack("CCCCCCCCCa${len}", 15, 0, # opcode 15 ($len + 5) % 256, ($len + 5) >> 8, # body octets 0xf1, # format code $_[0] % 256, $_[0] >> 8, # Column number 0, 0, # Row number ($_[2] eq "<" ? "'" : '"') . $_[1] # actual title, preceeded # by justification flag. ); } sub wk1_TEXT # (rownum, colnum, label, just) 15: Text in data. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $len = length($_[2]) + 2; # leading prime and trailing null $wk1_buf[$wk1_nrec++] = pack("CCCCCCCCCa${len}", 15, 0, # opcode 15 ($len + 5) % 256, ($len + 5) >> 8, # body octets 0xf1, # format code $_[1] % 256, $_[1] >> 8, # Column number $_[0] % 256, $_[0] >> 8, # Row number ($_[3] eq "<" ? "'" : '"') . $_[2] # actual title, preceeded # by justification flag. ); } sub wk1_LABELFMT # (fmtcode) 41: Label format. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCC", 41, 0, # opcode 41 1, 0, # 1 body octet $_[0] # label format flag. ); } sub wk1_CALCCOUNT # (iternum) 47: Calculation count. { $#wk1_buf += 1000 if ($wk1_nrec == ($#wk1_buf + 1)); $wk1_buf[$wk1_nrec++] = pack("CCCCC", 47, 0, # opcode 47 1, 0, # 1 body octet $_[0] # calculation count ); } __END__ # no need for perl even to scan the rest ############################################################################## # These next few lines are legal in both Perl and nroff. .00; # finish .ig 'di \" finish diversion--previous line must be blank .nr nl 0-1 \" fake up transition to first page again .nr % 0 \" start at page 1 ';<<'.ex'; ############## From here on it's a standard manual page ############ .TH SQL L "18th November 1992" .ad .nh .SH NAME sql \- execute an Oracle SQL statement from the command line .SH SYNOPSIS \fBsql\fP [\fB\-b\fP\fIbase\fP] [\fB\-c\fP\fIcache\fP] [\fB\-d\fP\fIdelim\fP] [\fB\-f\fP|\fB\-h\fP] [\fB\-l\fP\fIpage_len\fP] [\fB\-n\fP\fIstring\fP] \fIname\fP\fB/\fP\fIpassword\fP \fIstatement\fP .SH DESCRIPTION .I Sql connects to an Oracle database using the \fIname/password\fP supplied and executes the given SQL \fIstatement\fP displaying the result on its standard output. The \fB\-b\fP\fIbase\fP flag may be supplied to specify the database to be used. If it is not given, the database specified by the environment variable \fBORACLE_SID\fP or \fBTWO_TASK\fP is used. The \fB\-c\fP\fIcache\fP flag may be supplied to set the size of fetch cache to be used. If it is not given, the system default is used. If the \fB\-n\fP\fIstring\fP flag is supplied, \fBNULL\fP fields (in the \fIOracle\fP sense) will replaced in the output by \fIstring\fP. Normally, they are left blank. The \fB\-f\fP and \fB\-h\fP flags may be used to modify the form of the output. Without either flag, no field headers are printed and fields are not padded. With the \fB\-h\fP flag, field headers are added to the top of the output, but the format is otherwise unchanged. With the \fB\-f\fP flag, the output is formatted in a tabular form similar to that used by \fIsqlplus\fP, except that all fields are left\-justified, regardless of their data type. Column headers are printed at the top of each page; a page is assumed to be 60 lines long, but this may be overridden with the \fB\-l\fP\fIpage_len\fP flag. Without the \fB\-f\fP flag, fields are separated with tabs; this may be changed to any desired string (\fIdelim\fP) using the \fB\-d\fP flag. .SH ENVIRONMENT The environment variable \fBORACLE_SID\fP or \fBTWO_TASK\fP determines the Oracle database to be used if the \fB\-b\fP\fIbase\fP flag is not supplied. .SH DIAGNOSTICS .in +5 .ti -5 \fBonly one of \-f and \-h may be specified\fP .br the \fB\-f\fP and \fB\-h\fP options are mutually exclusive, but both were specified .in -5 The only other diagnostics generated by \fIsql\fP are usage messages, which should be self\-explanatory. However, you may also encounter error messages from Oraperl (unlikely) or from Oracle (more common). See the \fIOracle Error Messages and Codes Manual\fP for details. .SH NOTES This program is only intended for use from the command line. If you use it within a shell script then you should consider rewriting your script in Oraperl to use Perl's text manipulation and formatting commands. .SH "SEE ALSO" \fISQL Language Reference Manual\fP .br perl(1), oraperl(1) .SH AUTHOR Kevin Stock, .if t .ft C .if t .ft P .ex