Most of the credit for this blog post goes to user "agama" over at https://www.unix.com/shell-programming-and-scripting/162459-awk-script-file-command-line-options.html - that may be worth reading too as it's probably better explained. I have however built out the example script (below) a fair bit.
Awk is a text processing and pattern matching programming language. It's Turing-complete, and therefore surprisingly powerful while simultaneously being rather limited in its intended domain. Essentially it's meant to slice and dice text content, and it can do that incredibly well. But most Linux/Bash users know it as a provider of one-liners they barely understand embedded in the midst of a stream of pipes.
Having used it that way myself for roughly twenty-five years, I thought I ought to understand it a bit better. So I borrowed O'Reilly's Effective awk Programming (Fourth edition) by Arnold Robbins from the library. There aren't a lot of books written about awk because of its limited usefulness: this one is the language's bible. I've been writing Bash scripts as long as I've been using Linux so the first thing I wanted to know was, can I use awk with a shebang to write scripts? The answer is yes - but unlike Bash scripts, awk scripts need file or text input to do anything useful (yes, yes - awk can do things without file input ... but in practical terms you can't achieve much). Which led me to the next question: is awk aware of its own command line so I can validate and react to command line choices? The answer is again yes, with a CAVEAT: the book says command line parsing is inconsistent across platforms, meaning the script I'm showing below works great in Linux ... but might have a different ARG count on Solaris because their version of awk
doesn't count awk
as the zeroeth parameter or similar problems. These days you could probably avoid this entirely by using recent builds of gawk
(preferable not only for consistency, but also features), but it should definitely be kept in mind.
#!/bin/awk --exec
#
# https://www.unix.com/shell-programming-and-scripting/162459-awk-script-file-command-line-options.html
#
# Test with their example: <scriptname> -v -f "some value" intput-file1 input-file2
# (fails on any dash-options other than "v" and "f")
#
BEGIN {
# Look for a gawk-specific variable:
if ( PROCINFO["pid"] == "" )
print "This is not gawk";
else
print "This is gawk";
# Print our variables before processing:
print "Before: ARGC is " ARGC;
for ( k = 1; k <= ARGC ; k++ ) {
printf( "ARGV[%d] = (%s)\n", k, ARGV[k] );
}
# Process our command line:
for( i = 1; i <= ARGC; i++ ) {
if ( substr( ARGV[i], 1, 1 ) != "-" ) # assume first non -x is a file name
break;
if( ARGV[i] == "-v" ) # example option with no trailing data
{
verbose = 1;
continue; # loop to avoid error trap
}
if( ARGV[i] == "-f" ) # example option with trailing data
{
foo = ARGV[i+1]; # need to validate i+1 isn't out of range
i++; # bad form, but it works
continue; # loop to avoid error catch
}
# suss out other desired options like above
printf( "unrecognised option: %s\n", ARGV[i] ) >"/dev/stderr";
exit(1);
}
j = 1;
c = 1;
for( i; i < ARGC; i++ ) { # copy input file names down in argv
ARGV[j++] = ARGV[i];
c++; # new setting for ARGC
}
ARGC = c; # number of file names shifted + 1 for argv[0] value
# Show us remaining command line and variables:
print "After: ARGC is " ARGC;
for ( k = 1; k <= ARGC ; k++ ) {
printf( "ARGV[%d] = (%s)\n", k, ARGV[k] );
}
print "verbose ('-v') is " verbose
}
This script does nothing useful, it's only an educational tool: it prints all the command line arguments before and after processing to show how awk
command line processing can be made to work.
As well as doing a before-and-after printout of the variables, I've also added a test to determine if you're using gawk
(as opposed to some other version of awk
) by testing for the availability of a gawk
-specific variable (tested against mawk
with a "not gawk" result). (I would have preferred to use a built-in variable like AWK_VERSION, but it appears no such value exists.) This test should allow termination or a warning when "not gawk" is found.
The author of the answer this is based on says "Personally, I prefer to wrap my awk with a shell script and let it do all of the command line parsing and other error checking. The script then invokes awk with one or more -v var=value options to pass in the desired data." This is probably a good idea although my "gawk detection" might make this more useful?
I have no idea if I'll use this, I may go no further in learning awk
. But it was interesting and I hope it proves useful and/or educational to someone else.
Update / Partial Fix
2019-06-03: Multiple problems have been found with this idea.
- on the Mac the system version is located at
/usr/bin/awk
(not the /bin/awk my shebang above calls) - Mac's default
awk
dates from 2007, and won't run this script at all: it barfs on the shebang and never gets to my oh-so-clever "is it gawk" test. My limited attempts to make it work failed. - if you've installed
gawk
with HomeBrew (you really, really should), it will be first on the path as/usr/local/bin/awk
, thus solving this problem ... if you use HomeBrew - on Fedora, both
/bin/awk
and/usr/bin/awk
are links to/usr/bin/gawk
, but ... - I had assumed that Debian used
gawk
by default, but no! Their version is based onmawk
and so behaves differently ... it will run the script above if you change --exec to -f. (gawk
is optionally available for Debian.) - #!/usr/bin/env awk -f solves the path problem, but still won't work with Apple's default
awk
I've decided the best way to deal with this (and I admit it's not a good solution) is to use #!/usr/bin/env gawk --exec so the script fails immediately if gawk
isn't available. This will work for an individual if you're willing to install gawk
when needed, but obviously won't work for mass distribution of scripts.