Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]




    AWK is a general purpose computer language that is designed for processing text-based data, either in files or data streams. The name AWK is derived from the surnames of its authors — Alfred '''A'''ho, Peter '''W'''einberger, and Brian '''K'''ernighan; however, it is commonly pronounced "awk" and not as a string of separate letters. awk, when written in all lowercase letters, refers to the Unix or Plan 9 program that runs other programs written in the AWK programming language.

    AWK is an example of a programming language that extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. The power, terseness, and limitations of AWK programs and sed scripts inspired Larry Wall to write Perl. Because of their dense notation, all these languages are often used for writing one-liner programs.

    AWK is one of the early tools to appear in Version 7 Unix and gained popularity as a way to add computational features to a Unix pipeline.
    A version of the AWK language is a standard feature of nearly every modern Unix-like operating system available today. AWK is mentioned in the Single UNIX Specification as one of the mandatory utilities of a Unix operating system. Besides the Bourne shell, AWK is the only other scripting language available in a standard Unix environment. Implementations of AWK exist as installed software for almost all other operating systems.


        AWK (programming language)
            Structure of AWK programs
            AWK commands
                The print command
                Variables, et cetera
                User-defined functions
                Hello World
                Print lines longer than 80 characters
                Print a count of words
                Sum first column
                Calculate word frequencies
            Self-contained AWK scripts
            AWK versions and implementations
            Digression
            Books
            See also
    NameAWK
    Paradigmscripting language, procedural programming
    Year1977, last revised 1985, current POSIX editio...
    DesignerAlfred V. Aho
    Typingnone; can handle strings, integers and floati...
    Implementationsawk, GNU Awk, mawk, nawk, MKS AWK, Thompson A...
    Dialectsold awk oawk 1977, new awk nawk 1985, GNU Awk
    Influenced ByC programming language
    InfluencedC programming language
    Operating SystemCross-platform

    top

    Structure of AWK programs
    Generally speaking, two pieces of data are given to AWK: a command file and a primary input file. A command file (which can be an actual file, or can be included in the command line invocation of awk) contains a series of commands which tell AWK how to process the input file. The primary input file is typically text that is formatted in some way; it can be an actual file, or it can be read by awk from the standard input. A typical AWK program consists of a series of lines, each of the form

    /pattern/

    where pattern is a regular expression and action is a command. Most implementations of AWK use extended regular expressions by default. AWK looks through the input file; when it finds a line that matches pattern, it executes the command(s) specified in action. Alternate line forms include:

    BEGIN

    Executes action commands at the beginning of the script execution, i.e. before any of the lines are processed.

    END

    Similar to the previous form, but executes action after the end of input.

    /pattern/

    Prints any lines matching pattern.


    Executes action for each line in the input.


    Each of these forms can be included multiple times in the command file. Lines in the command file are executed in order, so if there are two "BEGIN" statements, the first is executed, then the second, and then the rest of the lines. BEGIN and END statements do not have to be located before and after (respectively) the other lines in the command file.

    AWK was created as a broadbased replacement to C algorithmic approaches developed to integrate text parsing methods.

    top

    AWK commands
    AWK commands are the statement that is substituted for action in the examples above. AWK commands can include function calls, variable assignments, calculations, or any combination thereof. AWK contains built-in support for many functions; many more are provided by the various flavors of AWK. Also, some flavors support the inclusion of dynamically linked libraries, which can also provide more functions.

    For brevity, the enclosing curly braces ( ) will be omitted from these examples.

    top

    The print command
    The print command is used to output text. The simplest form of this command is

    print

    This displays the contents of the current line. In AWK, lines are broken down into fields, and these can be displayed separately:

    print $1

    Displays the first field of the current line

    print $1, $3

    Displays the first and third fields of the current line, separated by a predefined string called the output field separator (OFS) whose default value is a single space character


    Although these fields ($X) may bear resemblance to variables (the $ symbol indicates variables in perl), they actually refer to the fields of the current line. A special case, $0, refers to the entire line. In fact, the commands "print" and "print $0" are identical in functionality.

    The print command can also display the results of calculations and/or function calls:

    print 3+2
    print foobar(3)
    print foobar(variable)
    print sin(3-2)

    Output may be sent to a file:

    print "expression" > "file name"

    top

    Variables, et cetera
    Variable names can use any of the characters A-Za-z0-9_, with the exception of language keywords. The operators + -
      / are addition, subtraction, multiplication, and division, respectively. For string concatenation, simply place two variables (or string constants) next to each other, optionally with a space in between. String constants are delimited by double quotes. Statements need not end with semicolons. Finally, comments can be added to programs by using
        as the first character on a line.

    top

    User-defined functions
    In a format similar to C, function definitions consist of the keyword function, the function name, argument names and the function body. Here is an example of a function.

    function add_three (number, temp)

    This statement can be invoked as follows:

    print add_three(36)
      Outputs 39

    Functions can have variables that are in the local scope. The names of these are added to the end of the argument list, though values for these should be omitted when calling the function. It is convention to add some whitespace in the argument list before the local variables, in order to indicate where the parameters end and the local variables begin.

    top

    Hello World
    Here is the ubiquitous "Hello world program" program written in AWK:

    BEGIN

    top

    Print lines longer than 80 characters
    Print all lines longer than 80 characters. Note that the default action is to print the current line.

    length > 80

    top

    Print a count of words
    Count words in the input, and print lines, words, and characters (like wc)


    END

    top

    Sum first column
    Sum first column of input


    END

    top

    Calculate word frequencies
    Word frequency, (uses associative arrays)

    BEGIN



    END

    top

    Self-contained AWK scripts
    As with many other programming languages, self-contained AWK script can be constructed using the so-called "shebang" syntax.

    For example, a UNIX command called hello.awk that prints the string "Hello, world!" may be built by creating a file named hello.awk containing the following lines:

      !/usr/bin/awk -f
    BEGIN

    top

    AWK versions and implementations
    AWK was originally written in 1977, and distributed with Version 7 Unix.

    In 1985 its authors started expanding the language, most significantly by adding user-defined functions. The language is described in the book The AWK Programming Language, published 1988, and its implementation was made available in releases of UNIX System V. To avoid confusion with the incompatible older version, this version was sometimes known as "new awk" or nawk. This implementation was released under a free software license in 1996, and is still maintained by Brian Kernighan. (see external links below)

    BWK awk refers to this the version by Brian W. Kernighan. It has been dubbed the "One True AWK" because of the use of the term in in association with the book that originally described the language, and the fact that Kernighan was one of the original authors of awk. FreeBSD refers to this version as one-true-awk.

    gawk (GNU awk) is another free software implementation. It was written before the original implementation became freely available, and is still widely used. Many Linux distributions come with a recent version of gawk and gawk is widely recognized as the de-facto standard implementation in the Linux world; gawk version 3.0 was included as awk in FreeBSD prior to version 5.0. Subsequent versions of FreeBSD use BWK awk in order to avoid the GPL, a more restrictive (in the sense that GPL licensed code cannot be modified to become proprietary software) license than the BSD license.


    xgawk is a SourceForge project based on gawk. It extends gawk with dynamically loadable libraries.

    mawk is a very fast AWK implementation by Mike Brennan based on a byte code interpreter. This is the default AWK that comes with Debian and Ubuntu.

    Downloads and further information about these versions are available from the sites listed below.

    Thompson AWK or TAWK is an AWK compiler for DOS and Windows, previously sold by Thompson Automation Software (which has ceased its activities).

    Jawk is a SourceForge project to implement AWK in Java. Extensions to the language are added to provide access to Java features within AWK scripts (i.e., Java threads, sockets, Collections, etc).

    BusyBox includes a sparsely documented Awk implementation that appears to be complete, written by Dmitry Zakharov. This implementation is the smallest Awk implementation out there, suitable for embedded systems.

    top

    Digression
      The bird emblematic of AWK (a.o. on The AWK Programming Language book cover) is the Auk.

    top

    Books
      The book's webpage includes downloads of the current implementation of Awk and links to others.
      Arnold Robbins maintains the GNU Awk implementation of AWK for more than 10 years. The free GNU Awk manual was also published by O'Reilly in May 2001. Free download of this manual is possible through the following book references.

    top

    See also
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "AWK (programming language)". link