Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    A string literal is the representation of a string value within the source code of a computer program. There exist numerous alternate notations for specifying string literals, and the exact notation depends on the individual programming language in question. Nevertheless, there are somegeneral guidelines that most modern programming languages follow.

    Specifically, most string literals can be specified using:

      declarative notation;
      whitespace delimiters (indentation);
      bracketed delimiters (quoting);
      escape characters; or
      a combination of some or all of the above


        String literal
            Declarative notation
            Whitespace delimiters
            Bracketed delimiters
            Delimiter collision
                Dual quoting style
                Escape character
                Escape sequence
                Double-up and Triple-up escape sequence
                Multiple quoting style
                Here documents
            Metacharacters
                Sigils
                Raw strings
                Handling newline characters
            Variable interpolation
            Embedding source code in string literals

    top

    Declarative notation

    In the original FORTRAN programming language, string literals were written in so-called Hollerith notation, where a decimal count of the number of characters was followed by the letter H, and then the characters of the string:

    27HAn example Hollerith string

    This declarative notation style is contrasted with bracketed delimiter quoting, because it does
    not require the use of balanced "bracketed" characters on either side of the string.

    Advantages:
      enables the inclusion of metacharacters that might otherwise be mistaken as commands

    Drawbacks:
      this type of notation is error-prone for manual entry by programmers

    Because of the drawbacks, most programming languages do not use this style of declarative
    notation.

    top

    Whitespace delimiters

    In YAML, string literals may be specified by the relative positioning of whitespace and
    indentation.

    - title: An example multi-line string in YAML
    body
    |

    This is a multi-line string.
    "special" metacharacters may
    appear here. The content of this string is
    indicated by indentation.

    top

    Bracketed delimiters

    Most modern programming languages use bracket delimiters or quoting
    to specify string literals. Double quotes are the most common quoting delimiters used:

    "Hi There!"

    Some languages also allow the use of single quotes as an alternative to double quotes (though the string must begin and end with the same kind of quotation mark):

    'Hi There!'

    Note that these quotation marks are unpaired (the same character is used as an opener and a closer), which is a hangover from the typewriter technology which was the precursor of the earliest computer input and output devices. The Unicode character set includes paired (separate opening and closing) versions of both single and double quotes:

    “Hi There!”
    ‘Hi There!’

    The paired double quotes can be used in Visual Basic .NET.

    The PostScript programming language uses parentheses, with embedded newlines allowed,
    and also embedded unescaped parentheses provided they are properly paired:

    (The quick
    (brown
    fox))


    top

    Delimiter collision

    Delimiter collision is a common problem for string literal notations that use
    quoting. The problem occurs when a programmer attempts to use a quoting character as part of the string literal itself. Because this is a very common problem, a number of methods for avoiding delimiter collision have been invented.

    top

    Dual quoting style

    Some languages (e.g. Modula-2, Javascript) attempt to avoid the delimiter collision problem by allowing a dual quoting
    style. Typically, this consists of allowing the programmer to use either single quotes
    or double quotes interchangeably.

    "This is John's apple."
    'I said, "Can you hear me?"'

    Some programming languages allow subtle variations on dual quoting, treating single quotes
    and double quotes slightly different (e.g. Perl).

    One problem with dual quoting is that it doesn't allow for the inclusion of both styles
    of quotes at once within the same literal.

    top

    Escape character

    One method for avoiding delimiter collision to use escape characters:

    "I said, "Can you hear me?""

    The most commonly-used escape character for this purpose is the backslash "",
    the tradition for which originated on Unix. From a language design standpoint, this
    approach is adequate, but there are drawbacks:

      text can be rendered unreadable when littered with numerous escape characters
      escape characters are required to be escaped, when not intended as escape characters
      although easy to type, they can be cryptic to someone unfamiliar with the language

    "I said, "The Windows path is C:\\Foo\Bar\Baz\""

    The confusing presence of too many escape and slash characters in a string is commonly
    disparaged as Leaning Toothpick Syndrome.

    top

    Escape sequence

    An extended concept of the escape character, an escape sequence is also a means of avoiding
    delimiter collision. An escape sequence consists of two or more consecutive characters that can have
    special meaning when used in the context of a string literal.

    "I said, 4Can you hear me?4"

    Escape sequences can also be used for purposes other than avoiding delimiter collision, and
    can also include metacharacters. (see Metacharacters below).

    top

    Double-up and Triple-up escape sequence

    Some languages (such as BASIC and DCL) avoid delimiter collision
    by doubling up on the quotation marks that are intended to be part of the string literal
    itself:

    "I said, ""Can you hear me?"""

    Some languages also use triple quoting, which originated in Python :

    This is John's apple.


    top

    Multiple quoting style
    In contrast to dual quoting style, multiple quoting style is an even more
    flexible notation for avoiding delimiter collision.

    For example in Perl:

    qq^I said, "Can you hear me?"^

    qq@I said, "Can you hear me?"@

    qq§I said, "Can you hear me?"§


    all produce the desired result through use of the quotelike operator, which
    allows numerous different characters to act as delimiters for string literals.
    Although this notation is more flexible, few languages support it. Perl
    and Ruby are two that do.

    top

    Here documents

    A Here document is an alternate quoting notation that allows the programmer
    to specify an arbitrary unique identifier as a content boundary for a string literal.
    This avoids delimiter collision, and also preserves newlines in the source code
    as newlines in the string literal itself.

    top

    Metacharacters

    Many languages support the use of metacharacters inside string literals. Metacharacters
    have varying interpretations depending on the context and language, but are generally a kind
    of 'processing command' for representing printing or nonprinting characters.

    For instance, in a C string literal, if the backslash is followed
    by a letter such as "b", "n" or "t", then this represents a nonprinting backspace, newline
    or tab character respectively. Or if the backslash is followed by 3 octal digits,
    then this sequence is interpreted as representing the arbitrary character with the specified
    ASCII code. This was later extended to allow more modern hexadecimal character code notation:

    "I said, 4Can you hear me?4
    "



    top

    Raw strings
    A few languages follow a convention where a leading character marks a string as being "raw":
    r"The Windows path is C:FooBarBaz"

    Other languages follow a the same convention using alternate quoting delimiters:
    CDATA The Windows path is C:FooBarBaz >

    or:
    q'The Windows path is C:FooBarBaz';

    A raw string is simply one in which none of the characters are interpreted as metacharacters,
    and no special interpretation or processing is applied in representing the string literal.



    top

    Variable interpolation
    Languages differ on whether and how to interpret string literals as either
    'raw' or 'variable interpolated'. Variable interpolation is the process
    of evaluating an expression containing one or more variables, and returning
    output where the variables are replaced with their corresponding values in
    memory.

    For example, the following Perl code:

    $sName = "Nancy";
    $sGreet = "Hello World";
    print "$sName said $sGreet to the crowd of people.";

    produces the output:

    Nancy said Hello World to the crowd of people.

    The sigil character ($) is interpreted to indicate variable
    interpolation.

    Similarly, the printf function produces the same output
    using notation such as:

    printf "%s said %s to the crowd of people.", ($sName,$sGreet);

    The metacharacters (%s) indicate variable interpolation.

    This is contrasted with "raw" strings:

    print r'$sName said $sGreet to the crowd of people.';

    which produce output like:

    $sName said $sGreet to the crowd of people.

    The ($) characters are not sigils, and are not interpreted to have any
    meaning other than plain text.

    top

    Embedding source code in string literals

    Languages that lack flexibility in specifying string literals make
    it particularly cumbersome to write programming code that generates
    other programming code. This is particularly true when the generation
    language is the same or similar to the output language.

    for example:
      writing code to produce quines
      using XSLT to generate XSLT, or SQL to generate more SQL
      generating a PostScript representation of a document for printing purposes, from within a document-processing application written in C or some other language.

    Nevertheless, some languages are particularly well-adapted to produce
    this sort of self-similar output, especially those that support multiple options
    for avoiding delimiter collision.

    Apart from the mechanics of specifiying string
    literals, however, one must consider security implications of code that generates
    other code, especially if the output is based at least partially on untrusted
    user input. This is potentially a serious security weakness.
    This is particularly acute in the case of Web-based applications, where malicious users can take advantage of such weaknesses to subvert the operation of the application, for example by mounting an SQL injection attack.
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "String literal". link