Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    Automatic summarization is the creation of a shortened version of a text by a computer program. The product of this procedure still contains the most important points of the original text.
    The phenomenon of information overload has meant that access to coherent and correctly-developed summaries is vital. As access to data has increased so has interest in automatic summarization. An example of the use of summarization technology is search engines such as Google.

    Technologies that can make a coherent summary, of any kind of text, need to take into account several variables such as length, writing-style and syntax to make a useful summary.


        Automatic summarization
            Extraction and abstraction
            Types of summaries
            Aided summarization
            Evaluation
            Further reading
            See also

    top

    Extraction and abstraction
    Broadly, one distinguishes two approaches: extraction and abstraction.

    Extraction techniques merely copy the information deemed most important by the system to the summary (for example, key clauses, sentences or paragraphs), while abstraction involves paraphrasing sections of the source document. In general, abstraction can condense a text more strongly than extraction, but the programs that can do this are harder to develop as they require the use of natural language generation technology, which itself is a growing field.

    top

    Types of summaries

    There are different types of summaries depending what the summarization program focuses on to make the summary of the text, for example generic summaries or query relevant summaries (sometimes called query-biased summaries).

    Summarization systems are able to create both query relevant text summaries and generic machine-generated summaries depending on what the user needs. Summarization of multimedia documents, e.g. pictures or movies are also possible.

    Some systems will generate a summary based on a single source document, while others can use multiple source documents (for example, a cluster of news stories on the same topic). These systems are known as multi-document summarization systems.


    top

    Aided summarization
    Machine learning techniques from closely related fields such as information retrieval or text mining have been successfully adapted to help automatic summarization.

    Apart from Fully Automated Summarizers (FAS), there are systems that aid users with the task of summarization (MAHS = Machine Aided Human Summarization), for example by highlighting candidate passages to be included in the summary, and there are systems that depend on post-processing by a human (HAMS = Human Aided Machine Summarization).

    top

    Evaluation
    An ongoing issue in this field is that of evaluation. Human judgement often has wide variance on what is considered a "good" summary, which means that making the evaluation process automatic is particularly difficult. Manual evaluation can be used, but this is both time and labor intensive as it requires humans to read not only the summaries but also the source documents. Other issues are those concerning coherence and coverage.

    One metric used in NIST's annual Document Understanding Conferences, in which research groups submit their systems for both summarization and translation tasks, is the ROUGE metric (Recall-Oriented Understudy for Gisting Evaluation *). It essentially calculates n-gram overlaps between automatically generated summaries and previously-written human summaries. A high level of overlap should indicate a high level of shared concepts between the two summaries. Note that overlap metrics like this are unable to provide any feedback on a summary's coherence. Anaphor resolution remains another problem yet to be fully solved.

    top

    Further reading
      Endres-Niggemeyer, Brigitte (1998): Summarizing Information (ISBN 3-540-63735-4)
      Marcu, Daniel (2000): The Theory and Practice of Discourse Parsing and Summarization (ISBN 0-262-13372-5)
      Mani, Inderjeet (2001): Automatic Summarization (ISBN 1-58811-060-5)

    top

    See also
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Automatic summarization". link