Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    Warning



    Information in this article may be formatted incorrectly. We seek an author to help us correct this error.


    Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and database of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning.

        Cyc
            Overview
            Description of the Knowledge Base, terminology
            OpenCyc
            ResearchCyc
            Criticisms of the Cyc Project
            See also

    top

    Overview

    The project was started in 1984 by Doug Lenat as part of Microelectronics and Computer Technology Corporation. The name "Cyc" (from "encyclopedia", pronounced like psych) is a registered trademark owned by Cycorp, Inc. in Austin, Texas, a company run by Lenat and devoted to the development of Cyc. The original knowledge base is proprietary, but a smaller version of the knowledge base, intended to establish a common vocabulary for automatic reasoning, was released as OpenCyc under an open source license. More recently, Cyc has been made available to AI researchers under a research-purposes license as ResearchCyc.

    Typical pieces of knowledge represented in the database are "Every tree is a plant" and "Plants die eventually". When asked whether trees die, the inference engine can draw the obvious conclusion and answer the question correctly. The Knowledge Base (KB) contains over a million human-defined assertions, rules or common sense ideas. These are formulated in the language CycL, which is based on predicate calculus and has a syntax similar to that of the Lisp programming language. CycL users pun that they are "cyclists".

    Much of the current work on the Cyc project continues to be knowledge engineering, representing facts about the world by hand, and implementing efficient inference mechanisms on that knowledge. Increasingly, however, work at Cycorp involves giving the Cyc system the ability to communicate with end users in natural language, and to assist with the knowledge formation process via machine learning.

    top

    Description of the Knowledge Base, terminology
    The concept names in Cyc are known as ''constants''. Constants start with an optional "#$" and are case-sensitive. There are constants for:
    * Individual items known as ''individuals'', such as #$BillClinton or #$France.
    * ''Collections'', such as #$Tree-ThePlant (containing all trees) or #$EquivalenceRelation (containing all [[equivalence relation]]s). A member of a collection is called an ''instance'' of that collection.
    * ''Truth Functions'' which can be applied to one or more other concepts and return either true or false. For example #$siblings is the sibling relationship, true if the two arguments are siblings. By convention, truth function constants start with a lower-case letter. Truth functions may be broken down into logical connectives (such as #$and, #$or, #$not, #$implies), quantifiers (#$forAll, #$thereExists, etc.) and [[predicate]]s.
    * ''Functions'', which produce new terms from given ones. For example, #$FruitFn, when provided with an argument describing a type (or collection) of plants, will return the collection of its fruits. By convention, function constants start with an upper-case letter and end with the string "Fn".
    The most important predicates are #$isa and #$genls. The first one describes that one item is an [[instance]] of some collection, the second one that one collection is a subcollection of another one. Facts about concepts are asserted using certain CycL ''sentences''. Predicates are written before their arguments, in parentheses:
    (#$isa #$BillClinton #$UnitedStatesPresident)
    "Bill Clinton belongs to the collection of U.S. presidents" and
    (#$genls #$Tree-ThePlant #$Plant)
    "All trees are plants".
    (#$capitalCity #$France #$Paris)
    "Paris is the capital of France."
    Sentences can also contain variables, strings starting with "?". These sentences are called "rules". One important rule asserted about the #$isa predicate reads
    (#$implies
    (#$and
    (#$isa ?OBJ ?SUBSET)
    (#$genls ?SUBSET ?SUPERSET))
    (#$isa ?OBJ ?SUPERSET))
    with the interpretation "if OBJ is an instance of the collection [[subset|SUBSET]] and SUBSET is a subcollection of [[superset|SUPERSET]], then OBJ is an instance of the collection SUPERSET". Another typical example is
    (#$relationAllExists #$biologicalMother #$ChordataPhylum #$FemaleAnimal)
    which means that for every instance of the collection #$ChordataPhylum (i.e. for every [[chordate]]), there exists a female animal (instance of #$FemaleAnimal) which is its mother (described by the predicate #$biologicalMother).
    The [[knowledge base]] is divided into ''microtheories'' (Mt), collections of concepts and facts typically pertaining to one particular realm of knowledge. Unlike the knowledge base as a whole, each microtheory is required to be free from contradictions. Each microtheory has a name which is a regular constant; microtheory constants contain the string "Mt" by convention. An example is #$MathMt, the microtheory containing mathematical knowledge. The microtheories can inherit from each other and are organized in a hierarchy:
    one specialization of #$MathMt is #$GeometryGMt, the microtheory about geometry.


    top

    OpenCyc

    The latest version of OpenCyc, 1.0, was released in July 2006. OpenCyc 1.0 includes the entire Cyc ontology containing hundreds of thousands of terms, along with millions of assertions relating the terms to each other. The knowledge base contains 47,000 concepts and 306,000 facts and can be browsed on the OpenCyc website. The first version of OpenCyc was released in May 2001 and contained only 6,000 concepts and 60,000 facts. The knowledge base is released under the Apache License. Cycorp has stated its intention to release OpenCyc under parallel, unrestricted licences to meet the needs of its users. The CycL and SubL interpreter (the program that allows you to browse and edit the database as well as to draw inferences) is released free of charge, but only as a binary, without source code. It is available for GNU/Linux and for Windows.

    top

    ResearchCyc

    In July 2006, Cycorp released ResearchCyc 1.0, a gratis (but not open source) version of Cyc aimed at the research community. (ResearchCyc was in beta stage of development during all of 2004; a beta version was released in February 2005.) In addition to the taxonomic information contained in OpenCyc, ResearchCyc includes significantly more semantic knowledge (i.e., additional facts) about the concepts in its knowledge base, and includes a large lexicon, English parsing and generation tools, and Java based interfaces for knowledge editing and querying.

    Cycorp has publicly stated its intention of releasing all of the terms and taxonomic relationships contained in ResearchCyc as part of OpenCyc and this was achieved with the release of OpenCyc 1.0. One stated goal is that of providing a completely free and unrestricted semantic vocabulary for use in the Semantic Web. The OpenCyc taxonomy is available in Owl on the OpenCyc web site.

    top

    Criticisms of the Cyc Project

    The Cyc project has been described as "one of the most controversial endeavours of the artificial intelligence history" (Bertino et al, p. 275), so it has inevitably garnered its share of criticism. Criticisms involve:

      The complexity of the system - arguably necessitated by its encyclopædic ambitions - and the consequent difficulty in adding to the system by hand
      Scalability problems from widespread reification, especially as constants
      Unsatisfactory treatment of the concept of substance and the related distinction between intrinsic and extrinsic properties
      The lack of any meaningful benchmark or comparison for the efficiency of Cyc's inference engine
      The current incompleteness of the system in both breadth and depth and the related difficulty in measuring its completeness
      Limited documentation
      The lack of up-to-date on-line training material makes it difficult for new people to learn the systems

    These issues have been debated in various places since the inception of the project; Doug Lenat and others have published many arguments in its defense.

    top

    See also
     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Cyc". link