Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    Vector space model (or term vector model) is an algebraic model used for information filtering, information retrieval, indexing and relevancy rankings. It represents natural language documents in a formal manner by the use of vectors in a multi-dimensional space based on an origin. It was used for the first time by the SMART Information Retrieval System.

    Vectors consist of lengths (magnitudes) and direction (angles). (See vectors). The basic theory proposed in the vector space model was to view the term (or keyword) query (Q) and the documents (D(n)) in which the terms were found as vectors. Relevancy rankings of documents in keyword search could be calculated using the vector theory model based on how large a deviation the angles (based on the cosine of these angles) of each document vector were in relation to the original query vector based on the scalar product between the query vector and the document vector and the assumptions of the document similarities theory. Thus a cosine value of zero meant that the query and document vector were orthogonal to each other and meant that there was no match or the term simply did not exist in the document being considered.


    The classic vector space model as proposed by Salton, Wong and Yang had both local and global parameters incorporated in the term weight (w(n)) equation (known as the tf-idf):

    w(n) = f(n) x Log (D / d(n))

    where:
      w(n) is the term weight for keyword search n,
      f(n) is the frequency in which the term n occurred in the document (representing the local parameter),
      d(n) is the number of documents containing the term n, and,
      D is the total number of documents in the set.

    Note that the quotient, d(n)/D, is essentially the probability of finding the document containing the term n, in the document set being used and represents the global parameter (compare with term count model below which only considered local parameters.




        Vector space model
            Assumptions and Limitations of The Vector Space Model
            Comparison with The Term Count Model
            Models based on and extending the vector space model
            Further reading
            See also

    top

    Assumptions and Limitations of The Vector Space Model

    The Vector Space Model has the following limitations:

      Long documents are considered poor representatives of the Vector Space Model because they had poor similarity values (a small scalar product and a large dimensionality)
      Documents with similar context but different term vocabulary ("False negative match")
      The search keywords were being typed during the search in an inappropriate manner giving poorer results e.g. key + ing, para + meter ("False positive match")
      Semantic limitation

    top

    Comparison with The Term Count Model

    The alternative Term Count Model, an earlier model, only considered local parameters and did not account for global parameters. See the separate section on Term Count Model in Wikipedia.

    top

    Models based on and extending the vector space model

    Models based on and extending the vector space model include:
      Generalized vector space model
      Topic-based vector space model (TVSM) — Extends the vector space model by removing the constraint that the term-vectors be orthogonal. In contrast to the generalized vector space model the topic-based vector space model does not depend on concurrence-based similarities between terms.




    top

    Further reading


    top

    See also







     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    MIT OpenCourseWare
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Vector space model". link