Navigation
  • Home
  • Recent
  • Most Active
  • Popular
  • Blog
  • Credits
  • RSS
  •   Interaction
  • Register
  • Statistics
  •   Help
  • Suggestions
  • Contact Us
  • How to Edit
  • Help



  • [Edit]


    Preprocessing is the act of processing data before it is parsed. There are numerous situations where it makes sense to do the parsing in several stages. One is where humans are the parsers, and another is in the context of computer programming. Preprocessing is also used in data mining and neural network software where it refers to the stage before a model is constructed, when data is analyzed and transformed and filtered.

        Preprocessing
            Computer programming
            Data mining
                Variable selection
                Sample selection
                Variable construction
                Variable transformation
            See also

    top

    Computer programming

    As the name suggests, preprocessing is performed by a preprocessor. The preprocessor modifies the data according to
    preprocessing directives that are usually placed in the input data itself.
    For instance, in the C programming language, where preprocessing directives are marked with a '
      ' at the beginning of the line, the preprocessor can implement macros, be used for including external files at different points in the file or to select blocks of code are to be sent to the compiler. The criteria can be several things, such as processor type (e.g. to resolve integer representation problems), availability of function calls (so you can provide a work-around if one is missing) and user preferences.

    When preprocessors support macros, calls to macro functions within the code will expand to the whole implementation of the macro before it is sent to the compiler. This can be quite useful where speed is more important than the size of the binary code, and when you need expressions that expand to more than just a function; for instance a case-block.

    Preprocessing is very useful to solve portability issues: depending on the target platform (that is specified to the
    preprocessor by some command-line argument) the application will contain specific code. For instance, when compiled for
    Linux, the application would read its configuation options from a file named .conf, whereas when compiled for
    Microsoft Windows, it would read the configuration from the registry.

    top

    Data mining
    In data mining preprocessing refers to data preparation where you select variables samples, construct new variables and transform existing ones.

    top

    Variable selection
    Ideally, you would take all the variables/features you have and use them as inputs. In practice, this doesn’t work very well. One reason is that the time it takes to build a model increases with the number of variables. Another reason is that blindly including columns can lead to incorrect models. Often your knowledge of the problem domain can let you make many of these selections correctly. For example, including ID numbers as predictor variables will at best have no benefit and at worst may reduce the weight of other important variables.

    top

    Sample selection
    As in the case of the variables, ideally you would want to make use of all the samples you have. In practice however samples may have to be removed to produce optimal results. You usually want to throw away data that are clearly outliers. In some cases you might want to include them in the model, but often they can be ignored based on your understanding of the problem. In other cases you may want to emphasize certain aspects of your data by duplicating samples.

    top

    Variable construction
    It is often desirable to create additional inputs based on raw data. For instance forecasting demographics using a GDP per capita ratio rather than just GDP and capita can yield better results. While in theory many adaptive systems can handle this autonomously, in practice helping the system out by incorporating external knowledge can make a big difference.

    top

    Variable transformation
    Often data needs to be transformed in one way or another before it can be put to use in a system. Typical transformations are scaling and normalization which puts the variables in a restricted range – a necessity for efficient and precise numerical computation.
    ----

    top

    See also
      YALE (Yet Another Learning Environment): free open-source software for knowledge discovery, data mining, and machine learning providing many preprocessing operators including operators for feature (variable) construction, transformation, and selection, for text document, web page, audio data, and time series data preprocessing, and many other preprocessing tasks




     
    Search more:
     

       
    Source Privacy License Download Contact Us Atlas
    Scientus.org Dictionary (Yet Another Wiki) RC : 1.39
    This article is licensed under the GNU Free Documentation License [copyleft]. It uses material from the Wikipedia article "Preprocessing". link