opt2bin {rWMBAT}R Documentation

Finds The Best Single Boundary For Each Variable To Maximize MI

Description

This function takes an array of continuous data, with cases in rows and variables in columns, along with a vector "class" which holds the known class of each of the cases, and returns an array "binneddata" that holds the 2 bin discretized data.

Usage

opt2bin(rawdata, class, steps, typesearch, minint = NA, maxint = NA)

Arguments

rawdata double array of continuous values, cases in rows and variables in columns. Distribution is unknown
class double column vector, values 1:c representing classification of each case
steps integer, number of steps to test at while finding maximum MI
typesearch =0 starting bndry based on data's actual max/min values =1 use the value passed in max as maximum (right) value =-1 use the value passed in min as minimum (left) value =2 used values passed via max, min
minint vectors whose values limit the range of search for each variables boundaries
maxint vectors whose values limit the range of search for each variables boundaries

Details

The discretization bin boundary is found by maximizing the mutual information with the class the resulting MI and boundary are also returned. The starting boundaries for the search can be given in the vectors min and max, or either one, or neither, in which case the data values determine the search boundaries.

Value

mi row vector holding the maximum values of MI(CVi) found
boundary double vector, the location used to bin the data to get max MI
binneddata resulting data binned into "1" (low) or "2" (hi)

Author(s)

Karl Kuschner, Qian Si and William Cooke, College of William and Mary, Dept. of Physics, 2009

References

http://kwkusc.people.wm.edu/dissertation/dissertation.htm

Examples

data(traingrp,traingrpclass) #load example input data from package
result <- opt2bin(traingrp,traingrpclass,150,2)
         

[Package rWMBAT version 2.0 Index]