Effect of discretization, continuization, Normalization, Randomization on the data

Pooja Lo
2 min readSep 13, 2021

Using Orange I learned about discretization, continuization, Normalization, Randomization

Discretization

Discretization replaces continuous features with the corresponding categorical features:

import Orange
iris = Orange.data.Table("titanic.tab")
disc = Orange.preprocess.Discretize()
disc.method = Orange.preprocess.discretize.EqualFreq(n=4)
d_iris = disc(iris)
print("Original dataset:")
for e in iris[:4]:
print(e)
print("Discretized dataset:")
for e in d_iris[:4]:
print(e)

Continuization

Given a data table, return a new table in which the discretize attributes are replaced with continuous or removed.

  • binary variables are transformed into 0.0/1.0 or -1.0/1.0 indicator variables, depending upon the argument zero_based.
  • multinomial variables are treated according to the argument multinomial_treatment.
  • discrete attributes with only one possible value is removed;
import Orange
titanic = Orange.data.Table("titanic.tab")
continuizer = Orange.preprocess.Continuize()
titanic1 = continuizer(titanic)
titanic.domain
titanic1.domain
print(titanic[15])
print(titanic1[15])
continuizer.multinomial_treatment = continuizer.FirstAsBase
continuizer(titanic).domain
continuizer.multinomial_treatment = continuizer.AsOrdinal
titanic1 = continuizer(titanic)
print(titanic[700])
print(titanic1[700])
continuizer.multinomial_treatment = continuizer.AsNormalizedOrdinal
titanic1 = continuizer(titanic)
print(titanic1[700])
print(titanic1[15])

Normalization

Normalization is a technique often applied as part of data preparation for machine learning. The goal of normalization is to change the values of numeric columns in the dataset to a common scale, without distorting differences in the ranges of values. For machine learning, every dataset does not require normalization.

from Orange.data import Table
from Orange.preprocess import Normalize
data = Table("iris")
normalizer = Normalize(norm_type=Normalize.NormalizeBySpan)
normalized_data = normalizer(data)
print(normalized_data)

Randomization

Randomization in an experiment is where you choose your experimental participants randomly. For example, you might use simple random sampling, where participants' names are drawn randomly from a pool where everyone has an even probability of being chosen.

from Orange.data import Table
from Orange.preprocess import Randomize
data = Table("iris")
randomizer = Randomize(Randomize.RandomizeClasses)
randomized_data = randomizer(data)
print(randomized_data)

Thank You.

--

--