\item Datasets : \insertcontinuationtext
\item $\text{{Multi Domain Sentiment}}^\text{{[5]}}$ Dataset contains product reviews taken from Amazon.com.
\item BOOKS : 1000 Positive reviews and 1000 Negative reviews.
\item $\text{{Movie Reviews}}^\text{{[6]}}$ : \\
\item All html files we collected from the $\text{IMDb archive}^{[7]}$. \\
\item 770 Positive reviews and 703 Negative reviews. \\
\item Preprocessing :
\item \textcolor{violet}{Upper to lower conversion}: All reviews are converted to lower case.
\item \textcolor{violet}{Normalization} : All word with apostrophies should be replace with its orginal form.
\\ eg :$\textcolor{red!=70}{\text{don't} \rightarrow \text{do not}}$
\item \textcolor{violet}{Non ASCII removal} : All non ASCII characters are removed from the reviews. \\ eg :$\textcolor{red!=70}{\bigstar \spadesuit \clubsuit \blacklozenge}$
\item \textcolor{violet}{Remove new lines} : Blank lines are removed from the reviews.
\item \textcolor{violet}{Stopword removal } : Stopwords in English language are \\ \textcolor{red}{an,are,the,a} etc.To remove all such words we are using Natural Language Toolkit$\text{(NLTK)}^{[8]}$. \\
\item \textcolor{violet}{Stemming} : A processing of interface for removing morphological affixes from words. eg:$\text{\textcolor{red!=70}{beauty,beautiness,beautiful}}\Rightarrow \text{\textcolor{red!=70}{beauti}} $
\item Dataset Partitioning :
\item {\scriptsize {\textbf{MOVIES}}}:
\caption{\scriptsize {Fig 1:Dataset Partitioning of MOVIE reviews}}
\item \scriptsize {\textbf{BOOKS}}:
\caption{\scriptsize {Fig 2: Dataset Partitioning of BOOK reviews}}
\item Feature Selection:
\item Mutual Infromation :
Selects features that are not uniformly distributed among the classes.
\textit{F} depicts the presence of feature \textit{F} \\
\textit{$\bar{F}$} is the absence of feature \textit{F} \\
\textit{$C_{k}$} is the Positive class \\
\textit{$\bar{C_{k}}$} represents Negative class \\
\textit{N} depicts Total samples
