Binary_multinomial_naive_bayes

Binary multinomial NB theorem applied from scratch for sentiment analysis . This is the original datalore notebook where i made the project . I exported the .ipynb for this project.

Naive Bayes Classification

This is a bayesian Classifier which makes a simplifying (naive) assumption about how the features interact.

We represent the sentences as a bag_of_words. A bag of words is basically an unordered set of words where the position does not matter but we keep track of the frequency of each word.

Now our main task is to find the class given a document . Here the class represents the sentiment.

C_nb=argmax_c∈C P(c|d)

Here we are using Bayes Theorem to predict the class. Now according to Bayes theorem :

P(x|y)=P(y|x)P(x)/P(y)

applying this in the Naive Bias eqt:

C_nb=argmax_c∈CP(d|c)P(c)/P(d)

Now since the P(d) will remain constant throughout the process we can drop that from the equation.

so the equation turns into:

C_nb=argmax_c∈CP(d|c)P(c)

Here the P(c) is called the prior probability while P(d|c) is called the likelihood probability

Now to analyse the document we concentrate onto words (w_i)

We assume two importanat things:

The bag of words concept. The position of the word does not matter in the sentence.
The probability of each factor(here words) is independent of each other so: P(w₁,w₂,.......|c)= P(w₁|c)* P(w₂|c)....

So finally we get:

C_nb=argmax_c∈CP(c)Π_{(i∈positions)} P(w_i|c)

where P(wi/C)=count(wi,c)+a/x+V* a , where a is the laplace smoothing function, x is the length of the class c and V is the total vocabulary(total length of the bag_of_words)

Heres an example:

This was Multinomial Naive Bayes....

We are going to implement the Binary Multinomial Naive Bayes for sentiment analysis which differs with Multinomial in the respect that in binary NB the repition of the words in each documents are not counted. The presence of specific words matters more than the frequency .

For example:

Resources:

I used this resource for learning about Naive Bayes . All the pictures used here are also from the same resource.
The dataset used is included in the repo. I used the randomized version of the same for training and testing the NB classifier. The data was obtained from here

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
Screenshot from 2021-02-07 22-44-23.png		Screenshot from 2021-02-07 22-44-23.png
Screenshot from 2021-02-07 22-44-59.png		Screenshot from 2021-02-07 22-44-59.png
Screenshot from 2021-02-07 22-45-22.png		Screenshot from 2021-02-07 22-45-22.png
Sentiment_analysis_using_NB.ipynb		Sentiment_analysis_using_NB.ipynb
Umich_sentiment_analysis_randomized.txt		Umich_sentiment_analysis_randomized.txt
umich-sentiment-train.txt		umich-sentiment-train.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Binary_multinomial_naive_bayes

Naive Bayes Classification

Resources:

About

Releases

Packages

Languages

19-ade/Binary_multinomial_naive_bayes

Folders and files

Latest commit

History

Repository files navigation

Binary_multinomial_naive_bayes

Naive Bayes Classification

Resources:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages