IMDB movie reviews - sentiment analysis
This project is about sentiment analysis (text mining field), you can expect:
- data preparation (nulls and balance check, regex cleaning, label decoding, review length constraint, tokenization, vocabulary building)
- working with pretrained word embeddings (GloVe)
- data analysis (embedding coverage, detailed cleaning with regex, stop words, word clouds and count plots)
- modelling (CNNs, RNNs, VADER) with Cross Validation and error analysis (confusion matrix, misclassification examples analysis)
- experiments (review length and GloVe dimension sensitivity)
- 86% test set accuracy
- Jupyter Notebook report in English (a lot of Python code)