NLP: Opinion classification

Let’s perform some classification methods on the same tripadvisor data as in the post https://www.alldatascience.com/nlp/nlp-target-and-aspect-detection-with-python. In this case we are going to read and preprocess the data again, then we are going to vectorize it in different ways, 1. With TF-IDF vectorizer that creates vectors having into account the frequency of words in a document and the frequency of words in all documents, decreasing weight of the words that appear too often (they can bee

Continue reading

NLP: Target and aspect detection with Python.

In this post we perform target and aspect detection on a dataset about tripadvisor opinions. Target or topic are the words or topics the opinions are about. Aspects are parts or features of the target. Here we explore the target detection using word embeddings (Word2Vec) which extracts similar words by context and try to extract aspects of the target by searching close words wusing the WordNet synsets. First, we perform data preprocessing by removing stopwords

Continue reading

Wine dataset analysis with Python

In this post we explore the wine dataset. First, we perform descriptive and exploratory data analysis. Next, we run dimensionality reduction with PCA and TSNE algorithms in order to check their functionality. Finally a random forest classifier is implemented, comparing different parameter values in order to check how the impact on the classifier results.

Continue reading