In this post we show a data visualization made with Tableu with data from Peace Agreements Database and World Bank DataBank . The visualization tries to explain how the different collectives are impacted when they are included in peace agreements world wide and show some examples on how these measures can improve people’s lives in someway.
Seguir leyendoNLP: Opinion classification
Let’s perform some classification methods on the same tripadvisor data as in the post https://www.alldatascience.com/nlp/nlp-target-and-aspect-detection-with-python. In this case we are going to read and preprocess the data again, then we are going to vectorize it in different ways, 1. With TF-IDF vectorizer that creates vectors having into account the frequency of words in a document and the frequency of words in all documents, decreasing weight of the words that appear too often (they can bee
Seguir leyendoAggregation methods in R.
In this post we use three clustering methods (kmeans, hierarchical clustering and model based clustering) to evaluate their accuracy. We see how to select the optimal number of clusters in each method and obtain metrics to select the best of them.
Seguir leyendoData Mining in R
This post describes an analysis performed on an online news dataset. Data cleaning, data transformation, and dimensinality reduction are performed. Next, we try some supervised and unsupervised models such as decision trees, clustering and logistic models to check their accuracy on the prediction of the popularity of the news.
Seguir leyendoNLP: Sentiment Analysis with Pytorch.
In this work we build a sentiment analysis model based on a BERT-GRU model on tripadvisor data, in order to try to predict if an opinion is positive or negative. BERT (Bidirectional Encoder Representations from Transformers) is a pretrained model based on transformers that has into account the context of the words. GRU layer is used instead of LSTM in this case.
Seguir leyendoNLP: Target and aspect detection with Python.
In this post we perform target and aspect detection on a dataset about tripadvisor opinions. Target or topic are the words or topics the opinions are about. Aspects are parts or features of the target. Here we explore the target detection using word embeddings (Word2Vec) which extracts similar words by context and try to extract aspects of the target by searching close words wusing the WordNet synsets. First, we perform data preprocessing by removing stopwords
Seguir leyendoWine dataset analysis with Python
In this post we explore the wine dataset. First, we perform descriptive and exploratory data analysis. Next, we run dimensionality reduction with PCA and TSNE algorithms in order to check their functionality. Finally a random forest classifier is implemented, comparing different parameter values in order to check how the impact on the classifier results.
Seguir leyendoAnalysis of Variance (ANOVA) with R
In this post we are going to perform an analysis of variance (ANOVA) with R in order to analyze the influences of different variables such as race, education level or job class in the wage. The data is the same as in the post Descriptive Analysis with R, so you can visit that post in order to get more detail about the data used. Let’s start the analyis. Discussion By means of ANOVA we have
Seguir leyendoLinear and logistic regression with R
This post is an analysis that applies linear and logistic regression on provided data with some health parameters of 2353 patiens who suffered surgeries. We try to discover the relation among some of the parameters and predict the probability of suffering an infection during the surgery. Discussion According to the results obtained, we can see that when studied separately, all the variables have an influence on the probability of suffering a post-surgical infection (diabetes, malnutrition,
Seguir leyendoInferential analysis in R
This post continues the analysis of the Mid-Atlantic Wage dataset by performing some inferential statistics with language R. The data used is the same as in the post about Descriptive Analysis with R. We start reading the dataset and performing the transformations described in that article:
Seguir leyendo