Forecasting Time Series with Auto-Arima

In this article, I attempt to compare the results of the auto arima function with the ARIMA model we developed in the article Forecasting Time Series with ARIMA (https://www.alldatascience.com/time-series/forecasting-time-series-with-arima/). I made this attempt to see how it works and what the differences are.The parameters selected by auto-arima are slightly different than the ones selected by me in the other article.Auto arima has the advantage of attempting to find the best ARIMA parameters by comparing the

Continue reading

Forecasting time series with ARIMA

In this post, I’ll attempt to show how to forecast time series data using ARIMA (autoregressive integrated moving average). As usual, I try to practice with “real-world”, which can be obtained easily by downloading open data from government websites. I chose the unemployment rate in the European Union’s 27 member countries. The data were obtained from the OECD data portal (http://dataportal.oecd.org/). First of all, I’m going to try to clean up the data, in this

Continue reading

Dealing with a real-world imbalanced Dataset.

Predicting NO2 levels in Madrid While looking for data to develop my data science skills, I came up with the idea of searching open data portals. I wanted to look at actual datasets and find out what they were like. For this purpose, I chose open data from the Madrid Open Data Portal (https://datos.madrid.es/portal/site/egob). I will try to predict NO2 concentration using weather and traffic data. This is not meant to be a definitive prediction

Continue reading

Deep Learning: COVID-19 detection in X-Ray with CNN

In this project we develop a Deep Learning detector of Covid-19 in radiographs. For this purpose, we use images from the “Covid-chestxray-dataset” [3], generated by researchers from the Mila research group and the University of Montreal [4]. We also use images of radiographs of healthy and bacterial pneumonia patients extracted from Kaggle’s “Chest X-Ray Images (Pneumonia)” competition [5]. In total, we have a number of 426 images, divided into training (339 images), validation (42 images)

Continue reading

NLP: Opinion classification

Let’s perform some classification methods on the same tripadvisor data as in the post https://www.alldatascience.com/nlp/nlp-target-and-aspect-detection-with-python. In this case we are going to read and preprocess the data again, then we are going to vectorize it in different ways, 1. With TF-IDF vectorizer that creates vectors having into account the frequency of words in a document and the frequency of words in all documents, decreasing weight of the words that appear too often (they can bee

Continue reading

Data Mining in R

This post describes an analysis performed on an online news dataset. Data cleaning, data transformation, and dimensinality reduction are performed. Next, we try some supervised and unsupervised models such as decision trees, clustering and logistic models to check their accuracy on the prediction of the popularity of the news.

Continue reading

NLP: Sentiment Analysis with Pytorch.

In this work we build a sentiment analysis model based on a BERT-GRU model on tripadvisor data, in order to try to predict if an opinion is positive or negative. BERT (Bidirectional Encoder Representations from Transformers) is a pretrained model based on transformers that has into account the context of the words. GRU layer is used instead of LSTM in this case.

Continue reading

NLP: Target and aspect detection with Python.

In this post we perform target and aspect detection on a dataset about tripadvisor opinions. Target or topic are the words or topics the opinions are about. Aspects are parts or features of the target. Here we explore the target detection using word embeddings (Word2Vec) which extracts similar words by context and try to extract aspects of the target by searching close words wusing the WordNet synsets. First, we perform data preprocessing by removing stopwords

Continue reading