R – Data Science Portfolio

febrero 12, 2021 Clustering

Aggregation methods in R.

In this post we use three clustering methods (kmeans, hierarchical clustering and model based clustering) to evaluate their accuracy. We see how to select the optimal number of clusters in each method and obtain metrics to select the best of them.

Seguir leyendo

febrero 10, 2021 Clustering

Data Mining in R

This post describes an analysis performed on an online news dataset. Data cleaning, data transformation, and dimensinality reduction are performed. Next, we try some supervised and unsupervised models such as decision trees, clustering and logistic models to check their accuracy on the prediction of the popularity of the news.

Seguir leyendo

enero 20, 2021 statistics

Analysis of Variance (ANOVA) with R

In this post we are going to perform an analysis of variance (ANOVA) with R in order to analyze the influences of different variables such as race, education level or job class in the wage. The data is the same as in the post Descriptive Analysis with R, so you can visit that post in order to get more detail about the data used. Let’s start the analyis. Discussion By means of ANOVA we have

Seguir leyendo

enero 18, 2021 statistics

Linear and logistic regression with R

This post is an analysis that applies linear and logistic regression on provided data with some health parameters of 2353 patiens who suffered surgeries. We try to discover the relation among some of the parameters and predict the probability of suffering an infection during the surgery. Discussion According to the results obtained, we can see that when studied separately, all the variables have an influence on the probability of suffering a post-surgical infection (diabetes, malnutrition,

Seguir leyendo

enero 16, 2021 statistics

Inferential analysis in R

This post continues the analysis of the Mid-Atlantic Wage dataset by performing some inferential statistics with language R. The data used is the same as in the post about Descriptive Analysis with R. We start reading the dataset and performing the transformations described in that article:

Seguir leyendo

enero 15, 2021 statistics

Descriptive analysis in R

This post shows an easy descriptive statistical analysis exercise of the Mid-Atlantic Wage Data showing some boxplots and checking for data normality. The dataset can be found here: https://github.com/selva86/datasets/blob/master/Wage.csv The fields in the data are the following: year: Year when the data was collected. maritl: marital status: 1. Never Married, 2. Married, 3. Widowed, 4. Divorced, and 5. Separated. age: worker’s age. race: 1. White, 2. Black, 3. Asian, and 4. Other. education: Education level:

Seguir leyendo