Projects

TMDb

TMDB Analysis

This project explores The Movie Database. In this data analysis, we will be looking at information about 10K movies from the Movie Database (TMDb). We are looking at which genres were most popular from year to year and exploring the relationship between the popularity of a film and it's vote average score.
We will be going through data wrangling and exploratory data analysis.
We will answer 2 research questions:
1. What genres were most popular over the years?
2. Does the Popularity of a movie correlate with the Vote Score Average?

A/B Test

Website Conversion Optimization: A/B Test

This project is an analysis using Python done on Jupyter notebooks.
I will be working to understand the results of an A/B test run by an e-commerce website. My goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.
There are several parts to this project:

I- Probability
II- A/B Test
III- A Regression Approach
IV- Results of Finding

WeRateDogs

@WeRateDogs Twitter API: Sentiment Analysis

This project was part of the data wrangling section of the Udacity Data Analyst Nanodegree program and is primarily focused on wrangling data from the WeRateDogs Twitter account using Python, documented in a Jupyter Notebook (wrangle_act.ipynb). This Twitter account rates dogs with humorous commentary. The rating denominator is usually 10, however, the numerators are usually greater than 10. This aspect was not cleaned as it is part of the humor and popularity of WeRateDogs.

For this project, we only wanted original ratings (no retweets) that have images. Not all of the original tweets in the dataset are dog ratings and some are retweets. Fully assessing and cleaning the entire dataset would require exceptional effort so only a subset of its issues (eight quality issues and two tidiness issues at minimum) needed to be assessed and cleaned.
The tasks for this project were:

- Data wrangling, which consisted of:
- Gathering data
- Assessing data
- Cleaning data
- Storing, analyzing, and visualizing the wrangled data
- Reporting on my data analyses and visualizations (act_report.pdf)

FordGoBikes

FordGo Bikes Data Analsis

Dataset:
Ford GoBike Data from 2018/01 through 2019/07. Dataset was downloaded from: https://www.fordgobike.com/system-data. This data encompasses bike ride start and end date, station information and location, member type of riders, gender, and age.


Focus of Analysis:
- Time of ride duration (duration_min)
- Type (user_type)
- Gender (member_gender)
- Age (member_age)

Process:
1. Univariate Exploration
We will look at the bike ride trends in terms of
age groups (member_age_bins)
genders (member_gender)
weekday (st_weekday, et_weekday)
and hours of the day (st_hour, et_hour)

2. Bivariate Exploration
We will look at the bike ride trends in terms of
user_type and member_age_bins
user_type and duration_min

3. Multivariate Exploration
a. We need to make new DFs for -
different age groups of subscribers
b. create 2 new columns
percentage column
rank column
c. create pivot for our visual which will display-
st_hour
weekday
rank
d. create our visuals: 4 heatmaps, each for the age groups we have (created in step a.)