Data Analytics Training Courses — SigmaZone

Introduction to R

1 or 2 Days

The programming languages R and Python dominate in the world of Data Science. This class is targeted to the non-programmer who has a statistics background. You will learn how to install and configure R, read data into R, use R packages, write and debug R code, and profile R code. Statistical topics serve as the working examples as you get to know R. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.

Course Outline
- Introduction to R
- Control Structures (If, For, While)
- R Functions
- Scoping Rules
- Dates and Times
- Loop Functions
- Debugging
- Profiling
- Programming Exercise

Data Munging - Getting and Preparing Data

1 or 2 Days

While most people associate model building with Data Science, much of the time is spent getting the data and preparing it for analysis. This class covers the basics of getting data from the internet, various file formats, and databases. It will also cover the basics of how to clean up the data in preparation for statistical analysis. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.

Course Outline
- Reading Data From Files (CSV, Tab, Excel, XML, JSON)
- Obtaining Data from Databases (MySQL, SQL Server, AWS, Azure)
- Organizing Data using dplyr
- Date Manipulation
- Exercise

Exploratory Data Analysis - Graphing and Summarizing

1 or 2 Days

Exploratory data analysis falls between data munging and model building. In this class we cover the different plotting systems in R along with summarization techniques. This class can be taught in a two-day format for classes without a programming background, or one day for classes with a programming background.

Course Outline
- Plotting Systems in R
- Base Plotting System
- Graphic Devices
- Lattice Plotting System
- ggplot2
- Summarizing Data
- Hierarchical Clustering
- K-Means Clustering

Basic Statistical Analysis in R

1 or 2 Days

This class focuses on inferential statistics in R. After the data munging of the data, it is ready for basic statistical analysis such as hypothesis testing. If the class has a background in both statistics and programming, this class can be taught in one day. Allow an additional half day for those without a programming background and another half day for those without a statistics background.

Course Outline
- Distributions in R
- Confidence Intervals
- Hypothesis Testing
- Power and Sample Size
- Introduction to Bootstrapping

Regression Modelling in R

1 or 2 Days

Regression models are typically the first step in what statistics calls 'models' and data science calls 'classifiers'. Despite the media attention to more complex methods such as Deep Learning, regression models are more parsimonious (think Occam's razor for models) and often provide excellent predictive capability. In our R sequence, this course shifts from a programming to a statistics/analytics focus. Allow an additional day for those without a statistics background.

Course Outline
- Univariate Least Squares Regression
- Coding in R
- Residual Analysis
- Prediction
- Multivariate Regression
- Multivariate Residuals and Diagnostics
- Logistic Regression
- Introduction to Poisson Regression

Machine Learning in R

1 Day

Machine learning is a statistical technique to give computer software the ability to improve performance on a task (or 'learn') with data. In this course, we will cover the basics of machine learning including training and test datasets, over fitting, underfitting, and error rates. The models (classifiers) used include regression, classification trees, and random forest.

Course Outline
- Prediction, Cross Validation, and ROC Curves
- Using R's Caret Package
- Predicting with Trees
- Introduction to Random Forest

Creating Data Products

1 Day

Data products span the gap between the person who created the analysis and the person who needs to consume the information. The ability to present your findings in a way that is easily understood by the receiver is key to being a good data scientist. This course uses Shiny, GoogleVis, Plotly, R Markdown, and Leaflet as new tools in your toolbox to present your results.

Example: Hypergeometric Sample Size Calculator

Course Outline
- Introduction to Data Products
- Shiny
- GoogleVis
- Plotly
- R Markdown
- Leaflet

Search

Data Analytics Training

Introduction to R

Data Munging - Getting and Preparing Data

Exploratory Data Analysis - Graphing and Summarizing

Basic Statistical Analysis in R

Regression Modelling in R

Machine Learning in R

Creating Data Products