Home DATA SCIENCE

According to prediction by a research by MGI and McKinsey’s Business Technology Office in 2013 there would be a shortage of talent necessary for organizations to take advantage of big data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use the analysis of big data to make effective decisions.

The prediction of the research has come true. In years to come, the shortage of skilled manpower in data science will increase exponentially. So, as future Data Scientists, you have a great future ahead!

Introduction to the Data Science in Python Bootcamp

- What is Data Science?
- Introduction to the Python Data Science Tool
- For Mac Users
- Introduction to the Python Data Science Environment
- Some Miscellaneous IPython Usage Facts
- Online iPython Interpreter

Introduction to Python Pre-Requisites for Data Science

- Rationale Behind This Section
- Different Types of Data Used in Statistical & ML Analysis
- Different Types of Data Used Programatically
- Python Data Science Packages To Be Used

Introduction to Numpy

- Numpy: Introduction
- Create Numpy Arrays
- Numpy Operations
- Matrix Arithmetic and Linear Systems
- Numpy for Basic Vector Arithmetric
- Numpy for Basic Matrix Arithmetic
- Broadcasting with Numpy
- Solve Equations with Numpy
- Numpy for Statistical Operation

Introduction to Pandas

- Data Structures in Python
- Read in Data
- Read in CSV Data Using Pandas
- Read in Excel Data Using Pandas
- Reading in JSON Data
- Read in HTML Data

Data Pre-Processing/Wrangling

- Rationale behind this section
- Removing NAs/No Values From Our Data
- Basic Data Handling: Starting with Conditional Data Selection
- Drop Column/Row
- Subset and Index Data
- Basic Data Grouping Based on Qualitative Attributes
- Crosstabulation
- Reshaping
- Pivoting
- Rank and Sort Data
- Concatenate
- Merging and Joining Data Frames

Introduction to Data Visualizations

- What is Data Visualization?
- Some Theoretical Principles Behind Data Visualization
- Histograms-Visualize the Distribution of Continuous Numerical Variables
- Boxplots-Visualize the Distribution of Continuous Numerical Variables
- Scatter Plot-Visualize the Relationship Between 2 Continuous Variables
- Barplot
- Pie Chart
- Line Chart

Statistical Data Analysis-Basic

- What is Statistical Data Analysis?
- Some Pointers on Collecting Data for Statistical Studies
- Some Pointers on Exploring Quantitative Data
- Explore the Quantitative Data: Descriptive Statistics
- Grouping & Summarizing Data by Categories
- Visualize Descriptive Statistics-Boxplots
- Common Terms Relating to Descriptive Statistics
- Data Distribution- Normal Distribution
- Standard Normal Distribution and Z-scores
- Confidence Interval-Theory
- Confidence Interval-Calculation

Statistical Inference & Relationship Between Variables

- What is Hypothesis Testing?
- Test the Difference Between Two Groups
- Test the Difference Between More Than Two Groups
- Explore the Relationship Between Two Quantitative Variables
- Correlation Analysis
- Linear Regression-Theory
- Linear Regression-Implementation in Python
- Conditions of Linear Regression
- Conditions of Linear Regression-Check in Python
- Polynomial Regression
- GLM: Generalized Linear Model
- Logistic Regression

Machine Learning for Data Science

- How is Machine Learning Different from Statistical Data Analysis?
- What is Machine Learning (ML) About? Some Theoretical Pointers

Unsupervised Learning in Python

- Unsupervised Classification- Some Basic Ideas
- KMeans-theory
- KMeans-implementation on the iris data
- Quantifying KMeans Clustering Performance
- KMeans Clustering with Real Data
- How Do We Select the Number of Clusters?
- Hierarchical Clustering-theory
- Hierarchical Clustering-practical
- Principal Component Analysis (PCA)-Theory
- Principal Component Analysis (PCA)-Practical Implementation

Supervised Learning

- What is This Section About?
- Data Preparation for Supervised Learning
- Pointers on Evaluating the Accuracy of Classification and Regression Modelling
- Using Logistic Regression as a Classification Model
- RF-Classification
- RF-Regression
- SVM- Linear Classification
- SVM- Non Linear Classification
- Support Vector Regression
- knn-Classification
- knn-Regression
- Gradient Boosting-classification
- Gradient Boosting-regression
- Voting Classifier

Artificial Neural Networks (ANN) and Deep Learning (DL)

- Theory Behind ANN and DNN
- Perceptrons for Binary Classification
- Getting Started with ANN-binary classification
- Multi-label classification with MLP
- Regression with MLP
- MLP with PCA on a Large Dataset
- Start With Deep Neural Network (DNN)
- Start with H20
- Default H2O Deep Learning Algorithm
- Specify the Activation Function
- H2O Deep Learning For Predictions

Miscellaneous Lectures & Information

- Data For This Section
- Read in Data from Online CSV
- Read Data from a Database
- Naive Bayes Classification
- Data Imputation

- Online, Classroom and On-Premise
- Knowledge of core Java is must
- Suitable for students and programmers

- Normal: One Month (online and classroom)
- Accelerated: One Week (online and classroom)
- On Premise: 3-5 days (accelerated)