Data Science and AI
Data
Science and AI Course Curriculum (Syllabus)
Course Introduction:
Welcome to our comprehensive Data Science and Artificial Intelligence (AI) course! This program is designed to provide you with a strong foundation in data science, machine learning, and AI techniques. You'll learn essential skills in Python programming, machine learning algorithms, data visualization, and more. Real-world projects and career guidance are integrated to give you hands-on experience and help you embark on a successful career in data science and AI.
Job Roles in Data Science and AI:
Upon completing this course, you'll be prepared for a diverse range of job roles in the data science and AI domain, including:
· Data Scientists
· Machine Learning Engineers
· AI Researchers
· Business Intelligence Analysts
· Data Engineers
· Python Developers
These roles encompass tasks such as data analysis, machine learning model development, AI research, business intelligence, data engineering, and Python development, providing you with a wide range of opportunities in the data-driven industry.
Module 1: Introduction to Data Science and AI
·
Introduction to Data Science (DS) and Artificial
Intelligence
· What is Data Science (DS)
· What is Artificial Intelligence (AI)
· What is machine learning? (ML)
· What is deep learning (DL)?
· Relatonship among AI, ML, DL and DS
· DS and AI Applications in HealthCare
· DS and AI Applications in Entertainment Industry
· DS and AI Applications in IoT
· DS and AI Applications in Retail
· DS and AI Applications in Agriculture
· DS and AI Applications in Cyber Security
· Different Tools or Technologies involved in DS and AI
o SQL
o Python
o Maths and Statistics
o Jupyter Notebook
o Visual Studio Code
o Scikit-learn.
o Numpy
o Pandas
o Matplotlib
o Seaborn
o Keras
o TensorFlow
o Cloud: Azure/AWS
· Different Job Roles in DS and AI
· Why is Data Scienceso Demand today?
· What’s the future of Data science and AI?
Module 2: SQL for Data Science
· Introduction to SQL
· Its importance in data science
· Understanding databases and their structure
· Installing and setting up SQL environments (e.g., MySQL, MS SQL Server)
· Basic SQL Operations
· Creating databases and tables
· Inserting, updating, and deleting data
· Retrieving data using SELECT statements
· Filtering and Sorting Data
· Using WHERE clause for data filtering
· Comparison operators
· Logical operators
· Sorting retrieved data using ORDER BY
· Aggregate Functions and Grouping
· Understanding aggregate functions (COUNT, SUM, AVG, MIN, MAX)
· Performing calculations on grouped data
· Using GROUP BY clause to aggregate data
· Working with Strings and Dates
· Manipulating strings with functions like CONCAT, SUBSTRING
· Handling date and time data using date functions
· Joining Tables
· Understanding the concept of table joins (Ven Diagrams)
· Implementing INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN
· Handling NULL values in joins
· Subqueries and Derived Tables
· Understanding subqueries and their role in SQL
· Using subqueries for complex queries and aggregations
· Utilizing derived tables for intermediate results
Module 3: PythonProgramming language
· Introduction to Python
· Role of python in Data science
· Explanation of Python's simplicity and readability.
· Python Syntax
· Data Types : (integers, floats, strings, booleans).
· Variablesdeclaration
· Variables assignment
· Variables naming conventions
· Lists in Python
· List creation, indexing, slicing, and modifying
· Sets in Python:
· Set operations (union, intersection, difference).
· Sets: data deduplication
· Sets: unique value extraction.
· List Comprehensions:
· Control Structures:
o for loops,
o while loops.
· Conditional Statements (if and elif)
· Logical operators (and, or, not)
· Lambda Functions
· Map, Filter
· Create, Read, Write Files
· File Operations & Errors
· Introduction to Classes and Objects (OOPS)
· Classes & Objects
· Create Class & Methods
· Working with Objects
· The init() Method
· Modify Properties & Methods
· 'self' Parameter
· Delete Objects
· 'pass' Statements
Module 4: Pythonfor Data Analytics and Data Science
· Real-world data science scenarios using Python.
· Numpy
· Pandas
· Matplotlib
· Types of Data: Structured, Unstructured, Semi-Structured:
· Structured (tabular)
· Unstructured (text, images), and
· Semi-structured (XML, JSON).
· Numpy
· Introduction to Array
· Creation and Printing of an array
· Basic Operations in NumPy
· Indexing
· Numpy: Where, count, arg
· Pandas
· What is Pandas Data frame
· Tabular data structure with rows and columns
· Series, Index
· Read_csv, Head, Tail
· Shape, Columns
· Iloc, loc, Drop
· GroupBy: Grouping and aggregation operations.
· Reshaping: Dataframe manipulation
· Plotting: Data visualization tools.
· Missing Data:
· Merge and Join: Combining DataFrames.
· Matplotlib
· Figure: Top-level container.
· Axes: Individual plots.
· Line Plot: Connects data points.
· Scatter Plot: Displays individual points.
· Bar Plot: Uses rectangular bars.
· Histogram: Shows data distribution.
· Pie Chart: Displays composition.
· Annotations: Adds text/arrows.
· Subplots: Divides figure.
· Styles: Customizes appearance.
· Case Study on Exploratory Data Analysis (EDA) and Visualizations
· What is EDA?
· Uni – Variate Analysis
· Bi-Variate Analysis
· More on Seaborn based Plotting Including Pair Plots, Catplot, Heat Maps, Count plot along with matplotlib plots.
Module 5: Stats and Maths
·
Statistics in Data science:
o What is Statistics?
o Role in Data Science
o Population vs. Sample
o Parameter vs. Statistic
o Types of Variables
· Data Gathering Techniques
o Collecting Data
o Sampling Techniques:
o Convenience, Simple Random Sampling
o Stratified, Systematic, Cluster Sampling
· Descriptive Statistics
o Univariate and Bivariate Analysis
o Central Tendencies
o Measures of Dispersion
o Skewness and Kurtosis
o Box Plots and Outliers
o Covariance and Correlation
· Probability Distribution
o Basics of Probability
o Discrete Distributions:
o Bernoulli, Binomial, Poisson
o Continuous Distributions:
o Normal, Standard Normal
· Inferential Statistics
o Central Limit Theorem
o Confidence Intervals, p-value
o Hypothesis Testing
o Z-test, T-test
o Chi-Square Test
Module 6: Machine Learning : Supervised Learning
·
What’s supervised learning: Regression and
Classification
· Linear Regression:
o Concept: Models linear relationships between independent and dependent variables.
o Data Preparation
o Model Representation
o Gradient Descent
o Concept and Purpose
o Steps in Gradient Descent
o Cost Function (MSE)
o Definition and Purpose
o Solving Linear Regression
o Normal Equation Method
o Metrics:Mean Absolute Error (MAE), R-squared (R2) Score
o Example: House price prediction
· K-Nearest Neighbors (KNN):
o Concept: Classifies data points based on their nearest neighbors in feature space.
o Introduction to k-NN Classifier
o Data Preparation
o How k-NN Works
o Utilizing Nearest Neighbors
o Distance Metrics
o Optimal Value of k
o Making Predictions
o Evaluating Performance
o k-NN: Strengths and Weaknesses
o Real-world Applications of k-NN
o Classification on Iris Dataset
o Metrics:Classification Accuracy
· Support Vector Machines (SVM):
o Concept: Finds a hyperplane that best separates classes in feature space.
o Introduction to Support Vector Machines (SVM)
o Data Preparation for SVM
o Understanding the SVM Algorithm
o Concept and Purpose
o Kernel Trick for Non-Linear Data
o Hyperparameter Tuning in SVM
o Making Predictions with SVM
o Evaluating SVM Performance
o Classification Accuracy
o Confusion Matrix
o F1-score (for binary classification)
o Pros and Cons of SVM
o Applications of SVM
· Logistic Regression:
o Concept: Models the probability of a binary target variable given predictors.
o Introduction to Logistic Regression
o Data Preparation for LR
o Understanding the LR Model (Sigmoid Function)
o Interpreting LR Coefficients
o Training the LR Model
o Cost Function (Log Loss)
o Evaluating LR Performance
o Accuracy, Precision, Recall, F1-score
o Confusion Matrix
o ROC-AUC Score
o Regularization in LR (L1 and L2)
o Pros and Cons of LR
o Applications of LR
o Disease prediction using LR
· Random Forest:
o Concept: Ensemble method combining multiple decision trees for improved accuracy.
o Data Prep
o How it Works
o Ensemble of Trees
o Feature Importance
o Training
o Evaluation
o Accuracy (Classif.)
o MAE (Reg.)
o Overfitting Handling
o Pros and Cons
o Applications: Breast Cancer Wisconsin (Diagnostic) dataset
Module 7: Machine Learning: Unsupervised Learning
·
Different types of techniques in Unsupervised
Learning
·
Dimension Reduction:
o Why Dimension Reduction is Important
o Principal Component Analysis (PCA)
o Concept and Purpose
o Steps in PCA
o t-Distributed Stochastic Neighbor Embedding (t-SNE)
o Concept and Use Cases
o Applications: Image Compression
· Introduction to Clustering
o Types of Clustering Algorithms (Focusing on KMeans)
o How KMeans Works
o Concept and Purpose
o Choosing the Optimal Number of Clusters (k)
o Applying KMeans for Clustering
o Evaluating Clustering Performance
o Silhouette Score
o Inertia
o Demo on using Digits Datasets
· Introduction to Recommender Systems
o Types: Content-Based, Collaborative Filtering
o Collaborative Filtering: User-Based, Item-Based
o Matrix Factorization: SVD
o Evaluation Metrics: RMSE, MAE
o Cold Start Problem & Solutions
o Applications: Product recommendations
Module 8: Deep Learning
· Perceptron& Neural Network History
· Activation Functions
· Sigmoid, Relu, Softmax, Leaky Relu, Tanh
· Gradient Descent
· Learning Rate Tuning
· Optimization Functions
· TensorFlow Introduction
· Keras Introduction
· Backpropagation & Chain Rule
· Fully Connected Layer
· Cross Entropy
· Weight Initialization
·
Regularization
· TensorFlow 2.0
· TensorFlow basic syntax
· TensorFlow Graphs
· Tensorboard
· Artificial Neural Network with TensorFlow
· Regression
· Classification
· Evaluating the ANN
· Improving and tuning the ANN
· Saving and Restoring Graphs
Module 9: NLP
· Statistical NLP Basics
· Intro to NLP
· Text Prep (Cleaning, Simplifying)
· Word Importance (Bag of Words, TF-IDF)
· Language Patterns (N-grams, Channel Model)
· Word Representation
· Word Meanings (Word2vec, Glove)
· Tagging Words (POS Tagger)
· Spotting Names (NER)
· Identifying Words (POS with NLTK, TF-IDF with NLTK)
· Sequential Models
· Remembering Sequences (RNN, LSTM)
· LSTM in Detail (Forward & Backward)
· Practical LSTM (Hands-on)
· Practical Applications
· Judging Feelings (Sentiment Analysis)
· Creating Sentences
· Changing Languages (Machine Translation)
Module 10: Computer Vision Basics
· Introduction to Computer Vision
· Image Representation
· Color Channels
· Introduction to Convolutional Neural Networks (CNNs)
· Motivation for CNNs
· Building Blocks of CNNs
· Convolutional Layers
· Pooling Layers
· Fully Connected Layers
· Advanced CNN Concepts
· Activation Functions (ReLU, etc.)
· Batch Normalization
· Dropout
· Training CNNs
· Loss Functions
· Optimization Methods (SGD, Adam, etc.)
· Backpropagation in CNNs
· Popular CNN Architectures
o LeNet
o AlexNet
o VGG
o ResNet
· Image Classification and Recognition
· Understanding Labels and Classes
· Top-k Accuracy
· Visualizing CNN Activations
· Practical Projects:
o Implementing CNN for Image Classification
o Building an Object Detector
o Image Understanding with Semantic Segmentation
Module 11: Data Science Project in the Cloud
- Overview:
- Introduction to cloud computing
- its role in data science projects.
- Key Topics:
- [Topic 1]: Setting up cloud environments for data science projects.
- [Topic 2]: Deploying an end-to-end data science project in the cloud.
- [Topic 3]: Practical steps for model training and deployment using cloud resources.
- Practical
Applications:
o Real-world examples of data science projects hosted in the cloud.
o Demonstrations of cloud-based data processing and model deployment.
- Hands-On
Experience:
- Setting up cloud environments (e.g., AWS, Azure) for data science work.
- Deploying and managing data science projects in cloud platforms.
Module 12: Real-world Projects and Case Studies
- Application of concepts learned throughout the course in practical projects.
- Real-world data science projects across various domains (3-4).
- Credit Card Fraud detection
- Movie Recommender System (IMDB)
- Image Classification on CIFAR 10
- Sentiment Analysis on product page reviews
- Crop Disease detection using AI and Computer Vision
- Speech to Text Detection for Digits
- Insights from industry experts and their experiences in data science.
- Completion of hands-on data science projects.
- Presentation of findings and insights from real-world datasets.
- Assessment:
o Evaluation of final data science projects.
o Peer assessment and presentation evaluation.
Module 13: Career and Industry Insights
- Overview:
o Exploration of job roles in data science, AI, and ML.
o Guidance on resume building and interview preparation.
o Discussion on continuing education and lifelong learning in the field.
- Key Topics:
o [Topic 1]: Job roles in data science, AI, and ML.
o [Topic 2]: Resume building and interview preparation for data science positions.
o [Topic 3]: Strategies for ongoing learning and professional development.
- Practical
Applications:
o Real-world insights into career paths and opportunities in data science.
o Tips and best practices for job application and interviews.
- Hands-On
Experience:
o Resume building exercises and mock interview practice.
o Development of a personalized career development plan.
· Evaluation of resumes and interview preparation progress.
· Completion of a career development plan.
Tags: Data Science and AI