Learning Resources
Text Books :
- Data Mining: Concepts and Techniques, 3E by J. Han et al. Download
- Data Mining: Practical Machine Learning Tools and Techniques, 3E by Ian H. Witten et al. View
- Pattern Recognition and Machine Learning, by Christopher Bishop View
Learn WEKA :
- GUI based learning :
- Implementation based learning :
- Implemented in Java by Dr. Noureddin Sadawi Go YouTube, Go Github
- Implemented in Java by Rafsanjani Muhammod
- GUI & Core Java by Rafsanjani Muhammod Go YouTube
Learn scikit-learn :
- Implemented in Python (scikit-learn) by Rafsanjani Muhammod Go Github
- LazyProgrammer Go Github
- Hands-on Machine Learning with Scikit-Learn and TensorFlow Go Github
Blogs :
- An Introduction to Data Mining by Dr. Saed Sayad Go
- Analytics Vidhya Go
- Soft Computing and Intelligent Information Systems Go
- Comparism between classifiers Go, Go
Coming soon … :)
Public datasets for Analytics
- UCI Machine Learning Repository Go
- KEEL Go
- AnalyticsVidhya Go
- Kaggle (This is mainly a contest site.) Go
- Public datasets for Machine Learning Go
- Algorithmia Go
- Springboar Go
Syllabus
Key Terms :
- Features / Attributres
- Feature-values & Attributre-values
- Class & Class-Attributes
- Instances / Records / Vectors / Tuples
- Two-class dataset & Multi-class dataset/Multi-label datasets (when number of class-values is gretter than 2.)
- High-dimensional (When number of feature is gretter than 10)
- Univariate, Bivariate & Multivariate dataset Go, Go
- Balanced dataset vs Imbalanced dataset
- Overfitting & Underfitting of a dataset
- Supervised learning vs. Unsupervised learning Go
- Classification, Regression, Clustering
- Bias–variance tradeoff Go
- Noisy Datasets & how to remove noise ?
- Anomaly Detection Go
Preprocessing Datasets :
- Data cleaning Go
- Remove duplicate elements
- Handle missing elements (Can you calculate : Mean, Median, Mode, Standard Deviation etc.?)
- Feature Scaling or Feature Normalization (Can you calculate distance using : Euclid, Manhattan, Minkowski etc.?)
Classification :
- Rule Classifiers
- ZeroR Classifier
- OneR Classifier Go & Go
- Logistic Regression
- KNN Classifier
- Support Vector Classifier ( Kernels : Linear, Polynomial, Gaussian, Sigmoid, etc. )
- Naive Bayes Classifier
- Decision Tree Classifier
- Gini
- ID3
- C4.5 / C5.0 / J48
- Ensemble Learning
- Bagging Classifier
- Boosting Classifier (AdaBoost, Gradient Boosting)
- Random Forest Classifier
- Introduction to Deep Learning (ANN, RNN, CNN, SOM, Autoencoders )
Regression :
- Linear Regression (Simple & Multiple)
- Polynomial Regression
- Support Vector Regression (SVR)
- Decision Tree Regression
- Random Forest Regression
Clustering :
- KMeans
- Hierarchical (Agglomerative, Divisive)
Imbalanced Learning : Go
- Majority class vs Minority class
- Re-sampling : Over-sampling, Under-sampling
- Over-sampling algorithms : ADASYN, SMOTE, Random Over-sampling
- Under-sampling algorithms : Random Under-sampling
Features Selection :
- Filter Approach (LDA)
- Wrapper Approach
Features Generation:
- Embeded Approach
- Principal Component Analysis (PCA)
- Understand Confusion Matrix
- Calculate : Accuracy, Error, Sensitivity, Specificity, Precision, Recall
- ROC Curve & AUPR Curve
Course Materials
Course Schedule
- Week #1 :
- Introduction to Pattern Recognition,
- Current Researh Trend,
- Introduction to WEKA
- Week #2 :
- Hands-on practice on WEKA GUI
- Understand what are the .CSV & .ARFF file
- Data Vizualization
- Classifier design
- Use different machine learning algorithms
- Evaluation options
- use training dataset
- supplied test dataset
- cross-validation (KFold=10)
- split dataset (2/3 train & 1/3 test)
- Confusion Matrix
- TP, FP, FN, FP
- Performance Measure : Accuracy, Error, TPR, FPR, F-Score etc.
- weighted mean
- Assignment #1 : Choose all (25) datasets from WEKA & submit report on it (Week #03).
- Week #3 :
- Introduction to WEKA implementation in Java.
- Assignment #2 : Two huge datasets provides & submit report on it (Week #04)
- Week #4 :
- Details on WEKA implementation in Java.
- Actual vs Prediction
- Evaluation options
- Feature Reduction
- Week #5 : Ensemble Learning
- Week #6 : Midterm Exam (Classifiers : based on your both Lab & Theory courses.)
- Week #7 : Clustering
- Week #8 : Data Analysis with scikit-learn (Python)-I
- Loading the datasets (using pandas, numpy)
- Features scalling
- Machine Learning Classifiers
- Evaluation Matrix
- Assignment #3 : Datasets will provide.
- Week #9 : Data Analysis with scikit-learn (Python)-II
- Problem solving
- Draw curve on scikit-learn (eg. ROC Curve, AUC Curve)
- More tricks (imblearn)
- Week #10 : Data Analysis with scikit-learn (Python)-III
- Introduction to Kaggle competetion Go
- Problem solving
- Assignment #4 : A dataset will provide.
- Week #11 : Presentation based on datasets. (Individual)
- Week #12 : Final Exam.