100DaysofMLCode

On this repository I will be updating all the stuff that I do every day related to Machine Learning for the 100DaysofMLCode Challenge proposed by Siraj Raval.

DAY 1 :

So this is the first day of #100DAYSOFMLCODE . I am so excited for the next 100 days that I am going to spend learning and using machine learning. Today was a great day. I attended the Chennai School of AI meetup and got to learn a lot of interesting things and got to know a lot of interesting people. We had a very fruitful discussion on applications of ML and AI. Got to know about interesting projects that some really experienced people had worked on. Some of them were –> Automatic quiz creation app, Chatbots, Detecting elephants near railways tracks and stopping them from being hit by trains, Automatic music creation web app(by the Dean himself) and many other interesting things. Also got to know about many new machine learning libraries like –> spacy, scikit, tensorflow, etc. Had a very great day and I was inspired to learn ML and go deep into the field.

DAY 2:

Started with the Python for Data Science series by @sirajraval on YouTube. Made a simple gender classification code using weight, height and shoe size as the only inputs. Made it with scikit-learn (Decision Tree) guided by siraj and further made the same using the scikit-learn KNearestNeighbors method and QuadraticDiscriminantAnalysis method. It is awesome just how a few lines in Python can create a gender classification code. It felt really awesome writing my first machine learning code. #100DaysofMLCode

DAY 3 :

Made a Twitter sentiment analysis code using libraries like tweepy and textblob. Followed the tutorial by @sirajraval , which is the second video in his Python for Data Science Series.

Completed the assignment and successfully created a csv file using the pandas library. You will find it soon on my github.

Learnt a bit about natural language processing from this tutorial and found it really interesting.

DAY 4 :

Learnt about vector spaces from MIT Open Courseware. Maths plays a very important role in ML. It is necessary to have good knowledge of linear algebra, probability and statistics and calculus.

All these can be valuable while developing your own machine learning algorithms.

DAY 5 :

Made a movie recommendation system by following the third video of @sirajraval ‘s Python for Data Science series. I used three dependencies: – numpy – scipy – lightfm

lightfm is a library which has inbuilt recommendation system algorithms. Used the fetch_movielens function to gather all movie data and used warp as my loss function to improve the model and reduce the loss over time.

Learnt about two types of recommendation systems : –collabrative (similar users data) –content based (only the users data) It was awesome learning and I was finally happy when I understood the whole code.

#100DaysofMLCode

DAY 6 :

Do you want to get rich by investing in stock markets. Well then, Machine Learning can help you understand where to invest so as to get the most profit.

I developed a stock price prediction model by following @sirajraval ‘s fourth video in his Python for Data Science series.

I used the following dependencies :

CSV
numpy
matplotlib
sklearn.svm

#100DaysofMLCode

DAY 7 :

Today I revised Decision Tree Classifiers and made a simple model for predicting the type of iris flower.

I got a better understanding of training and testing data.

I used scikit learn and numpy as the major libraries

#100DaysofMLCode

DAY 8 :

Read a paper on the development of Speech Recognition systems while on my journey from Bangalore to Chennai.

Attended a coding competition in Bangalore and spent the whole day coding.

#100DaysofMLCode

DAY 9 :

Learnt how to find accuracies of Machine Learning Algorithms ( basically Classifiers) using the sklearn.metrics function.

I found the accuracy of Decision Trees and KNeighbors Classifier using the iris dataset inbuilt inside scikitlearn.

For me Kneighborsclassifier turned out to be more accurate than Decision Trees.

#100DaysofMLCode .

DAY 10 :

Made a small version of the KNeareatNeighbors Classifier. KNN basically works on the principle of Euclidean distance.

I got an accuracy of 96% using this classifier.

Though it is simple to make, this classifier may work slow if you have a huge amount of data. It also doesn’t represent relationships between features and therefore doesn’t not take everything into consideration.

Learnt this by following Josh Gordon on the @googledevs youtube channel.

#100DaysofMLCode

DAY 11 :

Revised what I have learned from the past 10 days.

Got a refresher and overview of the past 10 days.

Moving ahead from tomorrow… #100DaysofMLCode .

DAY 12 :

Learnt how to use Linear Regression using scikitlearn.

Applied it for predicting body weight using just the brain weight.

Made this by following @sirajraval ‘s 1st video in his intro Deep Learning series.

#100DaysofMLCode .

DAY 13 :

I set up my github and uploaded all my work done till Day 12.

All ML Codes are available as .ipynb file.

Pls do contribute.

DAY 14 :

Started to read an online book named Neural Networks and Deep Learning by Michael Nielsen.

Learnt about perceptron – a type of artificial Neuron.

#100DaysofMLCode .

DAY 15 :

Learnt about the Sigmoid Neuron and how it is similar to a perceptron.

A sigmoid Neuron is also known as a logistic Neuron and is another type of an artificial neuron, which has the self learning functionality.

#100DaysofMLCode .

DAY 16 :

Continuing with the book – Neural Networks and Deep Learning by Michael Nielsen.

#100DaysofMLCode .

DAY 17 :

Tried to make my first Neural Network by following @sirajraval ‘s deep learning series.

There were some errors. Will check them out tomorrow.

#100DaysofMLCode .

DAY 18 :

Got my first neural network working by following @sirajraval ‘s second video in his Intro to Deep Learning Series.

Corrected the errors from yesterday’s code.

Find the code to my first Neural Net on my GitHub (link in bio).

#100DaysofMLCode .

DAY 19 :

Started learning how to classify handwritten digits using a feed forward neural network by following the book ‘ Neural Networks and deep learning ‘ by Michael Nielsen.

Learnt how feed forward neural nets are better than recurrent neural nets.

Understood the basic neural network for classifying handwritten digits.

#100DaysofMLCode .

DAY 20 :

Started learning how to make a simple Neural Network to classify handwritten digits. Basically the neural network will have 3 layers –> 1 input, 1 hidden and 1 output.

The weights and biases of the layers will be decided using the gradient descent algorithm.

The cost function will determine the loss or accuracy of the neural networks.

#100DaysofMLCode .

DAY 21 :

Moving ahead with gradient descent….. . . . #100DaysofMLCode .

DAY 22 :

Finally starting with the code to recognise handwritten digits. I will be using MNIST dataset with 60,000 images as training data and 10,000 images as testing data.

I will use Gradient descent to get the weights and biases for my Neural Network.

Hope to complete the code by tomorrow.

#100DaysofMLCode .

DAY 23 :

The Handwritten Digits Recognition code does not seem to work so easily.

But I hope to make it work soon.

#100DaysofMLCode

DAY 24 :

Got the handwritten digits recognition code working by following the book neural networks and deep learning by Michael Nielsen.

Did the same using tensorflow by following @sirajraval ‘s tutorial.

Got more than 90% accuracy for both the codes. Unfortunately I couldn’t make it for accepting user input and predicting the digit. But I will be working on it.

For now it trains on the training data and compares the predicted result with the test data and gives the accuracy.

#100DaysofMLCode

DAY 25 :

Started with the Intro to Machine Learning Course by @udacity as suggested by @sirajraval in his Machine Learning in 3 months video.

Learnt the basics and the first two lessons were a revision for me. The were all about supervised and unsupervised learning.

Learnt about Naive Bayes implementation using the scikit-learn library.

Hope to learn a lot from this course.

#100DaysofMLCode

DAY 26 :

Continuing with @udacity ‘s course…. #100DaysofMLCode

DAY 27 :

Started learning about Support Vector Machines from @udacity . Made my first SVM using scikit-learn

#100DaysofMLCode

DAY 28 :

Learnt about non-linear SVM’s. Also learnt about an awesome trick known as the Kernel trick which magically transforms a linear seperator into a non linear seperator.

@udacity ‘s Intro to ML course is really great and helpful. I recommend other wizards to take this course if you are a beginner.

#100DaysofMLCode

DAY 29 :

Went a little deep into decision trees by following @udacity ‘s Intro to ML course.

#100DaysofMLCode

DAY 30 :

I came across a very great article on medium which has all the best Machine Learning Cheat Sheets.

Read about the Scikit-Learn Cheat Sheet.

It is very well formatted and detailed.

I recommend all fellow wizards to read it and share.CLICK HERE FOR THE LINK

It is the best thing that I have come across in my Machine Learning Journey.

DAY 31 :

Revised the T-Shirt Recommendation System that I had made as an assignment for a workshop by @appliedaicourse in my vacations.

I completed the assignment successfully and earned a certificate for it too.

I used libraries like pandas, scikit-learn, keras, tensorflow, matplotlib, etc.

I used techniques like TF-IDF, bag of words, word2vec for text processing and the VGG16 Neural Network for image processing.

#100DaysofMLCode

DAY 32 :

Again a bit of Decision Tree coding from @udacity ‘s course.. .. .. .. #100DaysofMLCode

DAY 33 :

Got a revision of the handwritten digit recognition system that I made by following the book Neural Networks and Deep Learning by Michael Nielsen.

The system only measures accuracy from the given set of training data and test data.

I wasn’t able to give my own input to the system. I will soon be updating it on my GitHub. Any help with it will be great.

I saw an awesome clip on youtube depicting the handwritten digits neural network in an animated format.See it here

#100DaysofMLCode

DAY 34-35 :

Applied Google’s Deep Dream code by following @sirajraval ‘s 5th video in his Python for Data Science series.

I first downloaded Google’s Inception model, then created a tensorflow session where I imported the inception model graph.

Then I picked a layer out of all the 59 layers and enhanced the image. Applied Gradient Ascent on the layer iteratively and got the Deep Dreamed Image.

#100DaysofMLCode

DAY 36 :

Learnt about Genetic Programming by following @sirajraval ‘s 6th video in his Python for data science series. Genetic programming basically chooses the best ML model to solve a problem and find an optimal solution.

There are basically three steps in genetic programming : 1) Selection 2) Crossover 3) Mutation

This genetic algorithm will help classify of a given energy is gamma radiation or not.

The program uses tpot (genetic programming library), scikit-learn, numpy and pandas.

#100DaysofMLCode

DAY 37 :

Not much today. Continuing with @udacity ‘s Intro to ML course.

#100DaysofMLCode

DAY 38 :

Back after 2 day’s break…. Started with the Math of Intelligence series by @sirajraval .

Learnt about how we can train a model with labelled data by just using y=mx+b. It involved decreasing the error of the prediction by using gradient descent.

The whole series will not use any ML library and will involve making of ML algorithms from scratch.

This is a great way to get into the math of Machine Learning.

#100DaysofMLCode

DAY 39 :

Sometimes if you are lucky you get a lot of great data and sometimes you don’t. What do you do when you have less amounts of labelled data ? –> You apply Support Vector Machines (SVM’s). 😎

SVM’s are great when you want to perform supervised classification problems.

What SVM’s basically do is that the create the best hyperplane that is at equal distance from all classes of data.

Suppose you have data with two features then your hyperplane will be a line, but if your data has 400 features then your hyperplane will be a 399 dimensional figure.

SVM’s are normally applied to linear data but can also be applied to non-linear data by using a trick known as the Kernel trick.

SVM’s are great for classification but can also be used for regression, finding outliers, etc.

I learnt all of this from @sirajraval ‘s 2nd video in his The Math of Intelligence series.

#100DaysofMLCode

DAY 40 :

Done with SVM’s on Udacity.

@udacity ‘s Intro to ML course has really great assignments and they involve lots of challenging questions.

Udacity gave all the assignments in python 2 format so had to do some research to get it all in python 3.

The course didn’t go deep into SVM’s but I hope to learn it from the Math of Intelligence series by @sirajraval .

#100DaysofMLCode

DAY 41 :

The best way to learn something is to express it in an understandable way.

Working on a blog that will explain the T-Shirt Recommendation system. I hope to complete it as soon as possible. But till then you can check the code I have updated on my GitHub. You will find the link to my github here.

DAY 42 :

Did research for my blog and updated it further.

DAY 43 :

Read about the markdown language that is used for the GitHub readme file. Made little changes to the blog.

DAY 44 :

Found out that @udacity ‘s Intro to ML course Lesson 15 is related to Bag of Words and TF-IDF, which are text frequency counting concepts. These concepts will help me on the blog. Though I have learnt about these concepts before, the lesson will give me a revision so that I can explain it clearly on my blog.

I am now thinking about completing udacity’s course and then working on the blog. So the blog might take time.

Going a little slow this week due to exams.

DAY 45 :

Read about the concept of TF-IDF and how to apply it using scikit-learn library in python.

DAY 46 :

Read a paper named “A Few useful things to know about Machine Learning” by Pedro Domingos. The paper focuses on some very useful things that can be used to produce better Machine Learning Algorithms.

Some main points to remember from the paper are :

Learning = Representation+Evaluation+Optimisation
Generalization is important.
Data alone is not enough.
More data –> Cleverer Algorithm.
Intuition fails in higher dimensions.
Feature engineering is the key to solve the problem.
Learn many models.
Simplicity does not imply accuracy.
Correlation is not the same as causation.

Find the paper here :

https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf

A brief summary of the paper is on this medium article :

https://medium.com/@rupak.thakur/23-deep-learning-papers-to-get-you-started-part-1-308f80d7bba2

DAY 47 :

Did a project on my own today. I got a dataset of wine contents from kaggle. I divided the data into features and labels. Then I splitted it into training and testing data.

Then applied SVM (Support Vector Machine Classifier), SVC to be specific, using scikit-learn. I got an accuracy of 60%. 60% wasn’t good enough, but I think it’s great for me on my first self project. I will have to try alternative methods other than SVM to get a higher accuracy.

I did this whole thing on my own, without following any tutorial or video and I think it was really great.

Find the code on my GitHub (link in Bio). Any suggestions as to how to improve the accuracy will be really great.

DAY 48 :

Today I made a human activity prediction system, which predicts if a person is standing, walking, sleeping,etc. based on the data from a mobile phones sensors

I got the data from @kaggleofficial . The data set contains all of the data from mobiles sensors as its features and the activity as its label.

I used support vector machines to predict the activity and got an accuracy of 93%, which I think is pretty cool.

Go check out the code on my GitHub.(Link above).

DAY 49 :

Revision. Couldn’t find much time due to exams, so I did a bit of revision on the concept of Naive Bayes and Decision Trees.

DAY 50 :

Made a BMI Prediction System from scratch. Got a dataset online containing the height and weight of a person as features and the BMI as a label. Made my model using SVM and using the kernel as ‘linear’. Got an accuracy of 93 to 94% 😀😀😎. The past 50 days were fun and I got to learn a lot of new things. Hope to do the next 50 with the same enthusiasm.

DAY 51 :

Attended the Chennai School of AI’s second meet-up. Had a great time at Crayon Data discussing about LSTM’s, multi-label classification and Conversational AI. The speakers were inspirational and got to learn a lot from them.

Learnt how LSTM’s are capable of storing data so that they can learn from past outputs. Learnt about how we can predict multiple labels for a given problem.

Also learnt the flow of conversational AI and how to implement it using RASA-NLU.

I hope that this community goes on inspiring everyone so that we can make this world a better place to live in.

Thanks to @sirajraval & @schoolofaioffic .

DAY 52 :

Today I read a blog named “Visual Introduction to Machine Learning”. The blog basically describes how to solve a problem of distinguishing homes in New York from homes in San Fransisco.

First step is to classify all the data points that are needed. Like the elevation of the house, the price, etc. Then drawing boundaries to create a model using Decision Trees🌲. Decision trees normally use if else statements to define patterns in data. These if else statements are called as forks. These forks splits data into two branches. The value between these branches is called split point.

Split points basically create a boundary. Not all these split points create a perfect boundary. So we need to find the best split that predicts the answer correctly. We use recursion do so. This helps find the best split point to create a near to perfect boundary. Finally we find the accuracy. Accuracy on the the training data will be 100% but on the testing data it might be low due to overfitting.

Link to the blog is :

http://www.r2d3.us/visual-intro-to-machine-learning-part-1/

DAY 53 :

Machine Learning models can make mistakes if the patterns are overly simple or overly complex. These models can be adjusted to fit the data.

The simplest version of a decision tree is called a stump. This type of version is prone to errors due to bias. This model with too much bias ignores relevant details. To decrease these errors we increase the number of split points. That is more features. This allows the decision tree to take into account more complexity.

But too much complexity might cause overfitting. Which is another problem. One way to address errors from overfitting is to impose limits. The accuracy of each tree improves as errors due to variance decrease. As the minimum node size threshold continues to increase, the accuracy begins to deteriorate from error due to bias.

Therefore understanding the trade-off between bias and variance is important while making your Machine Learning Model.

Link to the blog :

http://www.r2d3.us/visual-intro-to-machine-learning-part-2/

DAY 54 :

Still working on the blog Now that the vacations are up, I aim to complete it by the 30th.

Also I hope to make much more progress than the last half of the Challenge.

DAY 55 :

Finally the blog is up. Click here to go to the blog. It will be great to hear from all the #wizards out ther. I cover topics like Bag of Words and TF-IDF in the blog. I used these concepts when I was learning to make the T-shirt Recommendation System.

You can find the whole code of the T-shirt Recommendation on my GitHub.

I hope this is useful.

DAY 56 and 57 :

Worked for 6 hrs today to make up for the time that I have lost. Completed all of Decision Trees from @udacity . Also learnt about ensemble algorithms –> AdaBoost and Random Forests. Another collection to my supervised learning tools.

In decision trees I went a little deep into knowing terms like entropy and information gain. Basically information gain and entropy can help you find the best feature to use in a decision tree so as to know how every node in a decision tree should be divided.

Ensemble methods like AdaBoost and Random Forests allow you two combine all weak classifiers to make one strong classifier. That is why they are known as ensemble methods.

I implemented decision trees on a data of emails and AdaBoost and Random Forests on a terrian data provided by @udacity .

DAY 58 :

Completed the Datasets and Questions lesson on Udacity’s Intro to ML course.

Played with the Enron Dataset and answered lots of questions related to it. It was great as it taught what data might be useful and what data might not be. It also showed how More Data is much important than fine tuned algorithms.

DAY 59 :

Started with the next chapter on Udacity’s Intro to ML Course. The chapter is all about regression and I am halfway through it.

Regression is basically a type of Continous Supervised Learning technique unlike Classification, which is a Discrete Supervised Learning Technique.

Regression is used when the output is not binary. It is used when the output is continous. Regression can be used fro predicting values like salary, net worth, height, weight. While Classification can be used when you need to predict if the speed is slow or fast. To know the exact speed ina a given unit like mph, you use Regression.

Linear Regression is based on the simple formula y=mx+c . Where y is the output to be predicted and m is a coefficient, c the intercept and X is the input.

The error in Regression, unlike Classification is found using The Summation of All Squared Values of errors. Where the errors can be defined as (Actual Value - Predicted Value). To lower this error we can use two techniques, one is OLS (built in with scikit-learn) and Gradient Descent.

DAY 60 :

Done with Regression and its mini project on @udacity ‘s Intro to ML Course.

Learnt how to apply R-squared using sklearn. Also learnt about multi-variate regression where there are more than variables as input for a target output. Where the value of m is different for every X(input). Finally did the mini project on the Enron dataset.

DAY 61 :

Halfway through the lesson on Outliers. @udacity ‘s course on Intro to ML teaches outliers in a very interesting way. It explains how to find them and remove them.

They have a very interesting and challenging mini project on outliers.

DAY 62 :

Still stuck on outliers…. #100DaysofMLCode

DAY 63 :

Finally started with unsupervised Learning. Unlike Supervised learning, Unsupervised learning does not have labels to its data.

Clustering is a type of Unsupervised Learning, where the algorithm creates clusters of data points and divides it into different classes.

A famous Clustering Algorithm is the K-Means Algorithm. K-Means works on two very important fuctions

1) The Assign Function 2) The Optimise Function

DAY 64 :

Done with clustering on @udacity ‘s course on Intro to ML. Completed the mini project on Clustering and got to know about this visual blog on Clustering :

https://www.naftaliharris.com/blog/visualizing-k-means-clustering/

Got a hands-on experience with using sklearn for K-Means.

DAY 65-66 :

drawing

Started with Feature Scaling on Day 65. Learnt about why feature scaling is important and the simple formula for it. Also learnt where Feature Scaling is necessary. The picture above gives a brief idea about feature scaling. 🔝🔝🔝🔝🔝🔝 On Day 66, I completed the mini-project for feature scaling on @udacity ‘s course on Intro to ML.

Outliers mainly occur due to :

1) Sensor Malfunction 2) Data entry errors 3) Freak event

How to remove them :

1) Train 2) Remove points with largest residual error. 3) Re-Train

DAY 67 :

Trying to get a Sneak-peek of the free online book “Machine Learning Yearning” by “Andrew NG”. The book basically focuses on creating different strategies for a Machine Learning problem. It goes through various chapters that can help you solve a difficult Machine Learning problem.

You can subscribe to the book for free on :

mlyearning.org

DAY 68 :

Revising what I have learnt till now on @udacity .

DAY 69 :

Day 69 : The Math of Neural Nets….

DAY 70-71 :

Completed the lesson on @udacity about PCA (Principal Component Analysis), which is basically a dimensionality reduction technique.

DAY 72 :

Analysed a Kaggle Kernel for Beginners. The kernel is based on house price prediction system and takes you from Data Collection to Prediction.

It is very well formatted and has important points described in it that Machine Learning beginners can follow. The link to the blog is :

https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data

I recommend #wizards who are beginners and have completed a proper Machine Learning course to analyse this Kernel.

DAY 73 :

One of the most important questions to ask as a Machine learning engineer when evaluating a model is how to judge the model that you have created. Each Machine Learning Model is trying to solve a problem with a different objective using a different dataset and hence, it is important to understand the context before choosing a metric.

The above diagram will help you understand what metric to choose when evaluating your model.

DAY 74 :

Did a revision of SVMs. Read a blog about it …. #100DaysofMLCode

DAY 75 :

Again going through another Kaggle Kernel.

I was curious how the Standard Scaler function in sklearn did feature scaling. So I did a bit of research on it.

Swipe the images to know more about StandardScaler.

DAY 76 :

Learnt about how the skew function works in the scipy.stats library.

Did this before going ahead with the Kaggle Kernel.

Swipe the pictures to know more about the library function.

DAY 77 :

Got a revision of Naive Bayes. Read a blog which explains how Naive Bayes can be used to solve the famous Titanic Dataset. Where the information of every passenger is given and we have to predict if a particular passenger will survive in the disaster or not.

The blog describes it in a very awesome format.

The Naive Bayes algorithm works on the principle of Conditional probability.

There are three models available in he sklearn library for Naive Bayes. You can see them in the pics above 🔝🔝🔝. There are some pros and cons of the Naive Bayes algorithm :

PROS :

Computationally fast - Simple to implement - Works well with small datasets - Works well with high dimensions - Performs well even if the Naive Assumption is not perfectly met. In many cases, the approximation is enough to build a good classifier.

CONS :

Require to remove correlated features because they are voted twice in the model and it can lead to over inflating importance - If a categorical variable has a category in test data set which was not observed in training data set, then the model will assign a zero probability. It will not be able to make a prediction. This is often known as “Zero Frequency”. To solve this, we can use the smoothing technique One of the simplest smoothing techniques is called Laplace estimation. Sklearn applies Laplace smoothing by default when you train a Naive Bayes classifier.

DAY 78-79 :

Did some research on how speech recognition systems work.

To make a speech recognition system from scratch you will need lots and lots of data. Which is very hard to do.

Also did some revision on clustering.

DAY 80 :

Learning how to make a linear regression algorithm from scratch.

Learnt some new terms :

Mean :

mean(x) = sum(x) / count(x)

Variance :

The variance is the sum squared difference for each value from the mean value.

variance = sum( (x - mean(x))^2 )

Co-Variance :

Covariance is a generalization of correlation. Correlation describes the relationship between two groups of numbers, whereas covariance can describe the relationship between two or more groups of numbers.

covariance = sum((x(i) - mean(x)) * (y(i) - mean(y)))

DAY 81 :

Coded a linear regression algorithm from scratch.

These are the following steps involved in making a linear regression algorithm :

1) Load dataset 2) Creating function for root mean squared error 3) Making the linear regression algorithm function 4) Making functions to calculate mean, variance, covariance. 5) Making function to calculate coefficients. 6) Finally running the algorithm.

DAY 82 :

Today was Day 1 of the AI Summit at @techfest_iitbombay

First there were two keynotes :

1) On AI in Asset Management .

2) On Voice Intelligence Learnt how AI can be used for Asset Management. What is important in the field such as Asset Management is the Training speed. The training speed matters as the data Continously changes and the new data needs to be trained again.

Then I learnt about Voice Intelligence and what problems can occur while making a Voice Assistant.

Then there was a panel session on Digital Marketing and AI.

After that there was a workshop by @samsungindia on how it’s personal assistant Bixby works in the background. Not exactly how it works but a brief overview of it.

Got to know about the fundamental equation of Automatic Speech Recognition (ASR) Systems.

There are three components of ASR :

1) Acoustic Model 2) Language Model 3) Decoder

DAY 83 :

Attended the Day 2 of the AI Summit @techfest_iitbombay .

Got an idea of how Amazon’s Alexa works inside. Also got to know how AI can be applied to health care systems.

Learnt what things to take care of before deploying your AI system.

These are the following points :

1) Difference between Consumer AI and Enterprise AI. Consumer AI has cleaner data and less penalty for failure. While Enterprise AI is more sophisticated and has higher stakes.

2) UI can be used to camouflage ML shortcomings. It can tell you the confidence level of an AI system.

3) Cost of data :

Obvious –> Storage cost and Processing and Maintanence Institutional –> Regulatory Compliance and customer trust.

4) Model Maintanence : - Quick fixes - Cascading effects - Systematic involvement.

DAY 84-86 :

Started with analysing another Kaggle Kernel, which focuses on a Step by Step method to classify diabetes using LNN.

Then read about a very OSEMN method to create a Machine Learning pipeline.

The method is OSEMN :

Meaning :

O –> Obtaining Data S –> Scrubbing / Cleaning data E –> Exploring and Visualising data M –> Modelling the data N –> INterpreting the data

Also learnt about pair plots in seaborn

DAY 87-90 :

Learnt about :

Pearson’s Correlation Coefficient.
Skewed Distribution
Log Transformation

Also I started analysing another kaggle challenge.