In this article, I will take you through 52 Machine Learning(ML) Interview Questions and Answers for Freshers. Today machine learning is one of the most demanding skillset required by multiple organizations across the globe. So it is important that whenever you are going for an interview, then it is mandatory that you should go through below frequently asked interview questions. It will help you in cracking any machine learning based interviews.
52 Machine Learning(ML) Interview Questions and Answers
1. What is Machine Learning ?
Ans. Machine Learning(ML) is a branch of Artificial Learning(AI) that deals in developing computer applications which can predict and perform complex tasks in self sufficient manner.
2. What are different Machine Learning(ML) algorithms supported by Google's BigQuery ?
Ans. The list of ML algorithms supported by BigQuery ML are as follows:-
- Linear regression: It is used to forecast numerical values with a linear model
- Binary logistic regression: It is used for classification use cases when the choice is between only two different options (Yes or No, 1 or 0, True or False)
- Multiclass logistic regression: It is used for classification scenarios when the choice is between multiple options
- Matrix factorization: It is used for developing recommendation engines based on past information
- Time series: It is used for forecasting business KPIs leveraging timeseries data from the past
- Boosted tree: It is used for classification and regression use cases with XGBoost
- AutoML table: It is used to leverage AutoML capabilities from the BigQuery SQL interface
- Deep Neural Network (DNN): It is used for developing TensorFlow models for classification or regression scenarios, avoiding any lines of code
3. What are Clustering Algorithms ?
Ans. Clustering algorithms are unsupervised algorithms that offer a unique solution to problems where finding the closest match to related items is the desired solution.
4. What is Matrix Factorization Algorithm ?
Ans. Matrix Factorization Algorithm is a powerful algorithm which provides recommendation based on historical data available.
5. What are different machine learning(ML) algorithms supported by ML.NET ?
Ans. Currently ML.NET has catalogue of below algorithms:-
- Anomaly detection
- Binary classification
- Time series
6. What is Anomaly detection algorithm ?
Ans. As the name suggests, it is an algorithm which looks for unexpected events in the data(taken over a period of time) submitted to the model.
7. What is Regression algorithm ?
Ans. It is another very powerful machine learning algorithm which returns a real value as opposed to a binary algorithm or ones that return from a set of specific values.
8. What is Binary Classification algorithm ?
Ans. It is a supervised machine learning algorithm which always returns an output of true or false(0 or 1).
9. What is ML.NET ?
Ans. ML.NET is a Microsoft premiere machine learning framework to create, train and run models in the .NET ecosystem.
10. What is Mean Squared Error(MSE) ?
Ans. It is defined as the measure of the average of the squares of the error. This method is mainly used to evaluate models when outliers are critical to the prediction output.
11. What is R-Squared ?
Ans. R-squared, also called as coefficient of determination is another method to represent the accuracy of the prediction as compared to the test set. It is calculated by taking the sum of the distance between every data point and the mean squared, subtracting them and then squaring it
12. What is Mean Absolute Error(MAE) ?
Ans. This method is also very similar to the Mean Squared Error(MSE) with the difference that it sums the distances between the points and the prediction lines as opposed to computing the mean.
13. What is Accuracy ?
Ans. Accuracy is known as the proportion of correct predictions to incorrect predictions in the test dataset. Values closer to 100% will be said as more accurate.
14. What is Precision ?
Ans. It is the ratio of correctly predicted positive observations to the total predicted positive observations in the test data set.
15. What is Recall ?
Ans. It is the ratio of correctly predicted positive observations to the all observations in actual test data set.
16. What is F1 Score ?
Ans. F1 Score is the harmonic mean of both precision and recall. A value close to or equal to 100% denotes the higher accuracy.
17. What is Log-Loss Reduction ?
Ans. It is basically an evaluation metric describing the accuracy of the classifier as compared to a random prediction. A value close to or equal to 1 is preferred, as the model's relative accuracy improves as the value approaches 1.
18. What is Log Loss ?
Ans. It is an evaluation metric describing the accuracy of the classifier. Log Loss takes into account the difference between the model's prediction and the actual classification. A value close to 0 is preferred, as a value of 0 indicates the model's prediction on the test set is perfect.
19. What is Multiclass Logistic Regression ?
Ans. Multiclass logistic regression is a classification technique that can be used to categorize events, objects, customers, or other entities into multiple classes.
20. What is Binary Logistic Regression ?
Ans. Binary Logistic Regression is a classification ML technique that can be used to predict a categorical variable which assumes only two values - true or false.
21. What is Area Under Precision-Recall Curve ?
Ans. The area under the precision-recall curve (AUC-PR) is a model performance metric for binary responses that is appropriate for rare events and not dependent on model specificity
22. Where can we apply binary logistic regression ?
Ans. It can be applied to a case when the variable to predict is binary and can assume only two values, such as true or false, yes or no, or 1 or 0.
23. What are the use cases of Clustering Model ?
Ans. Some of the potential use cases could be:-
- Natural disaster tracking such as earthquakes or hurricanes and creating clusters of high-danger zones
- Book or document grouping based on the authors, subject matter, and sources
- Grouping customer data into targeted marketing predictions
- Search result grouping of similar results that other users found useful
24. What are the use cases of matrix factorization ?
Ans. Some of the potential use cases could be:-
- Music recommendations
- Product recommendations
- Movie recommendations
- Book recommendations
25. What is Autoregressive Integrated Moving Average (ARIMA) ?
Ans. It is a statistical analysis model that uses time series data to either better understand the data set or to predict future trends. A statistical model is autoregressive if it predicts future values based on past values.
26. What is Stochastic Dual Coordinate Ascent (SDCA) algorithm ?
Ans. Stochastic Dual Coordinate Ascent (SDCA) has recently emerged as a state-of-the-art method for solving large-scale supervised learning problems formulated as minimization of convex loss functions. It performs iterative, random- coordinate updates to maximize the dual objective.
27. What is Entropy ?
Ans. Entropy is defined as the randomness or measuring the disorder of the information being processed in a Machine Learning Project. It is measured between 0 and 1.
28. What is Support Vector Learning ?
Ans. “Support Vector Machine” (SVM) is a supervised machine learning algorithm that can be used for both classification or regression challenges. However, it is mostly used in classification problems.
29. What is training in ML Model ?
Ans. The creation phase of the ML Model is called training. In this phase, a machine learning algorithm is fed with training data from which it can learn.
30. What is classifier in ML Model ?
Ans. Classifier in ML model is an algorithm which categorizes data into multiple set of classes based on a label.
31. What is Supervised Learning ?
Ans. It is a machine learning technique which takes a set of datasets as input and provides correct output data to machine learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y).
32. What is Collaborative Filtering(CF) algorithm ?
Ans. Collaborative Filtering is a machine learning algorithm which is used for identifying relationships between pieces of data. This techniques is frequently used in recommender system to identify similarities between user data and items.
33. What is eXtreme Gradient Boosting (XGBoost) ?
Ans. It is known to be one of the most powerful machine learning (ML) libraries that data scientists uses to solve complex use cases in an efficient and flexible way. It provides a portable gradient boosting framework for different languages.
34. What is Artificial Neural Network(ANN) ?
Ans. An artificial Neural Network is a system of computational model that are composed of multiple nodes just like the neurons connected in our brain.
35. What is Capsule Neural Networks(CapsNet) ?
Ans. It is a type of artificial neural network which uses the idea of adding structures called capsules to a convolutional neural network (CNN). A capsule is a group of neurons that learn to detect different features of an object like position, size, orientation, deformation, velocity, albedo, hue, texture, and so on.
36. What are different types of DNNs allowed to be created by BigQuery ML ?
Ans. BigQuery ML allows you to create two different types of DNNs:-
- DNNClassifier: This algorithm can be used to classify events, objects, or entities into a finite number of discrete classes.
- DNNRegressor: This kind of model is similar to the previous one with the difference that it returns a continuous result. For this reason, it can be used to predict numerical values.
37. What is the difference between training and inference ?
Ans. In the training phase, a developer feeds their model a curated dataset so that it can “learn” everything it needs to about the type of data it will analyze. Then, in the inference phase, the model can make predictions based on live data to produce actionable results.
38. What is Core ML ?
Ans. It is a suite of tools used to facilitate the process of bringing ML models to iOS and wrapping them in a standard interface so that you can easily access and make use of them in your code.
39. What are the main components required for Machine Learning(ML) ?
Ans. The main components required for this process, specifically for supervised learning, include: -
- Input data points: For image classification, we would require images of the domain we want to classify, for example, animals.
- The expected outputs for these inputs: Continuing from our previous example of image classification of animals, the expected outputs could be labels associated with each of the images, for example, cat, dog, and many more.
- A ML algorithm: This is the algorithm used to automatically learn how to transform the input data points into a meaningful output. These derived sets of rules are what we call the model, derived through a process of learning called training.
40. What is Convolutional Neural Networks(CNN) ?
Ans. A Convolutional Neural Networks(CNN) is a type of artificial neural network which uses convolution in place of general matrix multiplication for Image recognition and processing. It eliminates the need for manual feature extraction.
41. What is AI Platform notebooks ?
Ans. It is a fully managed service that allows data engineers and data scientists to use a JupyterLab development environment. This service allows us to develop, evaluate, test, and deploy machine learning models.
42. What are the advantages of using AI Platform Notebooks with JupyterLab ?
Ans. Below are few of the advantages of using API Platform notebooks with JupyterLab:-
- We can easily set up our pre-configured machine learning environments with the most important and useful ML libraries, such as TensorFlow, Keras, PyTorch, and others.
- We can leverage the scalability of the cloud by increasing the size of the hardware resources according to our requirements. For example, we can improve the performance of our notebook by scaling up the RAM or adding Graphics Processing Units (GPUs).
- We are allowed to access the other Google Cloud services from our notebook without the need to perform any additional configuration. From AI Platform Notebooks, we can easily leverage BigQuery, Dataflow, and Dataproc.
- We can integrate our development environment with code versioning applications such as Git.
- We can easily share our notebooks with colleagues or friends, thus making collaboration faster and increasing productivity. More about JupyterLab.
43. What is TensorFlow ?
Ans. TensorFlow is an open source library that's used to develop ML models. It's very flexible and can be used to address a wide variety of use cases and business scenarios.
44. What is Inductive Logic Programming(ILP) ?
Ans. Inductive Logic Programming is a sub field of Machine Learning which uses logic programming to learn as a uniform representation for examples, background knowledge and hypotheses. Inductive logic programming is particularly useful in bioinformatics and natural language processing.
45. Which of the Python libraries are currently available for use in Machine Learning ?
Ans. Below are few of the libraries frequently used for machine learning:-
46. Which of the Java libraries are currently available for use in Machine Learning ?
Ans. Below are few of the Java libraries frequently used for machine learning:-
- Apache Mahout
- Spark MLlib
- The Encog Machine Learning Framework
47. Which of the GO libraries are currently available for use in Machine Learning ?
Ans. Below are few of the GO libraries frequently used for machine learning:-
48. What is Unsupervised Learning ?
Ans. As indicated from its name, it is a type of machine learning technique in which models are not supervised using training dataset.
49. Which of the C++ libraries are currently available for use in Machine Learning ?
Ans. Below are few of the C++ libraries frequently used for machine learning:-
- Microsoft Cognitive Toolkit(CNTK)
- mlpack library
50. Which of the R libraries are currently available for use in Machine Learning ?
Ans. Below are few of the R libraries frequently used for machine learning:-
52. Which is Deep Learning ?
Ans. Deep learning is a subset of machine learning inspired by the layers of brain, structure algorithm in layers to create an artificial neural network just like in the brain.