#  | 
        Question  | 
        Topic  | 
      
	  
	  
        Q1  | 
        What are the differences between classification and clustering? | 
        Week1  | 
      
	  
	  
        Q2  | 
        In which cases is the normal equation good for a linear regression problem (in terms of dimension & size of data)? | 
        Week2  | 
      
	  
        Q3  | 
        Why do we use different types of error functions: Mean Square Error and Cross Entropy Error? | 
        Week2  | 
      
	  
	  
        Q4  | 
        Discuss the relation between mixing coefficient (𝛑) and responsibility (γ) in a GMM model. | 
        Week3  | 
      
	 
	  
        Q5  | 
        Discuss the relation between GMM and HMM based on their corresponding parameters. | 
        Week4  | 
      
	  
	  
        Q6  | 
        Why do we introduce the marginalization out (z{k-1}) in the derivation of the solution for viterbi decoding algorithm? | 
        Week4  | 
      
		  
	  
        Q7  | 
        What is the phytiscal meaning of lagrange multiplers shown in the dual problem of SVM? | 
        Week5  | 
      
	  
        Q8  | 
        Kernel trick: why is it called a trick? | 
        Week5  | 
      
	  
	  
	  
        Q9  | 
        What is the difference between XTX and the covariance matrix from cov(X)? | 
        Week6  | 
      
	  
        Q10  | 
        Explain the relation between prinicipal components and eign values/vectors of the covariance matrix from cov(X). | 
        Week6  | 
      
	  
	  
	  
        Q11  | 
        Explain why cross entropy error function goes well with Softmax activation function. | 
        Week7  | 
      
	 
	  
        Q12  | 
        Prove the partial derivative shown in slide 33 (it is related to Q11). | 
        Week8  | 
      
	 	  
	  
        Q13  | 
        What do 1) epoch and 2) batch mean in the training of a machine learning model? | 
        Week9  | 
      
	 
	  
        Q14  | 
        Explain the difference between gradient descent (GD) and stochastic gradient descent (SGD). | 
        Week9  | 
      
	  
        Q15  | 
        Why is LSTM robust against gradient vanishing problem compared to vanilla RNN? | 
        Week10  | 
      
	  
        Q16  | 
        Explain what dropout is and how it can be applied in RNN. | 
        Week11  | 
      
	  
        Q17  | 
        Discuss pros and cons of both VAE and GAN models. | 
        Week12  | 
      
	 
	  
	  
        Q18  | 
        What is Batch Normalization and why do we use it? | 
        Week13  | 
      
	
	  
        Q19  | 
        Explain why action value fucntion is preferable to state value function in RL. | 
        Week14  | 
      
	  
	  
        Q20  | 
        How does experience replay help the convergence of DQN? | 
        Week14  | 
      
	
	  
      
        Q21  | 
        How does entropy regularization help the exploration in PG based RL algorithms? | 
        Week15  | 
      
	  
	  
        Q22  | 
        Your comments on this course for its future improvement. | 
        PML  |