# How To Find The Mle For Theta No Calculus

What is the formula to find moles?

Original question: How can one calculate moles?Since both “moles (animal)”, “moles (skin blemish)”, and “moles (measurement)” have been tagged, I’m not quite sure what you are asking. But since the last tag is “chemistry”, I’ll assume you are referring to “moles (measurement)”.If so, you need to use Avogadro’s number.The proper definition aside (you can read that by following the link above), it is basically the number of something in a mole. Remember, a mole is just a number, same as a dozen or a score. It just so happens that a dozen = 12, a score = 20 and a mole = 6.022E23 (approximately).Using that (6.022E23 mol-1), you can calculate the number of moles in a given amount of something (e.g. molecules), and back. If you have the molecular mass of what you want to calculate, you can also calculate between mass, moles, and number of molecules/atoms/whatever.A couple of quick example to answer the question (how one can calculate moles):If you have 1,000,000,000,000 molecules of sucrose, you can calculate the number of moles.$\frac{1,000,000,000,000}{6.022E23 mol^-1}$. This is approximately 1.66E-12.Maybe you are not interested in calculating how many moles a given number of molecules is. Maybe you are more interested in calculating the number of moles in a given mass of a given substance?Then you need a different equation.$n=\frac{m}{M}$. Where n is the number of moles, m is the mass i grams, and M is the molecular weight in grams per mole.As an example, if you have 10 grams of sucrose, then you can quickly google that the molecular weight is 342.3 g/mol. You can then calculate the number of moles by dividing the two: $\frac{10 g}{342.3 g/mol}$. Which gives ~ 0.029 mol.

How do I find the maximum likelihood estimator for $\theta$ of uniform distribution?

(I’m not sure which is the interval for your uniform distribution: $(0, \theta)$ or $(\theta, 2\theta)$? I’m assuming it’s $(0, \theta)$ for my answer below:)The likelihood function $\phi(\theta)$ for a uniform distribution is:$\phi(\theta) = f(X_1|\theta) * f(X_2|\theta) * ... * f(X_n|\theta)$For $X_i$ between $0$ and $\theta$, $f(X_i|\theta) = 1 / \theta$.If all $X_i$s fall inside of $(0, \theta)$, $\phi(\theta) = 1 / \theta^n$For $X_i > \theta$, $f(X_i|\theta) = 0$.If any of the $X_i$s falls outside of $\theta$, $\phi(\theta) = 0$Therefore, to maximize $\phi(\theta)$, $\theta$ needs to “contain” all the $X_i$, which means:$\theta >= X_imax$Or, more precisely, $\theta = X_imax$ - This is your MLE.These lecture notes from MIT’s 18.443 might be what you’re looking for.

How do I learn Machine Learning in 10 days?

10 days? Hm, that’s definitely a challenging task :). However, I think that 10 days is also definitely a time frame where you can get a pretty good overview of machine learning field and maybe get started to apply some techniques to your problems.After reading an introduction to the 3 different subfields (supervised learning, unsupervised learning, and reinforcement learning). I would probably spend the time on simple (yet useful) algorithms that are representative of these fields (and maybe save reinforcement learning for later). E.g., Simple linear regression and Ridge Regression for regression analysis, logistic regression and k-nearest neighbors for classification, and k-means and hierarchical clustering for clustering tasks. Once you understand the goals of each algorithm and how they try to solve a particular problem, it is fairly easy to add more algorithms and approaches to your repertoire. However, besides algorithms, it is also important to know how to prepare your data (feature selection, transformation, and compression) and how to evaluate your models. Maybe, as a starter, you could check out our Machine Learning in scikit-learn tutorial at SciPy 2016. It’s approx. 6 hours and summarizes most of the basics while introducing the scikit-learn library, which can come in handy for implementation and further studies:If you are interested in understanding the math behind the algorithms, Andrew Ng’s cousera course Machine Learning - Stanford University | Coursera (and my book) provide a gentle introduction, but I realize that this is probably not within the scope of 10 days :).

What do you mean by derivative does not exist at a point?

Derivative is all about approximating a given graphed function, at a given point to straight lines. So if a derivative of a function doesn’t exist at a point, it means that the function can’t be approximated as a straight line at that point.You know the steps involved in deriving a derivative at a given point.So derivative at a point x is defined as:And in the limiting case we’ve approximated the curved function at point x as a line:Line L is tangent to the graphed function at x.So now you understand how a derivative is defined:We require 2 points A and B, A at given point x and B at a very small difference (delta x) in either direction (+ve or -ve X-axis)Then a chord is connected through A & Bthen we make the distance b/w A & B smaller and smaller until we get a tangent at that point approximating the function at point x.By direction I meant, we could’ve taken A>B OR AB OR A

Explain the general principle of maximum likelihood estimation. Suppose you flip a (biased) coin N times and it lands heads K times. Can you derive the ML estimator of the coin’s probability of landing heads? I really don't know how to proceed in this question....

Let the probability of obtaining HEADS by tossing that coin be $p$. We would like to find the Maximum Likelihood Estimator of $p$.When you toss the coin and check if the output is HEAD, you observe a random variable $Y$ which follows $Bernoulli(p)$ distribution.More compactly, $P(Y=t)=p^t(1-p)^{1-t}$ where $t=0,1$.Now, the first step is to write the likelihood function of $p$ given your sample (call your sample $(X_1,X_2,...,X_N)$). In this case, the likelihood $L(p|X_1,X_2,...,X_N)=p^{\sum_{i=1}^Nx_i}(1-p)^{N-\sum_{i=1}^Nx_i}$.In particular, since we are given there are $K$ heads, $\sum_{i=1}^Nx_i=K$. So $L(p|X_1,X_2,...,X_N)=p^K(1-p)^{N-K}$Note that I have assumed all tosses to be independent. Now, the principle of Maximum Likelihood Estimation is to find that value of the parameter for which the likelihood is maximized, given the observed sample.By simple Calculus, maximize the function $L(p|X_1,...X_N)$ w.r.t. $p$ and note that the value of $p$ for which this maximum is attained by the likelihood function, is $\hat{p}=\dfrac{K}{N}$.Thus, given your sample, the "most likely" value of $p$ is $\dfrac{K}{N}$, which by definition is your Maximum Likelihood Estimator of $p$.

What are some common machine learning interview questions?