What is the formula to find moles?

Original question: How can one calculate moles?Since both “moles (animal)”, “moles (skin blemish)”, and “moles (measurement)” have been tagged, I’m not quite sure what you are asking. But since the last tag is “chemistry”, I’ll assume you are referring to “moles (measurement)”.If so, you need to use Avogadro’s number.The proper definition aside (you can read that by following the link above), it is basically the number of something in a mole. Remember, a mole is just a number, same as a dozen or a score. It just so happens that a dozen = 12, a score = 20 and a mole = 6.022E23 (approximately).Using that (6.022E23 mol-1), you can calculate the number of moles in a given amount of something (e.g. molecules), and back. If you have the molecular mass of what you want to calculate, you can also calculate between mass, moles, and number of molecules/atoms/whatever.A couple of quick example to answer the question (how one can calculate moles):If you have 1,000,000,000,000 molecules of sucrose, you can calculate the number of moles.[math]\frac{1,000,000,000,000}{6.022E23 mol^-1}[/math]. This is approximately 1.66E-12.Maybe you are not interested in calculating how many moles a given number of molecules is. Maybe you are more interested in calculating the number of moles in a given mass of a given substance?Then you need a different equation.[math]n=\frac{m}{M}[/math]. Where n is the number of moles, m is the mass i grams, and M is the molecular weight in grams per mole.As an example, if you have 10 grams of sucrose, then you can quickly google that the molecular weight is 342.3 g/mol. You can then calculate the number of moles by dividing the two: [math]\frac{10 g}{342.3 g/mol}[/math]. Which gives ~ 0.029 mol.

How do I find the maximum likelihood estimator for [math] \theta [/math] of uniform distribution?

(I’m not sure which is the interval for your uniform distribution: [math](0, \theta)[/math] or [math](\theta, 2\theta)[/math]? I’m assuming it’s [math](0, \theta)[/math] for my answer below:)The likelihood function [math]\phi(\theta)[/math] for a uniform distribution is:[math]\phi(\theta) = f(X_1|\theta) * f(X_2|\theta) * ... * f(X_n|\theta)[/math]For [math]X_i[/math] between [math]0[/math] and [math]\theta[/math], [math]f(X_i|\theta) = 1 / \theta[/math].If all [math]X_i[/math]s fall inside of [math](0, \theta)[/math], [math]\phi(\theta) = 1 / \theta^n[/math]For [math]X_i > \theta[/math], [math]f(X_i|\theta) = 0[/math].If any of the [math]X_i[/math]s falls outside of [math]\theta[/math], [math]\phi(\theta) = 0[/math]Therefore, to maximize [math]\phi(\theta)[/math], [math]\theta[/math] needs to “contain” all the [math]X_i[/math], which means:[math]\theta >= X_imax[/math]Or, more precisely, [math]\theta = X_imax[/math] - This is your MLE.These lecture notes from MIT’s 18.443 might be what you’re looking for.

How do I learn Machine Learning in 10 days?

10 days? Hm, that’s definitely a challenging task :). However, I think that 10 days is also definitely a time frame where you can get a pretty good overview of machine learning field and maybe get started to apply some techniques to your problems.After reading an introduction to the 3 different subfields (supervised learning, unsupervised learning, and reinforcement learning). I would probably spend the time on simple (yet useful) algorithms that are representative of these fields (and maybe save reinforcement learning for later). E.g., Simple linear regression and Ridge Regression for regression analysis, logistic regression and k-nearest neighbors for classification, and k-means and hierarchical clustering for clustering tasks. Once you understand the goals of each algorithm and how they try to solve a particular problem, it is fairly easy to add more algorithms and approaches to your repertoire. However, besides algorithms, it is also important to know how to prepare your data (feature selection, transformation, and compression) and how to evaluate your models. Maybe, as a starter, you could check out our Machine Learning in scikit-learn tutorial at SciPy 2016. It’s approx. 6 hours and summarizes most of the basics while introducing the scikit-learn library, which can come in handy for implementation and further studies:If you are interested in understanding the math behind the algorithms, Andrew Ng’s cousera course Machine Learning - Stanford University | Coursera (and my book) provide a gentle introduction, but I realize that this is probably not within the scope of 10 days :).

What do you mean by derivative does not exist at a point?

Derivative is all about approximating a given graphed function, at a given point to straight lines. So if a derivative of a function doesn’t exist at a point, it means that the function can’t be approximated as a straight line at that point.You know the steps involved in deriving a derivative at a given point.So derivative at a point x is defined as:And in the limiting case we’ve approximated the curved function at point x as a line:Line L is tangent to the graphed function at x.So now you understand how a derivative is defined:We require 2 points A and B, A at given point x and B at a very small difference (delta x) in either direction (+ve or -ve X-axis)Then a chord is connected through A & Bthen we make the distance b/w A & B smaller and smaller until we get a tangent at that point approximating the function at point x.By direction I meant, we could’ve taken A>B OR A**B OR A**

**Explain the general principle of maximum likelihood estimation. Suppose you flip a (biased) coin N times and it lands heads K times. Can you derive the ML estimator of the coin’s probability of landing heads? I really don't know how to proceed in this question....**

**Let the probability of obtaining HEADS by tossing that coin be [math]p[/math]. We would like to find the Maximum Likelihood Estimator of [math]p[/math].When you toss the coin and check if the output is HEAD, you observe a random variable [math]Y[/math] which follows [math]Bernoulli(p)[/math] distribution.More compactly, [math]P(Y=t)=p^t(1-p)^{1-t}[/math] where [math]t=0,1[/math].Now, the first step is to write the likelihood function of [math]p[/math] given your sample (call your sample [math](X_1,X_2,...,X_N)[/math]). In this case, the likelihood [math]L(p|X_1,X_2,...,X_N)=p^{\sum_{i=1}^Nx_i}(1-p)^{N-\sum_{i=1}^Nx_i}[/math].In particular, since we are given there are [math]K[/math] heads, [math]\sum_{i=1}^Nx_i=K[/math]. So [math]L(p|X_1,X_2,...,X_N)=p^K(1-p)^{N-K}[/math]Note that I have assumed all tosses to be independent. Now, the principle of Maximum Likelihood Estimation is to find that value of the parameter for which the likelihood is maximized, given the observed sample.By simple Calculus, maximize the function [math]L(p|X_1,...X_N)[/math] w.r.t. [math]p[/math] and note that the value of [math]p[/math] for which this maximum is attained by the likelihood function, is [math]\hat{p}=\dfrac{K}{N}[/math].Thus, given your sample, the "most likely" value of [math]p[/math] is [math]\dfrac{K}{N}[/math], which by definition is your Maximum Likelihood Estimator of [math]p[/math].**

**What are some common machine learning interview questions?**

**We'd ask the following types/examples of questions, not all of which are considered pass/fail, but do give us a reasonable comprehensive picture of the candidate's depth in this area.In general, pick one or two (that the candidate is good at) and keep going deeper and deeper, rather than go horizontally through some checklist. It is far more indicative of depth.General mastery: when you really understand something (e.g., you've gone through the cycle of learning-doing-teaching-doing), you can express seemingly complex concepts in simple ways. Or you develop insightful views on things at a broader level and can explain it to others. E.g.,:Discuss your views on the relationship between machine learning and statistics.Talk about how Deep Learning (or XYZ method) fits (or not?) within the field.Isn't it all just curve fitting? Talk about that.How are kernel methods different?Why do we need/want the bias term?Why do we call it GLM when it's clearly non-linear? (somewhat tricky question, to be asked somewhat humorously---but extremely revealing.)How are neural nets related to Fourier transforms? What are Fourier transforms, for that matter?Etc.ML skills specific: E.g.,Pick an algorithm you like and walk me through the math and then the implementation of it, in pseudo-code.OK now let's pick another one, maybe more advanced.Discuss the meaning of the ROC curve, and write pseudo-code to generate the data for such a curve.Discuss how you go about feature engineering (look for both intuition and specific evaluation techniques).Etc.Distributed systems (our needs): E.g.,Discuss MapReduce (or your favorite parallelization abstraction). Why is MapReduce referred to as a "shared-nothing" architecture (clearly the nodes have to share something, no?) What are the advantages/disadvantages of "shared-nothing"?Pick an algorithm. Write the pseudo-code for its parallel version.What are the trade-offs between closed-form and iterative implementations of an algorithm, in the context of distributed systems?Etc.Other (hands-on experience, past accomplishments, etc.):Do you have experience with R (or Weka, Scikit-learn, SAS, Spark, etc.)? Tell me what you've done with that. Write some example data pipelines in that environment.Tell me about a time when you ... { worked on a project involving ML ; optimized an algorithm for performance/accuracy/etc. }Estimate the amount of time in your past project spent on each segment of your data mining/machine learning work.Etc.**

**What is an intuitive explanation for the expectation maximization (EM) algorithm?**

**Note that sometimes E-M is used to describe a class of algorithms, as well as a particular algorithm. Here's an analogy that may help (note this is more an instance of EM, but you can see the patterns here): you've never seen fruit in your life before. I dump a bunch of 50 roughly spherical fruit onto your table: everything from an inch to to a foot across. I only tell you one thing, there's 5 types of fruit. But you do know that that these fruit come from different types of trees and that trees don't just produce random stuff, they tend to produce similar things. How do you go about organizing the fruit to work this out? Well, really you have two problems you have to solve: 1. How do I "assign" each of the individual fruits to a particular tree type? Let's call this the missing values or Z for short. 2. What are the characteristics of the fruit of each tree type? Let's call these the "unknown parameters" or theta for short. But these two problems are interrelated: I can use one to help me solve the other one. Here's how I do it. 1. Randomly pick any assignment of fruits to types. In other, make a guess at Z. Let's call this Z(0). Initially, grapes are going to be mixed with watermelons, but that's OK. 2. Now that I've made a random assignment of fruits to types, I can now try to answer the second question: what are the characteristics of each fruit type assuming that they come from the same tree? Well, they have this average size, and this color, and so on. So I can work out that these are. This is the expectation step. I can get theta(0) this way. Some of the "fruit types" will now be towards the smaller end of the spectrum, some will be soft, some will be hard. They'll be all over the place. 3. Now that I've worked out the theta(0), I can now work out a better assignment of fruits to each type, because we know that things from a single tree are similar. Here something magical happens: the grapes are more likely to end up in one group (the ones characterized by small size and being soft), and the watermelons in another one (the ones characterized by big size and being hard). So now I have Z(1). 4. Let's go back to step 2. But instead of Z(0), let's use Z(1), and repeat! At some point, I'll notice that things haven't changed much: Z(11) will be the same as Z(12), for example. I'll also have theta(12). But Z(12) is now a pretty good assignment to different fruit type and their different characteristics!**