Pedrinho covered the Online part of your question very well, so I'll answer the other two.
Strictly speaking, when we pose an optimisation problem, solving it means finding its global solution. If the problem is also continuous, finding that solution also means satisfying the conditions for optimality. If the problem is discrete, it means that we need to prove there is no other discrete solution better than the one we found, which we typically do with branch and bound.
This type of classical solving is often not possible in practice, either because the problem is too large or because the equations are black-box.
In these cases, we would use stochastic optimisation methods, i.e., methods that seek to find some feasible solution to our problem without caring about proving much else beyond that. Examples include simulated annealing, evolutionary algorithms, particle swarm optimisation, and so on. Note that it's a bit hard to be exact in this definition, because many methods use a blend of characteristics, e.g., depending on how we do simulated annealing we could also seek to satisfy the optimality conditions like we do in local optimisation. In any case, stochastic methods will never seek to prove that a solution is globally optimal.
The main characteristic of stochastic methods is that there is an element of randomness every time we run them, i.e., we can get a different result every time. Note that this is still not a necessity, as even a particle swarm method could deterministically generate a random seed which would make the results reproducible. Examples would include generating random values for variables or iterates.
Furthermore, I have also seen people use this term to describe something totally different, i.e., creating and solving models that encode real-world stochasticity (e.g. financial modelling, forecasting, and so on). I haven't seen a formal definition to distinguish between the two, but I personally use "stochastic optimisation methods" to refer to the actual algorithmic part, so that it can't be mistaken for the modelling bit.
Along similar lines, I have seen the term Robust Optimisation used in two different ways. The first one is Wikipedia's definition, which says that we seek a certain level of robustness against uncertainty, e.g., to configure a power plant to run such that there is 95% probability that we will be able to meet demand every day. This is very similar to stochastic modelling, the main difference being that here we have a very specific objective, which is not necessarily true in other models that simply include stochasticity.
The second definition is very deterministic: we seek to minimise fluctuations in a system, which can be modelled deterministically. For example, I once worked on optimising aircraft wings such that we achieved maximum lift whilst ensuring that they would never be able to oscillate beyond specifications due to aeroelastic effects. There was nothing stochastic there, as the aeroelasticity equations are deterministic and the turbulence models are very precise. In mechanical engineering, we call that robust optimisation, since we are not simply optimising for performance - we are ensuring that a system will be robust (not break) during its operation.