Here's the way I think of it. Recall that the entropy S of a system can be defined in terms of the number of microstates Ω associated with the arrangement of the system (this sounds confusing but maybe it'll be more clear with an example). The relation between the two is S = k ln Ω.

Now let's say our system consists of 10 indistinguishable coins all initially heads. This arrangement only has one microstate associated with it, since only one arrangement of 10 coins gives the sittuation where all 10 coins are heads. Therefore, the entropy for this system is S = k ln 1 = 0.

What would you expect to happen when you shake up the box? The coins will tend toward the most probable configuration: 5 heads and 5 tails. Why do they do this? Because there are more microstates associated with the 5 heads/5tails arrangement. For example, HHHHHTTTTT, HTHTHTHTHT, HHTTHHTTHT, etc. are all different microstates in the ensemble of 5 heads/5 tails systems. In fact, there are Ω = 10!/(5! 5!) = 252 such microstates giving an entropy S = 5.5k. Of course, in this example you would not be surprised to see a state with 4 heads and 6 tails because the entropy associated with this system (Ω = 210, S = 5.3k) is very close to the entropy of the 5 heads/5 tails case. However, in thermodynamic systems, you have very large numbers of particles (N ~ 10^{23}) and you would not expect to see much deviation from the most probable case.

The above example illustrates that states with high entropy represent states that are more probable. The situation gets more complicated when you factor in different energies associated with the different states, but the concept remains the same. Therefore, the second law of thermodynamics becomes very intuitive: as time goes on, the universe tends toward the most probable state.