3
$\begingroup$

I ask this in a partly recreational, and partly research-related spirit, and I realize my problem might be ill-posed, so any suggestions for clarification might go a long way.

Succinctly, my problem can be stated as

Find the period of a "black box" pseudorandom number generator

but I suppose the following formulation might be more attractive:

Most programs for playing MP3 files (and MP3-playing devices as well) possess a "shuffle" feature that allows the player to play a list of songs in essentially "random" order.

If you run a music player set to shuffle long enough, you will note that sometimes the time interval between the instances your favorite song in the playlist can be short, or can be long.

Now, for a number of reasons, you can't take a look at what pseudorandom number generator your music player is using; you are however interested in how frequently your favorite song in the playlist would be played.

So, one now asks,

  1. Can you determine how often your favorite song in the playlist would be played based on the time intervals between the different instances your song is played?

  2. How long should your playlist be? Obviously, you can't conclude much if you only have two songs in your playlist...

  3. How long should you leave your music player playing to answer question number 1?

I should probably make the drastic assumption that all the songs in the playlist have the same length, e.g. 3 minutes or so.

  • 1
    If the algorithm is pseudorandom, then shouldn't the expected answer to number 1 be 1/N, where N is the total number of songs? Of course this doesn't apply to things like iTunes which add weights to more frequently manually chosen songs. (Also, the music playing program I currently run literally "shuffles" the play list when you hit that button: between button-presses, the player deterministically loops through all entries using the current order given by shuffling. =p )2010-11-22
  • 0
    It sounds reasonable, but in the one I'm using (mplayer on Linux), I've found that sometimes it takes a while in between the instances a favorite song of mine in my playlist has played, and sometimes it takes a few songs (e.g. my playlist has a length of 100, my favorite song was the 96th one played in one cycle, and then the third one played in the next cycle). I definitely haven't seen the last song of one cycle being the first song of the following cycle, though.2010-11-22
  • 1
    I'd agree that this is ill-posed.2010-11-22
  • 0
    Robin, what do you think is missing that could turn this into a well-posed problem? (As I said, this is in fact related to some research I'm doing, so I'd be interested in hearing how to make this soluble.)2010-11-22
  • 0
    I think it's an interesting question. Certainly a more probability/statistics question than pure mathematics. Have you tried already to determine what you would expect as answers to your questions if the shuffling is really random? Trying to determine that and then looking for possible deviations is, I think, a good approach.2010-11-22
  • 0
    In the "actual research", I have data output by the "black box"; for the "recreational" part, I've only gone so far as to note that the spacing in between instances my favorite song is played is pretty erratic.2010-11-22
  • 0
    A similar challenge comes up (finding the unknown period of a cycle) with factoring by Pollard's rho. In that case, however, observing a repeated value implies the intervening interval is an exact multiple of the cycle. That's clearly not the case with your music shuffler. Perhaps a better set of observations would be of consecutive songs (not just a single song). For the problem to be well-posed, I imagine we need to open the blackbox and give some description of its pseudo-random generator.2010-11-22
  • 0
    So, without the ability to peer into the black box (e.g. looking at the source code of the music player), the question is insoluble?2010-11-22
  • 1
    Not necesssarily looking at the source code, but making some assumptions about the PRNG and how the music shuffle is chosen. For example you have not observed the same song playing twice in a row. There may well be code that asks for "a do over" if the random choice comes up the same as the last choice, or this "feature" may be implemented in another way. If, for example, it was observed that the same song is never played within three times (so long as the playlist has four or more songs), this would suggest the music player tracks the last three songs.2010-11-23

1 Answers 1

2

Here's a practical procedure, which assumes that the Shuffler is cyclic. Construct some hash function $H$ from $k$-tuples of songs to $n$ bits (the parameters $k,n$ will be decided upon later). Store the value of $H$ on the first $k$ songs, and look for a matching value of $H$ on a future $k$-tuple. If $k$ is large enough, then the expected number of occurrences of this $k$-tuple in the entire cycle is $<1$, and so its likely that a repeat means that you've reached the end of the cycle. If $n$ is big enough (larger than $\sqrt{N}$, where $N$ is the period), then it's probable that matching hashes means matching $k$-tuples.

In order to increase your certainty, you can store a much larger prefix, and then use it whenever the hash value matches. In fact, you can probably use several hash functions this way to optimize the procedure.

If you can't assume that the Shuffler is cyclic, but rather that from some point on it will cycle, you can store all the hashes in a hash table (along with their time) and wait for a hash that appears twice. If you're allowed to read the input twice, you can use any of the Pollard rho algorithms, as hardmath commented.