It works like this. You take a phrase, like who would be the best man for the job?. You reduce it to one word, who?. You do the same for when should this job be done?. You then use these phrases as nouns: what matters most is the who, not the when. Sentences like His no really shattered my dreams and I want no ifs or buts! are results of the same process.
You could choose to add quotation marks and question marks with these questions-turned-one-word-nouns, but that is usually not done. They are mostly either not marked at all, or written in italics.
The implicit question behind this who is rather free: it could be whom do you consider the best man for the job?, or who should do it?. Since there is no way to tell what the implicit question was if you use only who, you are theoretically free to choose who or whom, unless context forces you into assuming one implicit question or another. Conventionally most people would use the neutral form who in most cases, which is what I recommend.
The use of who and when in this fashion is informal. If you're aiming at a rather formal style, you'd better avoid such constructions. Incidentally, the current style of your quote is rather informal as well.