For as rigorous of a Turing test as you present, I believe many (or even most) humans would also fail it.
How many humans seriously have the attention span to have a million "token" conversation with someone else and get every detail perfect without misremembering a single thing?
Response quality degrades long before you hit a million tokens.
But sure, let's say it doesn't. If you interact with someone day after day, you'll eventually hit a million tokens. Add some audio or images and you will exhaust the context much much faster.
However, I'll grant you that Turing's original imitation game (text only, human typist, five minutes) is probably pretty close, and that's impressive enough to call intelligence (of a sort). Though modern LLMs tend to manifest obvious dead giveaways like "you're absolutely right!"
How do you propose to do a Turing test on a human (in a sense that is different from a machine simply passing the Turing test)?
Like failing to pick out all the motorcycles in a captcha, or a turing test where you have a guy chat with two people without knowing that one of them could be a computer, and the interrogator, unprompted, suggesting one of them might be a computer?
How many humans seriously have the attention span to have a million "token" conversation with someone else and get every detail perfect without misremembering a single thing?