They key capability that humans have that I've yet to see in an LLM is the ability to recognize when they would not be capable of doing a task well and refuse to do it poorly instead. The only times I've ever seen LLMs give up on a problem are when the prompting is very explicitly crafted to try to elicit a response like that when necessary or after very long back-and-forth exchanges where they get repeated feedback about unsatisfactory results. I think this has pretty dire implications in terms of what the consequences are for deploying them in any scenario where failure has significant risk or the output can't be immediately audited for correctness.