It is well established that computers have semantics of some sort. After posting the question Is non-deterministic automated reasoning a viable strategy for solving problems in mathematical logic? (PhilSE), I received a variety of yes, no, and some-days to the answer demonstrating varied viewpoints on computers and automated proof of mathematical logic. An issue the disagreement turned on seemed to be on the nature of what respondents believed an LLM was, did, and was capable of. This brings to the fore an important question about LLMs. Large language models analyze human language, like a collection of proofs in mathematical logic, and construct products that seem to show an awareness of the syntax and semantics of content.
Consider the conversation with ChatGPT:
Human: Which sentence is meaningful? "Bob monkeys purple." or "Bob's monkey is purple."
ChatGPT: The sentence "Bob's monkey is purple." is meaningful. It conveys a clear and coherent statement: Bob has a monkey, and that monkey is purple. In contrast, the sentence "Bob monkeys purple." does not form a grammatically correct or meaningful statement.
Given that LLMs regularly provide meaningful answers to questions about the meaning of natural language, do large language models pose some form of awareness of the semantic content of the corpus they are trained on?