1

I'm thinking of words that appear in sentences which exist primarily to give structure. So examples might be:

"a" "of" "and" "are"

For example, when searching on Google for a phrase like "catch a butterfly", the results list will show bold highlights of the exact phrase, or isolated occurrences of "Catch" or "Butterfly" but it will not bold isolated occurrences of those structural words listed above.

Is there a name for this type of word? Also, is there a list of these words somewhere? I ask because I'm implementing a similar highlighting feature and want to exclude all the single occurrences of words which aren't germane to the searched term.

1 Answers1

3

In Web Search (or general text Search) , these words have a technical name "STOP WORDS", and a list of such words has a technical name "STOP LIST", which consists of your examples and many many more words.

Basically, STOP WORDS are those which are very common, do not add meaning to the query and not really good enough to filter the DataBase or Corpus.

Explore more here:

https://pythonspot.com/nltk-stop-words/

https://codingcanvas.com/full-text-search-stoplist-and-stopword/

https://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html

Prem
  • 4,736
  • 3
    Stop words are any words in a 'stop list' and may include very common lexical items as well as function words. From Wikipedia: There is no single universal list of stop words used by all natural language processing tools, nor any agreed-upon rules for identifying stop words, and indeed not all tools even use such a list. Therefore, any group of words can be chosen as the stop words for a given purpose. – Edwin Ashworth Oct 05 '21 at 18:11
  • 1
    @EdwinAshworth , but that does not change the fact that OP is actually asking for STOP WORDS & a STOP LIST .... : See his last Para, especially the last sentence .... ; Just that he does not know the terminology .... : Even Google uses this terminology !! – Prem Oct 05 '21 at 18:21
  • 1
    'I'm thinking of words that appear in sentences which exist primarily to give structure. So examples might be: "a" "of" "and" "are".' // But xpo6.com has: 'Here is a list of [E]nglish stop words:

    a about above across after afterwards ... hundred ... interest ... me ...'. This is not what OP's actual question is asking about. Certain stop lists, or reduced stop lists, may well be of help to them, but ELU addresses precise queries and seeks to give precise answers. And requests for lists have been off-topic for many years.

    – Edwin Ashworth Oct 05 '21 at 18:38
  • 1
    @EdwinAshworth , this is exactly what he is asking for ; See his last Para. Even his middle Para talks about this Google Web Search ; But, he is evidently not sure about the terminology and thus he is assuming that these are some sort of structural words. – Prem Oct 05 '21 at 18:48
  • 1
    @EdwinAshworth , also regarding your comment about lists, I have helped him with the correct terminology and given him Pointers where he can get standard lists to modify according to his wishes. Precise Query : "What are these words ignored in Web Search (which have been unknowingly called structural words, for lack of a better word) called ?" & Precise Answer : "These are STOP WORDS in STOP LISTS" – Prem Oct 05 '21 at 18:59
  • 1
    @EdwinAshworth , I saw your reference to https://xpo6.com/list-of-english-stop-words/ but that seems to be a Non-English Speaker (maybe Albanian) using "autotranslate" in http://xpo6.com/what-I-do/ ; that list is neither good nor general. My Answer contains a reference [ https://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html ] containing a better or general stop list. – Prem Oct 05 '21 at 19:54
  • 1
    I find this q, a, and comment thread intriguing. At first I thought "Computing-terms answer on ELU? Off-topic!" and downvoted this answer. Then I noticed that the OP was in fact describing using "stop words" as explained here and I removed my downvote. Then I thought more and added an upvote: Although the OP tagged "grammar," IMO it should have been "phrase request," and this provides the word they didn't know they were looking for, a specific semantic label for their activity. It just happens to align more to the task at hand than to a linguistic functionality. – Andy Bonner Oct 05 '21 at 20:54
  • Yes, Andy, I have answered exactly what the OP unknowingly wanted, thanks for spending time to think it over and reverting your voting ! I did try to convince @EdwinAshworth but I have been unsuccessful ; At least, I was able to convince Andy. – Prem Oct 06 '21 at 07:31
  • I repeat: 'There is no single universal list of stop words used by all natural language processing tools, nor any agreed-upon rules for identifying stop words.... Therefore, any group of words can be chosen as the stop words for a given purpose.' [Wikipedia] So the stop list I quoted is a totally valid choice. It includes words not fulfilling OP's 'What are common structural words called?' / 'I'm thinking of words that appear in sentences which exist primarily to give structure.' // 'Stop lists' may well help OP, but is not the correct answer to the question. – Edwin Ashworth Oct 06 '21 at 10:22