1

How many words would be required for a comprehensive English learners dictionary and what level of effort would be required to create such a dictionary from scratch?

"Comprehensive" means it has enough of the most commonly used words to support someone who uses it until they reach native-level fluency.

"Level of effort" means the approximate person hours for creation of the dictionary content (headwords, pronunciations, meanings) but excluding anything technical like creation of the database/system used to do the work.

"From scratch" means, it could be done in a way that the resulting dictionary would have no legal obligations to any other party, i.e. definitions get created from the creators' own knowledge and research and not from other copyrighted dictionaries.

For example, one might answer "12,000 words and 10,000 person hours."

EDIT 3/2/2021: The original question, left as background above, was closed as "opinion-based" and because questions must be "answered with facts and citations" so I'd like to clarify the question in a way that can be answered with facts and citations: Are there any published dictionary works that have published the number of entries contained in the work as well as the level of effort that was required to create them? I actually found one for which I can provide an answer if this question is opened back up, and perhaps there would be more examples from others with the question posed in this way.

The Motivation:

As a software developer who has created multiple language-related websites, I have frequently found myself in a position of needing programmatic access to a free dictionary. There are tons of web-based dictionaries that say they are free but they are not really free... especially if you want to use them for a commercial project, such as an English language learning website. Even for academic research, non-profit use, or similar projects, many dictionary providers will require you to pay them, sometimes for each word you want to look up programmatically, and of course there are all kinds of restrictions on how you use their material (like you can't make copies of anything to be stored on your computer), as well as things you have to do (copyright notices, links to their site, etc). I've even found some that are free, but they're illegally giving access to another party's copyrighted material.

I wish there was a dictionary out there that was so free the creator would literally put a link on their homepage to download their database, including all dictionary content, with no login required and no strings attached. Literally anyone could download it and do whatever they wanted with it for free and with no obligations.

I don't think such a dictionary exists. If it does I'd be happy to know where to find it. Otherwise, I'm thinking of trying to get one created. I've got the technical ability to create such a database and make it available online, but I have zero ability to actually create the content. So, I'm trying to figure out the level of effort. If I'm told 10,000 hours would do it, it doesn't seem impossible that I could find 100 qualified people to donate 100 hours each (over the course of a year or two) to do the work... I'd donate my time to do the tech work, to publish it in real time as it is being built, and to support the type of truly free access described above.

Kairei
  • 11
  • 1
    See https://en.m.wikipedia.org/wiki/Wiktionary. It is apparently based on out-of-copyright dictionaries. Also has entries created by bots. – Xanne Feb 12 '21 at 04:44
  • Thanks for the comment. I've seen Wiktionary but it comes with a dual license, both of which have some of the undesirable requirements described above. I'm also not wild about the bots and dated source material - I'd want human creation and up to date language. – Kairei Feb 12 '21 at 06:20
  • What about WordNet as mentioned in https://english.stackexchange.com/q/8233/191178 ? I guess you can argue that it comes with “strings attached” but if that string is that you have to cite them, is that really a dealbreaker? – Laurel Feb 12 '21 at 16:46
  • Thanks for the suggestion. Yes, I reviewed WordNet as well and yes I consider any requirement to include any notice a deal breaker as it means the end user has to worry about if some original content owner (or some future new owner of that copyright) might make changes or otherwise cause issues. The WordNet license page even has what I'd consider a "warning" about making sure you have an attorney review the license based on the planned use. – Kairei Feb 12 '21 at 17:05
  • While I'm pretty convinced there isn't one, I certainly appreciate more potential "free" dictionaries... but getting some opinions on the main question of the level of effort to create one would be very much appreciated. I've continued to do my own research and found something called COMLEX which has about 35,000 entries. According to https://nlp.cs.nyu.edu/comlex/index.html, it was created "by a team of four linguistics graduate students, working half-time for approximately one year." That's about 4,000 person hours for 35,000 entries (about 8 or 9 per person per hour). Sound reasonable? – Kairei Feb 13 '21 at 02:11
  • Any such work by a single person is going to be unmoderated: biased (what is essential?), lacking (OED consists of many hundreds of pages, and Wiktionary contains many more headwords), and almost certainly error-containing. – Edwin Ashworth Mar 02 '21 at 17:01
  • The recent edit turns this question from one inviting opinions into one about the existence of resources, which is off-topic. I think re-opening the question so that it can be re-closed for a different reason is unnecessary, so I'm voting against re-opening. Feel free to ask on our Meta page whether your question should be re-opened; asking there can sometimes provide the avenue for an oblique answer to the substantive question. :-) – Chappo Hasn't Forgotten Mar 03 '21 at 01:05
  • Thanks for the info. I guess I've just never quite understood the motivation to close questions that might provide useful info to the community. There are many requests for opinions and/or resources that don't get closed, e.g. there's a question here asking "what is the best dictionary for Indian English?" which is asking for both opinion and resources, and it wasn't closed. I'm guessing a number of people have found the answer to that question quite useful... but I digress. I'm not sure what the "Meta page" you mentioned is. Would it be possible to provide a link? – Kairei Mar 03 '21 at 18:35

0 Answers0