Wikipedia talk:Lists of common misspellings/For machines
This is the talk page for discussing improvements to the Lists of common misspellings/For machines page. |
|
note
[edit]Note:
- Each line starts with a space. You can use some basic regular expressions to remove that.
- The replace is not always suited for a simple find-replace, eg "accomadate->accommodate, -ed, -es, -tion, -tions, -ing"
r3m0t talk 13:55, Apr 3, 2005 (UTC)
- "anual->annual or anal, except "Anual" is a proper noun". How is that machine readable? maybe it should have way to add comments? - Omegatron 20:25, Jun 25, 2005 (UTC)
- It should have a commenting method. I will work on that. In the meantime, let me know if working on commenting interferes with any programs, etc. — MATHWIZ2020 TALK | CONTRIBS 00:48, 28 December 2005 (UTC)
- The page had no standards, so I did the following: Every line begins with a space, the misspelling, ->, the primary re-spelling, and then any additional spellings (separated by ", "). Comments appear on the next line and begin with " *". — MATHWIZ2020 TALK | CONTRIBS 01:22, 28 December 2005 (UTC)
- It should have a commenting method. I will work on that. In the meantime, let me know if working on commenting interferes with any programs, etc. — MATHWIZ2020 TALK | CONTRIBS 00:48, 28 December 2005 (UTC)
Format
[edit]A remark/question: most lines end in a space. Is this intentional? It seems a little odd to me; my script has to explicitly strip this out. (Note that three lines do not currently end in a space). Lupin|talk|popups 15:37, 27 May 2006 (UTC)
- I removed all the trailing spaces. Hope it didn't break anybody's scripts. Wmahan. 02:25, 30 May 2006 (UTC)
Random notes
[edit]Here are some notes from my work on correcting spelling:
- The first thing I do is remove all entries with more than one suggested replacement, since these are more difficult to correct automatically.
- I treat each line that doesn't start with a space as a comment. That way there's no need for a special comment syntax.
- I use case-insensitive matches for the words in this list, so I can fix misspellings at the beginning of sentences. To fix case-sensitive misspellings and repeated words, I use an extended syntax where the first word is a PCRE.
And a couple of questions:
- Some of the entries are for misspelled contractions, such as didnt->didn't. The MOS says to avoid contractions except in quotations, so should we change these to didnt->did not, and so on?
- So I've collected about 1700 more misspellings that occur commonly on Wikipedia. Would anybody mind if I add them?
Wmahan. 20:37, 4 June 2006 (UTC)
- I think we shouldn't have didnt->did not in the list. This list should just correct spellings, so that machines can deal with that and not have to worry about whether something is part of a quotation or not. Changing didn't to did not could be the topic of another list. Lupin|talk|popups 21:45, 30 June 2006 (UTC)
Valid (but sometimes obscure words) removed
[edit]I've removed the following lines:
592: calender->calendar, calender..machine 1018: cypher->cipher, cypher 1043: decypher->decipher, decypher 1044: decyphered->deciphered, decyphered 1339: encypher->encipher,encypher 1680: grat->great, grat..greet 2893: protem->protem, protein 3375: staring->staring, starring 3920: wether->weather, whether, wether..sheep
These words are valid spellings, so should not be on this list. Some are obscure and very often misused, but since machines cannot easily distinguish proper from improper usage, I feel that they should not be on this list. Lupin|talk|popups 21:48, 30 June 2006 (UTC)
- I removed colour->color because it is proper spelling for Commonwealthers. Xiong Chiamiov :: contact :: 22:02, 10 September 2006 (UTC)
- dependant->dependent could be removed too --Netol (talk) 07:40, 14 May 2015 (UTC)
Swearing?
[edit]Would it be possible to use this list to censor out swearing?
Eg. Fcuk -> F**k (Italics for obvious anti-swearing reason =D)
~ G1ggy! Reply 04:28, 29 April 2007 (UTC)
- It would be possible, but no one wants to because people like that Wikipedia is not censored. Plus, it would mess up articles like Fuck. ALTON .ıl 07:50, 3 May 2007 (UTC)
Semi-Protect?
[edit]Not to go against WP:BEANS, but if a vandal somehow found this page, the destruction would be disastrous. Do you guys think it would be appropriate to request semi-protection? --YbborTalk 03:10, 27 May 2007 (UTC)
- Never mind; I guess it's already semi-protected, just missing the icon. Are there technical restrictions on adding it? --YbborTalk 22:41, 30 May 2007 (UTC)
23/05/2008: whats with the black line in the middle of the page? —Preceding unsigned comment added by 84.216.46.35 (talk) 15:39, 23 May 2008 (UTC)
Punctuation
[edit]One very common mistake that I see on Wikipedia is that of misplacing quotation marks. For example:
Wrong: They were trained to respond "at a minutes warning". Right: They were trained to respond "at a minutes warning." Wrong: ...minuteman Paul Revere spread the news that "the regulars are coming", but was captured... Right: ...minuteman Paul Revere spread the news that "the regulars are coming," but was captured...
I know this is not a "spelling error" per se, but it is a typographical error nonetheless. In addition, I cannot conceive of any situation that the following would be used in a grammatically correct situation:
", ".
That being said, so long as it would be appropriate to put certain punctuation errors on this page (ones like this one that do not have exceptions--as many do), I think this would be a perfect candidate. Is punctuation appropriate material for this page? Andrew Nutter Talk | Contribs 10:13, 23 October 2008 (UTC)
- Wikipedia:MOS#Quotation_marks may explain why this "mistake" is so common. OrangeDog (talk) 01:12, 6 January 2009 (UTC)
- Alright, then I suppose that means Wikipedia believes it should use improper grammar, in which case I choose not to start a fruitless battle over it. Andrew Nutter Talk | Contribs 00:02, 21 January 2009 (UTC)
How about:
- Jill stated that he "would give Mary white roses", but never did.Smallman12q (talk) 23:04, 1 April 2009 (UTC)
Both usages are gramatically correct, depending on which English teacher you speak to. The fact that some people are so up-in-arms about one usage or another is its own issue. --King Öomie 16:34, 22 December 2009 (UTC)
British spellings
[edit]Should despatch, despatched, ... be removed because the spelling is valid in British English? PleaseStand (talk) 00:41, 28 February 2010 (UTC)
- Yes. Someone else has apparently deleted "despatch"; I just deleted "despatched". ("Despatched" is, for example, used in "Feisty ferrets help net pests", an April 2010 article on Telegraph.co.uk, an arm of the well-established British newspaper.) --Closeapple (talk) 15:09, 16 April 2010 (UTC)
Please Add
[edit]Please add labour -> labor to your list of common spelling mistakes. Thanks Cit helper (talk) 02:18, 10 June 2010 (UTC)
- Both are acceptable (British-U.S. difference, etc.) —fetch·comms 02:44, 10 June 2010 (UTC)
What about: behaviour -> behavior
definately -> definitely Aluminium -> Aluminum thanx -> thanks (slang) raitonal -> rational realised -> realized
{{helpme}} How do I determine if it is a dialect (ex: "British-U.S. difference, etc.") or a spelling error, does Wikipedia have a specific guideline regarding this?
Thank you very very much, I will read your suggested articles before suggesting any more! Cit helper (talk) 03:01, 10 June 2010 (UTC)
Can this be added?
[edit]Came across this mistake just a few minutes ago. Not sure if it can be added to this list or not.
payed -> played
--Hintswen Talk | Contribs 02:19, 12 August 2011 (UTC)
Markup errors
[edit]Would there be a way to also include wiki errors such as this in the list? JitteryOwl (talk) 22:39, 14 November 2012 (UTC)
Coca-Cola
[edit]What about: Coca Cola->Coca-Cola ? Sander.v.Ginkel (talk) 12:48, 1 May 2013 (UTC)
- I think not many people read this, so I added it to the list. Sander.v.Ginkel (talk) 19:33, 2 May 2013 (UTC)
Redundant with Typoscan
[edit]Much of this word list is in AWB's Typos project and it JavaScript implementation in WikEd. Perhaps the remainder of the list should be merged there and this deprecated? — Dispenser 18:22, 6 May 2013 (UTC)
Capitals
[edit]Should capital words be on there with corrections? I didn't see any.
For example,
america->America
israel->Israel
google->Google
wikipedia->Wikipedia
I would suggest synchronising it with human-readable list
[edit]There are differences and there is no point of two separate lists Bulwersator (talk) 09:11, 27 December 2013 (UTC)
Semi-protected edit request on 28 April 2017
[edit]This edit request has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
Please add:
colbalt->cobalt 59.100.7.215 (talk) 07:05, 28 April 2017 (UTC)
- Not done: Where do you want this added. You'll need to be more specific in your proposed changes. Sakuura Cartelet Talk 02:18, 30 April 2017 (UTC)
Remove aka from the list
[edit]Aka, a.k.a, AKA or A.K.A are all alternative spellings of the preposition and should not be included in the list of misspellings.
aka
Wiktionary:aka
RajkGuj (talk) 07:32, 18 April 2018 (UTC)
Missing apostrophes
[edit]I'm running into an issue using Lupin's Anti-Vandal Tool spellchecker that I run into a page with a possessives without apostrophes being flagged because they are in a web link. Unfortunately the tool only checks up to the first misspelling on the list with each page save, so you'll never get any spelling errors after the misspelling in the web link. If the list were ordered with missing apostrophe misspellings at the end of the list, we'd be able to catch those when they actually show up as misspellings in prose, but it wouldn't interfere with other misspellings when a missing apostrophe is in a web page link. I'm going to give this a couple days to see if anyone has an objection before I do this. VanIsaacWScont 16:13, 18 April 2020 (UTC)
- Hearing no objection after two days, Done VanIsaacWScont 16:30, 21 April 2020 (UTC)
Compress
[edit]The list is bulkier than it needs to be... "ammend"->"amend"; can correct:
amend, amended, amends, and amendment
There are several other examples. Can someone fix the "a", "b", "c", and all other lists?
Write your username here, and the letter you'll focus on. That way, we won't see duplicate changes. Aera23 (talk) 08:20, 19 May 2020 (UTC)
glamourous->glamorous should be removed
[edit]Both are valid spellings. Not sure why it's added in this. Can we remove it? Amazingcaptain (talk) 18:39, 21 May 2020 (UTC)
Section on text editor integrations?
[edit]For example, I've written a plugin for Emacs based on Mickey Peterson's blog post on using this list to automatically correct spelling. Other text editor integration might also exist.
It's also totally possible that this is outside the scope of this page, which I understand. ~ acdw (talk) 14:33, 12 July 2022 (UTC) [[{{Backwardscopyvio }]]} — Preceding unsigned comment added by 177.227.25.186 (talk) 17:38, 21 May 2023 (UTC)