Slashdot Mirror


English Wikipedia Gets Two Millionth Article

reybrujo writes to inform us of a milestone for the English-language Wikipedia: the posting of its two millionth article. At the time of this posting there is uncertainty over which article achieved the milestone. "Initial reports stated that the two millionth article written was El Hormiguero, which covers a Spanish TV comedy show. Later review of this information found that this article was most likely not two million, and instead a revised list of articles created around two million has been generated, and is believed to be correct to within 3 articles. The Wikimedia foundation, which operates the site, is expected to make an announcement with a final decision, which may require review of the official servers' logs."

38 of 125 comments (clear)

  1. Likely a lot more than 2 million by suso · · Score: 4, Informative

    Mediawiki doesn't count all articles in its article count. And I'm not talking about talk or image pages either. I think it has a threshold of like 72 bytes before it counts an article as an article. So they are most likely way over 2 million. For instance, Bloomingpedia actually has 2,148 articles right now but the Mediawiki count on the front page only shows 2,106. So 42 of the articles are smaller than the threshold.

    However, if they (or anyone else) need a plugin for Mediawiki that will list the pages in order so that you can count them and determine which article was the Nth article, I wrote a plugin called Page Create Order that will put a special page called "List Pages By Creation Date" in your wiki. We developed it for Bloomingpedia originally. Its simple, but it does the job. It could be easily modified to only count articles that are of a certain size as well, the main purpose of this plugin is to see the order in which pages where created.

    1. Re:Likely a lot more than 2 million by IBBoard · · Score: 2, Informative

      That depends on the encoding - either 72 characters in ASCII or UTF-8 or 36 characters if they go for the more multi-lingual friendly UTF-16.

      Either way, something about that length is likely to be a stub and not a 'real' article.

    2. Re:Likely a lot more than 2 million by adatepej · · Score: 2, Insightful

      There's a reason for only counting pages above a certain size as "articles": a heading and a sentence do not maketh a proper wikipedia article.

    3. Re:Likely a lot more than 2 million by KiloByte · · Score: 4, Insightful

      That depends on the encoding - either 72 characters in ASCII or UTF-8 or 36 characters if they go for the more multi-lingual friendly UTF-16. UTF-16 more multi-lingual friendly than UTF-8? Er... it has many disadvantages and not a single benefit over UTF-8.

      For example, UTF-16 needs a lot of porting effort, while UTF-8 magically works in all 8-bit-clean programs that don't need to count codepoints or tell character properties (and hey, bytes happen to _be_ 8-bit wide so unless you do something strange, you are 8-bit-clean). Most English-speaking developers won't put this effort, so here goes your multi-lingual friendliness.

      Or another, more insidious flaw of UTF-16: it gives people a false feeling that they can store an entire character in a single array position. This works... as long as you don't meet any character over U+FFFF (rare Han[1], etc) or characters which need to be written using a base char + combining characters (Indic scripts, etc). UTF-8 makes no such promises, and thus doesn't lead to such non-obvious bugs.

      UTF-16 is an abomination that needs to go. Unfortunately, it's entrenched in Windows API: you need to use BlueScreenW() instead of BlueScreenA() everywhere, and this is something people who don't need internationalization don't want to do. Even as of Vista, Microsoft still doesn't allow simply setting the system's code page to UTF-8, something which the whole Unix world[2] did years ago.

      [1]. And according to People's Murderous Commiepublic of China's laws, you need to support these (as GB18030) in any product sold in mainland China. Of course, they don't give a damn about that law unless they want to demand a favour from a company so they have a yet another stick of non-compliance).

      [2]. All non-toy distros do this by default, and if not for few whiners, non-UTF8 locales would probably be dropped by now.

      --
      The creatures outside looked from Alt-Right to Antifa; but already it was impossible to say which was which.
  2. That was quick by micpp · · Score: 4, Funny

    And people have already tried to delete the article for not being notable.

    1. Re:That was quick by WWWWolf · · Score: 3, Informative

      Can you be notable for being not-notable? Or famous simply for being famous? ... Before you answer "no" think of celebrities like Paris Hilton...

      Basically, the situation is this: Notability has its thresholds - either you are notable or not (though where exactly to draw the line is, at times, difficult - but we have pretty clear picture by now). Articles about people, bands, groups, companies, websites, etc. have to have assertions of notability (i.e. "they're really big in Pakistan and have released three albums", or whatever). Notability has to be backed up by reliable sources.

      This leads to the situation that 1) people who are famous for failing at something can be considered notable enough for articles of their own (provided someone noticed and documented that in a reliable source), and 2) worthless celebrities are, alas, notable enough for articles because they probably have had verifiable media appearances.

      (Think of it this way: if I had not heard about Paris Hilton before, I'd go to the article, come to the conclusion that she's a worthless celebrity, and be done with it. If there was no articles about her, I'd probably ask "hey, this... thing is on TV all the time, what the heck has she done to get there, anyway, and why isn't there an article about her?" =)

  3. Confusion? by niceone · · Score: 2, Funny

    Can't they just check Wikipedia?

  4. The millionth by 4D6963 · · Score: 2, Funny

    Which was the millionth article then? Not that it really matters, just being curious, cause I'm like, bored..

    --
    You just got troll'd!
    1. Re:The millionth by Hachey · · Score: 2, Informative

      The 1 millionth article was Jordanhill Railway Station. Ironically, the 2 millionth article was almost a train station as well, this time just outside of Tokyo.

      --
      Please allow me to hate the creator of the 120-character limit: *HATES*. Thank you.
  5. Is it so important? by El+Lobo · · Score: 3, Insightful
    And why, oh why, is it always so important to know exactly which articles was Nr X, which poster was the first one, which was the first child born in the new millenium, how many times did Al pacino say "fuck" on Scarface and so on?...

    Do we have so few problems that we have the need to statistically know EVERYTHING? Does that matter (other than to inflate the vanity of a few?).

    --
    It's time to realise that Abble's products are the biggest abomination these days. Just say NO to the dumb iAbble way!!
    1. Re:Is it so important? by daeg · · Score: 3, Funny

      Just so you know, you're the 8th person bitching about this, and the 5th since the turn of the hour's 22nd minute, with a very high probability that future posters will bitch about it too, and will bitch about it at the 2.5 million mark, too, and the 5 million.

    2. Re:Is it so important? by Rik+Sweeney · · Score: 2, Funny

      how many times did Al pacino say "fuck" on Scarface

      It's 207 in case anyone's interested.

  6. What I love about Wikipedia.... by Demerara · · Score: 4, Funny

    ...is their commitment to stating the obvious. At length...

    The 2,000,000 article is actually the last article to be part of the first 2,000,000 articles and the 2,000,001 is the first of the third million.

    I'm glad they cleared that up - I wondered whether the 2,000,000 article might be actually the one millionth or perhaps the 4 millionth....

    --
    Backward%20compatibility%20is%20over-rated
    1. Re:What I love about Wikipedia.... by fractoid · · Score: 2, Funny

      You're saying there isn't a zeroth article?

      Oh and wow, the Firefox spell checker thinks 'zeroth' is a word. Score one for Asimov (or did he not coin it? Whoever it was then, colour me curious!)

      --
      Rampant carbon sequestration destroyed the Dinosaurs' tropical paradise. I'm here to help repair the damage.
    2. Re:What I love about Wikipedia.... by everphilski · · Score: 2, Funny

      or did he not coin it? Whoever it was then, colour me curious!

      maybe a c++ programmer? :)

    3. Re:What I love about Wikipedia.... by fm6 · · Score: 2, Interesting

      In computing, zero has always been been a valid index, and often makes more sense as the lower bound than 1. For example, if you have a multidimensional array stored contiguously, it's easier to calculate the memory location holding a given element if the array's lower bounds are 0.

      So "zeroth" is perfectly good word, and Asimov (who really didn't understand computers all that well) probably didn't coin it.

      I once had a CS professor who insisted that his students number the sections in their papers from 0 instead of 1!

    4. Re:What I love about Wikipedia.... by fm6 · · Score: 2, Interesting

      Your complaint about Wikipedia is a special case of my #1 complaint about Wikipedia. Which is that its content mostly lacks focus. I write technical documents for a living, and in my job it's important to structure content carefully and only put in the facts that your readers are likely to need. (The most difficult and most enjoyable aspect of my work.) Because nobody "owns" a given article, it's impossible to impose this kind of discipline on Wikipedia. To my mind, that's the biggest drawback to editing reference material on a Wiki, and a fatal flaw in the Wikipedia concept.

      Don't get me wrong. I like Wikis (I manage my department TWiki) and I like the idea of "open-source" documentation. But the two just don't go together. Open Source allows its developers a maximum of freedom, but every good OSS project has a code nazi who makes sure that only code that actually enhances the product get integrated. I'm reminded of that Heinlein character who said his household was a combination of fascism and anarchy, with no trace of democracy. Wikipedia has the anarchy part down. And, despite what Colbert says, it's not at all democratic. But a Wiki is incompatible with fascism.

      I often refer to Wikipedia (always with an eye to guessing what's serious content and what's some idiot's ramblings) but I never enjoy reading it. I'm enough of a dweeb to enjoy reading real encylopedia, which is what Wikipedia will never be.

  7. It would be interesting to know by opusman · · Score: 2, Insightful

    It would be interesting to know how many "real" articles there are. That is, if you took out the individual articles for all the fictional sci-fi characters that wikipedia seems to excel at, all the articles for individual episodes of Star Trek and Dr Who, basically all the meaningless cruft that nerds deem important - then, count how many articles there are. Far, FAR less than 2 million, I would expect.

    1. Re:It would be interesting to know by jollyreaper · · Score: 4, Insightful

      It would be interesting to know how many "real" articles there are. That is, if you took out the individual articles for all the fictional sci-fi characters that wikipedia seems to excel at, all the articles for individual episodes of Star Trek and Dr Who, basically all the meaningless cruft that nerds deem important - then, count how many articles there are. Far, FAR less than 2 million, I would expect. I would agree that there's no place for that sort of thing in a paper encyclopedia, there's just not enough room. If you want geek stuff, you have to buy those books separately. But wiki has no practical limitation, it can grow to be however big it needs to. So long as the information is well-written, what does it matter? The important matter is indexing the information. Without a good index, I could certainly see your point, the practical information could be lost amongst the impractical. But wiki has a good manual index and google automatically indexes the shit out of the site. So what's the rub?
      --
      Kwisatz Haderach
      Sell the spice to CHOAM
      This Mahdi took Shaddam's Throne
    2. Re:It would be interesting to know by dbolger · · Score: 2, Insightful

      You are being a bit closed minded there. What gives you the right to determine what is "real" or "important"? I'm not saying I entirely disagree with your view on the value of such things, but your argument could really be turned on its head for any point of view. Replace the Slashdot POV with Entertainment Weekly and we get something that is just as valid as your argument:

      'It would be interesting to know how many "real" articles there are. That is, if you took out the individual articles for all the boring scientific rubbish that wikipedia seems to excel at, all the articles for individual chemicals or compounds, basically all the meaningless cruft that nerds deem important'

      I wouldn't deny a peroxide addled nitwit their juicy celebrity gossip any more than I would deny a geek his in-depth biography of Wolverine, or a nerd his scientific definitions. Just because it is unimportant to you or I does not mean that it is without merit to somebody.

    3. Re:It would be interesting to know by opusman · · Score: 2, Insightful

      The rub is that Wikipedia presents itself as a "real" encyclopedia, when it clearly isn't. If they didn't make such an issue out of the whole "notability" thing it wouldn't be so bad - as it is, it really looks like hypocrisy. I've got nothing against having all those articles up there - I've read a few of them myself. But wikipedia is presented to the world as a real encyclopedia, with high standards to match (e.g. the "accuracy competition" with Britannica) - and yet the vast majority of its material does not relate to anything real or important by any stretch of the (non-geek) imagination. When 50% of Britannica is composed of biographies of Captain Janeway and Buffy Summers then Wikipedia will be able to count itself as a real encylopedia, but not before.

      (Just my own opinion of course, feel free to disagree)

    4. Re:It would be interesting to know by JesseMcDonald · · Score: 4, Insightful

      It seems to me (and apparently the GP as well) that you're criticizing Wikipedia for not having the same limitations as a paper encylopedia. Who cares what proportion of the articles fall into some niche category, as long as one can still easily find all the information one is looking for? The simple fact that a physical encyclopedia has limited storage space and thus cannot contain in-depth articles on every little special-interest detail does not appear to me to somehow constitute an advantage for physical encyclopedias.

      Or were you perhaps simply protesting the direct comparison of article counts between Wikipedia and Britannica? That I could understand, since the comparison could hardly be fair. Their requirements are simply too different for any direct quantitative comparison to be meaningful.

      --
      "The state is that great fiction by which everyone tries to live at the expense of everyone else." - Bastiat
    5. Re:It would be interesting to know by Carnildo · · Score: 2, Informative

      By your definition, Wikipedia has somewhere between 1,500,000 articles (discarding *all* articles about popular culture) and 1,900,000 articles (discarding just the things you consider "cruft"). The largest group of articles are biographies (30% of the encyclopedia), followed by articles on places (25%), popular culture (25%), and history (10%).

      --
      "They redundantly repeated themselves over and over again incessantly without end ad infinitum" -- ibid.
  8. Spanglish Wiki? by Mr.+Underbridge · · Score: 4, Funny

    "...a milestone for the English-language Wikipedia:" ..."Initial reports stated that the two millionth article written was El Hormiguero, which covers a Spanish TV comedy show."

    Wow, that's ironical.

  9. Re:Just one question by Da+Fokka · · Score: 4, Insightful

    You can't quote a microwave in a college paper either, but it's certainly useful.

    But seriously, Not every source has to be academical to be of use. For many subjects, wikipedia is an excellent starting point. You might want to take lemmata on controversial subjects like Palestine and the Evolution with a grain of salt, but for many a subject the articles on wikipedia are of excellent quality.

  10. Re:Just one question by Notquitecajun · · Score: 4, Insightful

    You have two exclusive statements...one which makes sense, the other which doesn't.

    Who cares? I mean honestly, who does?

    In the long run, this is quite a minor historical marker. We're going to see article 5 million and MAYBE that will matter a little more. Maybe.

    You can't even quote Wikipedia on a college paper, so why should anyone be using it

    Correct - it's rather dumb to use it on a college paper (like using a regular paper encyclopedia); however, Wikipedia is the fastest starting point and is a good medium on not only specific information on subjects and sources, but also on the opinions of people with education, expertise, and bias on their subjects. If you dig into some controversial topics' histories, there is actually some VERY good information to wade through and find sources on. The end result is not perfect, the system IS flawed, but the information that you can glean from digging and researching STARTING at Wikipedia is quite useful.

    Plus, the specialized wikis that are popping up that are using wiki-style management for their small wikis (where REAL experts can actually post) may be the bigger genius behind wikipedia).

    If your complaint about wikipedia is that the final articles are flawed, you're right...but look at the process behind some of those articles and the histories. Dig into that, and you find what you need.

  11. Wikipedia thrives on controversial subjects by tucuxi · · Score: 3, Insightful

    Because they draw people to try to reflect their points of view; and when you read the article (say, abortion or evolution or software patents) you can gain a quick overview on almost any significant point of view on the subject, and how they relate to each other. Yes, individual viewpoints may not be perfectly reflected. But you *do* gain an incredibly broad view, which no traditional encyclopedia can deliver.

    Wikipedia is much more likely to be useful on a controversial subject where people feel inclined to participate (and correct or refactor partisan views) than in non-controversial subjects that doesn't scratch anybody's itches. You need to cross a certain threshold in order to contribute to an article. Articles that aren't important to you you simply will not edit. Articles that are edited by many may not gain "quality", but will become very broad, and better starting points for further research than those that are only edited by a few not-that-motivated users.

  12. How many articles do other encyclopedias have? by Refried+Beans · · Score: 2, Interesting

    Two million does sound impressive. Congratulations, Wikipedia. But how does this compare to other encyclopedias? Does anyone have numbers for Britannica or World Book?

    1. Re:How many articles do other encyclopedias have? by LiquidCoooled · · Score: 3, Funny

      Actually, wikipedia can answer that (though I don't know how accurate it is):

      The size of the Britannica has remained roughly constant over the past 70 years, with about 40 million words on half a million topics.

      http://en.wikipedia.org/wiki/Encyclop%C3%A6dia_Bri tannica

      --
      liqbase :: faster than paper
  13. and then of course by sdedeo · · Score: 2, Interesting

    Nominated for deletion, amusingly enough.

    It was "speedy kept", but amusing that a stratified sample shows not only that wikipedia is filling these days with trivia, but also bureaucracy.

    (Yes, I have a bee in my bonnet about wikipedia even though I love it -- see my sig.)

    --
    Protect your liberties. Donate to the ACLU
  14. You can help review new articles by ajs · · Score: 2, Informative
    If you would like to help review newly created articles, just follow this URL:

    http://en.wikipedia.org/wiki/Special:Newpages

    This will take you to the list of the most recently created articles. If you find that you have trouble keeping up with other editors who are reviewing the same articles, you might find this link useful:

    http://en.wikipedia.org/w/index.php?title=Special:Newpages&limit=250&offset=250&namespace=0

    Which will take you to the same list, but starting from the 250th most recent article.

    Typically, it's most useful to

    Anyone can do these things, and you can also just improve on any article by adding additional sources, or expanding on the article.
  15. Re:Just one question by h2g2bob · · Score: 3, Interesting
    Clicking the cite this page link on any page will tell you:

    IMPORTANT NOTE: Most educators and professionals do not consider it appropriate to use tertiary sources such as encyclopedias as a sole source for any information -- citing an encyclopedia as an important reference in footnotes or bibliographies may result in censure or a failing grade. Wikipedia articles should be used for background information, as a reference for correct terminology and search terms, and as a starting point for further research.

    As with any community-built reference, there is a possibility for error in Wikipedia's content -- please check your facts against multiple sources and read our disclaimers for more information.
  16. Yeah, but hasn't Wikipedia jumped the shark? by Medievalist · · Score: 5, Insightful

    I know a few retired rocket scientists. I'd love it if their unique knowledge didn't go to the grave with them. I'd rather be able to look up the definition of a "yardley" as a unit of pressure than a list of characters from Harry Potter. Unfortunately, wikipedia doesn't seem to be interested in anything that's "from personal knowledge or experience" these days.

    If wikipedia is only going to allowed references to things already published elsewhere, and all written culture is inevitably moving online, how will wikipedia differentiate from Google? I mean, if there's no unique information in wikipedia, there's very little unique value in it. It's just a really labor-intensive presentation layer at that point, isn't it?

  17. Re:Just one question by jandrese · · Score: 3, Interesting

    Except for that Han Chauvinism and some parts of the Islamophobia article (which was a complete mess), all of the articles you quoted look like a pretty neutral starting point for someone trying to learn about them for the first time. They cited lots of sources that a reader can go to for additional research and for the most part kept a neutral point of view. I'd wager that you'd have a tough time finding a more balanced approach to some of these topics, Islamophobia and Afrocentrism especially, from any other source. The kind of people who coin terms like that are generally less interested in neutrality than Wikipedia is.

    --

    I read the internet for the articles.
  18. Re:Just one question by carpe_noctem · · Score: 3, Insightful

    Whoever said anything about quoting wikipedia itself? I would say it is of far greater use for research papers in that you can get a good overview of a subject, and then use the citations of said article to find other, lengthier papers more suitable for academia.

    Wikipedia is a research tool, not the swiss army knife of research.

    --
    "Quoting famous computer scientists out of context is the root of all evil (or at least most of it) in programming." - K
  19. Re:Yeah, but hasn't Wikipedia jumped the shark? by Taxman415a · · Score: 5, Informative

    Wikipedia has never been interested in unique information. One of the first policies was the one against original research. That certainly doesn't mean there isn't a place for original research, (those are plentiful), nor does it mean Wikipedia isn't valuable. By collating and linking vast amounts of information, Wikipedia does something google can't. It creates the presentation of the information manually. Google can only index content that is already there through an algorithm. And for a long time if not forever, there will be information that is not online. Further, Wikipedia summarizes information like Google will likely never be able to. Even if a Wikipedia article is not all right, it can give you an idea of where to go look and what to look for, which is perhaps it's only truly valuable contribution until there is a way to formally peer review and freeze content so that the reader can see a version that is stabilized.

  20. Re:Research isn't what I'm talking about. by Taxman415a · · Score: 2, Informative

    Well original research just happens to be the name of the policy, but it covers all unpublished ideas and thought. And what I was saying is that Wikipedia intentionally avoids that type of thing as a necessary evil to maintain improvement in quality. Otherwise you either need a power structure that can say yeah or nay on content or you open floodgates to all the latest crackpot theories and information.You have to spend enough time on the project to reallize there isn't an in between. And again, it's not like there aren't lots of other sources for publishing that other valuable non published information. That's what post-docs are for right? :)

    A manual presentation layer. I'm content-driven, personally, a slick presentation does not increase my perception of the value of information.
    - Everybody says that, but studies show time and time again that the way information is presented has drastic effects on how much information gets accross and how it is percieved. Next you're going to tell us ads don't affect you.

    Right, so it's an automatic (and thus more up-to-date) presentation layer, which carries quantifiable and repeatable bias by virtue of being algorithmic.
    - What you're missing here is that google indexes links to information, it does not summarize the actual information as Wikipedia does. Even if the information you wanted was always in a google search, you still then have to collate it and judge sources, etc. Also quality information is not all or perhaps even mostly online right now. The work of summarizing the information is valuable, and if it is already done for you can get you further ahead on the task at hand.

    Why should a wiki be "stabilized"? Why is "formality" a virtue when wikipedia was created and gained value from non-conformance to traditional models?
    - Because the real goal is information quality. Demonstrable quality in a way useful to the reader/researcher. The non conforming, radically open current system has been shown to be successful in producing content, a smaller portion of it of reasonably high quality. But studies and observation of Wikipedia show that it has extremely high variation in quality. From articles replaced with "YO MAMA SO PHAT..." to widely reviewed articles citing and properly summarizing all the best written material on the subject. Formal peer review can lead to higher information quality and if that reviewed version is available as an option, default or not, can allow the best of both worlds. (like the Linux kernel and most other software) Then there can be both a radically open article that may be more up to date, balanced, etc, and a stable version that is at least guaranteed not to be vandalized. The amount of stabalization could be as little as that or as much as the formally reviewed case, or both. Thus the best of both worlds, content is produced, and high quality content is available, and the review processes can be demonstrated.

  21. Re:Research isn't what I'm talking about. by nothings · · Score: 2, Informative
    In order to deal with the very real threat of vandalism (let's not pretend it wasn't vandalism that sparked the changes in how wikipedia runs)

    No, the "no original research" rule was instituted to deal with physics crackpots. This is documented on wikipedia itself if you actually delve into the pages about the rule.

    There is no good way for wikipedia to differentiate between the personal experiences or knowledge of a 73-year-old rocket scientist wunderkind, a crackpot writing stuff in his garage, or a published scientist dabbling poorly outside his actual area of expertise. So wikipedia just disallows that sort of thing entirely, and draws instead on the difficulty in those people publishing their work in peer-reviewed journals or mainstream publications by setting threshholds in that direction.

    And it's not wikipedia's fault if the knowledge of a 73-year-old-Jim-Yardley knower isn't preserved. Anecodes and anything else from him can be written down on any web page and preserved for posterity that way. (And if they get media attention because they're not crackpottery, they may make it into wikipedia someday.)

    The goal of preserving absolutely everything known by every human, but only the good stuff, is unsatisifiable, and wikipedia aims on the extremely conservative side of the problem. It may not seem like that with all the pop culture crap to be found there, but wikipedia isn't a single coherent entity, it's a teeming mass of random people following the rules to varying degrees of accuracy and with no consistency at all. Somehow people care more about following the rules when it comes to rocket science than when it comes to character summaries of last year's big TV show. And isn't that awesome?