Slashdot Mirror


User: Jamesday

Jamesday's activity in the archive.

Stories
0
Comments
325
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 325

  1. Re:kwiki vs mediawiki vs twiki vs.... on Microsoft Releases FlexWiki as Open Source · · Score: 3, Informative

    I'm one of the Wikipedia team (looking after the database servers mostly) so I can't commment on others but MediaWiki includes these properties which make it interesting: Very widely used and understood (Wikipedia and many other places). Uses normal words, not CamelCase, for links. Supports most human languages (a broad range at Wikipedia places). Current version supports MySQL as the database, next version expected to support PostGreSQL as well. GPL license, PHP (including safe mode PHP) Extreme caching to help scalability (Squid and Memcached for that, also a file-based cache) Supports limited database load sharing in the current version, more later. Wikipedia uses a master and two slaves for normal work, a couple of slaves for backup only. Proved able to scale (so far) to a top 400 web site using 5 Squid caches, 15 web servers. Three machines were enough for top 1200 or so - equipment needs start to rise vey rapidly as you get into the top 1,000. It'll do more, we just haven't proved it in practice yet.

  2. Re:Cite specific revision of article on Wikipedia Founder Jimmy Wales Responds · · Score: 1

    Right. Known limitation. It's on the to do list. The current workaround is to check the previous revision and link to that if it'll do the job.

  3. Re:Synchronizing different language versions on Wikipedia Founder Jimmy Wales Responds · · Score: 1
    There are people and tools which add links to the same article in different languages. There are some tools which help people to add those links.

    The content of the articles is decided by humans and the articles often have different content. If you want to choose a language to contribut in, I recommend Dutch. The Encyclopedia in Dutch is currently smaller than the English or German version, so your contributions will make a greater difference in the Dutch language version.

  4. Re:sources on Wikipedia Founder Jimmy Wales Responds · · Score: 1
    The programmers here might want to look at quicksort. A fair range of sources there. The article is largely written by professionals in the field, myself included.

    What the Wikipedia doesn't do, at least not yet, is encourage people who contribute to or review articles to indicate their professional standing and version reviewed somewhere. If anyone wants to start doing that, just go ahead and say so on the talk page. If it becomes popular we'll work out something more formal.

  5. Re:1.0 release hardcopy? on Wikipedia Founder Jimmy Wales Responds · · Score: 1
    One of the very well known Linux flavors has been talking about releasing a CD copy. Still in the early stages - too early to do more than say that it's being considered. If anyone wants to release CD or DVD copies, go right ahead. Ask for assistance if necessary.

    Remember to consider whether you might have liability as publisher, though. The Wikipedia is protected soundly by the Communications Decency Act and DMCA/OCILLA in the online version. Lack of liability for non-online sources isn't so certain. There are bound to be some copyright infringements somewhere in the Wikipedia, even though nobody wants them there and any which are noticed are taken care of. Anyone doing it commercially should probably ensure that they have publisher's insurance.

  6. Re:1.0 release hardcopy? on Wikipedia Founder Jimmy Wales Responds · · Score: 1

    Please write a page on meta.wikipedia.org saying how you do it for your laptop. The post a reply here which points to that. Or just reply here if you prefer.

  7. Re:Backups on Wikipedia Founder Jimmy Wales Responds · · Score: 1

    There has been some discussion of that from time to time, most recently last week with someone from a project which is building releases for portable devices. If someone wants to do it, go right ahead and do it. There's talk about hosting it on the Wikimedia servers from time to time but security is a concern.

    It's in no way an official answer but I expect that it'll happen, one way or another, before the end of the year.

  8. Re:Yeah... and? on Oxford Students Hack University Network · · Score: 1

    Compying the CD from the neighbor in the US isn't copyright infringement either, if it's done using either analog or digital audio recording media. That's completely lawful. Do be sure that you use music CDs, not computer CDs.

  9. Re:The problem is adding many different jurisdicti on CeCILL: La Licence Francaise Du Logiciel Libre · · Score: 1

    Which part of CeCILL allows a program written originally and released with only the CeCILL license (no GPL component reuse at all) to be included within a GPL program?

    That appears to be absent.

  10. Re:Funding - situation, what we spent the money on on Wikipedia Hits 300,000 Articles · · Score: 1

    That should have been "James.... No /. account".:) Wasn't Mav who wrote it.:) But I should have created this account before posting, so I'd have a score 5 post to my credit.:)

  11. Dual Licensing, CC and GFDL on Wikipedia Hits 300,000 Articles · · Score: 1

    That question of dual licensing CC-ShareAlike and GFDL is something which Jimmy Wales of Wikipedia and Lawrence Lessig of Creative Commons were scheduled to discuss at a meeting in Germany a few weeks ago. Jimmy Wales is still on or just finishing a vacation, so we don't yet know the results of that discussion.

    There are various views on the practicality of adding a CC-ShareAlike license to existing articles when the original contributors weren't necessarily aware of the possibility or may object. How to do it properly (without annoying people or infringing licenses) is an interesting problem to solve. Adding CC-ShareAlike for new contributions is relatively easy but will probably annoy those of the GPL/GFDL is the only right way faith. Those who want to share the knowledge will probably welcome it.

    Other CC licenses (which aren't copyleft) would upset more people who are of the copyleft is best faith. I'm not one of those and would welcome the use of a far broader range of licenses. Copyleft is best seems to be a minority view in the current base of contributors. Share the wealth seems more common. But the copyleft view, while it seems a minority, is also a pretty large minority.

    Personally, I don't much care about derivative work creators not releasing their derivative works under the GFDL, so long as they must tell people where to find the original work so people know where to find and enhance it, or get it for their own use. As a practical matter, that's the way the Wikipedia already works - I'm not aware of even one case where it's used something from a derivative work in the article from which the work was derived. However, this sort of thing is blasphemy to the GFDL/GPL is the only way people.:)

    This isn't an official Wikipedia policy statement - just my views on the matters and my assessment of the current balance of views of contributors.

  12. Re:How do you help them? on Wikipedia Hits 300,000 Articles · · Score: 1

    Best way to get started is to visit #mediawiki on Freenode. That's where most of the developer talk happens.

    If you want to do some background reading, take a look at MediaWiki.

  13. Re:Why PHP? on Wikipedia Hits 300,000 Articles · · Score: 2, Informative

    Sorry, I didn't address the database part of your comment. The site often is database-bound. The current high web server load will just move the slowest point to the database, the number of visits will go up and we'll see where the next pressure point is. Today we were sufficiently database-bound that we temporarily turned off local search and used Google instead for a while.

    At the moment:

    One Squid cache server loss hurts responsiveness, so we'll be getting at least one more so we can stand one failure there. For two months of growth that probably means at least two to keep up.

    At least several web servers are needed to remove them as a choke point for a while (my guess is that five or so dual Opterons will handle traffic growth here for two months or so).

    More database servers are needed and more load spreading between them. The load balancing work is ongoing. Since there are differences about how to handle this (how to spread the load) I'll abstain from describing possible options here until there's more general agreement. Will certainly involve slaves offloading some queries and slaves offloading search from one or more primary servers.

    There's no sign yet that the growth is growing (we're in a seasonal relatively low load period at the moment though) and that means that we'll continue to see stress points moving around.

  14. Re:Why PHP? on Wikipedia Hits 300,000 Articles · · Score: 1

    PHP because one of the objectives of the MediaWiki software project is that it be able to run on a shared server with PHP in safe mode. There are some discussions about plugin modules to handle slower tasks. More important, though, the use of memcached for object caching is gradually increasing. As memcached use increases, the amount of time spent in PHP will decrease.

  15. Re:The topic is somewhat misleading on Wikipedia Hits 300,000 Articles · · Score: 1

    English is growing fast but in percentage terms some of the others are growing much faster. You're just seeing them in an earlier stage of development than English. French is going to be hitting the 300,000 article mark soon enough. It might take as long as two years to get there.:)

    Others are at an even earlier stage of development. There's a chance to really make your mark in some of the languages which have so far had less traffic than English.

  16. Re:Copyright on Wikipedia Hits 300,000 Articles · · Score: 1

    Thanks for helping to take care of that problem. I encourage anyone else who finds a copyright issue to help the project to do the right thing as well.

  17. Re:You are the answer... on Wikipedia Hits 300,000 Articles · · Score: 1

    It's still in the growing process. Quality control and reviewing systems are on the to do list. When it comes to experts, though, you might note the comments of the experts who've posted here and what they reported finding in their own fields. If you're not comfortable with what it is today, please wait a few years and take another look.

  18. Re:Funding - situation, what we spent the money on on Wikipedia Hits 300,000 Articles · · Score: 1

    It's been discussed. Doesn't look as though people are tired of donating to keep the resource available and growing, though. If we did do Google, one possibility I've mentioned is displaying the ads and giving everyone a check box to turn them off if they want to. But no current plans to do that and current plans are to try to avoid doing it or anything else involving ads.

  19. Re:Funding - situation, what we spent the money on on Wikipedia Hits 300,000 Articles · · Score: 1

    Probably not open to topic sponsorships at the moment (and will probably avoid that for a long time if we can). On the other hand, the hardware pages describe what we have, who it's from and our experiences with them. It seems very likely that we'd be happy to fully identify the source of any hardware of colo space or bandwidth there if someone was to provide one or more of those things. In fact, we'd want to do that as part of describing the history of the project, I expect.

    If someone wanted to ship an off-lease bladeserver and blades to Florida we'd have no problem putting it to good use serving web pages (assumes typical three year old equipment), it would save us from spending some money and I'm pretty sure that we'd accurately describe who donated it and say thanks. Same for anything else that's reasonably current (though on the database side, it's 100GB so we are pretty much forced to use fairly recent disks).

    At present most of our giving back is giving back to people by providing the reference resource they want to see. The longer we can do that rather than ads and topic-linked sponsorships, the happier everyone will be. For now, people respond when we indicate that performance or reliability issues have to be solved with money. So long as that remains the case, all is good and we'll probably continue to do it that way and stay pretty well unencumbered with ads or other things people don't really want.

    Waffle words are because it's collective decision-making and I could always be proved wrong in my expectations about what the community will decide.

  20. Re:wiki = falsehoods? on Wikipedia Hits 300,000 Articles · · Score: 1

    Thanks for the tax mention. I've clarified it from "it levies its own taxes and is exempt from the Internal Revenue Code" and "Puerto Ricans pay no federal income tax" so that the second now has "on income from island sources" at the end. The key to an edit like this is the edit comment. Mine was: "clarify taxes per slashdot mention http://slashdot.org/comments.pl?sid=113692&cid=963 1338 and US Dept. of Interior site at http://www.doi.gov/oia/Islandpages/prmain.htm". That tells people why I made the edit and provides a source which can be used to check correctness.

    If you do have a correction removed, please discuss it on the talk page and give references to support your statement. If you disagree with something, say why and provide references to support the statement. That approach is generally successful in either getting your point across or in discovering that there are conflicting facts or views which need to be accounted for.

  21. Re:Why MySQL? on Wikipedia Hits 300,000 Articles · · Score: 1

    MySQL was probably better at the time (several years ago). PostgreSQL has improved a lot and a couple of people are working on making the database wrapper functions work with PostgreSQL as well. If there are any Oracle, Sybase or whatever fans out there, feel free to add your efforts so it'll run on any database. At least some of the developers are pretty keen on the use of multiple keys in PostgreSQL queries.

    Lots of little machines is the option I favor for database scalability, Our data is naturally partitioned by language and we can offload some queries with replication even from the big ones. The technical people are still discussing the various options.

  22. Re:One thing I've missed with Wikipedia... on Wikipedia Hits 300,000 Articles · · Score: 1

    They will be cached in the database and that's good enough to protect us from a big load spike for a particular version. View source for the old version and look at the bottom of the page. You'll see something like: Served by bayle.wikimedia.org in 1.62 secs. When not logged in I tried four times and saw rabanus reporting 0.24 seconds, bart 0.34 seconds, bart 0.34 seconds again and isidore 1.15 seconds.

    It's not as efficient as the Squid cache servers are but it's uncommon enough that it hasn't yet been put on them. Not sure whether the memcached caching for these is in place at the moment.

  23. Re:Entry for pornography on Wikipedia Hits 300,000 Articles · · Score: 1

    That one lasted for about 20 minutes. Anything mentioned here is vandal bait today. You looked at it, saw it was wrong but chose not to fix it. Why? It's a public resource and you own the commons as much as anyone else. If someone is pouring salt on the grass, you've as much right and responsibility to stop them as anyone else.

    For a better view of overal quality I suggest that you use the random article link a hundred times and see how much vandalism and idiocy you find.

  24. Re:Celebration! on Wikipedia Hits 300,000 Articles · · Score: 1

    Right, Alexa is a somewhat flawed measure. Still useful. To give some idea of how skewed it can be, back in March, seeing about 200 hits/s, Wikipedia passed LiveJournal in Alexa daily charts. At the time, LiveJournal was peaking at 700-800 hits per second according to Brad's presentation at the MySQLcon. So, we learn, no surprise, that LJ has a very high percentage of AOL members who use it but don't have the Alexa toolbar installed.

    When it comes to Wikipedia and Slashdot, it seems likely to me that Wikipedia has a higher AOL visit rate than Slashdot, so it's probably something between a wash and under-scoring Wikipedia. But only probably - your speculation is as good as mine.

    I'm curious about the page hit rate of Slashdot if someone wants to share some numbers. CmdrTaco is more than welcome to add them to the comparison page and it'll help everyone work out how good/bad the Alexa ranks are for us both.

  25. Re:Random page on Wikipedia Hits 300,000 Articles · · Score: 1

    There used to be a bug in the "random" selection which caused it to gradually drift in the direction of the early articles. That's been fixed and with 300,000 articles the bulk loaded census-based articles no longer show up so frequently. It's also general policy to discourage such large automated bulk loads, so we have humans rating what is interesting to humans by what they are willing to spend hteir time doing.

    Anyone who doesn't open their browser window a hundred times a day is more than welcome to set their browser home page to the random page link. Please correct any errors you see.:) Yup, there's always a catch.:)