Simetrical · Slashdot Mirror

Re:legitimate need? on Wikipedia Gears Up For Explosion In Digital Media · 2009-01-14 12:50 · Score: 2, Informative

Actually, most of these uploads should go to the Wikimedia Commons, not Wikipedia proper. Files uploaded to Commons can be used on any Wikimedia site, including any language of Wikipedia, Wiktionary, etc. Files uploaded to the English Wikipedia can only be used on the English Wikipedia. The Commons admins are largely a different group of people from the English Wikipedia admins, although there's some overlap. Adminship is given out on a per-project basis; only a few dozen stewards have any privileges across projects.

Re:Wikipedia Search = Sucky on Wikipedia Gears Up For Explosion In Digital Media · 2009-01-14 12:45 · Score: 3, Informative

There have been major improvements to search lately, thanks pretty much solely to the volunteer work of Robert Stojnic (rainman). You might want to try it out again. Still probably not quite up to Google levels in some ways, given the difference in budget of some billions of dollars versus ~$0, but it has better relevance than before and a lot more nice features now (e.g., "did you mean").

Re:Wikipedia=new on-line data repository on Wikipedia Gears Up For Explosion In Digital Media · 2009-01-14 12:42 · Score: 2, Insightful

Does anything on Wikipedia ever really get deleted? I thought the Mods and Admins had full access to deleted pages.

Yep, that's generally true. Anyone who can delete things can also undelete things, and there are lots of people who can do both: over 1600 on the English Wikipedia, 250 on the Wikimedia Commons -- any administrator. Hypothetically a sysop would be able to use Wikipedia as a private file store this way, since views of deleted content aren't logged, but that's probably not worth it. :)

If you upload something that even the admins shouldn't see, generally an "OMG lawsuit" kind of thing like personal information, you can get your revision oversighted -- still stored, but only restorable by someone with shell access. This doesn't currently work for uploads, though, as far as I know.

Actually, though, deletion of files was permanent for a long time, until a couple of years ago. This created a fun doomsday scenario where a rogue or compromised sysop account could run a script to delete all images on Wikipedia unrecoverably. I don't think backups were kept then either, so they'd have to be manually gotten back from mirrors and things like that. Fun stuff. Part of the new hardware setup uses ZFS snapshots to back up the files now, from what I've been told, although I haven't worked with that directly.

Not much of a milestone. on Guitar Hero III the First Game to $1 Billion In Sales · 2009-01-13 09:27 · Score: 1

Okay, so maybe it's the first to make a billion in sales. But World of Warcraft has over 10 million subscribers at $15/month each. I compute that as $1.8 billion dollars a year. Even if we assume that those figures are somehow inflated, I think it's safe to figure that WoW has brought in a lot more money for its owner than Guitar Hero.

Re:CentOS is free RHEL on Wikimedia Simplifies By Moving To Ubuntu · 2008-10-19 07:24 · Score: 1

Starting from reason one "We liked it better" there is nothing objective about this comparison.

Could be, but all that tells you is that the reasons for choosing Ubuntu over other distros weren't objective. If distro selection were dominated by objective reasons, we'd probably have fewer distros. Most people just pick the distros they like and feel comfortable with, based on past experience, and Wikipedia's admins are no exception.

"Unlike Debian stable, and like Fedora, it's updated fairly frequently so we get a decent rate of package updates for infrastructure..." Use RHEL/CentOs.

RHEL and CentOS have much slower update cycles than Ubuntu (even if a bit faster than Debian). With Ubuntu you have an upgrade path that gives you six-month-old software at worst, if you want it. If you stick only to LTS you have software as old as RHEL/CentOS, but nothing obligates you to do that: you can trade off between having to upgrade constantly and having new software easily available, upgrading every six months or every five years as you feel inclined. If you use RHEL/CentOS, you have old packages, period.

Besides Ubuntu updates have their fair share of screw ups too or did we already forget about the OpenSSL fiasco *COUGH* http://it.slashdot.org/article.pl?sid=08/05/13/1533212 *COUGH* ?

That's not upgrade-related, or at least not meaningfully. The breakage was a year and a half before the fix, and would likely have made it into stable version of even a fairly slow release cycle. That was a case of poor maintenance by Debian, not a case of updating incautiously. Even Debian stable might have been struck if the change had happened to be made a year or two later than it was.

" ...and Canonical actually puts out security updates for a decent amount of time." Have you seen the CentOs roadmap ? http://dag.wieers.com/blog/files/centos-intro-1.3-en.png So that means support until 2014. Thats one year longer then Ubuntu LTS, which goes to 2013 !!

That point was against Fedora, not CentOS. CentOS 1) has release cycles that are too slow, 2) is not the same version of Linux that the admins are using on their desktops. Fedora, on the other hand, 1) is so cutting-edge that it can break, and 2) does not provide long-term support (in the form of security updates, not talking about paid support here). Ubuntu occupies a happy middle ground: you can upgrade quite frequently if you like (although not as frequently as Fedora), or you can upgrade quite infrequently if you like (although not as infrequently as CentOS). It's quite stable (probably as stable as CentOS), but fairly up-to-date (although not as up-to-date as Fedora).

You have, it appears, attempted to interpret points made against Fedora or Debian as being the points made against CentOS and RHEL, which they aren't. I notice you addressed four out of the five points, skipping the one substantive objection to CentOS.

I can't begin to phantom why this post is +5 informative.

Because the mods probably figured out (I'm not sure if you have) that the poster is Brion Vibber, CTO of the Wikimedia Foundation and the one ultimately responsible for the choice of Ubuntu.

Re:What amazes me... on Huge Traffic On Wikipedia's Non-Profit Budget · 2008-06-24 07:24 · Score: 1

It's probably more illuminating to look at those separately:

Slashdot reach: ~0.03% per day
Wikipedia reach: ~10% per day

Wikipedia gets 300 times the traffic that Slashdot does, according to Alexa. And that doesn't even count the sister projects. wikimedia.org gets 0.6% reach, 20 times Slashdot. Slashdot isn't even up to some of the small projects like Wiktionary and Wikibooks. To quote the Wikimedia Bugzilla's quips list,

Xirzon: are the servers up for slashdotting ? brion: we get more traffic than /. usually we don't even notice the bump on the traffic graphs

Re:What is the role of Open Source on Huge Traffic On Wikipedia's Non-Profit Budget · 2008-06-24 07:12 · Score: 1

Essentially all software used in the entire process of serving a web page to the user is free and open-source. The servers all run Linux; the wiki software is MediaWiki; the web servers are Apache and lighttpd; reverse proxying is done by Squid; the database is MySQL; search is done by Lucene; programming languages used in various places are PHP, Python, C, and C++; dynamic functionality is normally done using JavaScript, not Java/Flash/etc.; and so on. Non-free software is only used if there's a good reason to do so, and in the web server world, there generally isn't.

There is some non-free software used, however. The version of Lucene used is written in Java, which is not quite open-source yet (or at least I don't think the version used is). I think I've heard that the routers run non-free software. Some user-made tools that are loaded with every page run on the toolserver, which doesn't share the main project's open-source commitment and mostly runs Solaris; the tools themselves may also not necessarily be open-source.

Re:To fix wikipedia on The Battle For Wikipedia's Soul · 2008-03-10 12:30 · Score: 1

I think you're looking for Citizendium. I also think you're wrong. To cite one study (emphasis added),

In this paper we examine how contributor motivations affect the quality of contributions to the open-content online encyclopedia Wikipedia. We find that quality is associated with contributor motivations, but in a surprisingly inconsistent way. Registered users' quality increases with more contributions, consistent with the idea of participants motivated by reputation and commitment to the Wikipedia community. Surprisingly, however, we find the highest quality from the vast numbers of anonymous "Good Samaritans" who contribute only once. Our findings that Good Samaritans as well as committed "zealots" contribute high quality content to Wikipedia suggest that it is the quantity as well as the quality of contributors that positively affects the quality of open source production.

I've seen others in the same vein. I'm pretty sure one even said a majority of actual content was written by anonymous users, whereas a core of registered users made many more edits but mainly in terms of maintenance and dealing with vandalism.

As for trying to gauge expertise and then giving experts authority over their subjects, well, you could do that. Or you could require everyone to provide adequate citations and trust the community to recognize when someone has expertise, and defer to them when (but only when) it's appropriate. Wikipedia has so far chosen the latter model, and is now about as reliable as other encyclopedias by the metrics I've seen. The former model is open to anyone: Wikipedia is, after all, licensed under the GFDL, and a fair amount of Citizendium content is forked from it. Which will succeed? We'll see, I guess. I don't believe in trusting experts any more than they can back up their claims, especially not on controversial topics. "Experts" like political scientists, economists, alternative-medicine providers, etc. would have field days with their respective articles if not kept in line by amateurs with some common sense.

Re:calling BS - should be classed as phishing on Holes Remain Open in Firefox Password Manager · 2007-07-20 04:59 · Score: 1

If you had actually read even the post, or the article beyond the first sentence of the second paragraph, you would see that this is definitely not phishing: "somebody can easily create a page that steals the password as soon as the page is opened"; "it is possible to read out the [automatically] entered data via JavaScript and then submit them". No phishing involved, unless you count just getting someone to click a link as phishing, rather than, say, using the World Wide Web. The site emulates the form for the benefit of Firefox, which will happily fill in the password to be scooped up. The entire process can be totally invisible to the user (and for best effect, should be).

Re:Dump MediaWiki on Wikipedia On the Brink? Or Crying Wolf? · 2007-02-13 07:19 · Score: 1

MediaWiki is a slow lumbering beast. I ran a wikipedia mirror with MediaWiki on a PIII 900 and it was virtually unusable. Just doing a simple redirect to the new server took seconds before I cut out the wiki initialization stuff that was happening prior to the 301 redirect.

The fact is, though, that stuff would only be executed for a very small percentage of hits to Wikipedia itself. The WMF relies *very* heavily on Squid caching. A typical view by an unregistered user (which is almost all viewers) will be handled very quickly by Squids passing back what amounts to a static HTML page, with no PHP logic being executed at all.

Due to this heavy caching reliance, however, MediaWiki is much slower if you don't use caching, because that never really needed to be optimized. It will probably only run optimally if you use a multiserver Apache/Squid layout similar to the WMF's. Also realize that MediaWiki uses InnoDB by default, which is typically slower than MyISAM for small sites but more effective for large request volumes, and there are likely many other little issues like that.

I think it can be fairly said that MediaWiki is something of a resource hog for small or poorly-configured wikis, but for scalability I'd be very surprised if any other wiki package comes close. They just don't have to: almost all their users will be running small wikis. They wouldn't have even a single user nearly as large as Wikipedia. But MediaWiki is designed primarily for Wikipedia and its sister sites. You get what you design for: MW is best for huge sites, other packages are much faster for small ones.

Re:Still doesn't understand the code itself on Finding New Code · 2007-02-05 14:56 · Score: 1

When Google launched their engine, I was disappointed they didn't take the extra time needed to make their parser/engine smart.

Aw, c'mon, you can't blame them. It's a beta!

. . . wait a minute, this one actually isn't. Are there others?

Re:Thanks but no thanks. on The Death Of CS In Education? · 2007-02-05 14:10 · Score: 1

A university is a seller which should pander only to its clients - the students attending it. As such, the only thing that should be taught is what the students want to learn.

Really? Damn, I wish I could convince my university to let me off those bloody humanities requirements, then.

A better way to look at it is: universities teach things that students may or may not want to be taught, to filter out those who aren't willing to learn.

Re:Viacom is being stupid on Viacom Demands YouTube Remove Videos · 2007-02-03 14:25 · Score: 1

But there's no reason Viacom shouldn't demand a cut of the ad profits Google's making from short clips of its shows, on top of the advertising. Also, how long until some joker uploads entire shows in two-minute segments?

Re:Drop them on Viacom Demands YouTube Remove Videos · 2007-02-03 14:22 · Score: 1

I'm not sure if that's legal under antitrust law, but even if it is, they can't do this kind of thing regularly, for the simple reason that it reduces the quality of their search results. If I routinely couldn't find an official page for a movie or whatnot using Google just because they had a tiff with the movie's publisher, you know what, I'd probably use some other search engine. Google isn't about to compromise its most important product for a side show. And then there's public image, too.

Re:designer not developer on CSS: The Definitive Guide · 2007-01-29 15:58 · Score: 1

If this is a valid CSS3 selector, I think it will do what you want, but I'm not totally clear on where function-style syntax is allowed: body:not([lang|=attr(lang)) span[lang] {...}. Of course, it won't work for languages declared using anything but the lang attribute, but it might be a start.

Re:designer not developer on CSS: The Definitive Guide · 2007-01-29 15:52 · Score: 1

I realize that CSS is not a programming language. More's the pity; if it were a little more like a language then a lot of things that are very hard to do in CSS and require dynamic styles (or at least assignation of styles) to do properly would be very easy. For example, I should be able to say that an element is n% minus the width of another element... But since I can't I have to figure out how many units (typ. pixels) are in n%, then figure out how many units wide the other element is, and then set the width of the element I was trying to size accordingly. And if the element I'm trying to fit in next to is sized in ems instead of px, I have to recalculate this data and resize elements every time the font size is changed.

The current Working Draft of the CSS3 Values and Units module allows for a calc() function that permits this behavior. Generally, yes, many things would be easier if an embedded programming language would be allowed, but this would potentially come at a significant speed cost. Part of the idea behind CSS is that it should not greatly slow down rendering as compared to simpler layout schemes.

Re:Ohhhh JOY Yet Another Mystery Novel (YAMN) on CSS: The Definitive Guide · 2007-01-29 15:44 · Score: 1

As an example. Using un-ordered lists to create menus. Its a complete and total hack, and I mean to use the word hack in the most derogitory manner possible. Instead of comming up with a menu framework that was designed from the ground up to be menus they used this stupid hack and think they are so cool. News for you, your not cool, your not smart nor are you clever.

What would you prefer? A table? Surely no better. But lists have advantages: for instance, some mobile devices will cleverly collapse lists by default, so that they don't take up valuable screen real estate. Screen readers may audibly delineate lists in some way. And so on. If you don't care about people not using the browsers you checked things out with, hey, by all means stick with tables.

If you're suggesting that a tag or somesuch be introduced, again, that's not CSS. It's the fault of HTML. The last draft of XHTML 2.0 that I looked at had a navigation list tag in it, for what it's worth.

And if the problem is just that you think any complicated style rule is a "hack", well, frankly I think it's more legitimate to say that table-based layout is a hack, since it's not what the original versions of HTML meant for that to be used for at all.

Lets take for example something that could make all of our lives easier, the basic ability to have include files.

You utterly misunderstand the entire purpose of CSS. It is meant to be for styling. Not for content inclusion or anything similar. For that, you could use XSL, maybe. Or you could use a CMS. Or, yes, you can use frames. CSS is deliberately designed to not slow down page rendering much, so something that requires you to wait for a server response is totally unacceptable for it (even reflows of already-rendered content are unacceptable).

Re:Isn't this the entire methodology of Google? on Google Defuses Googlebombs · 2007-01-26 06:16 · Score: 2, Informative

I was under the impression that the link text was the entire means by which Google
created their PageRank algorithm.

Nope. It depends heavily on how many sites link to you, how highly rated those sites are, what they're about, etc. See the Wikipedia article.

Re:Worst on Google Defuses Googlebombs · 2007-01-26 06:13 · Score: 1

It deosn't matter. You can guess what "Worst president ever" links to.

No Googlebomb required.

w00t!

Yep. It links to the most prominent and popular essay on the Internet dedicated to discussing who the worst president ever is. Exactly the right result.

Re:miserable failure on Google Defuses Googlebombs · 2007-01-26 06:09 · Score: 1

Tweaking and tuning existing algorithms is not rethinking the problem.

You think Google isn't devoting substantial resources to completely rethinking the problem? They don't want to be behind the curve when the next Google comes along with a rethink of the problem. They want to get there first. They just haven't thought of anything much better than PageRank yet, and neither has anyone else.

Re:Big changes? on Google Defuses Googlebombs · 2007-01-26 06:04 · Score: 5, Informative

Are you saying that bots are getting different search results than users? Because absolute shitloads of websites serve different versions of their pages to google for a wide variety of reasons. For example some premium sites allow google to index part of their content in order to rope people into buying a subscription.

Yes, that's called "cloaking" and can get you delisted. BMW Germany's website got removed from Google a while back for doing it, and presumably less prominent ones regularly are as well. Google's official position is that you should write a decent web page and they'll be able to figure out how it should rank:

Make pages for users, not for search engines. Don't deceive your users or present different content to search engines than you display to users, which is commonly referred to as "cloaking."

Avoid tricks intended to improve search engine rankings. A good rule of thumb is whether you'd feel comfortable explaining what you've done to a website that competes with you. Another useful test is to ask, "Does this help my users? Would I do this if search engines didn't exist?"

Don't participate in link schemes designed to increase your site's ranking or PageRank. In particular, avoid links to web spammers or "bad neighborhoods" on the web, as your own ranking may be affected adversely by those links.

Don't use unauthorized computer programs to submit pages, check rankings, etc. Such programs consume computing resources and violate our Terms of Service. Google does not recommend the use of products such as WebPosition Gold that send automatic or programmatic queries to Google.

Re:I could not resist... on Catching Spam by Looking at Traffic, Not Content · 2007-01-25 09:29 · Score: 1

While I realize this was partly a joke . . .

(x) It will stop spam for two weeks and then we'll be stuck with it

Perhaps, but I suspect it will put a fairly large crimp in the business. Sender-side validation is going to be more useful than recipient-side, because the sender has more information about the sending client. Even aside from that, it has the benefit of putting framework for easy upgrade in place.

(x) Requires immediate total cooperation from everybody at once

The same could be said for pretty much any protocol change, and it's not true. The infrastructure will be phased in, and then once it's in place and running, SMTP service will end. The only ones who will be screwed over by that are the guys who were sending e-mail from their own servers, but we can campaign to inform them (to begin with, sending alert messages that "you're still using SMTP, stop!!" in response to any SMTP e-mail sent to a prepared server).

Of course, we aren't going to block packets for being SMTP . . . just the major players will (be required to) stop accepting it. If you want to send SMTP to your own server somewhere, go ahead.

(x) Anyone could anonymously destroy anyone else's career or business

Wha? How?

(x) Lack of centrally controlling authority for email

Um, I'm suggesting we create one.

(x) Open relays in foreign countries

No difference between them and any other client, really. The entire point is that the gateways do the analysis, and open relays have to go through gateways like anyone else.

(x) Huge existing software investment in SMTP

There's a huge existing software investment in HTML, too. Let's ditch this XHTML foolishness. I mean, standards are supposed to stay static forever and ever and no new ones ever created, right?

(x) Joe jobs and/or identity theft

Not relevant, unless you're referring to the photo-ID requirement. If you are, well, the cameras would kind of pick you up, y'know, so even if you give false ID you'd suddenly be an internationally wanted criminal.

(x) Ideas similar to yours are easy to come up with, yet none have ever
been shown practical

Well, yeah, I noticed. But saying "you have to prove it's practical" is kind of a conversation-killer. I was asking why it wasn't practical.

(x) Why should we have to trust you and your servers?

Why should you have to trust those damn DNS servers? And the Internet routers? They're controlled by telecoms, for crying out loud!

(x) I don't want the government reading my email

Never mentioned the government reading anyone's e-mail. Actually, you could probably use SSL or something in any new protocol people come up with, so the likelihood of the government reading your e-mail is probably decreased.

Re:Isn't this "ray" easily blocked? on US Military Tests Non-Lethal Heat Ray · 2007-01-25 06:23 · Score: 1

The source you identified states a few square inches... not the 'tiniest gap'.

Several companies make cotton fabric with a flexible metal mesh covering the entire exterior surface -- ever see modern sabre fencing? It would be trivial to make entire uniforms of this material, and a fencing-equivalent mask could be used for the eyes, nose, mouth etc.

Actually, it said "a small exposed area", "any gaps", and gave "tips of your fingers" as an example. That implies, to me, that the amount of skin that needs to be exposed is quite small. But regardless, it will definitely be a pain in the neck to dress up like that, especially since it means you'll just be shot instead of burned . . . I can't see it catching on. But we'll see, I guess.

Why not require this at the protocol level? on Catching Spam by Looking at Traffic, Not Content · 2007-01-25 05:33 · Score: 1

Here's what I've never gotten. By definition, spam is unsolicited mass e-mail. So ditch SMTP, and replace it with a protocol that has the following characteristics:

A list of approved gateways will be maintained by ICANN or some similar body. Any gateway found to not abide by the protocol will be removed from the list. To eliminate the possibility of spammers repeatedly getting approval, registration will require some kind of real-life physical registration with photo ID. Deliberate violation of the protocol by a gateway for some reason would be a criminal offense in all countries; if a country did not enforce this adequately, their citizens would be prohibited from setting up gateways (although they could still use other countries').
All e-mail must pass through an approved gateway (major e-mail providers could get approval for themselves, your dinky little server can route through one of the biggies while keeping the address). Each domain name will probably get assigned to a single gateway, but gateways can have multiple domain names.
Each gateway will track its clients for mass e-mailing. Whenever a mass e-mail is detected, as defined by the specification, every recipient's gateway will be queried for whether the recipient has whitelisted the sender. If the recipient has not, the message will not be sent to that user (and the sender will be informed). Mail clients would provide a standardized mechanism for whitelisting, such as the user clicking a particular kind of link (and confirming via popup), and would display a message at the top of whitelisted messages allowing the whitelisting to be removed.
If unsolicited e-mail is received, a "report spam" function would exist at the protocol level, which would instruct the sending gateway to deal appropriately with the user.
The regulatory body in question could require all gateways to perform some particular kind of analysis on all outgoing and/or incoming e-mails. This wouldn't be heuristic-based if at all possible, but would rather perform simple yes/no checks such as whether an attachment matches a known virus.

Currently, any computer can send out as many SMTP messages as it wants and claim that they originate from wherever it wants. This protocol would mean transparency: gateways would have to be trusted and so you couldn't fake the recipient. There would be basically no change on the front end; only the hosts would have to adjust. And there would be no way to send unsolicited mass e-mails.

Of course, this isn't a cure-all. It wouldn't prevent viruses from sending spam to all their host's contacts, or prevent someone from setting up many accounts via proxies to do nothing but spam at a very low rate. But it would leave a more solid information trail than we currently have, and e-mail viruses would be halted within days at worst. Best of all, due to the centralization, improvements to the protocol could easily be made. In short, as far as I can see, it would be a win-win for the Internet, even if it would require some minor sacrifices.

Re:Isn't this "ray" easily blocked? on US Military Tests Non-Lethal Heat Ray · 2007-01-25 05:03 · Score: 1

Couldn't an organized crowd just pull the metal screens off their windows and use them as shields? Last I checked, those work great against microwaves. You could even make clothing made of flexible metal mesh to block the incoming rays.

Military tests apparently show that even the tiniest gap in the clothing will give you basically the full effect. Anyway, what are you going to do about the eyes, nose, mouth . . .

At any rate, the worst that this means is it's not totally effective and you have to shoot the guy. Oh well.

Slashdot Mirror

User: Simetrical

Comments · 657