Slashdot Mirror


User: Andrew+Cady

Andrew+Cady's activity in the archive.

Stories
0
Comments
615
First seen
Last seen
Profile
(view on slashdot.org)

Comments · 615

  1. Re:And an absence predisposes you to conservativis on Researchers Find a 'Liberal Gene' · · Score: 1
    Please ignore the anonymous version of this post.

    I was a liberal until I began to understand it was my money at stake, and my money is what I use to provide for my family... and distribute to charities as I see fit.

    Only collective social action can insure against unemployment, so that everyone (not just x%) can continue feeding their family, regardless of what happens on Wall Street or in China...

    Of course I realize this doesn't work. If you cannot threaten to starve a man's children, you cannot force him to husk corn. And a nation that cannot force anyone to husk corn cannot compete against China. I fully realize this.

  2. Re:And who gets to define "liberal?" on Researchers Find a 'Liberal Gene' · · Score: 2, Insightful

    To some people, a "liberal" is someone who believes the government should take care of people who have been left behind someway in the economic process, the unemployed, the homeless, those who are at a disadvantage in some way. Under that point of view, Cuba should be considered one of the most "liberal" regimes in the world.

    Sorry, but no, communism is NOT being more of a democrat than the democrats. Communist politics simply do not fit on this spectrum.

    There's a qualitative difference between saying that the underclass should have a better standard of living than they do now, and saying that the existence of an underclass should be abolished.

  3. Re:Whew... So there is hope for a cure? on Researchers Find a 'Liberal Gene' · · Score: 1

    In the USA, the GOP consistently courts the stupid demographic, while the democrats have surrendered it. It's not that conservatism is stupid, but that the GOP actually compromises with the stupids, giving them the things they stupidly want (e.g., purely symbolic exclusion of gays, myriad forms of flag-waving), in exchange for power used for unrelated ends (e.g., corporate tax policies).

    Of course, the Democrats do the same with, say, the black demographic.

  4. Re:Weak error handling on Taco Bell Programming · · Score: 1

    "147 line of code" which does not cover most of what we are talking about.

    It does some of what someone (maybe you) said couldn't be done with this approach, thus proving its possibility...

    To use the ingredient analogy. If wget is equivelant to a tomato and you change the wget code it is no longer a tomato but a genetically modified tomato that can only be used in that one recipie.

    Again, so what? (And who says it can only be used in that recipe? I use the feature I added to wget all the time.)

    Does your system handle hundreds of sites without hand editing a config file or script? Does your system monitor runs to see if they complete and figure out what to do if they do not? Does your system tell the difference between a no data timeout and a slow data timeout? Have you solved the problem of coordinating multiple wgets with host spanning?

    I already answered the last question (don't use host spanning). With regard to the others, it doesn't matter. More requirements mean more coding, but the general approach of starting with wget instead of coding from scratch is going to get shit done as quickly as possible without reinventing the wheel. You can come up with features X Y and Z that aren't already simple switches (although, notice that others in this thread are listing features that are already simple switches) -- but that in itself is a poor argument for coding A, B, C... from scratch, when there's a lot already done for you.

    Now, frankly, the issues you are listing seem pretty damn trivial to me. I just don't see what the big problem is. Still, I don't want to address them point by point in this thread. (I also recognize that, in principle, more difficult features to implement could be thought up.)

    I think the real point underlying the article (IIRC...) is that perfectionism (and implementing everything yourself is a form of this) sure can waste a lot of time. If you're trying to make the most of your effort -- instead of trying to make the best piece of software possible -- you need an attitude that searches for a lazy "good enough" solution. If your business model depends on having a better web scraper than anyone else, then you might write one -- but if you're not selling a proprietary web scraper, then it probaly doesn't, and you're wasting your time, losing sight of the big picture.

    BTW, the code I'm talking about retries failures infinitely, but only when specifically instructed to do so. Certainly good enough for my purposes at the time I wrote it. It would be trivial in that code to continually devote, say, x% of processes to retrying errors, if you wanted more automation. Just need to decide on x.

    The original poster posited that everything can be done using generic unix functions with a little glue. That is patently false considering that there are many features that are part of system requirements that are not covered by standard Unix calls.

    (NB. when you say OP, you mean the article; when I say OP, I mean the OPon slashdot, who disagreed.) My point is much more specific: that wget can do a lot more than OP said. I agree that 'xargs' won't suffice to drive wget for this purpose -- unless it does. Depending on your purpose, maybe you should just run it and use the results you get, accepting limitations -- at least you would get results, and without debugging any code.

    Anyway, like I say above, although the limitations you list here are real, they still seem surmountable to me, and not with all that much effort. I certainly don't find the possibility of doing so "patently absurd", although the definition of "a little" glue is of course arbitrary. I originally got into the thread because I saw people saying things were not possible which I had already seen done.

    The biggest failing of the Taco Bel

  5. Re:Students will complain on Colleges May Start Forcing Switch To eTextbooks · · Score: 1

    Oh shit, I didn't read the article :p

  6. Re:I expect the following: on Colleges May Start Forcing Switch To eTextbooks · · Score: 1

    So pessimistic. On the contrary, I predict that it will soon be possible to download 90% of your textbooks for free, instead of the current 50%.

  7. Re:Students will complain on Colleges May Start Forcing Switch To eTextbooks · · Score: 1

    The opt-out is piracy, which this move will certainly simplify.

  8. Re:Weak error handling on Taco Bell Programming · · Score: 1

    That looks like quite a bit of code.

    The definition of "quite a bit" is matter of opinion, but I'm talking about 147 lines of perl (including comments and blanks) and 88 lines of shell scripts serving misc. ancillary functions.

    Point 5 means that we are no longer using wget but our own version of wget.

    Sure, but so what? It's free software.

    It also does not fix the multi connection when host spanning is used. It also does not handle sites that we do not want to crawl that may be connected to sites that we do want to crawl.

    The problem can be solved.

  9. Re:Weak error handling on Taco Bell Programming · · Score: 1

    Does that mean the operator has to manually monitor the crons and restart the ones that failed?

    Here's the thing. Either you're writing code to monitor wget processes, or you're writing code to monitor your custom coded wget-replacement (or equivalent logic within an application not divided into processes). Or you're doing it manually, which may be reasonable.

    How do you schedule orbitz.com to go off and then soggy.com to go off later?

    Write code that launches wget on your schedule... Why do you think this is hard to do with wget?

    What of you are handling hundreds of different web sites? Hundreds of crons? How do you retry later on sites that are very slow at the moment? How would you know that wget timed out due to slow download?

    Having done this, I'll tell you how I did it. I don't claim it's exactly pretty, but it works, it's easy to do, and it won't cause problems if you're careful:

    I enabled wget's logging facilities and scanned the logs for failures. I kept a queue of wget processes to run, and kept a fixed number of wget processes running at a given time. (I changed the number of processes as necessary by hand, although this might have been handled heuristically to maximize resource usage.)

    I'm totally confident this approach could be scaled up to the size of the whole internet, because the task is so easily divided into small sections and you're going to hit bandwidth limits long before number-of-processes limits. First assume that you're using a separate process for each host (not a wget process, but the glue process that runs wget). Are there too many hosts for that many processes? No. Are any hosts too big to be handled by a wget-coordinating script? You may think so, but I know you're wrong because I've seen it...

    This is a perfect example of the 80/20 rule. The "solution" may cover 80% of the problem but that final 20% will require so much babysitting as to make it unusable. Wget is not an enterprise level web crawler.

    You're right that there is a lot of "babysitting" required; you're wrong that the solution to certain of these problems must be "unusable" -- I know because I've seen them solved. My intuition says the others would be similarly solved.

    One thing you might have to do is edit wget source.

  10. Re:which language is best? on Taco Bell Programming · · Score: 2, Informative

    Wget for crawling tens of millions of web pages using a 10 line script? He doesn't understand crawling at scale.

    Wget is made for crawling at scale.

    There's a lot more to it than just following links. For example, lots of servers will block you if you start ripping them in full, so you need to have a system in place to crawl sites over many days/weeks a few pages at a time.

    wget --random-wait

    You also want to distribute the load over several IP addresses

    The way I do this with wget is to use wget to generate a list of URLs, then launch a separate wget process with varying source IPs specified with --bind-address. It would, however, be trivial to add a --randomize-bind-address option to wget source.

    and you need logic to handle things like auto generated/tar pits/temporarily down sites, etc.

    What makes you think you can't handle these things with wget?

    And of course you want to coordinate all that while simultaneously extracting the list of URLs that you'll hand over to the crawlers next.

    Again, why do you think wget is inadequate to this? It's not.

    Any custom-coded wget alternative will be implementing a great deal of wget. Most limitations of wget can be avoided by launching multiple wget processes, putting a bit of intelligence into the glue that does so. If that isn't enough, it probably makes sense to make minor alterations to wget source instead of coding something new.

    My point here is just that wget is way more awesome than you give credit.

  11. Re:"Community service" == free labor for the state on Motorcyclist Wins Taping Case Against State Police · · Score: 1

    The inmates (ahem, volunteers) are motivated by the contract they sign which allows them to be put in jail for failure to satisfy their supervisors (or, if not put in jail, then at the very least put to work for another day).

    It's not dissimilar to the incentive system at work behind minimum-wage employment. It works.

    (What are you basing your 'doubt' on, anyway? You seem unfamiliar with the system. Are you?)

  12. "Community service" == free labor for the state on Motorcyclist Wins Taping Case Against State Police · · Score: 1

    "Community service" means doing for free what the state would otherwise have to pay minimum wage to have done. The economic incentive is still there.

    Your 'reading to kids' scenario is a myth (an exceptional sort of thing that might result from negotiated plea bargains involving high-priced lawyers). For the masses, "community service" is just forced labor.

  13. Re:Bringing Claude Shannon to higher education on Bringing Convenience and Open Source Methods To Higher Education · · Score: 1
    Most of your response does not address my original posts at all. I'll address the one point that does.

    Are the best communities the product of local universities or the global village?

    How does one discern the goodness of a community?

    Arbitrarily. Here's one metric: where can I go to get a physics question answered? Who will answer my physics question fastest and in most detail? I don't think I will find the fastest answer at a university [online or not].

    But by all means blog about it after class.

    That's some smug attitude you got there, but here on slashdot, we write programs after class...

    Although I have to say, there's a lot more value in any blog that people actually read, than in a college paper written for an audience of one grader -- who will learn nothing from it.

  14. Re:Bringing Claude Shannon to higher education on Bringing Convenience and Open Source Methods To Higher Education · · Score: 1

    The internet is at every university already. Campus denizens are overrepresented in many/most/all online forums. It isn't a question of one or the other, but rather of maximizing the benefit from both styles of communication.

    OK, but I'm not talking about "styles of communication," I'm talking about the communicating communities themselves. Are the best communities the product of local universities or the global village? It is going to depend on specifics, but usually the local community -- no matter what sort -- is not going to be able to compete.

    It's just so much easier to form connections at light-speed than whatever the average speed of a human body is.

  15. Re:My memorable college experience was getting lai on Bringing Convenience and Open Source Methods To Higher Education · · Score: 1

    To me, the classic moment of college was standing up in a classroom having to defend a position that people disagree with. And then arguing about it later in the cafeteria or dorm. If you've never spent all night arguing over the existence of God, then you never had an education.

    I was doing this sort of thing when I was fifteen -- on the internet, with adults [including, by happenstance, a math professor]. There are entire internet forums devoted to arguing about god. Really, are you thinking about what you're saying? Do you realize where you are? If you want all-night arguments, the internet is going to beat any university...

    And yes we did have a few drinks or a joint. And yes it's nice to have some girls join you in your intellectual explorations.

    The only reason I ever went to university was to meet girls.

  16. Re:Consider Star Trek... on Bringing Convenience and Open Source Methods To Higher Education · · Score: 1

    It's hard to find as great a concentration of intelligent people with an interest in a certain specialisation on the Internet as at a university. It's even harder to find a place with a high concentration of intelligent people with an interest in a certain specialisation and a lot of intelligent people working in a completely different field on the Internet.

    I really don't think this is the case. Especially if you include "intelligent." For example, try to find a localized group that can compete with Undernet's #math for opportunities to talk about advanced math. I doubt one exists in the world; I certainly wouldn't expect to find one at arbitrary university. Certainly, if I had a math question, it would make more sense to go there than to a university. Especially at 3am.

    There's a reason why so many math majors & grad students spend so much time on IRC talking about math, rather than spending that time talking about math with their local peers.

    The internet connects everyone in the whole world, so for any selection criteria, with such a larger pool, it's almost always going to win.

  17. Re:Consider Star Trek... on Bringing Convenience and Open Source Methods To Higher Education · · Score: 1

    Both in undergrad and grad school, I learned way more from random discussions, be they with other students or professors, than I ever did during the official class time. So much of an education is had by being around others who are also interested in the same things and eager to talk about it.

    Because it's so hard to find people to talk to on the internet??

  18. Re:One sentence discredits the whole article on Bringing Convenience and Open Source Methods To Higher Education · · Score: 0

    Actuall, UofP is VERY good for certain types of degrees. Computer Science being one of them. While I don't have a degree from UofP, I have worked with IT people who do, and they were smart, motivated, well educated people.

    Correlation is not causation. Perhaps the types that get a UofP "education" have been hacking since they were 12.

  19. Re:Erm.... Labs? on Bringing Convenience and Open Source Methods To Higher Education · · Score: 1

    My girlfriend just got finished telling me she's doing vet school on about $6000 a year for tuition and living expenses.

    She may have a large scholarship.

    At a lot of expensive USA private schools (most of them if the sample with which I am familiar is representative) 10%+ receive enough scholarship money to pay about what a community college costs. (Still, $8000 is on the low end of that if you actually include living expenses; $8000 is about enough to rent a single bedroom in NYC.) But there is always the other 90%.

  20. Re:One sentence discredits the whole article on Bringing Convenience and Open Source Methods To Higher Education · · Score: 3, Insightful

    The University Of Phoenix education is a complete and utter joke. What they teach is worthless and best and counterproductive at worst(and yes, I have seen some of the content of their masters programs, assignments that include algebra I was doing in 7th grade and homework questions like, "What is a MAN?")

    That doesn't matter, because what universities sell is not education but credentials.

    After all, the internet as a whole provides a much richer educational environment than any university possibly could, "internet university" or not. (Indeed, classes in ordinary universities are also a joke, if you're accustomed to learning things without being forced.)

    But just learning things won't help you get you a job. I have heard perfectly competent hackers talk about going back to get another degree (in computer science) even though they know they wouldn't learn anything there, because it would help them get higher-paying jobs.

    So yeah, there's a market for credentials, and the less time you have to waste pretending to be learning what in fact you already know, the better.

  21. Re:toposhaba on Congress Mulls Research Into a Vehicle Mileage Tax · · Score: 1

    Considering the absolute dependence of all of humankind on energy production via fossil fuels, (1) it does not matter what any document says; unconstitutionality is preferable to the self-destruction of the human race, (2) the constitution does in fact authorize congress to pass laws necessary to promoting the general welfare.

    Anyway, don't worry: no governments (or anyone else) are actually doing what is necessary to prevent humans from destroying through over-consumption the environment on which they depend for survival (as they have been doing for the last 10,000+ years). Enjoy your peak oil!

  22. Re:Keep in mind on Future of NASA's Manned Spaceflight Looks Bleak · · Score: 1

    The question was, why would anyone make such an investment? I don't assume that the prize is the only return on the investment; I just don't assume that there is any other return on the investment. (It doesn't matter though, because the $180B up-front loss is too much for anyone, anyway.) But if there was such a large predictable return on the investment separate from the prize, then why would anyone need to offer the prize?

  23. Re:Keep in mind on Future of NASA's Manned Spaceflight Looks Bleak · · Score: 1

    you could probably do it for a billion or two

    Where are you getting this figure from?? (C'mon man, you're just making it up!)

    Ultimately private companies will land on the Moon once the tourist market is rich enough to sustain flights there; which won't happen until after there's a big tourist market for suborbital flights and then a big tourist market for orbital flights as costs fall.

    There's never going to be a "big market" for tourist flights to the moon because it requires too much fuel (well over 99% of humans could not afford a vacation via Concorde; most humans living now cannot afford a road-trip); and whatever "space tourism" there is will of course represent the deprivation of millions.

  24. Re:Keep in mind on Future of NASA's Manned Spaceflight Looks Bleak · · Score: 2, Insightful

    Did you actually read your Ansari X Prize link? "$10 million was awarded to the winner, but more than $100 million was invested in new technologies in pursuit of the prize."

    So apparently the prize resulted in a 90% loss of investment (in the short-term). Now take into account the fact that there are a lot more people capable of losing $90M than $180B...

  25. Re:Keep in mind on Future of NASA's Manned Spaceflight Looks Bleak · · Score: 1

    I propose we take the remaining $32 billion that NASA hasn't spent yet, and deposit it in a bank somewhere. The first American company that lands human beings on the moon, keeps them there for one day, and returns them to Earth can collect $20 billion. The second company that does this can collect $10 billion. The third can have the last $2 billion.

    Why would any investor choose to fund a company that planned to attempt to collect this money? How do you convince the investor that even $32 billion, let alone $20 billion (which is the break-even point only for first place), is enough to accomplish the mission?

    It just doesn't make sense. When you make an investment where it's possible to lose everything, you want a return on the money that's several times what you put in. Reward needs to be proportional to risk. When corporations make multi-billion dollar contracts, the agreement always includes payment through the contract for on-going activity; they never raise the money up-front, and take on all the risk through their own investors.

    It's too much money for private corporations to raise for any project, anyway.

    It's just so wrong...