unimi.it · Domains · Slashdot Mirror

Re:favorites... by utoddl · 2013-10-21 02:20 · Score: 1 · on Ask Slashdot: Do You Use Markdown and Pandoc?

Also, programmers write software, not documentation.

I've been contributing to a project over the last 20 years (which happens to be a text editor) which has 80+ pages of documentation in texinfo. I'm not specifically recommending texinfo; it's what the project started with and it works. The point is, the build process renders a plain text version of the texinfo documentation, then a script reads the formatted docs to build several of the program source files, including hash tables of the commands, static text for the internal "help" commands, enums (it's C code), etc. You literally cannot add a new command to the program without adding it first to the documentation. It's a slick way to keep the documentation in sync with the code. The same idea could surely be implemented with many other document source formats. It's a step towards Knuth's Literate Programming without going overboard.

Programmers certainly can and should write documentation.

Article is misleading by Jookey · 2011-12-05 16:03 · Score: 1 · on New Theory Challenges Need For Dark Matter

The paper linked in the summary is essentially a rehash of this written in 2008:
http://www.mat.unimi.it/users/carati/pdf/atenemissing.pdf
This article essentially states that the far field gravitational effects of distant galaxies are not negligible. This is the essential meat of the idea.

From the article linked in the summary:
"In the literature, there are several papers which illustrate different ways in which the distribution of matter can be estimated, starting for example from the measured distribution of luminosity of the galaxies. We take instead the simpler path which consists in assuming a functional form for [local gravitational potential], with a minimal number of parameters and then trying to determine these parameters by a best fit with the observed rotation curves."

This part really bugs me. Why does the author completely ignore the luminosity distribution of the test galaxies. At the very least this means that all the pretty graphs you see when scrolling through the pdf are misleading.

Re:Replace their respective pages with a message by martin-boundary · 2011-04-04 19:53 · Score: 2 · on Yahoo! Liable In Italy For Searchable Content

Actually, Italy has a respectable record in search engine research ( linky1, linky2).

Re:Congrats! by raddan · 2010-11-05 13:02 · Score: 1 · on EPIC Files Lawsuit To Suspend Airport Body Scanner Use

The 4th Amendment says nothing explicitly about privacy. It just says that the government needs to have a reason, backed up by due process. Privacy has been
interpreted into the Constitution, but the 4th Amendment itself doesn't give the rationale as to why your possessions cannot be searched at the whims of a government official. We can only speculate. Given that the Exclusionary Rule was not an accepted legal principle until the early 20th century, we have to be careful when we assume we know what the Founders thought.

The 9th Amendment essentially says that the Consititution is not exhaustive, i.e., that the rights of people are not limited to those expounded in the document. I.e., they leave room for future expansion of rights. But, in fact, the Constitution was constructed in such a fashion as to imply that laws ought to be built in a "subtractive" fashion anyway, that written laws explicitly remove rights from people. The Bill of Rights was added after the fact (nearly ten years later) because some politicians thought that certain rights needed to be explicitly outlined. It was very controversial at the time. Anyhow, while you can assume that you have a "right" to something that isn't explicitly outlawed, courts are also required to consider the traditional handling of a matter (called the "Common Law doctrine") when considering a case.

Anyway, I do think that a privacy law should be explicitly drafted, but as a computer scientist, I am increasingly beginning to think that our belief in privacy as an unalienable right is a momentary blip on the radar of human history. In a profound sense, any collection of data is an erosion of privacy. Wikipedia has a really lame section on the mathematics of anonymity, but you might be interested in k-anonymity. This is a very active area of research, but generally speaking, things look bleak.

(IANAL, but I have a CS and a legal studies degree)

Why? by assertation · 2010-05-10 08:44 · Score: -1 · on Hacking Vim 7.2

'Vim Can Do Everything' but configuring it to do so is sometimes daunting.

Then why use it in the first place?

It is 2010, not 1990. You do not have to develop on *nix using CLI tools only. There are also many small and simple (i.e. NE http://ne.dsi.unimi.it/ ) CLI editors for network admins who just want to touch up config files.

People say that by learning VI you learn more unix, since much of VI is just a wrapper program for sundry *nix utilities. True, but if you invest in a good multipurpose editor centric IDE like Visual Slick Edit you will never need those things. I've been happily using Slickedit for 11 years on two different operating systems and on about half a dozen different languages.

All professionals have professional expenses. Students have to pay as much for books.
I would spend a few hundred dollars to a tool that will let you do your work without adding work to be able to use the tool.

Wow, what an idiot solution! by Jugalator · 2006-12-29 05:53 · Score: 3, Informative · on The NSFW HTML Attribute

Talk about totally missing out on the already existing adult content rating standards.

Instead of inventing something redundant here, just have browsers installed at work block access to pages rated as "breast exposure", or whatever. There is already a standard with very fine-grained control of exactly what a web page contains, if it's "visible sexual touching", language, or whatever, and the administration can then decide on exactly what they wish to allow. You can even tell that it's "nudity, but in a medical context" if you intend to loosen up the regulations in special cases.

http://www.icra.org/label/generator/

ICRA is supported by Internet Explorer and while strangely enough Firefox don't seem to have built-in support for these schemes to aid for website classification, there should be extensions like ViQ for Firefox to add this support, although I haven't tested it.

Of course, few sites today use this system well, but that's still being vastly better off than inventing some new inflexible "nsfw" HTML attribute, and modifying the HTML standard. Wow...

To me, this issue always disturbs me by bogaboga · 2005-11-02 03:28 · Score: 5, Informative · on Windows and Linux User Interfaces

The issue is decent looking fonts. I always have to download the webfonts.sh script http://vigna.dsi.unimi.it/webFonts4Linux/webFonts. sh, and turn off anti-aliasing in order to have a desktop that is a pleasure to work with. Heck ebven the most recent OpenOffice.org release is uglier on Linux than it is on Windows.

Guys, we need to have an attractive desktop by default in order to make the user experience at least more appealing. In one installation of Ubuntu, I had to tweak the X.org conf file in order to have it display these fonts correctly! And believe me...it took more than 4 hours to get right! Who would have that time in the "real" world?

hopefully not an idiotic vim/emacs comment by beforewisdom · 2005-10-17 01:51 · Score: 1 · on Vim 6.4 Released

Usually with these threads there is the temptation to make the "usual comments" regarding vi or emacs.

Interface is frequently mentioned.

If you look back to when these editors were "designed", they were "designed" for doing significant developement, on unix, in a pure shell enviornment.

There are still people who do significant development in a pure shell enviornment, but this sort of thing is in the minority these days.

Most command line editors in nix are primarily used these days to edit configuration files. There is no need for a complex and alien set of key bindings.

If all you truly need a command line editor for is editing config file you might be interested in "ne":

http://ne.dsi.unimi.it/

It has CUA ( windows, dos-like ) key bindings and it even has a DOS style drop down menu.

It is customizable, fully in the command line, and very, very small.

Desktop Linux needs the following: by bogaboga · 2005-08-27 01:03 · Score: 4, Insightful · on Vista Launch Good for Desktop Linux?

Desktop Linux will still be a long way off until applications can be installed and un-installed in an easy way. I know folks are going to mention apt-get and its sister dpkg tools. But these are not very useful unless one can configure them and is also on the internet. With the rich resources of the OSS community, one wonders why rpm dependency hell has no adopted solution. Autopackage http://www.autopackage.org/ would be a good start but all major distros are not even giving it support! From a developer's point of view, writing an application for Linux means testing the application on no more than 6 distros! In some cases, I have seen more than 17 binaries for the same applications targeting different Linux distros. In the Windows world, there could be just 1 or 2. So it follows that if we in the Linux world can make life easier for developers, then that is positive. Our egos alone will not deliver. I think we need some kind of dictatrship here.

The other thing Desktop Linux needs is good fonts. I am yet to find a desktop Linux installation that is beautiful out of the box. Often times, one has to download M$ fonts or could use the script found here: http://vigna.dsi.unimi.it/webFonts4Linux/webFonts. sh to get good fonts for the web.

Next thing is multimedia and multimedia applications. Totem in the GNOME world and Amarok in the KDE world will not play mp3s out of the box, yet there are no licensing restrictions on these formats! These are so many other examples in the multimedia field.

There is a bug/feature I found in Linux that needs attention in relation to how devices are mounted. Remember that we in the Linux world are aiming at domination. So we should attract as many users as we can. The bug is here: http://bugs.kde.org/show_bug.cgi?id=111173. I was surprised that there was a wontfix mentioned. So how are we to attract users if there will always be confusion in how devices are mounted?

Last but not least, we need publicity - good publicity. Right now, Linux is being touted as very good or good enough for the average user. What happens is that folks then have to understand that Linux is just a KERNEL and that there are many implementations associated with this kernel. To many, understanding this is a challenge. So one says "I use Linux at home, it's freely available on the net...try it out..." (and they leave it at that)! What follows is confusion as newbies find tons of distros and incompatible packages. Folks what do you think?

Re:Still ugly fonts - this works too! by bogaboga · 2005-07-31 01:08 · Score: 4, Informative · on GNOME 2.12 Previewed

1: Install Microsoft true-type fonts.

2: You could install them via this script: http://vigna.dsi.unimi.it/webFonts4Linux/webFonts. sh

Then do the following:

Configure X and Gnome to 96 dpi sudo cp /etc/X11/xorg.conf /etc/X11/xorg.conf.bak sudo gedit /etc/X11/xorg.conf Locate Section "Monitor" and add the following lines before EndSection: # DisplaySize 270 203 # 1024x768 96dpi

# DisplaySize 338 254 # 1280x960 96dpi

# DisplaySize 338 270 # 1280x1024 96dpi

# DisplaySize 370 277 # 1400x1050 96dpi

# DisplaySize 423 370 # 1600x1400 96dpi

Uncomment the line corresponding to your current resolution.

To get other values, use the following formula:

displaysize = {pixelsize}/96*25.4

Remember:

The display size must be "right" so adjust those values till you get your size right.

What I like about Linspire by bogaboga · 2005-07-29 10:45 · Score: 3, Interesting · on Review of Consumer-Friendly Linux Distro

There are many things to love about Linux based Operating Systems especially on the virus/malware/addware side, but what I have come to like about Linspire is that it just works as advertised. It just works! SuSE does not (remember multimedia)? Second, it's beautiful. I love its fonts. For other distros, I have had to tweak X11 and download this script http://vigna.dsi.unimi.it/webFonts4Linux/webFonts. sh in order to see fonts as I like them. What troubles me most is that even for Debian based ones, the invoking of this command "apt-get update && apt-get upgrade" might leave you with an unusable system.

Now, if Linspire could adapt autopackage http://autopackage.org/, the better since Linspire packages would be able to install on any distro.

One lesson, but not specifically for linux by beforewisdom · 2005-05-14 14:33 · Score: 1 · on 25 Years After DOS - Lessons for Linux?

DOS applications, despite being shell driven have long had friendly "drop down" style menus.

The new *nix shell editor ne has such menus:
http://ne.dsi.unimi.it/

I think such menus in emacs, vi, and other shell apps would make those apps a lot more friendly.

Fastutil by BoxedFlame · 2004-12-14 22:05 · Score: 2, Interesting · on What are Some Essential Java Libraries?

fastutil is what Jakarta Commons Collection should've been: an actually competent implementation of truly type safe containers.

MG4J from the same place is pretty interesting too.

Fastutil by BoxedFlame · 2004-12-14 22:05 · Score: 2, Interesting · on What are Some Essential Java Libraries?

fastutil is what Jakarta Commons Collection should've been: an actually competent implementation of truly type safe containers.

MG4J from the same place is pretty interesting too.

Re:Passe... by rreyelts · 2004-09-30 07:19 · Score: 2, Interesting · on Have a Nice Steaming Cup of Java 5

No, autoboxing suffers an entirely new set of problems. For example, let's take a container, which is the most commom usage case of generics.

You will be forced to box and unbox upon every single access to that container. Boxing can be quite expensive, compared to simple operations, because it will usually involve a memory allocation - which can suffer even more under multiprocessor boxes due to heap contention.
Your container will use a much larger amount of memory compared to an array of primitives. Compare an int[] to a List<Integer>. The int[] will cost you 4*N bytes. The List<Integer> will cost you 16*N bytes - 4*N for the Object[], 8*N for the two-word object header for each Integer, and 4*N for the actual int value stored in each Integer.

For systems like mine, that store 100 million objects in memory, a 4X memory increase is hugely unacceptable, and the access penalty is also unacceptable. My situation is not uncommon, and it is the reason why libraries like fastutil exist and are so popular.

I think .NET is mostly a rip-off of the JVM with very little innovation, but they seem to have a much better approach to primitives, with JIT type-specialization

Re:Alliances... by beforewisdom · 2004-04-05 01:41 · Score: 2, Interesting · on Japan, China, S Korea Agree To Standardize Linux

This is not intended to be flame bait.

The vi vs emacs question is irrelevant to everyone but developers, and then only a small group of developers. For simple system editing you don't need to have either on your system:
NE editor:
http://ne.dsi.unimi.it/

Since the concerns of these goverments are for everyday users their concerns will be for ease of use and so far KDE is ahead if for nothing else its similarity to windows.

Just my opinion

Steve

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Article Text for those too lazy to click the link by Anonymous Coward · 2003-06-12 06:42 · Score: -1, Redundant · on Computing PageRank on your PC?

Introduction

WebGraph is a framework to study the web graph. It provides simple ways to manage very large graphs, exploiting modern compression techniques. More precisely, it is currently made of:

A set of flat codes, called codes, which are particularly suitable for storing web graphs (or, in general, integers with power-law distribution in a certain exponent range). The fact that these codes work well can be easily tested empirically, but we also try to provide a detailed mathematical analysis.
Algorithms for compressing web graphs that exploit referentiation ( la LINK), intervalisation and codes to provide a high compression ratio: for instance, the WebBase graph (2001 crawl) is compressed at 3.08 bits per link, and a snapshot of about 18,500,000 pages of the .uk domain gathered by UbiCrawler is compressed at 2.22 bits per link (the corresponding figures for the transposed graphs are 2.89 bits per link and 1.98 bits per link). The algorithms are controlled by several parameters, which provide different tradeoffs between access speed and compression ratio.
Algorithms for accessing a compressed graph without actually decompressing it, using lazy techniques that delay the decompression until it is actually necessary.
A complete, documented implementation of the algorithms above in Java, contained in the package it.unimi.dsi.webgraph. Besides a clearly defined API, the package contains several classes that allow to modify (e.g., transpose) or recompress a graph, so to experiment with various settings. The package relies on fastutil for a type-specific, high-performance collections framework, on MG4J for bit-level I/O, on the COLT distribution for ready-to-use, efficient algorithms and on GNU getopt for line-command parsing.
Data sets for very large graph (e.g., a billion of links). These are either gathered from public sources (such as WebBase), or produced by UbiCrawler.

In the end, with WebGraph you can access and analyse a very large web graph, even on a PC with as little as 256 Mbytes of RAM. Using WebGraph is as easy as installing a few jar files and downloading a data set. This makes studying phenomena such as PageRank, distribution of graph properties of the web graph, etc. very easy.

You are welcome to use and improve WebGraph! Installation

You just have to install the .jar file coming with the distribution, and download the jars WebGraph depends upon (i.e., fastutil, MG4J, COLT and GNU getopt). You may find useful to refer to the JPackage Project if you own an RPM-based distribution. In the same vein of the packages above, WebGraph is also distributed as a Jpackage-like RPM.

Re:XML is NOT just text! (old school answer) by fishdan · 2003-01-30 06:38 · Score: 1 · on XML and Perl

Wow, are we arguing about what is text? Now that is an old school computing arguement that I'm not sure the kids will appreciate! (no offense intended.)

My $.02 : XML is composed of text because it only allows ascii characters. Thats it. Well-formed XML "the language" requires more definitions, but an xml "file" is just another text file format. You're talking about nondeterministic finite automata quintuple that specifies how XML is parsed. understood, etc. But within that quintuple, I is the set of all ascii characters >= 32 and 128. At least I think that's true. Can someone post if I'm wrong? I appreciate learning of my misconceptions.

Steam! by gandalf23atwork · 2003-01-06 06:24 · Score: 1 · on Uncle Tungsten

Bah!

I truely disagree with all of you. It's Steam! Steam forms the base of all scientific progress. Steam can save the world!

-Professor Steamhead

Re:FYI... by cscx · 2002-10-13 14:03 · Score: 2 · on Blender Is GPL

If you spend hours every day editing text, you'll want something more powerful and won't mind spending some time to use it properly. Of course, it would be great if the interface was "intuitive" enough so you wouldn't need to learn it.

Which is why I use TextPad or NEdit or ne (the latter 2 are open source, the first is $pay$ software).

Re:Sick and tired of defeatism by ChaosDiscordSimple · 2002-06-13 03:24 · Score: 2 · on Serious IIS Hole; Minor X Bug

What about the fact that we STILL don't really take advantage of gfx hardware for 2D presentation? or the fact that fonts still look like ass?

What are you talking about? Thanks to various bits of acceleration in XFree86, my desktop is zippy fast. Games and DVDs play as smoothly as I could want. Ugly fonts? Well, yes, truly free fonts tend to be a bit weaker. However, you can easily get the fonts Microsoft generously makes available for free, using the webFonts4Linux script. They won't be quite as nice as on Windows by default thanks to a patent on the TrueType hinting engine, you can either build your own FreeType library to include the patented code, or you can use anti-aliased fonts. KDE has anti-aliased fonts and Gnome is right on its heels.

If you think we can laugh at others, check those market share figures. We have a lot of work to do.

First, it doesn't matter what our market share is. So long as the community continues to grow, there will be a future. Second, The latest market figures for servers show Linux as gaining market share. On desktops, things aren't quite so good, but we're definately increasing our numbers. Things are looking quite good in the long run. Yes, there is a lot of work to do, and we need to remain honest of how far we have to go. But some cheerleading and hyping our strengths is key.

Re:Oh dear by KeyserDK · 2002-01-24 09:40 · Score: 1 · on Xft Support For Mozilla

AA fonts doesnt make your current fonts better than they are. AA magic.

Edgy (bad) fonts get blurry. Good fonts dont they get sharp. (removes the tiny edges visible).

Try using some of microsoft's truetype fonts (monotype). Here they have a quite nice script for getting those fonts. Use them in kde/gnome and be amazed ;).

WinFlower by MoceanWorker · 2001-12-20 07:08 · Score: 1 · on All Work And No Play ...

heh.. seeing Minesweeper on the list reminded me of WinFlower and the ICBW (International Campaign to Ban Winmine)... anybody try out that game? you can find it here. btw, it's java-based

Re:Information Overload by Mr.+Slippery · 1999-08-24 01:07 · Score: 1 · on Wearable PCs

It has occured to me that there is such a thing as "to much information".

Too much information, perhaps. Too much knowledge, doubtful - information is not knowledge.

Our external storage - first cave paintings, then clay tablets, scrolls, books, recordings, and now computers - lets us store the information and put our brains to work on the knowledge.

We don't have to remember the endless details (wish I could find a link to Feynman's "map of the cat" story!) and can spend more time on the knowledge rather than the easily recorded and retrieved information.

In net discussions, I often pause to go seach for some little fact. I would love to be able to do this in real time conversations.

(For instance, can you imagine political debates where the candidates could instantly call up, say, the federal budget figures, or their opponent's voting record, or any statistic they needed? And the folks watching at home could instantly check them on it?)

Slashdot Mirror

Domain: unimi.it

Comments · 30