ICFP 2001 Task
David Mentré writes: "The ICFP 2001 programming contest TASK is now available. The objective is to write an optimizer (aka compressor) for an HTML-like language. It must work for arbitrarily big inputs and in a limited wall-clock time. Can you guess what will be the winning language? ;)" We already announced the contest, but now that the task is available, it might be fun to look at. So what will the contestants come up with? Anyone think perl might be the language of choice? :)
For those who are actually participating the competition, SAVE THE DOCUMENT!!! Don't give yourself an excuse for LOSING when the page is SLASHDOTTED and you can't lookup the task again!!!!!
C
Now what was the article about ?
the winning entry will be written in Intercal :)
That would be historical. Never before has a markup language challenged the dominance of a programming language.
> ie, it doesn't have to be a functional programming language.
Of course. Go ahead, and ridiculize yourself by submitting an non-functional entry.
Those guy are functional from the bottom up, and their preaching about it. The contest is _designed_ to make non-functional languages ridiculized.
I bet that the SML/NG input to the software will be tailored to make functional software win (ie: you will have incredibely complex input, that will only be suitable for languages with built-in matching, so functional languages will be standing on the shoulders og giants).
Using C or C++ in this constest is akin using perl to write a linux kernel module.
Note that it is not a bad contest, on the contrary. It shows the areas where functional languages shine. Of course, I'll be rotfmalo the day where someone will submit a winning C entry (for instance, by having a library that would implement most of the goodies functional languages are good at)
Cheers,
--fred
not that there's anything wrong with that.
Parsing is easy. Very easy. The problem is finding possible optimisation transformation on the parsed form.
I am really short of time, but I am pretty sure that there are case where an optimisation will depend on very distant tags. This is the place where you C program will have a very hard time (in particular because the time limit imposes non-brute force heuristics).
That beeing said, 3 days is a lot of time, and the contest *looks* easier than the preceeding ones.
Cheers,
--fred
Java 1.4 has regex pattern matching in the standard libraries.
Slashdot editors suck, and that's the story i'm sticking to!
...I must admit, I've never seen someone post a reply to a reply to their own message.
It was recently discovered that research causes cancer in rats.
I see what I'm missing. Color nesting makes it hard.
Hint: There are also EM/S nesting, size nesting, and underline nesting optimizations.
Local optimizations can be done on the fly as you initially indicated. The global optimization problems for nesting are algorithmically "hard." My guess is it is NP. The global optimization problem can be stated as a path minimization problem in a fairly large state space (naive size is 2x2x2x2x2x4x10x8) with ordered waypoints. The state space is actually a bit smaller than that.
Oh, and the state transition distances are different. A <B>,</B> pair costs 7 bytes, but a <PL>,</PL> pair costs 9. I bet that linear programming will choke on this problem too. There are a bunch of local minima. Time to dust off my Numerical Recipes book and read up on simulated annealing.
Of course, what do I know?
First pass optimizations should be able to guard against stupidity. My first cut just clears out whitespace and trivial redundancies, like:
<B></B>This is not Bold<B></B>
or
<B><B>This was already Bold</B></B>
The first pass optimization will worst case include all of the original tags and should therefore pass the stupidity test. Of course, after this initial cleanup, then the real work has to start.
We don't need = (except in initializers, of course)...
Hmm... I have better things to do than figuring out an algorithm for this, really I do...
--
Barry de la Rosa,
public[at]bpdlr.org
-- /. ID is lower than Bruce Perens'!
Barry de la Rosa,
public[at]bpdlr.org
My
the corrolary to this is that O'Caml is only used by grad students who have tons of free time, while C/C++ is used by people who get paid.
--sam
--sam
--sam
Any technology distinguishable from magic is insufficiently advanced.
Doug forgot to enable the Mercury compiler's optimizations! He's fixed that now, so we now do a lot better than before.
Nonetheless, I think the Mercury folks took a prize last year though, didn't they?
We came fourth, which was good enough to rate a mention, but not good enough for a prize.
Who's with me?
Or in Objective Oberon with an object orientated design, or perl, or c, or c++, or shell with awk and sed.
i suppose if you are a real glutton you could use assembler *shakes head*
actually a language with excellent string manipulation is also Business Basic (Basis Business Basic or ProvideX Basic)
pick your favorite language
\\||//
--ooo00ooo--
[Dilbert] Not so good, you just authored IIS.
Remember: it's a "Microsoft virus", not an "email virus",
Your right to not believe: Americans United for Separation of Church and
Actually, perl is a fully functional language in a pure sense; functions are values, and the language features anonymous functions and closures. Arguments are passed as a list, meaning you can curry if you feel like it.
You just very rarely see perl being used functionally because despite the fact it has functional features, its functional features are really quite clumsy to use, and anyone prone to functional programming would probably prefer another language anyway.. in other words, you can be as functional as you like in perl. It just isn't worth the bother.
Larry Wall likes LISP, though.
Irritable, left-wing and possibly humorous bumper stickers and t-shirts
Just to point out, there's a good functional XML parser/processor available for Scheme - as everyone here knows, XML is just a verbose way of writing certain Lisp S-Expressions. This software will allow you to load any XML in as a lump of Scheme, and it can then be treated like any other Scheme data structure you've been taught how to process in CS 101. See www.lh.com/~oleg/ftp/Scheme/xml.html
for example: <WEIGHT unit="pound">
<NET certified="certified"> 67 </NET>
<GROSS> 95 </GROSS>
</WEIGHT>
becomes:
(WEIGHT (@ (unit "pound"))
(NET (@ (certified)) 67)
(GROSS 95)
)
Choice of masters is not freedom.
Java?
Anyone? Ok, I'm actually serious, don't mod me up +1 Funny.
I'm sure that there will be quite a few hacks out there using lex & yacc & C :-)
To do it with a regular expression.
I'm joining the contest. That is if this damn semantic analyzer works, I can get on to the optimization part. :P
--exa--
No, no, no . . . you got it all wrong. The winner will be in C#
Check out comment 28, it's his point.
---
---
"Of course, that's just my opinion. I could be wrong." --Dennis Miller
#!/bin/bash
gzip *.html
load "linux",8,1
Yes, and open source software sucks because if Linus et al had real jobs, they wouldn't have time to futz around in their spare time...
While that's partly true, it won't help you. The task is designed for functional languages, and it shows. Functional languages have algebraic data types and pattern matching built in. In C you'd have to implement this yourself, plus all the code required for memory management and write the optimiser, all in three days.
That's why I predict O'Caml will win. Of all the languages which have native support for precisely this kind of task, it's the one that also has the support for beating the time limit.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
Java does not have pattern matching, unless you think method dispatch is pattern matching. :-)
Having said that, Java would be a good general-purpose language to solve this problem in. I don't like your chances given the three day limit, though.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
No, flexibility will be the deciding factor, closely followed by speed of development. Speed of execution won't make a huge difference, because the judges are going to throw in a test case so large and complex that no program can find the "optimal" sequence in the given wall time on the test machine. So the winner will be the program which can make the best use of the time available.
The winner will use a set of optimisation techniques (from simple transformations to expanding the string out into decorated characters and re-encoding), and will be able to combine these techniques seamlessly. And this will be coded in three days.
This is not a test for a systems programming language like C. This is a test for a language in which you can write complex decision logic quickly.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
I predict that the winner will probably be O'Caml, and for one reason only: it will beat the time limit. Moreover, this will have little to do with the speed of the O'Caml implementation.
Your program has to complete within a given wall time (not CPU time). This is supplied as a command-line argument and environment variable. The winner will be the tool which can stop computing before the time is up and produce the output. The winning program will be one that iteratively improves the input string, but is monitored by another thread which stops the optimiser thread when the time is almost up, then grabs the best solution so far and performs the output.
O'Caml just happens to have good built-in primitives to write this kind of "interrupt the thread when the time is up and let me know what you did" computation.
sub f{($f)=@_;print"$f(q{$f});";}f(q{sub f{($f)=@_;print"$f(q{$f});";}f});
...would be my bet ;)
Thats because all of the good C engineers have jobs and are too busy to fsck around with stupid games like this. Caml programmers bored from a lack of any employment in the real world had no problem finding the time to dedicate to this problem.
Someone you trust is one of us.
No INTERCAL beers? That one could be interesting :-)
- Steeltoe
http://www.debunkingskeptics.com/
echo Server: Microsoft IIS/5.0
echo Content-Type: text/html
echo
echo <html><title>Server overloaded</title>
echo <body bgcolor="#0000FF">
echo <font face="Courier, Courier New, Fixed">
echo IIS.EXE caused a general protection fault in
echo module BLUESCREEN.DLL at 0123:45F00F00
# Yes, I know it's not exactly clean HTML
# It'll work in almost all browsers though, and
# they DID say compressed...
This message is provided under the terms outlined at http://www.bero.org/terms.html
Since most can handle it, you may even want to "compress" head and body statements... And, of course, leave out closing tags.
print "404 Not FoundThe requested URL was not found on this server.";
This message is provided under the terms outlined at http://www.bero.org/terms.html
At least for correctness. Think about it..
All you have to do, is parse the document, and then spew it back out in a smaller representation. There are lots of booleans in there, and those are as far as i can see what you really need to worry about.
Of course, I'm not gonna bother to do this one...
Anyhow, speed will be the deciding factor on this one. I wouldn't be surprised if the winner is written in plain ole C.
Stop the brainwash
Yeah, it's those french guys and their secret war of Caml vs SML. =)
You mean, they are too busy debugging their core dumps and memory leaks? ;)
C is a poor language for rapid development because it is not safe -- it's too easy to have a space leak or a myserious crashing bug. Manipulating strings and data structures in C is also rather tedious.
I would expect that a good C implementation would not be faster, either, because using a higher-level, safe language would allow you to make more optimizations to your program (since you can write it faster).
Go ahead and prove me wrong, though!
> or perl, or c, or c++, or shell with awk and sed.
Ha, good luck with those. This is not just simple string manipulation...
Well, that would be Microsoft Research, actually, and last I checked Peyton-Jones was at the Cambridge one.
Here's benchmarks:
http://www.bagley.org/~doug/shootout/craps.shtm
Lots of tests aren't implemented for Mercury, which explains its low score.. but it isn't doing to well in the ones that are implemented, anyway.
Nonetheless, I think the Mercury folks took a prize last year though, didn't they? (Don't get me wrong, I'm all for unorthodox new languages.)
I don't think lisp is the most popular functional language any more.. I would say it's probably SML or O'Caml. Lisp hasn't won in any of the previous years, I don't think.
I agree wholeheartedly, though, that Perl won't be powerful enough for this kind of AST manipulation. I like that the task will probably make slashdot kids think that perl is extremely appropriate -- that will put them in their place all right.
I am positive Brainf*ck will be on the podium.
I'm afraid I am not suffinciently advanced in these languages to bet on them, but I reckon they would give the other contenders a run for their money.
XSLT is designed not to have side effects and is instead more of a functional language meaning that you can't assign values to variables dynamically. Considering that this task may require quite a lot of storing of state before deciding which optimization to make I think it is rather unlikely that XSLT is a good language to use to solve this particular problem.
Then again I didn't look too hard at the task but it may be that the judges have designed it in such a way that it is amenable to functional programming considering that it is the being run by the International Conference on Functional Programming. Even then most functional languages do allow you to store state, even Lisp so an XSLT solution would have to be rather clever.
--
You're missing the fact that the tags have to be paired. Ensuring the tags are paired isn't a problem, but ensuring the tags are paired optimally is where the difficulty arises.
:). I'm writing my entry in Haskell and I have a fairly decent optimizer except for these cases and it's currently a 110 line Haskell program.
Consider: <B><I>Hello</I></B>World&l t;B><I>Foo</I></B>
Should be: <B><I>Hello<PL>World</PL>F oo</I></B>
But how do you know if it's more optimal to go with a PL tag vs closing the B and I? You would need the array to go further down and see that you will need the B and I again so you don't bother to close it.
This is the problem I'm running in to now
--BEGIN SIG BLOCK--
I'd rather be trolling for goatse.cx.
Things you think are in the Constitution, but are not.
wouldn't it be possible to use lex/flex to generate a C scanner that could parse the data just as easily as perl? C by itself may not be the answer, but there are some tools that use C that can make life a lot easier.
If you need me, I'll be doing a preliminary victory dance over in the QBASIC camp...
A helluva lot simpler. One form- (function-name param1 param2... paramN). Not a whole bucket of syntax rules like you find in those BCPL derived languages. Oh well, let the children speak their C, C++, and Java and leave the meat to the parents.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
Perhaps. But C++ never was very usable for a big project to me either. So much work for so little result. Smalltalk has worked for me, but to each his own I supose.
Working toward a usable PDA environment in the spirit of Newton OS: Dynapad
No, Perl has too many capabilities to be considered a purely functional language, however it does have first-class functions, so any program that can be created in more than one functional language (say Lisp and ML) can also be created in Perl:
http://perl.plover.com/lambda/
ie, it doesn't have to be a functional programming language.
Logo! Logo! Bow before the awesome power of the turtle!
He put his boots up on the table and made a face. "The sig," he smirked. "You can waste your life in search of the sig."
As far as time limits are concerned, may I suggest Mercury. It's a functional/logical hybrid that's somewhat unorthodox for VHLLs in that it's a compile-only rather than interpreted-only or interpreted/compiled. Its designers claim that it can get similar performances to programs of similar functionality written in C, alhtough I haven't seen any benching, and I'll believe it when I see it. :)
Anyone think perl might be the language of choice? :)
No, because Perl isn't a functional language. *sigh*.
They'll simply buy out the winning team and divvy them up.
Humorless sig goes here.
Actually, they say they will test with inputs as large as ``a few megabytes''. Note that it is always legal to merely echo an input if you can't figure out how to optimize it...
Regardless of how we do in the competition, we're learning a lot about our programming language and its implementation, anyhow. The task isn't a perfect fit for our language, which is good, since it has tickled some of the dark corners already.
Two days to go!
Lisp is not just a functional language, as any functional purist will tell you with an air of disdain. It supports multiple paradigms.
I don't know Perl, so the following is a real question, not merely rhetorical: Can you create new functions on the fly in Perl as you can in Lisp? That is, can you have a function1 that returns a new function2 as its value, where function2's definition is relative to the parameters that were passed into function1 at its time of creation? That is, can you create closures in Perl? In Lisp, this is done as follows:
Just passigning functions around does not make a language functional. Heck, C can do this function pointer. Truly functional languages support the creation of closures (among other things). Can Perl do this?
why would C be inferior for solving this task?
It's a great choice. You'll kill the guys working in x86 assembler.
Just choose your favorite language amongst the ones used in the "99 Bottles of Beer on the Wall" gallery.
I'll personally offer 99 bottles of Chimay Belgian beer to the winner of he does it in BrainF***!
--
Trolling using another account since 2005.
I'd like to create a Forth Team as I consider this rather ignored language as one of the most elegant solutions in terms of language simplicity as well as interpreter efficiency.
--
Trolling using another account since 2005.
Did anyone else spot the similarity between the name of their mark-up language and another language?
No, your children are not the special ones. Nor are your pets.
It takes a little up-front learning if you're used to, say, C. The syntax is different from other languages, but actually simpler.
MSFT was among the 1999 winners, but their entry was in Haskell, not Visual Basic. Unfortunately their writeup has disappeared from the MSFT web site.
Hint: 20 levels of bracing means you should be breaking that function down into smaller, reusable functions. You don't tend to do that in C++ because it's such a drag -- cut, paste, declare, etc. In Scheme new functions are cheap and easy.
That aside, who says you have to use it for a large project before it's fun? I'm having fun with a couple of 3KLOC projects.
- PLClub submitted two separate entries using OCaml, either of which would have won the contest.
- Camls 'R Us took second place.
- Galois Connections took third with their Haskell entry.
- The Merry Mercurians took fourth with their Mercury entry.
1999:- Camls 'R Us mopped up the competition with their 3585-line OCaml entry
- The 1250-line Haskell Entry that took 2nd place was written in a mere 24 hours.
1998:for it to be Ruby, but I imagine some type of Lisp-like language would be best. Maybe Scheme as others have mentioned.
Everyone knows that Integer Basic had much better performance than Applesoft, as long as you can tolerate only having integers. 10 COLOR=3 20 PLOT 10,10 umm, what else was the program supposed to do ?
Don't let all the Irritating Parentheses fool you, LISP WILL WIN!
quis custodiet ipsos custodes - Juvenal
Pick me, pick me! :)
JOVIAL will kick some serious buttocks!
-D
Thanks to relentless innovation from Redmond, the choice is clear. A team working in Visual Basic will clearly triumph.
Art At Home
Perl hasn't a chance. As much as we all love it, it is not up to the task of writing complex programs such as this. The winner will probably be written in LISP, because it is the most popular functional language. The only other possible winners are better functional languages such as Caml or Haskell.
Even Slashdot wants to hide some things
I think the winner could quite well be SNOBOL.
:-)
It sure would be nice to hear some positive news about SNOBOLing
I know, bad joke
If it's supposed to move and doesn't, use WD-40. If it moves and it shouldn't, use duct tape.
You mean this story is thin, neat, and dresses well?
Perl does in fact have proper closures. E.g.
#!/usr/bin/perl -w
use strict;
sub make_adder {
my $addend = shift;
sub {my $x = shift; $x + $addend}}
my $adder = make_adder(3);
print &$adder(2), "\n";
IMHO not as nice as Lisp or ML, but there nonetheless.
i got it!
10 print "please have a cup of coffee"
20 compress data
30 goto 10
done!
every good
I think I'll use Scheme
Casual Games/Downloads
why would C be inferior for solving this task?
Don't quote me on this.
Everyone is welcome.
print "
<HTML>
<HEAD>
<TITLE>404 Not Found</TITLE>
</HEAD>
<BODY>
<H1>Not Found</H1>
The requested URL was not found on this server.
</BODY>
</HTML>";
"Your superior intellect is no match for our puny weapons!"
I think I will just submit cat(1). Should beat all others in the "correctness" and "speed of optimisation" department!
Hmm... actually you wouldn't need to make an array at all. My brain really isn't functioning. You could do it on the fly. Hmm... but there's a plain tag... that could make it harder... I think I'm missing something. ;-)
It is a miracle that curiosity survives formal education.-Albert Einstein
I think I'm missing something.
I see what I'm missing. Color nesting makes it hard. To get around that we can simply "move" color end tags if they're right before a color start tag to right after the next color end tag.
Actually, we need to check the next set of start tags in general to see if we can use a plain tag. Of course, this makes it more inefficient...
It is a miracle that curiosity survives formal education.-Albert Einstein
...but it might be hard to make it really fast. Maybe if you wrote it in assembly?
A somewhat simple but slow way of doing it would be to create an array in which each non-tag character has a set of flags for bold, italic, whatever and a set of fields for color, size, etc. Then simply go through the newly created array and put in tags when ever the flags of fields change.
It should be possible to optimize that... but I imagine significantly more efficient ways to do it will be found.
I think this method will create highly optimized output... I can't think of anything to screw it up off the top of my head... but maybe I'm just missing something really obvious.
Perhaps if some of the tags turn other tags off? I don't remember. That could screw it up.
Comments?
It is a miracle that curiosity survives formal education.-Albert Einstein
C alone should be enough. PERL alone can handle it. C++ should not be needed.
It seems to me that for reasons similar to the selection of SNOBOL, Icon and/or M/MUMPS/Cache would be ideal choices for this contest. Actually, I started with MUMPS and am ending up with Icon. A great, little known language. Steve Graham