Ask Slashdot: Open Source vs Proprietary GIS Solution?
New submitter rnmartinez writes "As the Project Manager for a non-profit looking to implement a tech project, I am running into a few dilemmas, and as a casual Slashdotter I could really use some help. I'll start with a brief explanation of the project. We research issues in Canadian Immigrants, and found that there was a lack of recent, unaggregated information. As we dug further, we found that some data was available, but there was no central repository. Therefore, we are building a web based service to collect this data, with the intent of having it display in Google Maps and then be downloadable as a CSV file that is readable in GIS software such as ESRI Arcsoft, so that data may be visualized."
The dilemma: "...It seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff." Read on for further details.
"To date, we have relied on a LAMP box with Drupal as the frontend to help provide a more social experience. However, it seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff. As a believer in open source and as a non-profit, I am having some moral issues with this (I try and run Linux and open source on everything I reasonably can)."
"So here is my dilemma: do I dump $20K into moving everything to an MS solution that in the short-medium term might make the geometry functions (i.e. show me all the hospitals within a 20km radius of this cluster of immigrants) or do I get him to spend the same amount writing something similar for MySQL? The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB? Might be a longshot but again, as a non-profit I'd like to see something go back to everyone, not just my group."
"Really, I am open to any flexible, creative open and reliable solutions. Sorry if my knowledge is limited or if I am grasping at straws, and if I am being terribly biased, but I trust Oracle with open source about as much as I trust MS."
MySQL might fare poorly against MSSQL's geometry support, but how does PostgreGIS stack up?
"So here is my dilemma: do I dump $20K into moving everything to an MS solution that in the short-medium term might make the geometry functions (i.e. show me all the hospitals within a 20km radius of this cluster of immigrants) or do I get him to spend the same amount writing something similar for MySQL? The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB? Might be a longshot but again, as a non-profit I'd like to see something go back to everyone, not just my group."
"Really, I am open to any flexible, creative open and reliable solutions. Sorry if my knowledge is limited or if I am grasping at straws, and if I am being terribly biased, but I trust Oracle with open source about as much as I trust MS."
MySQL might fare poorly against MSSQL's geometry support, but how does PostgreGIS stack up?
GIS and vector data of the world
http://downloads.cloudmade.com/
http://www.naturalearthdata.com/
It is far superior to MySQL geographic types. I'm partial to MSSQL, but PostGIS is a close second.
If you've got something generating Lat/Lon coordinates to slap into your database, the distance calculations are trivial:
http://en.wikipedia.org/wiki/Haversine_formula
I didn't see anything in a quick skim of your requirements (tl/dr) that looks like it requires a $20K package to meet. There are a number of open source mapping solutions out there, Google Earth is hard to beat for functionality and ease of programming (see: KML).
I would recommend checking out PostGIS, which works with PostgreSQL. PostGIS adds functions, data types, and projection definitions to PostgreSQL that turn PostgreSQL into a powerful spatially-enabled database server. I'ved used this combination for a number of projects, and they work great. Both are fully open sourced.
In my office we work with GIS Data, but a ESRI Desktop license is just too expensive.
We choose for desktop Manifold GIS and the beauty of that software is that you could use almos any DB Backend to store your geometries. Now we are using MS SQLServer as backend, but Manifold allow you to save everything on WKB or WKT on the database that means that you can use almost any database.
My suggestion is for the server use a combination of Mapserver+PostGIS.
For clients you could use Manifold GIS, it's not ArcMap but believe me, with a license of $900 you could do almost the same things that you could do with a ESRI Desktop license of $5000.
Have a look at this post from Sebastian Delmont on google plus. I found an excellent eye-opener to whats out there related to GIS tech that you can "roll on your own". If you are doing simply radial distance calculations than as mentioned the Haversine forumula is your friend. I added a radial search to a dealer locator for an online store in under a day with some python and a bit of time to geocode and cache all the address data via google.
jaymz
http://postgis.refractions.net/ - pretty good spatial functions based on top of PostgreSQL, and not tied to Oracle.
There are many, many open source GIS packages that you might find useful rather than implementing everything yourself. http://maptools.org/ is one place to get some pointers from.
Run, do not walk, to research the Open Source Geospatial Foundation's offerings. And be aware, neo-geos devote their efforts primarily to PostGIS, which builds on the very capable PostreSQL database, adding in geospatial capabilities that by many accounts rival the best that that Oracle Spatial and MS SQL provide.
As always, there are significant tradeoffs to evaluate in your situation. Be ready to study the many evaluations and comparisons of the various solutions.
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh
At my place of employment we use PostgreSQL and PostGIS extensively for the exact or similar problems as you describe. We recently contributed back to a portion of the PostGIS project by extending the TSP solver for a different ending than the beginning. I'm not the one who is generally writing stuff like this, but I maintain the servers and I know how much performance can be gained. Plus, the PgSQL and PostGIS guys are very close with lots of code and advancements being contributed directly into PgSQL from the PostGIS team. We have also looked at the MS solutions and found them to be ridiculously expensive to host and scale services targeted at business with real-life budgets and not huge corporations. We have tools used in nearly all of the counties in Wisconsin processing many requests per day and second(not allowed to give numbers) with only a few servers. Personally, stay open source and stick with PostgreSQL. They have a track record for extremely stable systems that can be upgraded as advancements are made with very little downtime. You can tune the internal performance metrics to tweak everything you need with online research or many books and even consultants such as EnterpriseDB. Good Luck with your developers, go with PostgreSQL and you won't look back.
I can safely say that MySQL handles GIS data sufficiently for the type of scenario you've described on Slashdot. I've got a database I periodically play with that is all the cities / major towns of the world and can quickly query it with distance data. (as in: Give me everything within 100 miles of Lat,Long)
MySQL doesn't make all of it as easy as it should be, but with some careful design decisions it can work surprisingly well for it.
I previously had created a distance function that worked pretty good, but more recent versions of MySQL have better internal support and I was able to ditch some of my custom routines.
(Note: my routines performed efficiently, but gathered extraneous data that could later be filtered out to be more precise. The extra data was maybe 10% more than necessary, but meant the query was very fast.)
Perhaps you should be looking to upgrade your MySQL instance, or getting some procedures/functions written for MySQL to handle some of the calculations.
(It's pretty easy to calculate a rectangle for the query to run against, and then follow up with a more precise distance calculation to filter out a few stranglers that slip into the dataset. (better to include a few that are dropped in later stages than to exclude valid data up front)).
As for the moral issues: Stuff them. They aren't what you should be focusing on.
Disclaimer: I work for Esri.
In the FOSS GIS world the go to DBMS is PostgreSQL + the PostGIS extensions. Don't even consider MySQL in this regard. PostGIS owns this space and rightfully so.
SQL Server is a great database as well and their spatial types and functions are excellent. Depending on how much data you'll be storing and processing - and if you have Windows Server licenses already - you may be able to use SQL Server Express which is free as in beer for any use.
For someone in your position though there's really only one choice: go with PostgreSQL + PostGIS. It would be silly for someone in your position to pay for this functionality and the PostGIS community would welcome you with open arms.
This is an excellent question. Do you need to perform arbitrarily complex operations on geometries, or are you looking for very simple geospatial operations? If, for example, you need to find all points within a complex polygon, or calculate overlaps then it's likely to be useful either to buy into a real GIS system, or investigate something like the GDAL OGR library. If, however, all you want to do is the sort of thing you describe, you can probably get away with calculating a bounding rectangle from the set of points you have and extending that in each direction by an arbitrary amount. Lo - store grid references or Lat/Long, and you need no geospatial support in your database at all. If you *did* need geospatial support, and were persuaded that MS-SQL were the way to go, then check if you qualify for WebSiteSpark in Canada - that will get rid of most of the 20k you need to spend and get you Server 2008, SQL Sever 2008, Visual Studio 2010 &c for $100 over 3 years. How much data do you have? SQL Server 2008 Express has the spatial extensions.
There's a saying about stupid questions and answers, but never mind ... Your answer is based on several assumptions:
You're right that custom code is usually more expensive than already available boxed solutions, under the assumption that the generic solution implements only the needed code and does not need to be modified. If the generic solution implements much wider range of functionality than you actually need, the custom code may be actually much cheaper. And if you need/want to change the internals, it's going to be very expensive (outweighing the prior savings) or even impossible. Plus you usually have zero influence on future of the product.
I really don't see how the choice of MSSQL (or any other database) eliminates the need for proper testing ...
I say pay whatever it takes to switch everything to MS-only technologies, since your developers prefer that. Then, to make the budget work out, cut all their paychecks by that same amount. (Actually, you should cut their paychecks by quite a bit more, since you have to account for the continual license fees needed, and the fact that some of them may quit.)
Or offer them the choice between this and doing something lower cost that works as well and doesn't require continual license fees.
Sounds reasonable until you consider that both approaches are going to require custom work. Even though the whole point of languages like SQL is that you're supposed to be able to swap out the back end easily, in the real world, you have to do all kinds of work. So the choice isn't between some kinds of simple switchover or doing a lot of custom work, it's between doing a bunch of custom work or doing a different bunch of custom work. Oh, and the third option of finding a prepackaged solution that does what you want without swapping out such a major part of your application stack.
Disclaimer: I work for Monty Program Ab, the primary developers behind MariaDB.
MariaDB 5.3.3 (currently in RC status) has improved GIS functionality versus MySQL. While sponsoring new features could improve it even further what we are releasing right now might be enough for you. Please give it a try, and talk to us about how we could make it fit your needs better.
And why didn't you provide an answer to the submitter, rather than troll?
Oh! Oh! I have an answer. Pick me!
It's because you're trolling just as hard!
You spent 5 of 6 lines attacking the person who wrote the post, 1 line talking about some other topic and zero arguing the merits of the post.
Yay! I solved the puzzle. I'll take the cupcake for $10 and the rest on a gift certificate Pat.
I personally agree with the OP, you use the best tool for the job. If you are designing your own tools just to save money, then you're under-valuing your time as a developer. If in this case MSSQL works, then use it, if in another case MySQL works, use it.
I've been doing geospatial development since 1992 when Genasys (defunct) released the worlds first web mapping server, Web Broker (ESRI's IMS wasn't released until 5 years later). At the time the only COTS option for managing spatial data within an RDBMS was Sybase's Spatial Query Server, which provided slightly more geometry options inherent within MySQL today. MySQL geometry features should be considered primitive at best, far from a full-featured implementation. On the other hand, PostGIS, is full featured and the only open source implementation I know of that adheres to the Open Geospatial Consortium's 'Simple Features: Types and Functions 1.1' specification. It is very robust, scales well (in contrast to MS SQL) and you can't beat the price or ease-of-mind that comes with being licensed under v.2 of the GPL. Additionally, on comparable hardware, in terms of reliability, performance, and cost effectiveness, there is no comparison, regardless whether your using Oracle w/Spatial or MS SQL and ESRI's SDE.
Your developers, as good of guys at heart as they may be, should stay away from the kool aid for awhile
Firstly, you can post questions here: http://gis.stackexchange.com/ and they will be answered. It seems to be a pretty good community, I have been posting on there for a little while.
Secondly, I don't use databases a lot, but I recommend that you do what you suggest. Make a Google Map and make a CSV for folks to download.
like phosphorescent desert buttons singing one familiar song
Am I the only one who finds it a bit sad that this is considered a hard problem to solve in house? It depends on how you have your database set up, but, you could store your latitude and longitude in different fields as degrees, minutes, and seconds (do you need resolution finer than about 30 meters?, then add more fields, need coarser resolution, do the same). Then, you start at your center point and draw an appropriately sized circle (more on that after) around that point using an algorithm that gives you all the second^2, minute^2, and degree^2 (and larger and or smaller blocks as precision requires) sized blocks that fall inside or touch the circle. Then you craft a select statement for all sites that match that set of blocks. Then, after you have that set, if you don't care too, too much about precision, you're done. If you do, you take the data set that's been returned and you look at all of the sites whose block intercepts the circumference and calculate their distance to the center and throw out the ones that are too far away. If you're using a flat earth (not flat as in pancake, flat as in a perfect ellipsoid) model, then you're done at this point. If you want to consider three dimensional distances on an earth with mountains and valleys, etc. (the "appropriately sized circle" mentioned above should already be taking care of the perfect ellipsoid model), so that a site 100 meters away horizontally but at the bottom of a 1.5 km cliff isn't considered to be 100 meters away, then you need to do more work and you need the altitude of each site in your database as well. Since you can pretty much rely on a fairly low maximum amount of overhang from cliffs and so forth, all you need to do is have an inner circle and an outer "appropriately sized circle" based on some precalculated constants regarding maximum changes in altitude over the whole earth calculated by coordinate block of some given size (computing or obtaining those constants is the trickiest part, put it's not hard, it just requires the appropriate GIS data and some number crunching) and stored in a table. When you compute your inner and outer circles, you just take the local terrain into account and draw the outer circle as far out as any sites in those blocks could possibly be from the center, and the inner circle as far in as they could possibly be. Then you work the blocks from the outside of the max circle to the inside of the min circle (not bothering to search blocks bounded on the outside by other blocks where you've already determined all the sites are inside your max area).
For calculating the "appropriately sized circle" in the first place, you make use of the Haversine Formula or an appropriately modified (for altitudes) version thereof and some safe margin around the edge. Voila. Now, I know I've fudged past some of the math here, especially for the more complicated cases, but this is still pretty simple stuff, especially for the simpler cases. This is CS Major Sophmore or Junior year stuff.
What if they actually save money by being more productive?
Umm there is a cost to retrain your developrs to use new tools. If the developers like .net it is probably worth a few K to give them the tools to do what they need. .NET doesn't suck neither does MS SQL (Microsoft has upped its game in product quality the last few years). The time and cost of finding an appropriate Open Source tool and get your coders up to speed will more often then not be more expensive then using the first decent solution that is available and liked by your staff.
If something is so important that you feel the need to post it on the internet... It probably isn't that important.
This may be too obvious to be helpful, but since the submitter mentioned that they use Drupal on top of their database, why not just use one of the Drupal Google Maps modules as a starting point?
Google Maps Tools provides the very proximity function he's looking for (among other things) and there are many other easy integrations with Google Maps he can use as well.
That said, if he really wants to do the calculations at the DB layer, then switch the installation to Postgres GIS as mentioned elsewhere.
The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
I have had a fair amount of success using open source GIS tools for one of my side project to create a hunting map book. I am not too sure of how well either one would work for what you are trying to do but I would be surprised if they didn't. The 2 tools I have had the most success with are uDig GIS and GRASS GIS. Both of them will run on Window, Mac OS, or Linux just fine, they all can process shape files, and DB input as well as geo tiff files as well as other file formats. Another popular OS GIS program is Quantum GIS. When I started I found OSGeo4w which had a bunch of open source GIS programs compiled for windows with everything you need and tried out a few of them and found 2 that best suited my needs. At the time I know it included GRASS, uDig, QGIS, and a couple of other ones, you might want to check out OSGeo as well as there may have been other projects that have started or better meet your needs.
Here is an example of some of what I have done. It is a map of the MN deer areas showing the antlerless deer harvest per square mile for each area in relation to all the others during the archery season. This map is a 10 year snapshot where the darker area indicated more deer were taken per square mile.
Time to offend someone
How effing accurate does such a thing even need to be in a social networking context? Still, it wouldn't be hard to get this down to sub-meter accuracy. It could be hired out if need be on one of those programming task sites for under $1000 (far less that the $20000 initial outlay to put up a decent couple of machines to do it in an MS-SQL way).
I suspect there's a band of "we are scared of maths" programmers somewhere near the OP.
now we need to go OSS in diesel cars
Canada is finally cracking down on fake immigrants. One of the biggest problems is the abuse of the immigrant investor program which this business apparently is trying to exploit, if you actually look at their web site:
In other words, "buy your way into the country by buying some dead persons' business."
Nice scam - too bad Mr. Martinez isn't licensed. The Canadian government has required licensing of anyone doing immigration consulting since 2004 because of the number of scams and abuses.
The problem, as I see it, is that the original poster is comparing the wrong things; namely, he's looking at MySQL for some odd reason, when several other people have here already pointed out that the de-facto OSS solution to this is PostgreSQL and PostGIS. Sounds pretty much like a drop-in solution to me.
He makes good points...
If you lack technical skills then buying pre packaged software is for you, but you certainly shouldn't go around tricking people with even less technical skills into thinking that you're somehow an expert.
If however you are technically competent, then it is almost always better to have software that you can modify to suit your needs, and which you can understand inside out.
Similarly if you're a competent and well taught developer you should have little problem adapting to different platforms, and you should actually *want* to because experience of more platforms increases your value as a developer.
There is a reason why the companies with the most inhouse technical skill build their core services on open source (Think google, yahoo etc). While your typical non tech company simply buys in commercial software (and often get ripped off by overpriced low skill blag-artist consultants).
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
If you ignore them, and you admit you're ignorant when it comes to them, why are you commenting on them, and providing advice regarding them? Maybe you'd be better off providing advice regarding platforms you're knowledgeable about?
Karma: Poor (Mostly affected by lame karma-joke sigs)
Both solutions are going to require custom programming. One of them chucks the existing (mostly working) solution, the other builds on the solution. That is a fairly substantial difference.
Heck, I could understand if the developers wanted to simply chuck MySQL and replace it with MS SQL Server (which apparently also works with Drupal). I personally would probably use the existing Drupal solution with PostgreSQL, but if the developers feel more comfortable with MS SQL Server I could see that being considered.
However, the developers aren't talking about moving to SQL Server to get them the database functions they need. They are talking about throwing away the existing solution completely and rewriting in DotNetNuke and MS SQL Server. That's just plain crazy unless the existing solution is worse than nothing at all, and the developers on staff have a spotless track record for delivering on time and under budget.