Ask Slashdot: Open Source vs Proprietary GIS Solution?
New submitter rnmartinez writes "As the Project Manager for a non-profit looking to implement a tech project, I am running into a few dilemmas, and as a casual Slashdotter I could really use some help. I'll start with a brief explanation of the project. We research issues in Canadian Immigrants, and found that there was a lack of recent, unaggregated information. As we dug further, we found that some data was available, but there was no central repository. Therefore, we are building a web based service to collect this data, with the intent of having it display in Google Maps and then be downloadable as a CSV file that is readable in GIS software such as ESRI Arcsoft, so that data may be visualized."
The dilemma: "...It seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff." Read on for further details.
"To date, we have relied on a LAMP box with Drupal as the frontend to help provide a more social experience. However, it seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff. As a believer in open source and as a non-profit, I am having some moral issues with this (I try and run Linux and open source on everything I reasonably can)."
"So here is my dilemma: do I dump $20K into moving everything to an MS solution that in the short-medium term might make the geometry functions (i.e. show me all the hospitals within a 20km radius of this cluster of immigrants) or do I get him to spend the same amount writing something similar for MySQL? The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB? Might be a longshot but again, as a non-profit I'd like to see something go back to everyone, not just my group."
"Really, I am open to any flexible, creative open and reliable solutions. Sorry if my knowledge is limited or if I am grasping at straws, and if I am being terribly biased, but I trust Oracle with open source about as much as I trust MS."
MySQL might fare poorly against MSSQL's geometry support, but how does PostgreGIS stack up?
"So here is my dilemma: do I dump $20K into moving everything to an MS solution that in the short-medium term might make the geometry functions (i.e. show me all the hospitals within a 20km radius of this cluster of immigrants) or do I get him to spend the same amount writing something similar for MySQL? The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB? Might be a longshot but again, as a non-profit I'd like to see something go back to everyone, not just my group."
"Really, I am open to any flexible, creative open and reliable solutions. Sorry if my knowledge is limited or if I am grasping at straws, and if I am being terribly biased, but I trust Oracle with open source about as much as I trust MS."
MySQL might fare poorly against MSSQL's geometry support, but how does PostgreGIS stack up?
GIS and vector data of the world
http://downloads.cloudmade.com/
http://www.naturalearthdata.com/
It is far superior to MySQL geographic types. I'm partial to MSSQL, but PostGIS is a close second.
If you've got something generating Lat/Lon coordinates to slap into your database, the distance calculations are trivial:
http://en.wikipedia.org/wiki/Haversine_formula
I didn't see anything in a quick skim of your requirements (tl/dr) that looks like it requires a $20K package to meet. There are a number of open source mapping solutions out there, Google Earth is hard to beat for functionality and ease of programming (see: KML).
I would recommend checking out PostGIS, which works with PostgreSQL. PostGIS adds functions, data types, and projection definitions to PostgreSQL that turn PostgreSQL into a powerful spatially-enabled database server. I'ved used this combination for a number of projects, and they work great. Both are fully open sourced.
In my office we work with GIS Data, but a ESRI Desktop license is just too expensive.
We choose for desktop Manifold GIS and the beauty of that software is that you could use almos any DB Backend to store your geometries. Now we are using MS SQLServer as backend, but Manifold allow you to save everything on WKB or WKT on the database that means that you can use almost any database.
My suggestion is for the server use a combination of Mapserver+PostGIS.
For clients you could use Manifold GIS, it's not ArcMap but believe me, with a license of $900 you could do almost the same things that you could do with a ESRI Desktop license of $5000.
Have a look at this post from Sebastian Delmont on google plus. I found an excellent eye-opener to whats out there related to GIS tech that you can "roll on your own". If you are doing simply radial distance calculations than as mentioned the Haversine forumula is your friend. I added a radial search to a dealer locator for an online store in under a day with some python and a bit of time to geocode and cache all the address data via google.
jaymz
http://postgis.refractions.net/ - pretty good spatial functions based on top of PostgreSQL, and not tied to Oracle.
There are many, many open source GIS packages that you might find useful rather than implementing everything yourself. http://maptools.org/ is one place to get some pointers from.
Run, do not walk, to research the Open Source Geospatial Foundation's offerings. And be aware, neo-geos devote their efforts primarily to PostGIS, which builds on the very capable PostreSQL database, adding in geospatial capabilities that by many accounts rival the best that that Oracle Spatial and MS SQL provide.
As always, there are significant tradeoffs to evaluate in your situation. Be ready to study the many evaluations and comparisons of the various solutions.
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh
At my place of employment we use PostgreSQL and PostGIS extensively for the exact or similar problems as you describe. We recently contributed back to a portion of the PostGIS project by extending the TSP solver for a different ending than the beginning. I'm not the one who is generally writing stuff like this, but I maintain the servers and I know how much performance can be gained. Plus, the PgSQL and PostGIS guys are very close with lots of code and advancements being contributed directly into PgSQL from the PostGIS team. We have also looked at the MS solutions and found them to be ridiculously expensive to host and scale services targeted at business with real-life budgets and not huge corporations. We have tools used in nearly all of the counties in Wisconsin processing many requests per day and second(not allowed to give numbers) with only a few servers. Personally, stay open source and stick with PostgreSQL. They have a track record for extremely stable systems that can be upgraded as advancements are made with very little downtime. You can tune the internal performance metrics to tweak everything you need with online research or many books and even consultants such as EnterpriseDB. Good Luck with your developers, go with PostgreSQL and you won't look back.
I can safely say that MySQL handles GIS data sufficiently for the type of scenario you've described on Slashdot. I've got a database I periodically play with that is all the cities / major towns of the world and can quickly query it with distance data. (as in: Give me everything within 100 miles of Lat,Long)
MySQL doesn't make all of it as easy as it should be, but with some careful design decisions it can work surprisingly well for it.
I previously had created a distance function that worked pretty good, but more recent versions of MySQL have better internal support and I was able to ditch some of my custom routines.
(Note: my routines performed efficiently, but gathered extraneous data that could later be filtered out to be more precise. The extra data was maybe 10% more than necessary, but meant the query was very fast.)
Perhaps you should be looking to upgrade your MySQL instance, or getting some procedures/functions written for MySQL to handle some of the calculations.
(It's pretty easy to calculate a rectangle for the query to run against, and then follow up with a more precise distance calculation to filter out a few stranglers that slip into the dataset. (better to include a few that are dropped in later stages than to exclude valid data up front)).
As for the moral issues: Stuff them. They aren't what you should be focusing on.
Disclaimer: I work for Esri.
In the FOSS GIS world the go to DBMS is PostgreSQL + the PostGIS extensions. Don't even consider MySQL in this regard. PostGIS owns this space and rightfully so.
SQL Server is a great database as well and their spatial types and functions are excellent. Depending on how much data you'll be storing and processing - and if you have Windows Server licenses already - you may be able to use SQL Server Express which is free as in beer for any use.
For someone in your position though there's really only one choice: go with PostgreSQL + PostGIS. It would be silly for someone in your position to pay for this functionality and the PostGIS community would welcome you with open arms.
Disclaimer: I work for Monty Program Ab, the primary developers behind MariaDB.
MariaDB 5.3.3 (currently in RC status) has improved GIS functionality versus MySQL. While sponsoring new features could improve it even further what we are releasing right now might be enough for you. Please give it a try, and talk to us about how we could make it fit your needs better.
And why didn't you provide an answer to the submitter, rather than troll?
Oh! Oh! I have an answer. Pick me!
It's because you're trolling just as hard!
You spent 5 of 6 lines attacking the person who wrote the post, 1 line talking about some other topic and zero arguing the merits of the post.
Yay! I solved the puzzle. I'll take the cupcake for $10 and the rest on a gift certificate Pat.
I personally agree with the OP, you use the best tool for the job. If you are designing your own tools just to save money, then you're under-valuing your time as a developer. If in this case MSSQL works, then use it, if in another case MySQL works, use it.
Am I the only one who finds it a bit sad that this is considered a hard problem to solve in house? It depends on how you have your database set up, but, you could store your latitude and longitude in different fields as degrees, minutes, and seconds (do you need resolution finer than about 30 meters?, then add more fields, need coarser resolution, do the same). Then, you start at your center point and draw an appropriately sized circle (more on that after) around that point using an algorithm that gives you all the second^2, minute^2, and degree^2 (and larger and or smaller blocks as precision requires) sized blocks that fall inside or touch the circle. Then you craft a select statement for all sites that match that set of blocks. Then, after you have that set, if you don't care too, too much about precision, you're done. If you do, you take the data set that's been returned and you look at all of the sites whose block intercepts the circumference and calculate their distance to the center and throw out the ones that are too far away. If you're using a flat earth (not flat as in pancake, flat as in a perfect ellipsoid) model, then you're done at this point. If you want to consider three dimensional distances on an earth with mountains and valleys, etc. (the "appropriately sized circle" mentioned above should already be taking care of the perfect ellipsoid model), so that a site 100 meters away horizontally but at the bottom of a 1.5 km cliff isn't considered to be 100 meters away, then you need to do more work and you need the altitude of each site in your database as well. Since you can pretty much rely on a fairly low maximum amount of overhang from cliffs and so forth, all you need to do is have an inner circle and an outer "appropriately sized circle" based on some precalculated constants regarding maximum changes in altitude over the whole earth calculated by coordinate block of some given size (computing or obtaining those constants is the trickiest part, put it's not hard, it just requires the appropriate GIS data and some number crunching) and stored in a table. When you compute your inner and outer circles, you just take the local terrain into account and draw the outer circle as far out as any sites in those blocks could possibly be from the center, and the inner circle as far in as they could possibly be. Then you work the blocks from the outside of the max circle to the inside of the min circle (not bothering to search blocks bounded on the outside by other blocks where you've already determined all the sites are inside your max area).
For calculating the "appropriately sized circle" in the first place, you make use of the Haversine Formula or an appropriately modified (for altitudes) version thereof and some safe margin around the edge. Voila. Now, I know I've fudged past some of the math here, especially for the more complicated cases, but this is still pretty simple stuff, especially for the simpler cases. This is CS Major Sophmore or Junior year stuff.
What if they actually save money by being more productive?
This may be too obvious to be helpful, but since the submitter mentioned that they use Drupal on top of their database, why not just use one of the Drupal Google Maps modules as a starting point?
Google Maps Tools provides the very proximity function he's looking for (among other things) and there are many other easy integrations with Google Maps he can use as well.
That said, if he really wants to do the calculations at the DB layer, then switch the installation to Postgres GIS as mentioned elsewhere.
The party of stupid and the party of evil get together and do something both stupid and evil, then call it bipartisan.
Canada is finally cracking down on fake immigrants. One of the biggest problems is the abuse of the immigrant investor program which this business apparently is trying to exploit, if you actually look at their web site:
In other words, "buy your way into the country by buying some dead persons' business."
Nice scam - too bad Mr. Martinez isn't licensed. The Canadian government has required licensing of anyone doing immigration consulting since 2004 because of the number of scams and abuses.
If you ignore them, and you admit you're ignorant when it comes to them, why are you commenting on them, and providing advice regarding them? Maybe you'd be better off providing advice regarding platforms you're knowledgeable about?
Karma: Poor (Mostly affected by lame karma-joke sigs)