Ask Slashdot: Open Source vs Proprietary GIS Solution?
New submitter rnmartinez writes "As the Project Manager for a non-profit looking to implement a tech project, I am running into a few dilemmas, and as a casual Slashdotter I could really use some help. I'll start with a brief explanation of the project. We research issues in Canadian Immigrants, and found that there was a lack of recent, unaggregated information. As we dug further, we found that some data was available, but there was no central repository. Therefore, we are building a web based service to collect this data, with the intent of having it display in Google Maps and then be downloadable as a CSV file that is readable in GIS software such as ESRI Arcsoft, so that data may be visualized."
The dilemma: "...It seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff." Read on for further details.
"To date, we have relied on a LAMP box with Drupal as the frontend to help provide a more social experience. However, it seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff. As a believer in open source and as a non-profit, I am having some moral issues with this (I try and run Linux and open source on everything I reasonably can)."
"So here is my dilemma: do I dump $20K into moving everything to an MS solution that in the short-medium term might make the geometry functions (i.e. show me all the hospitals within a 20km radius of this cluster of immigrants) or do I get him to spend the same amount writing something similar for MySQL? The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB? Might be a longshot but again, as a non-profit I'd like to see something go back to everyone, not just my group."
"Really, I am open to any flexible, creative open and reliable solutions. Sorry if my knowledge is limited or if I am grasping at straws, and if I am being terribly biased, but I trust Oracle with open source about as much as I trust MS."
MySQL might fare poorly against MSSQL's geometry support, but how does PostgreGIS stack up?
"So here is my dilemma: do I dump $20K into moving everything to an MS solution that in the short-medium term might make the geometry functions (i.e. show me all the hospitals within a 20km radius of this cluster of immigrants) or do I get him to spend the same amount writing something similar for MySQL? The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB? Might be a longshot but again, as a non-profit I'd like to see something go back to everyone, not just my group."
"Really, I am open to any flexible, creative open and reliable solutions. Sorry if my knowledge is limited or if I am grasping at straws, and if I am being terribly biased, but I trust Oracle with open source about as much as I trust MS."
MySQL might fare poorly against MSSQL's geometry support, but how does PostgreGIS stack up?
This is a stupid question. Of course you are going to buy product that is ready to do what you want. Writing custom code is always hundreds of times more costly. These commercial products can sell them cheaper because they sell thousands, even millions of them. But if you need someone to write that code for you (which you will be doing regardless of putting it open source), it will cost.
So take the product that is required to get the job done, right now. Don't waste the money on getting some expensive coders on other product. You would also have headaches with testing and bugs. All that is done for you with MSSQL. Take it.
GIS and vector data of the world
http://downloads.cloudmade.com/
http://www.naturalearthdata.com/
There are other solutions you may want to look into. Solr has decent geography support and is built to be a search system.
MS SQL is a better product, all around. Both as an engine, and the management UI. The integrated security features, integrated XML support, and of course the GIS functions.
But it costs money.
Cost benefit, dude. That's all there is to it.
It is far superior to MySQL geographic types. I'm partial to MSSQL, but PostGIS is a close second.
If you've got something generating Lat/Lon coordinates to slap into your database, the distance calculations are trivial:
http://en.wikipedia.org/wiki/Haversine_formula
I didn't see anything in a quick skim of your requirements (tl/dr) that looks like it requires a $20K package to meet. There are a number of open source mapping solutions out there, Google Earth is hard to beat for functionality and ease of programming (see: KML).
How could it possibly cost 20k to do what you're talking about? I did the same thing last summer for a guy for about $200. And why would it cost $20k to move to MSQL. I suspect either there is a lot not being talked about in the summary for the prices are missing a decimal point.
Drop MySQL in favor of postgres.
I would recommend checking out PostGIS, which works with PostgreSQL. PostGIS adds functions, data types, and projection definitions to PostgreSQL that turn PostgreSQL into a powerful spatially-enabled database server. I'ved used this combination for a number of projects, and they work great. Both are fully open sourced.
In my office we work with GIS Data, but a ESRI Desktop license is just too expensive.
We choose for desktop Manifold GIS and the beauty of that software is that you could use almos any DB Backend to store your geometries. Now we are using MS SQLServer as backend, but Manifold allow you to save everything on WKB or WKT on the database that means that you can use almost any database.
My suggestion is for the server use a combination of Mapserver+PostGIS.
For clients you could use Manifold GIS, it's not ArcMap but believe me, with a license of $900 you could do almost the same things that you could do with a ESRI Desktop license of $5000.
This is what you end up with proprietary solutions. The proprietary vendor has to make money. They know that you will eventually need stuff, and they want to make you pay them.
Go for the open source one. Find a way to be free of vendors as much as possible. go find a proprietary solution that sits in front of a mysql db. so that in future, you may just ditch the proprietary solution and jump to another one, with the same db behind it. or, you can use that setup, while in the meantime developing your custom front/application slowly, and at one point be vendor-free.
Read radical news here
Have a look at this post from Sebastian Delmont on google plus. I found an excellent eye-opener to whats out there related to GIS tech that you can "roll on your own". If you are doing simply radial distance calculations than as mentioned the Haversine forumula is your friend. I added a radial search to a dealer locator for an online store in under a day with some python and a bit of time to geocode and cache all the address data via google.
jaymz
SQL Server makes GIS a breeze, see here http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th
what about PostgreSQL...can you use that?
I think that the spatial geography data types that are native to MS SQL seem like they would be pretty key to what you're doing. Add to that DotNetNuke, which is free and open source, and that appears to be a pretty strong solution. DNN is also extremely extensible, very robust and it think would play well with MS SQL and your project.
http://postgis.refractions.net/ - pretty good spatial functions based on top of PostgreSQL, and not tied to Oracle.
There are many, many open source GIS packages that you might find useful rather than implementing everything yourself. http://maptools.org/ is one place to get some pointers from.
Run, do not walk, to research the Open Source Geospatial Foundation's offerings. And be aware, neo-geos devote their efforts primarily to PostGIS, which builds on the very capable PostreSQL database, adding in geospatial capabilities that by many accounts rival the best that that Oracle Spatial and MS SQL provide.
As always, there are significant tradeoffs to evaluate in your situation. Be ready to study the many evaluations and comparisons of the various solutions.
"Ahh! I see you're in that indeterminate Schrodinger state where - oh, uh
To make it easy, you store the lat/long of each item, and it is relatively easy to search quickly for a all of the matches within a rectangle, and then within that group you can use a little bit of geometry to narrow it down to an exact circle (if you wish). This might cost something like one man-day to implement. Are you sure that your developers are reasonably competent?
PS. I work in a mixed Linux/Mac/Windows environment, and we've found Windows based computers to be relatively expensive (in terms of our time) to maintain and administer.
Either Postgresql or Mysql (w/ geo extensions) will work on the back end. Geoserver makes a good, if somewhat heavyweight, middleware solution—it is fast though. It'll serve out selected areas in a variety of formats, including JSON. Arcsoft can use some of the other formats. Google Maps or OpenLayers will work well on the front end.
At my place of employment we use PostgreSQL and PostGIS extensively for the exact or similar problems as you describe. We recently contributed back to a portion of the PostGIS project by extending the TSP solver for a different ending than the beginning. I'm not the one who is generally writing stuff like this, but I maintain the servers and I know how much performance can be gained. Plus, the PgSQL and PostGIS guys are very close with lots of code and advancements being contributed directly into PgSQL from the PostGIS team. We have also looked at the MS solutions and found them to be ridiculously expensive to host and scale services targeted at business with real-life budgets and not huge corporations. We have tools used in nearly all of the counties in Wisconsin processing many requests per day and second(not allowed to give numbers) with only a few servers. Personally, stay open source and stick with PostgreSQL. They have a track record for extremely stable systems that can be upgraded as advancements are made with very little downtime. You can tune the internal performance metrics to tweak everything you need with online research or many books and even consultants such as EnterpriseDB. Good Luck with your developers, go with PostgreSQL and you won't look back.
Are you telling us that you are putting your own personal bias for open source ahead of choosing the best solution for the problem? You ought to be fired just as much as the average "Nobody ever got fired for buying M$" drone. You're not being paid to be an open source evangelist; you're being paid to do your job!
Write down the pros and cons for each solution, pick the best one from an engineering perspective, and, if some closed source solution unfortunately has the advantage, grit your teeth and implement it.
Don't lock yourself in, use a FOSS solution. If your developers don't like it, fire them. You are in control, don't let them strong-arm YOU.
These are obvious choices without better alternative even in the commercial world. All ESRI products, or other products included. ESRI doesn't advertise it, but they are also using GDAL and other open GIS standards.
https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/
I have 10 years of Web, and 5 of GIS experience, and doing consulting now. Let me know if you're interested, and we can have a private conversation.
I can safely say that MySQL handles GIS data sufficiently for the type of scenario you've described on Slashdot. I've got a database I periodically play with that is all the cities / major towns of the world and can quickly query it with distance data. (as in: Give me everything within 100 miles of Lat,Long)
MySQL doesn't make all of it as easy as it should be, but with some careful design decisions it can work surprisingly well for it.
I previously had created a distance function that worked pretty good, but more recent versions of MySQL have better internal support and I was able to ditch some of my custom routines.
(Note: my routines performed efficiently, but gathered extraneous data that could later be filtered out to be more precise. The extra data was maybe 10% more than necessary, but meant the query was very fast.)
Perhaps you should be looking to upgrade your MySQL instance, or getting some procedures/functions written for MySQL to handle some of the calculations.
(It's pretty easy to calculate a rectangle for the query to run against, and then follow up with a more precise distance calculation to filter out a few stranglers that slip into the dataset. (better to include a few that are dropped in later stages than to exclude valid data up front)).
As for the moral issues: Stuff them. They aren't what you should be focusing on.
*than. For the love of God.
Disclaimer: I work for Esri.
In the FOSS GIS world the go to DBMS is PostgreSQL + the PostGIS extensions. Don't even consider MySQL in this regard. PostGIS owns this space and rightfully so.
SQL Server is a great database as well and their spatial types and functions are excellent. Depending on how much data you'll be storing and processing - and if you have Windows Server licenses already - you may be able to use SQL Server Express which is free as in beer for any use.
For someone in your position though there's really only one choice: go with PostgreSQL + PostGIS. It would be silly for someone in your position to pay for this functionality and the PostGIS community would welcome you with open arms.
This is an excellent question. Do you need to perform arbitrarily complex operations on geometries, or are you looking for very simple geospatial operations? If, for example, you need to find all points within a complex polygon, or calculate overlaps then it's likely to be useful either to buy into a real GIS system, or investigate something like the GDAL OGR library. If, however, all you want to do is the sort of thing you describe, you can probably get away with calculating a bounding rectangle from the set of points you have and extending that in each direction by an arbitrary amount. Lo - store grid references or Lat/Long, and you need no geospatial support in your database at all. If you *did* need geospatial support, and were persuaded that MS-SQL were the way to go, then check if you qualify for WebSiteSpark in Canada - that will get rid of most of the 20k you need to spend and get you Server 2008, SQL Sever 2008, Visual Studio 2010 &c for $100 over 3 years. How much data do you have? SQL Server 2008 Express has the spatial extensions.
How often does the data change? I'm not talking about adding new records but changes to existing ones. If the answer is rarely then you should be putting your data in a SOLR search engine, not in a database of any sort. But if a database is called for, you really need to use PostgreSQL, either on its own or as part of a GIS system using POSTGIS. MySQL is just the wrong tool for this job.
Look at tools like TileMill that make mapping really easy.
If Oracle ownership of MySQL bothers you, then download MariaDB.
You don't have to use it right away if you don't want to, but it is a drop-in replacement for MySQL that you can substitute at any time.
CODE RED!
CODE RED!
DEPLOY MARIA!
GO!
GO!
GO!
Please tell me you are not actually thinking that having geometric functions in SQL is worth $20,000. If so, I have a really nice hammer I can sell you, only $800.
Postgres + PostGIS is the way to go for you. Has any function you will find in Oracle Spatial or MSSQL. Just have a quick look at the documentation and you'll see for yourself.
Disclaimer: I work for Monty Program Ab, the primary developers behind MariaDB.
MariaDB 5.3.3 (currently in RC status) has improved GIS functionality versus MySQL. While sponsoring new features could improve it even further what we are releasing right now might be enough for you. Please give it a try, and talk to us about how we could make it fit your needs better.
If it gives you a warm fuzzy to use open source, budget what it will take to retrain or replace your programmers and build from scratch what you could get by paying the license fees. Factor in the additional risk and schedule impact of build versus buy. Then when you're done, contribute everything you paid your team to develop back as Open Source code so anyone else who wants to do the same thing can use it.
I know this post will get swamped, but slashdot isn't really the best place to get advice about that. You should ask your question over at http://gis.stackexchange.com
OSGeo, Worldwind, QGis and GeoServer. Specifically GeoServer which would allow you to server data from which ever database you decide directly to the users. If you aren't already aware, you should investigate Web Map Servers (WMS) as they would be a better delivery mechanism than CSV.
You say that your developers are MS guys at heart? Then go with the MS stack to develop this. Your developers are your biggest expense, and Microsoft licensing for nonprofits is completely reasonable. As much as you might want to do everything open source, sometimes it's more important to get the job done in a timely fashion.
If they really are "MS guys at heart", then they're not very well qualified. This may be due to intrinsic low intelligence, but presuming you've ruled that out, it's likely due to lack of significant experience with professional-quality software. (I consider MS software fine for children and amateurs, but utterly unsuitable for anyone who even pretends to be a professional.)
This exercise would thus make an excellent teaching moment for them: they should be tasked with investigating the many fine pieces of open-source software (and data) linked to in this thread, and educating themselves to the level necessary for them to understand not only how the solution to your current problem has very likely already been coded by someone else, but why it's very likely been coded by someone else. In other words, I'm suggesting that they come to an understanding of why common/ordinary/routing tasks should not be relegated to expensive, proprietary software, but are included in free, open-source software during the normal course of development.
Meanwhile, you should consider this question: is a relational database the best way for you to attack this problem? Or should you, instead, consider alternate ways of storing/searching the data? (I make no recommendation either way; without MUCH more detail on your precise requirements, I can't.)
you might want to contact your local MS sales rep. I believe, if you're a charity, you can get MS SQL Standard edition for around $3K.
And why didn't you provide an answer to the submitter, rather than troll?
Oh! Oh! I have an answer. Pick me!
It's because you're trolling just as hard!
You spent 5 of 6 lines attacking the person who wrote the post, 1 line talking about some other topic and zero arguing the merits of the post.
Yay! I solved the puzzle. I'll take the cupcake for $10 and the rest on a gift certificate Pat.
I personally agree with the OP, you use the best tool for the job. If you are designing your own tools just to save money, then you're under-valuing your time as a developer. If in this case MSSQL works, then use it, if in another case MySQL works, use it.
First of all, it's not your developers who should do this decision. Yes, you should ask for their opinion but if they're the "we know MSSQL and we're not willing to see other options therefore everything else is worse" type then they'll choose MSSQL no matter what. So do a cost-benefit analysis, as already recommended by wasabii.
Before giving any specific recommendations, I have to mention that I'm a PostgreSQL-guy. I do like how mature and reliable the database is, how the community works, how fast the occasional bugs are fixed, etc. Therefore I'd recommend you to give PostGIS a try - AFAIK the spatial functionality is infinitely better than in MySQL, quite comparable to MSSQL. There is a nice comparison matrix of features at http://www.bostongis.com/PrinterFriendly.aspx?content_name=sqlserver2008_postgis_mysql_compare but it's very old - it may give you an idea how the products compare. You can probably get more details on the PostGIS mailing list.
You should really decide whether you want to go with an open-source solution or not, how important the benefits are for you etc.
Things to consider:
1. How many "guys" (people) are you talking about when you say they are "MS guys at heart" == (amount)
2. How much will they have to learn to be as proficient at option X versus what they are good already (MSSQL) == (time)
3. How much will they be annoyed that option X "just doesn't do this the same way" == (more time wasted chatting about how dumb there boss is for making them change)
then run that through the ol' PERT analysis (shortest time + 4xlikely time + longest time) / 6, and see if that comes up to be more than your 20k initial expense..
so lets say 3 "guys"? and the project seems rather large and its doing something new, but lets guess 6 months (960hrs.. 40hrsx4x6),
lets say they on the conservative they waste 8 hrs a week learning/complaining/whatever, 10 likely and 12 if they are really mad at you for making them learn some new stuff.. ( pert says 10 after that )
so we'll say they'll blow 10 hrs a week learning the new SQL db, 240 total hrs a piece, and we'll guess there pay at $20 being non-profit and all, but probably paid more I would hope.. $4800 per person, times 3 is $14,400. so its getting in the warm zone of $20k.. would be interesting to see real numbers..
A feature comparison between Postgis and MSSQL server
http://bostongis.com/?content_name=sqlserver2008r2_oracle11gr2_postgis15_compare#221
I've always used postgresql. Cross-platform, well documented, well supported, solid performance. What's not to like?
Also ties in quite well with other open source geospatial projects like qgis, http://qgis.org , and geoserver, http://geoserver.org.
There is also a drupal module geospatial module that uses postgis, http://www.geops.de/blog/64-spatial-data-and-drupal-7
If you are looking cloud, it costs 41% less to run your linux instance on Amazon than a windows instance.
Open Source stuff tends to, quite frankly, suck ass. Whether you are talking about Linux, Firefox, Apache, etc etc etc, there doesn't exist a single open source code project that does anything better than its closed source competition, this is why I generally recommend avoiding open source whenever possible.
http://home.techsoup.org/ --> Great site for MS software cheap for non-profits. Instead of debating which software solution is best. If you have good guys giving their time, and they want a MS solution, then give it to 'em cheap!
As someone has already pointed out, your developer is far more expensive than the software. So the more you can get done without custom code, the cheaper the solution.
Also, MS does give significant cost breaks for non-profits.
Finally. there is nothing to prevent you from opening up the source of the part your actually produce.
I've been doing geospatial development since 1992 when Genasys (defunct) released the worlds first web mapping server, Web Broker (ESRI's IMS wasn't released until 5 years later). At the time the only COTS option for managing spatial data within an RDBMS was Sybase's Spatial Query Server, which provided slightly more geometry options inherent within MySQL today. MySQL geometry features should be considered primitive at best, far from a full-featured implementation. On the other hand, PostGIS, is full featured and the only open source implementation I know of that adheres to the Open Geospatial Consortium's 'Simple Features: Types and Functions 1.1' specification. It is very robust, scales well (in contrast to MS SQL) and you can't beat the price or ease-of-mind that comes with being licensed under v.2 of the GPL. Additionally, on comparable hardware, in terms of reliability, performance, and cost effectiveness, there is no comparison, regardless whether your using Oracle w/Spatial or MS SQL and ESRI's SDE.
Your developers, as good of guys at heart as they may be, should stay away from the kool aid for awhile
Firstly, you can post questions here: http://gis.stackexchange.com/ and they will be answered. It seems to be a pretty good community, I have been posting on there for a little while.
Secondly, I don't use databases a lot, but I recommend that you do what you suggest. Make a Google Map and make a CSV for folks to download.
like phosphorescent desert buttons singing one familiar song
I'd throw in a good work for PostgreSQL as well. But an additional question: is this the last question you'll be asked of this geographic data? Over the long run, if the questions keep coming, you'll want a stable base from which to work. "Stable" means "popular" to some extent. Can you keep finding developers to work on your problems with the base of data and software that you have to work with, or does it depend on a few people knowing all the not-so-popular products you're working with?
I'm a big open source, fan, but working in a big bureaucracy, I've reluctantly ceded the wisdom of the Microsoft/ESRI "shop" as offering the best chance of quickly replacing people who move on.
Someone else has probably already posted this, but ESRI is a MS-only solution too. They ditched Unix over a decade ago with ArcGIS 8.
I'm interested to learn of other, more portable, GIS Solutions as well. I've had the opportunity to play with LuciadMap in the past, but that runs in Java (a language I hate coding in). It is more portable than ESRI and the world has no real shortage of Java Developers. They're a proprietary solution based on the commonly accepted GIS standards. Their license isn't cheap, but still cheaper than ESRI from my understanding. They also have an Android version that was in the works last time I heard from them.
Am I the only one who finds it a bit sad that this is considered a hard problem to solve in house? It depends on how you have your database set up, but, you could store your latitude and longitude in different fields as degrees, minutes, and seconds (do you need resolution finer than about 30 meters?, then add more fields, need coarser resolution, do the same). Then, you start at your center point and draw an appropriately sized circle (more on that after) around that point using an algorithm that gives you all the second^2, minute^2, and degree^2 (and larger and or smaller blocks as precision requires) sized blocks that fall inside or touch the circle. Then you craft a select statement for all sites that match that set of blocks. Then, after you have that set, if you don't care too, too much about precision, you're done. If you do, you take the data set that's been returned and you look at all of the sites whose block intercepts the circumference and calculate their distance to the center and throw out the ones that are too far away. If you're using a flat earth (not flat as in pancake, flat as in a perfect ellipsoid) model, then you're done at this point. If you want to consider three dimensional distances on an earth with mountains and valleys, etc. (the "appropriately sized circle" mentioned above should already be taking care of the perfect ellipsoid model), so that a site 100 meters away horizontally but at the bottom of a 1.5 km cliff isn't considered to be 100 meters away, then you need to do more work and you need the altitude of each site in your database as well. Since you can pretty much rely on a fairly low maximum amount of overhang from cliffs and so forth, all you need to do is have an inner circle and an outer "appropriately sized circle" based on some precalculated constants regarding maximum changes in altitude over the whole earth calculated by coordinate block of some given size (computing or obtaining those constants is the trickiest part, put it's not hard, it just requires the appropriate GIS data and some number crunching) and stored in a table. When you compute your inner and outer circles, you just take the local terrain into account and draw the outer circle as far out as any sites in those blocks could possibly be from the center, and the inner circle as far in as they could possibly be. Then you work the blocks from the outside of the max circle to the inside of the min circle (not bothering to search blocks bounded on the outside by other blocks where you've already determined all the sites are inside your max area).
For calculating the "appropriately sized circle" in the first place, you make use of the Haversine Formula or an appropriately modified (for altitudes) version thereof and some safe margin around the edge. Voila. Now, I know I've fudged past some of the math here, especially for the more complicated cases, but this is still pretty simple stuff, especially for the simpler cases. This is CS Major Sophmore or Junior year stuff.
I'm afraid I'd have to put my eggs in the longevity basket. Databases come and go. Open source seems reasonable to me for that reason alone. Another thing, if your 'guys' are MS guys it still might be security in the bag to use something java related. I said MAYBE stone throwers be still. .NET was always annoying and now its interest is waining. Would it hurt to be able to platform hop? Nope, because even managing parts its been nice switch machines and locations. Furthermore doesn't PHP and something like MYSQL click?
I'm not into GIS but I once attended a presentation about this system.
Maybe it can help, meybe not.
http://www.gvsig.org/
From the original poster:
However, it seems that MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL and ditch the open source stuff.
OK, what makes you think that the F/OSS stuff will all work together and integrate fine?
F/OSS isn't necessarily the cheapest way to go. You mentioned the licensing fees that MS will charge but did you mention the additional developer costs with F/OSS?
You see, the F/OSS stuff never plays nice with one another. Sure, you got source code - so you need extra developers to go and plow through the source code and figure out how it works.Then, I guarantee there will be a dependency problem somewhere. There's always dependency problems and incompatibilities with F/OSS installations. I take that back, almost - LAMP has had its ass kicked so much, that's pretty much been ironed out. BUT we're back to the geometry and working that in. Then, there's the custom coding that will be required with a F/OSS installation; as the poster said there would.
So, I guess the numbers to crunch are:
1. MS and licensing fees.
2. F/OSS and extra help.
Sorry, as much of a F/OSS fan that I am, I can't honestly say that it will be cheaper: short or long run.
If you can stomach their dog-eat-dog world of AI agents long enough to discover that their data structures are based around the same ESRI core software and that the PCA routine VAR2VEC lies at the heart of their analysis, then you'll conclude that putting your loadings in XML makes a great interface.
Argumentum ad Probabilitum
... to me is the cost of obtaining all the data in a usable form. The logic itself would be easy.
It could be done very easily in a no-SQL solution. Doing it through SQL because some SQLover is making decisions would be only slightly harder. The SQL lookup would be delivering 4/3.14159 times as much data as actually needed for a circular result, and then that would be filter by whatever front-end or back-end code to cleanly clip off the corners, of the SQL implementation didn't have the math to do it (and I might not trust SQL to do that kind of stuff as efficiently as it could).
Let my logic organize the data into grid squares in files and I could make this work very fast. But I need the data (or at least uniformly spread random sample of a few billion items) to test it.
now we need to go OSS in diesel cars
I have had a fair amount of success using open source GIS tools for one of my side project to create a hunting map book. I am not too sure of how well either one would work for what you are trying to do but I would be surprised if they didn't. The 2 tools I have had the most success with are uDig GIS and GRASS GIS. Both of them will run on Window, Mac OS, or Linux just fine, they all can process shape files, and DB input as well as geo tiff files as well as other file formats. Another popular OS GIS program is Quantum GIS. When I started I found OSGeo4w which had a bunch of open source GIS programs compiled for windows with everything you need and tried out a few of them and found 2 that best suited my needs. At the time I know it included GRASS, uDig, QGIS, and a couple of other ones, you might want to check out OSGeo as well as there may have been other projects that have started or better meet your needs.
Here is an example of some of what I have done. It is a map of the MN deer areas showing the antlerless deer harvest per square mile for each area in relation to all the others during the archery season. This map is a 10 year snapshot where the darker area indicated more deer were taken per square mile.
Time to offend someone
How effing accurate does such a thing even need to be in a social networking context? Still, it wouldn't be hard to get this down to sub-meter accuracy. It could be hired out if need be on one of those programming task sites for under $1000 (far less that the $20000 initial outlay to put up a decent couple of machines to do it in an MS-SQL way).
I suspect there's a band of "we are scared of maths" programmers somewhere near the OP.
now we need to go OSS in diesel cars
...with regards to geometry built in then MySQL...
Then MySQL what? The OP should finish their sentences if they actually want help. I have a feeling given the intentional bad grammar and illogical attacks on open source that Slashdot is the victim of yet another troll.
One of the comments above proposed a no SQL solution. See the introduction to no SQL in Wikipedia ----
http://en.wikipedia.org/wiki/NoSQL
There are good, and tried, Open Software tools to implement no SQL solutions.
Canada is finally cracking down on fake immigrants. One of the biggest problems is the abuse of the immigrant investor program which this business apparently is trying to exploit, if you actually look at their web site:
In other words, "buy your way into the country by buying some dead persons' business."
Nice scam - too bad Mr. Martinez isn't licensed. The Canadian government has required licensing of anyone doing immigration consulting since 2004 because of the number of scams and abuses.
You have identified the software that fits your needs.
The people that will be doing the job have advised you what tools they prefer
Dogbert: I'm going back to my old job as a network systems administrator.
Dilbert: Why?
Dogbert: I'm attracted by the potential for reckless abuse of power.
If you plan on creating a website using your data, perhaps exposing a public, restful api, geodjango is almost without compare. The geodjango orm makes complex queries, coordinate transformation and calculations straightforward and transparent. Check out http://www.everyblock.com/ the flagship geodjango project. GeoDjango works with most of the GIS databases, but is best coupled with PostGIS.
As a phd student working with spatial data, I cannot express how invaluable geodjango has been for me.
The only issue there, is that I am not too fond of Oracle having ownership of MySQL. Should I be directing $20K into replicating these functions into something like MariaDB?
If this is how you make your technical decisions, no amount of advice will help.
Advice: on VPS providers
I've run ESRI products on Linux and Windows platforms, and even though I'm a linux admin first, I would have to recommend Windows/MSSQL if you do choose to go with their products. Although they technically support Linux platforms, I found that their support was abysmal, and when we moved to ArcGIS on Windows, suddenly things worked much better, and their support actually had answers when we needed it.
Drupal handles queries against location by proximity with the Location module and Views, or I don't see what the issue is.
It's probably not the right time to change everything in your shop, but you could learn a lot from Geocouch and CouchDB. The technology for mobile and remote collection is very powerful using these tools, and it's worth beginning to learn how the NoSQL databases work in any case. MongoDB also has a GIS facet. For further discussions on Public Data please see Max Ogden's work: http://govinthelab.com/diy-open-data-how-to-start-a-public-data-catalog-in-your-city/
I worked for a large location-based mobile / web startup. Pretty much every web request dealt with a lat / lng. We used geokit ( http://geokit.rubyforge.org/ ) and it worked great.
20k to move to a completely different platform over one trivial problem is just wrong.
Well, I've done geospatial development using ESRI, PostGIS and MongoDB. I've evaluated Oracle Spatial, but do not have direct experience running it.
My experience is that ESRI is quite difficult to configure and run, and even when it is running, some operations are just plain slow. If you need to render your own map layers it might be useful, but if all you're looking for is spatial queries, it's way too expensive/slow/difficult to justify.
PostGIS coupled with Google Maps for map data and OpenLayers for the web front-end is what we went to after finally burning out on ESRI. That works great for our needs, but if you don't need a lot of customization, just the plain out of the box Google Maps widgets work fine too. PostGIS is easy to work with, mature and (compared to ESRI on the same hardware) very fast. And there's definitely something to be said for being able to mix spatial queries with non-spatial queries involving joins, which ESRI can't handle. This is definitely a recommended configuration. And most Linux environments have pre-build binaries for PostGIS available, so it's easy to install.
MongoDB has a GIS implementation that's pretty performant, but quite limited in features and is still pretty young. They can't use indexes to find polygons that contain a given point, but they can find points within a polygon, stuff like that. That said, they targeted the "easy" cases first for geospatial support, and their definition of easy looks a lot like what you need to do, so it might do everything you need. In general it's a transition and a different way of thinking using MongoDB instead of a more traditional RDBMS, but if I were starting a new project I'd probably go with it. I wouldn't go with it purely for the spatial support though.
Oracle Spatial looks very nice from all we've been able to read about it, just stupidly expensive, and I couldn't find anything it's supposed to do better than PostGIS anyway.
Get access to Python and download the shapely package. It rocks.
MySQL does have spatial feature support. The underlying data structure can do point-in-rectangle tests cheaply. More complex geometry is supported, but the query engine will generally construct a bounding rectangle for index lookup purposes, then sequentially test the hits against the more complex geometry.
I've tried this, and it does work. It's good enough for efficient "all gas stations within N miles of here" queries, for example.
For what it's worth, OpenStreetMap uses PostgreSQL successfully..
see stats here http://wiki.openstreetmap.org/wiki/Stats
Could that be because it's ILLEGAL to identify individual immigrants as it's an invasion of their right to privacy?
FFS, sprout a brain! The government HAS the detailed data, but you are NOT allowed to access it directly.
As to GIS, check out PostgreSQL. I hear their GIS support is pretty powerful, but I've never coded for it. Certainly I'd recommend PostgreSQL over MySQL any day of the week.
I do not fail; I succeed at finding out what does not work.
I responded to your early post before it went "live" here.
Keep in mind that the following answer only applies if you're not doing a full-blown GIS application, but only calculations like "find all the X that are there within Y miles of location Z," and "How far is it from point X to point Y?"
And if that is the case, there are software libraries (I know of some in Ruby, I know there are others) that, in conjunction with Google Maps (or Yahoo Maps and even a couple of other services), mean YOU DO NOT NEED COMPLEX DATABASE INTERACTION. So unless you're doing a complex GIS operation of some kind, database performance per se is almost completely irrelevant.
All you need to do is store two floating-point numbers: latitude and longitude. And you get these numbers in the first place (the first time you do a lookup) from Google Maps itself... even that does not need sophisticated math or database operations. You just get the numbers from Google and store them.
I did an application like this for another site not long ago, and I could do another one (stand-alone) in under 2 hours, using MySQL, with performance that is perfectly acceptable.
But if you are also truly concerned about Oracle having control of MySQL (and I would not blame you if you did), you can use MariaDB instead, which is a drop-in replacement for MySQL, written by some of the same developers who originally wrote the MySQL software, apparently because of the same concerns you have. For most purposes it is 100% compatible with MySQL, except for some performance enhancements.
.NET NUKE if you can help it. We have two projects in our shop utilizing this CMS as a framework and it's a bear trap. Also, we've accomplished a similar goal to yours using MySQL, it all boils down to math (and it can be applied to a query.)
Your developer is a Windows drone, it seems. If he only reads Windows magazines, Windows articles, have Windows-only mindset. It's like a religion, and he will try to justify everything as perfectly right because it works on Windows.
Your developer is either too slow to learn, or is afraid of learning an trying open source stuff.
Try out, and read about the tools they have for:
http://www.openstreetmap.org/
Look at their map data source that can be downloaded from multiple places. Yes there are "open source datasets" for the whole globe, or by continent, by country, by state... They also have links to numerous open source tools that work with their data. Free, most are also open source.
Watch out for using Google map data. I think they have restrictions on map data usage. Their data is proprietary, read the Google map API license again carefully!!!
Your major problem, too, is to have your servers running Windows. That in itself is a huge problem in the long run. If you run some flavor of Linux on your Servers, your developers will have to know enough of Open Source ways to do things.
Do you know that .NET stuff are being dropped by MS in favor of newer MS tools, right? Then why even bother with .NET?
Take a look at PostgreSQL with PostgreGIS as mentioned... Not only will this integrate nicely with Drupal and replace MySQL with minimal fuss, but it also avoids the issue of being associated with Oracle.
I believe there are also existing drupal modules for working with map data and specifically google maps.
Incidentally, if you're worried about MySQL because its owned by Oracle, then surely moving to MS would be even worse? At least MySQL can be forked under the terms of the GPL, MS products cannot be so you are totally beholden to a company every bit as ruthless as Oracle, and if you go to MSSQL/.NET then you will be beholden to them for your entire stack not just the database.
http://spamdecoy.net - free throwaway anonymous email - avoid spam!
fwiw, we use a combination of postgresql, postgis, geoserver, a few in-house developed servlets and apache or nginx on server side for our (rather local-oriented) geo visualization solutions (http://www.gaeaplus.si/ ; non-english, sorry). provides: standards compliance and ability to connect almost any client to it, decent performance, rich geometry support, simple administration, replication and load balancing. I never looked back or considered anything else since.
Your choosing to use Google maps? Why?
If you plan to open source your data why not use Open Street Map http://www.openstreetmap.org/
Someone references at the top the tools of cloudmade which is a direct spin-out of OSM
Open source mapping is getting better and better all the time people like microsoft bing and mapquest use OSM now.
Go have a look and at least compare google to the tools available with OSM
I remember one quote about mapping tools on OSM a couple of years ago
OSM mapping tools are either ridiculously easy or stupidly hard depending on whether you come from the GIS mapping community or are member of the public respectively. OSM mapping tools have got a lot better since then
Oh yay, another "do my homework for me" question. Why you'd even consider the options mentioned as if they were the only ones in town is more than a little beyond me. Really, lmfgify material. Wikipedia has lists of alternatives. Go check.
And if that's too hard, at the very least check out postgis. Sheesh.
I used to do work similar to this a few years ago and we used all open source software.
we used PostGIS for the database, and geodjango for the web page. geodjango had many other open source GIS projects required/capable of supporting(GDAL,proj4,pykml, etc).
You should be fired for putting your own personal wants over that of the needs of the project.
You said it yourself "MS SQL offers more functions with regards to geometry built in then MySQL, and my developers (good guys, but MS guys at heart) want me to switch to .net NUKE/MSSQL". Oh, and that quote also shows your anti-Microsoft bias, a bias which seems to extend to Oracle.
There is no "-1 offended" or "-1 you don't agree with me" mod options for a reason.
Who owns your data? Id there a lifecycle to consider? Will the data/information/results be used for collaboration, web-services ...?
Anyone can buy software and wait for patches. Anyone can own software and prioritize patches. Anyone can buy L/FOSS support for a stable product from a foundation/company.
These days the Q&A (IMO): Are you a responsible owner? What are the valuable assets for your business; virtual-material (information, data, content ...) or hardware/software... products?
Staying market competitive requires agility and responsiveness, IMO, L/FOSS provides agility and responsiveness in very stable and atypical markets.
Never choose the brand and never buy the marketing hype, look at what you need to do (resources to requirements) or solutions/success and then proceed.
Yes; L/FOSS for most, but not all, is the best path/process [Unless you like to point the finger of blame at a brand-name].
Unaccountable leaders are masters, and unrepresented people are slaves. How do US and EU fare?
I only read this part:
do I dump $20K into moving everything to an MS solution
The answer is no. The answer is always no.
Dump all the relational Database stuff and index your data textually in Apache Solr or look at one of the big data solutions.Both have considerable support for R-tree spacial searches and have tons of open source alternatives. Your problem seems better suited to this type of tool anyway. Relational databases offer data integrity in the face of a high volume of arbitrary inserts. Good for things like inventory systems and order entry. If you can batch the data in at scheduled intervals you'll have faster searches and you'll be able to scale your environment across cheaper commodity hardware.
Is it *really* impossible to answer the question "i.e. show me all the hospitals within a 20km radius of this cluster of immigrants" in MySQL?
This is an exercise in clarify requirements first, comparing them to the platforms *specific capabilities*. The requirements aren't things you can get from an "Ask Slashdot" article; you need to figure out what your use-cases actually demand. For example does the question posed above mean
(1) Finding all the hospitals within 20km of some point (e.g. centroid) used to represent the cluster?
(1b) Finding all hospitals within 20km of any point in the cluster.
(2) Finding all the hospitals that fall within a 20km buffer built around a polygon (e.g. the convex hull of points) representing the cluster of immigrants as a whole.
(3) Any of the above with *true geometry* as opposed to Cartesian approximations on some tangent plane.
Items 1 and 1b are trivial to do even without spatial extensions, although performance won't be what you'd wish. Any database that implements a reasonable subset of the OpenGIS standard should handle these neatly.
Item 2: last time I looked MySQL doesn't implement "Buffer", although this could be handled by a Web Feature Server (WFS).
Item 3: Most spatial extensions to database systems won't do this (possibly Oracle does); and I haven't checked, but it's likely most open source GIS libraries don't do this. The good news is that your app surely don't need to do this. An example of an application that would might be enforcing a no-fly zone over a large country. If you are contemplating shooting down aircraft, you need to have *very* precise answers on a geographic scale too big to approximate with Cartesian coordinates.
My guess is that you want either 1b or 2. If you want to know whether any of the immigrants in the cluster had access to a hospital, you'd use 1b. If you wanted to know whether the set of immigrants on the whole had access to a hospital, you'd choose 2.
Here's a technical issue to consider. Is the database tier the right architectural tier to address this? A Web Feature Server (WFS) could handle this too. In some ways its simpler to get started by doing things in the database layer, but a WFS has certain advantages. You can bring together multiple data sources, including some that might be on-line elsewhere. You can provide services to other (authorized) users. You can support desktop GIS for researchers and Web Map Servers (WMS). The big disadvantage is that you've got an unfamiliar interface to deal with (submit query URLs and receive XML responses which have to be un-marshalled).
The main advantage of doing all your spatial stuff in the database tier is that database servers usually have plenty of spare CPU bandwidth, it's one less complex software system to install, configure and maintain, and it uses a familiar programming interface (whatever you use to connect to the database). The biggest disadvantages of the database tier relative to a WFS is that your system is bound to be tightly bound to the peculiarities of the DBMS platform you choose. If you decide you just can't use MySQL any longer, you're going to rewrite a lot of query code. The WFS would abstract a lot of that.
I'm semi-retired now, but I do a little pro-bono consulting for non-profits and have some experience with GIS in public health apps. If you'd like help feel free to contact me. My yahoo mail user name is grumpynerd. I'm a bit rusty in the web mapping area, but I could probably help you make some of your architectural and product choices.
Post may contain irony: discontinue use if experiencing mood swings, nausea or elevated blood pressure.
Currently it is PHP/mysql, and that is why I leaned that way originally.
If it is just a set of proximity queries, then you could almost do that with search&replace on your SQL code
I.e.if you can store coordinates in UTM distance calculations become dead simple:
"select * from hospitals where (lat-targetLat)*(lat-targetLat)+(lon-targetLon)*(lon-targetLon) maxDist*maxDist"
Working with degrees makes it a bit harder, but as long as maximum distances are within a few 100 km you can get away with a simple cos(targetLat*deg2rad) scaling of the longitude values.
Terje
"almost all programming can be viewed as an exercise in caching"
I've recently switched to using MongoDB, a noSQL database that supports 'find nearest' type queries. Can be a bit of a learning curve switching to a noSQL database however.
http://www.mongodb.org/display/DOCS/Geospatial+Indexing
In general, the solution to almost all GIS software is GRASS + PostgreSQL + PostGIS
I've set these up a few different ways...
MySQL as database backend crashes much too easily, very difficult to configure, akin to lightning
M$ SQL is expensive and underwhelming performance but easy to pay someone to fix
PostgreSQL is not as 'quick' as MySQL but incredibly reliable.
It is possible to interface ESRI software products with a postgres db, which works, but then you are paying several thousand a seat for ESRI software. Also, for their web server toolkit, web server software, mobile development kit, all gets ridiculously expensive.
I believe GRASS has a few more features than ArcGIS at the expense of interface, but with the good fortune of having your own GIS development team I don't see why you would pursue a 'paid supported' software type. If your engineers are whining for M$ database (ick) fire them and hire some real engineers.
You seem to want to adopt MSsql because it offers a few handy functions which are not available in MySQL.
In general, is is best not to rely on the database to do processing since these functions are usually non-standard. It is best to stick to standard SQL and only rely on the db for standard SQL functionality.
Databases are for data. Processing should not be done in the database. I know this is a somewhat "old school" approach and I admit that I am old but there are several problems in locking yourself into a particular sql implementation just because it has a few handy functions. (Oracle is famous for this... once you check into Oracle, you can never check out)
- cost (you'll be stuck with the MSsql cost forever)
- MS may change the function
- you won't be able to move to another db
- at some point you will probably end up coding around these functions anyway since they won't be exactly what you need
As others have pointed out, there are lots of libraries which perform these functions which are portable and don't lock you into a particular implementation.
I don't read your sig. Why are you reading mine?
How about a real GIS? There are plenty of them - here's a recent list: .
I will note that GRASS, which was built for the US Army (and I used to know the project manager, back in the early nineties, when I lived in Austin), and it is industrial strength, and well-respected.
mark
PostGreSQL supports geometric ojects as used in GIS better than your options as does Oracle Unix.
If you have money you get the better supported options, it saves developement time and maintainandce and data operations, etc.
If not you do more coding, updating, maintainance yourself or with your team but things are less robust and work less well.
Take a look at Quantum GIS as well:
http://www.qgis.org/
"The Esri Nonprofit Organization Program is designed to provide conservation and humanitarian nonprofit organizations around the world an affordable means of acquiring ArcGIS software and services for organized volunteer efforts. Other types of nonprofit organizations may also be eligible for membership in the program."
http://www.esri.com/nonprofit/index.html
If you can get ArcGIS at a reasonable price, do it. Save time, save labor. Build on a solid foundation.
The MS stuff, though I admit is well integrated, has limited functionality--MS is not a GIS company. Yes it will help you today, but 2 yrs from now (if you are not planning a redesign/refactor), you'll be hating it. Been there done that.
Look into NASA Worldwind. The interface is ArcView like *and* GoogleEarth/Maps like. And you get real GIS guys that can help you out from the GIS community (from NASA nonetheless) maintaining it.
PostGIS + GRASS as a central processing repository is the way to go. You can also use QuantumGIS as a basic frontend for your day to day work. If you also have a lot of POI data, have a look at MongoDB geospatial too
You might consider looking at geocommons.com to see if it would serve your needs.
MSSQL is a poor implementation of a spatial database, based on OGR standards from 2001. The geometry data type is expressed in CLR, and you cant enhance it or do any fancy GIS with it. Use PostGresql and enhance as necessary with PostGIS. You will have SO many more options for doing data presentation then you ever will with MSSQL. The software is robust, available, and there is a large number of people willing to support it. Take that 20K and buy a support contract if that makes you feel better.
Spatial data hosting with Linux and WAY out performs any windows implementation to boot. You just cant lose here, just like any Web technology, the Linux toolkit is far ahead of Microsoft.