In my continuing global modelling saga I am running into several issues with both Edit Tools and Arcmap geoprocessing tools.
I am trying to union or merge without overlaps 3 global datasets of Protected Areas and to then intersect these datasets with my global ecoregion data. I get nothing but unexplained errors. Sometimes after 20+ hours of processing.
Anyone have any insight into what may be causing Arc to crash and burn on basic geoprocessing tasks like these. These are big datasets. And if it's a size issue where can I obtain some information on size limitations for these operations.
I've been reluctant to use Manifold to work on this process since my experience in the past is that it is extremely slow for this sort of task if it can even get it done.
M
Geoprocessing Woes
Started by
Martin Gamache
, Nov 20 2006 11:11 AM
10 replies to this topic
#1
Posted 20 November 2006 - 11:11 AM
#2
Posted 20 November 2006 - 11:20 AM
Without knowing more details its hard to help, other than to offer what I would do in a general situation as yours - switch to AV 3.x. For reasons unexplained, I get faster processing times on geoprocessing tasks than in ArcMap, especially when intersecting large, complex polygon coverages.
For mega coverages, exp global, I would tile them and run separately.
For mega coverages, exp global, I would tile them and run separately.
#3
Posted 20 November 2006 - 12:26 PM
Martin,
Really wish I could help on this one. I am MapInfo user when it comes to geoprocessing and I have used some very large data sets without problems.
Could there be something like a number of vertexes on a polygon limitation with Arc (maybe weeding your line work on your polygons if possible before). Could be at lat long issue.... re-project into a projected data set before running....
Is there a way you could break up the world into smaller parts and re-process. When I run a geo-process on large data sets, I always test on smaller ones... (at least you don't have to wait 20 hours to find out if you did something wrong)
20 hours .... to run (is there some sort of progress bar to help gage things?)
Not much help I know.... but I just had to try.
Really wish I could help on this one. I am MapInfo user when it comes to geoprocessing and I have used some very large data sets without problems.
Could there be something like a number of vertexes on a polygon limitation with Arc (maybe weeding your line work on your polygons if possible before). Could be at lat long issue.... re-project into a projected data set before running....
Is there a way you could break up the world into smaller parts and re-process. When I run a geo-process on large data sets, I always test on smaller ones... (at least you don't have to wait 20 hours to find out if you did something wrong)
20 hours .... to run (is there some sort of progress bar to help gage things?)
Not much help I know.... but I just had to try.
Chart
#4
Posted 20 November 2006 - 12:28 PM
M,
Data in the real world meets the unwritten expectations of the geoprocessing programmer. So your expectations are correct--it should run and run faster than you describe. I run into things like this pretty frequently and here's my list of things to try:
1. Repair Geometry tool. The tool will repair:
3. There still may be a bad shape somewhere, so try processing on a subset of the data just to verify that the model can run to completion
4. That said, you may be running into a bug. Most bugs in this class of problem can be reduced down to a small scenario with just a few features. If you think that's your case definitely submit it to tech-suport; they may have other things to try that I haven't thought of.
5. And for the record, we've not had a vertex limit for a very long time (10+ years).
Charlie
Data in the real world meets the unwritten expectations of the geoprocessing programmer. So your expectations are correct--it should run and run faster than you describe. I run into things like this pretty frequently and here's my list of things to try:
1. Repair Geometry tool. The tool will repair:
- Null geometry?The feature will be deleted from the feature class.
- Short segment?The geometry's short segment will be deleted.
- Incorrect ring ordering?The geometry will be updated to have correct ring ordering.
- Incorrect segment orientation?The geometry will be updated to have correct segment orientation.
- Self intersections?The geometry's segments that intersect will be split at their intersection.
- Unclosed rings?The unclosed rings will be closed.
- Empty parts?The parts that are null or empty will be deleted.
3. There still may be a bad shape somewhere, so try processing on a subset of the data just to verify that the model can run to completion
4. That said, you may be running into a bug. Most bugs in this class of problem can be reduced down to a small scenario with just a few features. If you think that's your case definitely submit it to tech-suport; they may have other things to try that I haven't thought of.
5. And for the record, we've not had a vertex limit for a very long time (10+ years).
Charlie
Charlie Frye
Chief Cartographer
Software Products Department
ESRI, Redlands, California
Chief Cartographer
Software Products Department
ESRI, Redlands, California
#5
Posted 21 November 2006 - 12:46 PM
I have done some pretty intensive models. These are a few things that I have learned.
I would also recommend trying backwards approach to working with your datasets. I use this when working with large datasets. Select by Location the your ecoregion data that intersects with the Protected Areas. Invert your selection to remove ecoregions that do not contain Protected Areas. This will reduce the number of features that need to be processed. This sometimes helps.
Sometimes there is a problem with shapefiles. I would recommend "washing" the files through a 3rd-party application. Convert the shapefile to another format, then back to shapefile. I tend to use Global Mapper for this.
Good Luck,
-Tom
- Make sure all your data is in the same projection.
- Make sure all your data uses simple geometries (multipart to singlepart)
- Write your output files to a different directory from your source data. Preferably a different drive. The biggest culprit of geoprocessing errors is file/schema locks. Be sure that if you try again that you delete ALL files in this directory and REBOOT.
- Though ESRI does not have a vertex limit, there are file size limits to the OS. This affects both output files and intermediate files.
I would also recommend trying backwards approach to working with your datasets. I use this when working with large datasets. Select by Location the your ecoregion data that intersects with the Protected Areas. Invert your selection to remove ecoregions that do not contain Protected Areas. This will reduce the number of features that need to be processed. This sometimes helps.
Sometimes there is a problem with shapefiles. I would recommend "washing" the files through a 3rd-party application. Convert the shapefile to another format, then back to shapefile. I tend to use Global Mapper for this.
Good Luck,
-Tom
#6
Posted 21 November 2006 - 01:00 PM
I've managed to do the tasks I was trying to.
First thing I did was delete all the protected areas file and recreate them from the original zipped downloaded datasets from the source.
I did a clean restart and made sure all the memory was free, i.e. deleting temp files.
This allowed me to do all the merges using ET. Prior to these steps it crashed everytime.
Once I did the merges which took almost 15hrs I was able to process the intersect and dissolve operations, each time having to reboot. Most of these operations did not succeed on the first try, either returning out of memory or topology errors.
m
First thing I did was delete all the protected areas file and recreate them from the original zipped downloaded datasets from the source.
I did a clean restart and made sure all the memory was free, i.e. deleting temp files.
This allowed me to do all the merges using ET. Prior to these steps it crashed everytime.
Once I did the merges which took almost 15hrs I was able to process the intersect and dissolve operations, each time having to reboot. Most of these operations did not succeed on the first try, either returning out of memory or topology errors.
m
#7
Posted 21 November 2006 - 03:17 PM
this is part of the reason why I prefer to do things in ArcInfo workstation a lot of times...
* error messages make more sense and are easier to understand
* you don't have to struggle with the UI
* no unexpected oddities with data and temp files...
* error messages make more sense and are easier to understand
* you don't have to struggle with the UI
* no unexpected oddities with data and temp files...
#8
Posted 21 November 2006 - 05:08 PM
Perhaps you might try the new 7x Manifold:
http://69.17.46.171/...994541106730000
There is some discussion of the improvement in performance there.
http://69.17.46.171/...994541106730000
There is some discussion of the improvement in performance there.
#9
Posted 21 November 2006 - 05:19 PM
My version of 7...not sure which version gave me a nout of memory error after a couple of hours of processing!
m
m
#10
Posted 22 November 2006 - 01:21 AM
7.1.4.575 is the version being discussed. It was released 3 days ago.
If you've already decided it's not for you that's fine, but others might be interested. But, it sounds like your problem would be a great test - you should try it out and send them the data if it "doesn't work".
Here's a summary of the improvements in this release (from the link provided):
"We have improved the performance of the Clip Intersect and Clip Subtract operations for lines, the performance of the Split operation, and the performance of the Topology Overlays tool. The increase in the performance of the Topology Overlays tool is biggest for Union and Update overlays (a factor of 10 or more) and smallest for the Identity and Intersect overlays (a factor of 2 or more). The increase in the performance of other operations is big (a factor of 10 or more). Either way you cut it (pun intended...) the increase in performance is visible and substantial and has made the investment into improved geometry processing very worthwhile.
Plans are to close out any remaining issues not involving the geometry layer, to do full quality assurance and to issue an update as a production release shortly after the 25th of this month. So now is the time for all intrepid experimenters to hammer away at this new release with the toughest geometry processing tasks available. :-)
"
If you've already decided it's not for you that's fine, but others might be interested. But, it sounds like your problem would be a great test - you should try it out and send them the data if it "doesn't work".
Here's a summary of the improvements in this release (from the link provided):
"We have improved the performance of the Clip Intersect and Clip Subtract operations for lines, the performance of the Split operation, and the performance of the Topology Overlays tool. The increase in the performance of the Topology Overlays tool is biggest for Union and Update overlays (a factor of 10 or more) and smallest for the Identity and Intersect overlays (a factor of 2 or more). The increase in the performance of other operations is big (a factor of 10 or more). Either way you cut it (pun intended...) the increase in performance is visible and substantial and has made the investment into improved geometry processing very worthwhile.
Plans are to close out any remaining issues not involving the geometry layer, to do full quality assurance and to issue an update as a production release shortly after the 25th of this month. So now is the time for all intrepid experimenters to hammer away at this new release with the toughest geometry processing tasks available. :-)
"
#11
Posted 22 November 2006 - 02:05 PM
Martin,
I forgot one other ArcGIS/Geoprocessing tidbit. In 9.1, in the Samples toolbox, there are some geoprocessing tools that were designed for handling large data efficiently. At 9.2 we added that logic into the regular version of these tools--so it will be used when its needed. Basically if you're geoprocessing a relatively small number of features, one memory management approach is ideal, but that strategy does not hold up for large datasets, in fact there's a break point at which large data will get much slower. Another way to manage memory allows large datasets to be processed optimally, though it is not as efficient for small datasets.
Also at 9.2 (now shipping) we also introduce a new, additional, DBMS for the geodatabase, called File Geodatabase. This gets us past the limitations of the personal geodatabase, which was dependent on the Microsoft JET database engine. So, no more 2Gb file size limits, update transaction count limits, and no more being somewhat limited to WINTEL platforms. The personal geodatabase file locking issues will go away, but it is not a multi-user dbms, so expect it to behave similarly to shapefiles with respect to locking.
Happy Thanksgiving,
Charlie
I forgot one other ArcGIS/Geoprocessing tidbit. In 9.1, in the Samples toolbox, there are some geoprocessing tools that were designed for handling large data efficiently. At 9.2 we added that logic into the regular version of these tools--so it will be used when its needed. Basically if you're geoprocessing a relatively small number of features, one memory management approach is ideal, but that strategy does not hold up for large datasets, in fact there's a break point at which large data will get much slower. Another way to manage memory allows large datasets to be processed optimally, though it is not as efficient for small datasets.
Also at 9.2 (now shipping) we also introduce a new, additional, DBMS for the geodatabase, called File Geodatabase. This gets us past the limitations of the personal geodatabase, which was dependent on the Microsoft JET database engine. So, no more 2Gb file size limits, update transaction count limits, and no more being somewhat limited to WINTEL platforms. The personal geodatabase file locking issues will go away, but it is not a multi-user dbms, so expect it to behave similarly to shapefiles with respect to locking.
Happy Thanksgiving,
Charlie
Charlie Frye
Chief Cartographer
Software Products Department
ESRI, Redlands, California
Chief Cartographer
Software Products Department
ESRI, Redlands, California
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users


Sign In
Create Account

United States
Back to top
No Country Selected
Sweden
Australia








