Jump to content

 
Photo

EPS text to spreadsheet

- - - - -

  • Please log in to reply
16 replies to this topic

#1
Hans van der Maarel

Hans van der Maarel

    CartoTalk Editor-in-Chief

  • Admin
  • PipPipPipPipPipPipPip
  • 3,839 posts
  • Gender:Male
  • Location:The Netherlands
  • Interests:Cartography, GIS, history, popular science, music.
  • Netherlands

I've been asked to update a few existing Illustrator maps, which includes a few edits to the index. The index is supplied as a seperate EPS file which is placed in the main Illustrator file.

However, the text in the EPS file isn't really produced very nicely:
Attached File  Screen_shot_2010_07_26_at_10.02.56.png   39.67KB   94 downloads

As you can see, each line consists of several text objects, with the index letter/number spaced out by dots. On previous projects I was able to save this as a PDF, then copy/paste the text from there into a text editor, getting single lines, and then further splitting it up in a spreadsheet (from there I'd paste it into text columns in Illustrator and tab-align it, with . as the tab characters).

For some reason this isn't working on this file. Does anybody have any other suggestions? It's a few hundred lines of text, so I'm not keen on manually redoing it, it's not a nice MAPublisher file and even if it were, the map labels are abbreviated and all, so I have no way of recreating the index that way...
Hans van der Maarel - Cartotalk Editor
Red Geographics
Email: hans@redgeographics.com / Twitter: @redgeographics

#2
Hans van der Maarel

Hans van der Maarel

    CartoTalk Editor-in-Chief

  • Admin
  • PipPipPipPipPipPipPip
  • 3,839 posts
  • Gender:Male
  • Location:The Netherlands
  • Interests:Cartography, GIS, history, popular science, music.
  • Netherlands

Ugh... I think I got something. I can use my old trick, but only column-by-column and with manually adding in line breaks...
Hans van der Maarel - Cartotalk Editor
Red Geographics
Email: hans@redgeographics.com / Twitter: @redgeographics

#3
frax

frax

    Hall of Fame

  • Associate Admin
  • PipPipPipPipPipPipPip
  • 2,299 posts
  • Gender:Male
  • Location:Stockholm, Sweden
  • Interests:music, hiking, friends, nature, photography, traveling. and maps!
  • Sweden

What does the pdf look like? Have you tried pdf2txt?
Hugo Ahlenius
Nordpil - custom maps and GIS
http://nordpil.com/
Twitter

#4
Michael Schmeling

Michael Schmeling

    Master Contributor

  • Validated Member
  • PipPipPipPip
  • 204 posts
  • Gender:Male
  • Location:Kassel, Germany
  • Germany

What does the pdf look like? Have you tried pdf2txt?

Or an OCR software? I have heard that Tesseract does a good job.
Michael Schmeling
Kassel, Germany
Arid Ocean Map Illustrations
http://maps.aridocean.com
Indie Cartographer
http://www.indiecartographer.com

#5
Hans van der Maarel

Hans van der Maarel

    CartoTalk Editor-in-Chief

  • Admin
  • PipPipPipPipPipPipPip
  • 3,839 posts
  • Gender:Male
  • Location:The Netherlands
  • Interests:Cartography, GIS, history, popular science, music.
  • Netherlands

What does the pdf look like? Have you tried pdf2txt?


Quite frankly, I thought having the full Adobe Design suite would be more than enough to handle something like this, but apparently not...

As for OCR, tempting and since I can input a hi-res tiff the chances are high that it'd go without a hitch but I'd still have to doublecheck a few hundred street names to make absolutely sure.
Hans van der Maarel - Cartotalk Editor
Red Geographics
Email: hans@redgeographics.com / Twitter: @redgeographics

#6
Nick H

Nick H

    Legendary Contributor

  • Validated Member
  • PipPipPipPipPip
  • 307 posts
  • Gender:Male
  • Location:Caversham, Reading, England.
  • United Kingdom

Hello Hans, perhaps the EPS file doesn't contain embedded fonts? If you import it to AI and blow it up does it pixelate? If the text is an image you could try extracting this and (if the quality is good enough) doing something with it in an image editor. Or alternatively, OCR, as suggested.

Regards, N.
Caversham, Reading, England.

#7
canvas101

canvas101

    Contributor

  • Validated Member
  • PipPip
  • 37 posts
  • United States

Hello Hans van der Maarel,

Any chance you could post an example of the file to see if someone can find a workaround?

Regards

#8
Dennis McClendon

Dennis McClendon

    Hall of Fame

  • Validated Member
  • PipPipPipPipPipPipPip
  • 1,079 posts
  • Gender:Male
  • Location:Chicago
  • Interests:map design, large-scale maps of cities
  • United States

Hans, I'm pretty good at this kind of text processing problem, dating back to my past as a magazine production guy. Can you send me the file or post it somewhere?

I'll let you and the board know what technique I end up using.
Dennis McClendon, Chicago CartoGraphics
chicagocarto.com

#9
Hans van der Maarel

Hans van der Maarel

    CartoTalk Editor-in-Chief

  • Admin
  • PipPipPipPipPipPipPip
  • 3,839 posts
  • Gender:Male
  • Location:The Netherlands
  • Interests:Cartography, GIS, history, popular science, music.
  • Netherlands

Hans, I'm pretty good at this kind of text processing problem, dating back to my past as a magazine production guy. Can you send me the file or post it somewhere?

I'll let you and the board know what technique I end up using.


Here's the file. Attached File  test.eps.zip   24.43KB   57 downloads

So what I want is a single spreadsheet with 2 columns: street name and grid reference. I did this the hard way and that was doable but boring.
Hans van der Maarel - Cartotalk Editor
Red Geographics
Email: hans@redgeographics.com / Twitter: @redgeographics

#10
frax

frax

    Hall of Fame

  • Associate Admin
  • PipPipPipPipPipPipPip
  • 2,299 posts
  • Gender:Male
  • Location:Stockholm, Sweden
  • Interests:music, hiking, friends, nature, photography, traveling. and maps!
  • Sweden

You mean something like this? I opened up the eps file in Illy, then I ran John Wundes "Join Text Frame" script on each column (took some time, and I couldn't run the all at once). Then I posted the resulting text into my trusty vim text editor for some final edits (replacing dots with tabs etc) then pasted into excel.

You might want to review it, to ensure that everything is ok! And you owe me one! :)

Attached Files

  • Attached File  test.zip   12.85KB   39 downloads

Hugo Ahlenius
Nordpil - custom maps and GIS
http://nordpil.com/
Twitter

#11
canvas101

canvas101

    Contributor

  • Validated Member
  • PipPip
  • 37 posts
  • United States

Hello Hans,

Here’s the approach I employed (e.g. Windows OS). First open the EPS in AI and select one column of just the text. Next copy that column of text and paste into Notepad (Any text editor should work. I find using a simplistic application that does not try and interpret formatting is sometimes the best with these type of examples.). After the text is pasted into Notepad position the cursor at each line break and press enter. In addition I found it is easier when you expand the Notepad window horizontally when making the line breaks. I encountered one error when the line break was completed and had to copy and paste a city name. At this point the text is in its original format and can be copied into Excel or AI for further customization. Time to complete, less than a minute. Hope this helps.

Regards

Attached Files



#12
Hans van der Maarel

Hans van der Maarel

    CartoTalk Editor-in-Chief

  • Admin
  • PipPipPipPipPipPipPip
  • 3,839 posts
  • Gender:Male
  • Location:The Netherlands
  • Interests:Cartography, GIS, history, popular science, music.
  • Netherlands

Hugo,

That's it! Thanks! I'll buy you a beer* next time we meet and I'll definately check out that script. I have to do this from time to time and it's always a hassle. Thanks to everybody else for their comments and suggestions too.

(*) or other beverage of your choice
Hans van der Maarel - Cartotalk Editor
Red Geographics
Email: hans@redgeographics.com / Twitter: @redgeographics

#13
frax

frax

    Hall of Fame

  • Associate Admin
  • PipPipPipPipPipPipPip
  • 2,299 posts
  • Gender:Male
  • Location:Stockholm, Sweden
  • Interests:music, hiking, friends, nature, photography, traveling. and maps!
  • Sweden

Canvas, the problem with your approach is that the pasted text do not appear in the same order as in the eps-file, look at the places where the street names are broken up in separate boxes, for instance.

That's why you have to merge the text into one text frame first, before copy/pasting into a text editor.

Hans - I am looking forward to a beer - our maybe I'll find some data conversion task for an FME specialist that I can throw at you some time... !
Hugo Ahlenius
Nordpil - custom maps and GIS
http://nordpil.com/
Twitter

#14
Dennis McClendon

Dennis McClendon

    Hall of Fame

  • Validated Member
  • PipPipPipPipPipPipPip
  • 1,079 posts
  • Gender:Male
  • Location:Chicago
  • Interests:map design, large-scale maps of cities
  • United States

Sorry I was away from the forum yesterday. I just opened the test file in Illustrator, and exported a text file. Then you can open that text file in your favorite text processor (or even InDesign), where you'll discover it's all one long line with no line breaks. Either manually add the line breaks, or use GREP to put one any place that a digit is followed immediately by an uppercase letter.
Dennis McClendon, Chicago CartoGraphics
chicagocarto.com

#15
canvas101

canvas101

    Contributor

  • Validated Member
  • PipPip
  • 37 posts
  • United States

Canvas, the problem with your approach is that the pasted text do not appear in the same order as in the eps-file, look at the places where the street names are broken up in separate boxes, for instance.

That's why you have to merge the text into one text frame first, before copy/pasting into a text editor.

Hans - I am looking forward to a beer - our maybe I'll find some data conversion task for an FME specialist that I can throw at you some time... !


Hello Hugo,

I'm not sure I understand your comment. The text is in paragraph format with each name and corresponding location on a separate line. The only issue I see is the "B." was not carried to the next line, an oversight by me not an application issue. The two photos that I attached are depicting before and after clearly showing broken text on the before and properly formatted text on the after. However as you pointed out in your solution, the need to manually set tabs position still exist for both methods. Hope this clarifies.

Regards




0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

-->