Sponsored in part by... Smith Micro StuffIt Deluxe 12: breakthrough compression of MP3 files, PDFs,
iWork and MS Office files! Reduce JPEG file sizes with no loss in
quality, burn to CD/DVD, back up archives to iDisk and more. Buy
today for only $59.99! <http://www.stuffit.com/mac/deluxe/tb/>

 [F] TidBITS  / TidBITS  / TidBITS Talk  /

Using Webstractor

[dcortesi]dcortesi (apparently) - 09:38pm Jul 14, 2004 PST
via email

> you can close up the gaps between Web pages so that the material
> is repaginated into a single seamless flow.

<http://db.tidbits.com/getbits.acgi?tbart=07739>

Well, that's what I could really use it for: make a continuous flow out
of a multi-page story like, for example, a show recap at
TelevisionWithoutPity
(http://www.televisionwithoutpity.com/show.cgi?show=76). Three, no four
big problems arose as soon as I tried it.

1) When I brows the story in sequence page 1... page 25, the browser
stacks the latest link at the top, so that the pages are in
*reverse*sequence* !! What kind of sense does that make? Who wants a
document that opens page 25, 24, 23...? The only way I could find to
reorder them was to drag them individually around in the list, becoming
a kind of clumsy human Bubble Sort.

Kudos to Webstractor that it rendered these pages flawlessly and at
least as quickly as Safari.

2) Working in the editor with 25 pages, everything was gluey-slow.
Every edit change caused a "changing layout" delay.

3) I could figure out how to select just the body text of a page and
found the awesomely great "Edit>Crop" command, brilliant! But I
couldn't find any way to make that "single seamless flow" -- no matter
what I did, I still had 25 pages, each with a bit of text on it. Well,
actually 10, because I deleted the rest to speed things up.

4) No way do I want to do this manually more than once! I need to
script it. But Webstractor as it stands apparently has no AScript
dictionary, or at least, the Script Editor doesn't see it.

Good Proof-of-concept; not a product yet.

Dave Cortesi


Mark as Read
  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages

Lewis Butler (apparently) - Jul 17, 2004 6:39 pm (#1 Total: 3)  

Reply to this message
via email  

Photo of Author
Posts: 1080
Re: Using Webstractor

On Wed, 14 Jul 2004 21:38:09 -0700, David Cortesi
<dcortesimindspring.com> wrote:
> > you can close up the gaps between Web pages so that the material
> > is repaginated into a single seamless flow.
>
> <http://db.tidbits.com/getbits.acgi?tbart=07739>
>
> Well, that's what I could really use it for: make a continuous flow out
> of a multi-page story like, for example, a show recap at
> TelevisionWithoutPity
> (http://www.televisionwithoutpity.com/show.cgi?show=76). Three, no four
> big problems arose as soon as I tried it.

That's exactly what I downlaoded Webstractor for as well.

The first thing I noticed was that Webstractor immediately tries to make
a http connection to an IP address. I'm not sure what this connection
was, since I did not allow it.

I had similar problems trying to combine the TWOP pages into a single
page, and lots of problems trying to edit everything to get rid of all
the graphics. I was able to rarrange the pages in order quite quickly
by drawing the separator all the way down, hiding the page display
area. I was expecting a "Concatenate" command somewhere to combine
all the pages into one, but if it's there, I missed it.

Overall, it was MUCH faster to simply manually copy the text intoa
textedit document than mess with webcaster.

>
> 1) When I brows the story in sequence page 1... page 25, the browser
> stacks the latest link at the top, so that the pages are in
> *reverse*sequence* !! What kind of sense does that make?

Less than zero? Even more annoying, reordering the pages in the
Browse view does not reorder them in the Edit view (or actually,
vis-versa, as I reordered in edit and then the changes did not copy
over to browse). Very odd.

> Kudos to Webstractor that it rendered these pages flawlessly and at
> least as quickly as Safari.

Looks like it is using WebCore to me.

> 2) Working in the editor with 25 pages, everything was gluey-slow.
> Every edit change caused a "changing layout" delay.

Yes, that was annoying. Even after clearing all the graphics and
cropping everything to just the text, when i exported to PDF I STILL
had 12 pages with less than half a page filled on each page. Even
after selecting all of the pages and saying "fit to page"

>
> 3) I could figure out how to select just the body text of a page and
> found the awesomely great "Edit>Crop" command, brilliant!

It's basically simply selecting the text and pasting it into TextEdit.
 Only slower, less useful, and not easily printable. Granted, the one
click selection of the entire text area is marginally useful

> But I
> couldn't find any way to make that "single seamless flow"

Neither could I. Seems it should be simple. I kept trying different
things, thinking it must be an Interface failure rather than an actual
shortcoming in the program, but danged if i can figure it out.

> Good Proof-of-concept; not a product yet.

I don't even think it's a particularly good proof-of-concept. I
certainly wouldn't use it. As it stands I can't see that's it's
useful at all. If you want to save part of a page, save the source
and cut out what you don't want. I may be missing something here, but
it doesn't seem very useful.

--
gkreme at gmail or kreme at kreme or syth at mac

georgewade1 (apparently) - Jul 19, 2004 9:58 am (#2 Total: 3)  

Reply to this message
via email  

Photo of Author
Posts: 29
Re: Using Webstractor

I hope that the Webstractor folk are reading TidBits Talk so that they
can either point us to pages in the manual or fix things up...

Another point is that Safari will be growing up with Tiger.

> QUOTE: "Save and Email Web Pages

Leon McNeill - Aug 3, 2004 5:52 am (#3 Total: 3)  

Reply to this message
 

Photo of Author
Posts: 1
Re: Using Webstractor

Hello all. Just a quick response to many of the points you have brought up about Webstractor, for which I am the lead developer.

1) When I brows the story in sequence page 1... page 25, the browser stacks the latest link at the top, so that the pages are in *reverse*sequence* !!


This indeed can be quite annoying, and Webstractor 1.1 will address this by allowing you to sort by any column in Browse mode ... so you can make it act exactly as you want. Among the new columns available to you in the next release there will be a Number column similar to that in iTunes playlists, so that when you're sorted by this column you can also freely reorder the captured pages however you like.

2) Working in the editor with 25 pages, everything was gluey-slow. Every edit change caused a "changing layout" delay.


There will always be a layout delay, however we will minimize this as far as possible. We've made a number of speed improvements throughout Webstractor for the next release.

But I couldn't find any way to make that "single seamless flow"


In edit mode, there are three neighboring columns of check boxes. The right-most, when checked, will make that web page entry continue from the previous entry's layout, instead of starting on a new page. You can affect multiple entries in one operation by selecting them all and either 1) clicking on one of their abovementioned checkboxes, or by 2) clicking the Info button and changing the setting ("Follow previous entry on same page") in the Info sheet.

Webstractor immediately tries to make a http connection to an IP address.


This is Webstractor's automatic "check for updates" connection. Webstractor won't attempt this connection more than once per 24 hours. As you discovered, blocking it causes no harm.

> 3) I could figure out how to select just the body text of a page and
> found the awesomely great "Edit>Crop" command, brilliant!


It's basically simply selecting the text and pasting it into TextEdit.


Webstractor's Crop functionality really shines when used on web pages with any kind of table-based layout -- copy/pasting from Safari into TextEdit will lose this kind of layout.

--
Leon McNeill
Senior Software Developer
Softchaos Ltd.
leonsoftchaos.com
www.softchaos.com



  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages


 [F] TidBITS  / TidBITS  / TidBITS Talk  / Using Webstractor




Add a message

To add a message to this discussion, you must be a registered user. Enter your email address below. If you have an account associated with the email address you enter, you will be prompted for your password. If not, you'll be able to create a new account with no fuss.

Enter your email address:

Submit