Sponsored in part by... Fetch Softworks GET FETCH 5 FOR FREE! Fetch Softworks makes Fetch, the original
Macintosh FTP client, free for educational and charitable use.
Fetch 5.3 includes a new look and Leopard technology support.
Apply today at <http://fetchsoftworks.com/edapply>!

 [F] TidBITS  / TidBITS  / TidBITS Talk  /

Web collection tools

[mc]mc (apparently) - 07:49pm Jul 13, 2004 PST
via email

Matt Neuberg writes
>Sometimes a new idea is so simple, you can't believe no one's
>thought of it before.

<http://db.tidbits.com/getbits.acgi?tbart=07739>

He's referring to Webstractor, the browser that lets you collect and
edit web pages.

We agree that this is a really cool idea. But it's not that no one's
thought of it before. So, just by way of info, in 2001/2002 a
research prototype called Hunter Gatherer was published that lets
users create collections from web page components. (Indeed we have a
provisional patent on the collection-making approach).

Descriptions of the tool, related papers and a video can be found at
http://shaka.dgp.toronto.edu/hg/overview

With Hunter Gatherer, the focus was on letting users collect chunks
of web pages of interest into what we call "collections" - which can
be added to and subtracted from, revised and so on.

This is the main difference btwn the HG approach and Webstractor.
With HG, you indicate what part of the page you'd like to add to a
collection, hit a to add, and that's it: the component is in your
collection. With Webstractor you're picking up the entire page and
*then* you prune it.

Our empahasis was on reducing steps to make a collection - something
our research showed lots of people do, but do poorly because their
are too many steps involved to make a collection.

<http://www2002.org/CDROM/refereed/130/index.html>

In Webstractor, it seems that to make collections, you need to copy
and paste from all of the collected documents into a new one. We
automate the collection building; we also make it possible to share
collections by URL, so you have a copy and your pals can have a copy
- and your pals can edit their version without changing yours.

Hunter Gatherer was implemented as a proxy service. It would be
fairly straight ahead to translate this service to a plug-in but as
researchers, development wasn't the core mandate. The idea of
proxy/plugin was to let users use their own rather than new tools to
do information work. It was also to push standards like XHTML,
JavaScript and xPath. If anyone would like to collaborate with us on
converting HG to a plugin and continue the research, we'd love to
hear from you.

If you'd like to try hunter gatherer, we're still interested in a
variety of research questions about the collection-making approach.
send an email to
<mc-res at dgp.toronto.edu> and you'll get an automatic reply on how
to set up the tool. Please note it's not perfect; it's a concept.
but the way it supports the interaction for collection making is even
easier than Webstractor.

Thanks,
mc schraefel



Mark as Read
  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages

mjt (apparently) - Jul 14, 2004 8:36 pm (#1 Total: 5)  

Reply to this message
via email  

Photo of Author
Posts: 3
Re: Web collection tools

At 20:49 -0700 7/13/04, m.c. schraefel wrote:
>Matt Neuberg writes
>>Sometimes a new idea is so simple, you can't believe no one's
>>thought of it before.
>
><http://db.tidbits.com/getbits.acgi?tbart=07739>
>
>He's referring to Webstractor, the browser that lets you collect and
>edit web pages.
>
>We agree that this is a really cool idea. But it's not that no one's
>thought of it before.

In 1999-2000, a company from the Pacific Northwest called Webforia
produced Webforia Reporter, a (Windows-only -- hiss!) app that did
what Hunter Gatherer did (and, by extension, what Webstractor does).
It's cool that someone's giving yet another go at what is, at the
root, a fantastic idea.

On our cynical days, we referred to Webforia Reporter as "Webforia
Plagiarizer" because it made it so easy to build what looked like an
original presentation from other people's work. But we kept telling
ourselves that "information wants to be free..."

Cheers.

+ Michael

--
_________________________________________________
Michael J. Tardiff
mjttaoproductions.com
Seattle, Washington USA

Matt Neuburg (apparently) - Jul 14, 2004 8:36 pm (#2 Total: 5)  

Reply to this message
via email  

Photo of Author
Posts: 2661
Re: Web collection tools

On 7/13/04 at roughly 9:58 PM, thus spake Michael J. Tardiff
<mjttaoproductions.com>:

>At 20:49 -0700 7/13/04, m.c. schraefel wrote:
>>We agree that this is a really cool idea. But it's not that no one's
>>thought of it before.
>
>In 1999-2000, a company from the Pacific Northwest called Webforia
>produced Webforia Reporter, a (Windows-only -- hiss!) app that did
>what Hunter Gatherer did (and, by extension, what Webstractor does).

And here's a note from a reader (name removed, but other stuff unchanged)
who claims that the idea was patented by Intel years ago. - However, I must
add that, while I may have let myself in for this with my verbal emphasis on
the *idea*, I regard this all as quite irrelevant. Patent schmatent, Windows
schmindows, thought schmought. This is the first program I've ever seen that
acts like this, and that's all that matters to me. I'm not saying SoftChaos
invented the idea, but you cannot deny that they brought it to my desktop
and no one else did. m.

==== forwarded material follows ====

This "simple" idea was thought of almost eight years
ago(October 1986), but never actually put into a
product. Check out this out:

US Patent 5,809,250 - Methods for creating and sharing
replayable modules representive of Web browsing
session

This is available at the US Patent website:
www.USPTO.gov, click on patent number search.

I am fairly certain Intel, who owns the patent, will
not enforce it against a small company.

Just thought you might want to credit the person who
actually invented the idea, but it won't hurt my
feelings if you don't. I really liked the article.
Keep up the great work.

==== forwarded material ends ====
--
matt neuburg, phd = matttidbits.com, http://www.tidbits.com/matt/
pantes anthropoi tou eidenai oregontai phusei
AppleScript: the Definitive Guide! NOW SHIPPING...! (Finally.)
http://www.amazon.com/exec/obidos/ASIN/0596005571/somethingsbymatt
Subscribe to TidBITS! It's free and smart. http://www.tidbits.com/

Paul Durrant (apparently) - Jul 15, 2004 8:01 pm (#3 Total: 5)  

Reply to this message
via email - Durrant Software Limited  

Photo of Author
Posts: 21
Re: Web collection tools

At 9:36 pm -0700 14/7/04, matt neuburg wrote:
>==== forwarded material follows ====
>
>This "simple" idea was thought of almost eight years
>ago(October 1986), but never actually put into a
>product. Check out this out:
>
>US Patent 5,809,250 - Methods for creating and sharing
>replayable modules representive of Web browsing
>session

Presumably that should be October 1996. No-one was doing any web
browsing in 1986!

regards,

Paul

--
Paul Durrant, Durrant Software Limited, Reg.Co.No.: 2612332
Custom XTensions for QuarkXPress since 1991
82 Earlham Road, Norwich, Norfolk, NR2 3HA.
<http://www.durrant.co.uk/>

Mathew Lu - Jul 20, 2004 5:25 pm (#4 Total: 5)  

Reply to this message
 

Photo of Author
Posts: 1
Re: Webstractor vs. Furl

Webstractor is interesting, and I'm going to check it out, but it doesn't sound quite what I need. What I would like is something like a local version of the FURL service (http://www.furl.net/index.jsp). I use and really like furl, but I dislike the fact that my pages are stored off my computer (there is a way to export the pages back to yourself in a zip file, but until somebody comes up with a better way to view and search them this seems to be of limited use). The great thing about furl is that whenever you see something interesting you can just hit the furl button and the text (but not images) and link will be saved. Then later you can go back and search through your furled pages on the furl website. If there were a program that worked like this, a plug-in for my browser that would instantly save everything on screen into a searchable database that resided on my computer I'd be thrilled. Anybody know of such a thing?

Regards, --Mathew Lu

Jeff Porten (apparently) - Aug 20, 2004 5:44 am (#5 Total: 5)  

Reply to this message
via email  

Photo of Author
Posts: 347
Re: Web collection tools

On Jul 20, 2004, at 9:25 PM, Mathew Lu wrote:

> Webstractor is interesting, and I'm going to check it out, but it
> doesn't sound quite what I need. What I would like is something like a
> local version of the FURL service (http://www.furl.net/index.jsp).

Sorry for the late reply.

I think you can come close to this with HistoryHound from St. Clair
Software. HH indexes everything that hits your Safari browser cache,
so you can search for "where was that thing I read two weeks ago?".
You don't manually mark a page for storage, it's everything you read.
(With some controls in the preferences if you want to exclude some
domains; i.e., bank information.)

<http://db.tidbits.com/getbits.acgi?tbart=07665>

HH indexes are much smaller than the original cache files, so there's
not much of a hit to keep them around indefinitely. You'll need a lot
of space if you want to keep your caches around forever, though. The
HH search window pulls from the cache if it's still available,
otherwise you go back to the original site.

FWIW, Tiger Safari will have web page save ability, and it sounds like
that plus Spotlight is going to do the trick for you, one way or
another.

Best,
Jeff



  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages


 [F] TidBITS  / TidBITS  / TidBITS Talk  / Web collection tools




Add a message

To add a message to this discussion, you must be a registered user. Enter your email address below. If you have an account associated with the email address you enter, you will be prompted for your password. If not, you'll be able to create a new account with no fuss.

Enter your email address:

Submit