Sponsored in part by... Smith Micro StuffIt Deluxe 12: breakthrough compression of MP3 files, PDFs,
iWork and MS Office files! Reduce JPEG file sizes with no loss in
quality, burn to CD/DVD, back up archives to iDisk and more. Buy
today for only $59.99! <http://www.stuffit.com/mac/deluxe/tb/>

 [F] TidBITS  / TidBITS  / TidBITS Talk  /

Files vs. databases

You're correct that Spotlight only works at the file level. iCal and 
Address Book have also been adapted to work with this: <http://
 
arstechnica.com/reviews/os/macosx-10.4.ars/9>.

Does anyone know (or can anyone conjecture) how VoodooPad <www.flyingmeat.com> was able to link to individual entries in Address Book back in Panther? Or how it's possible to link from one page in one VoodooPad file to a specific page in another file?

About the supposed desirability of databases: It's often assumed that a decent wiki engine has to make use of a database to work properly. But pmwiki <www.pmwiki.org> eschews a database, using flat files only. I've no idea whether it would be possible to run Wikipedia <www.wikipedia.org> on pmwiki. but the speed in everyday use with moderately-sized wikis suggests that the advantages of databases are limited.


Mark as Read
  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages

John C. Welch (apparently) - Aug 3, 2005 9:07 am (#1 Total: 7)  

Reply to this message
via email  

Photo of Author
Posts: 808
On 8/3/05 10:28, "Rick Lavin" <rlavinpu-kumamoto.ac.jp> wrote:

> About the supposed desirability of databases: It's often assumed that a decent
> wiki engine has to make use of a database to work properly. But pmwiki
> <www.pmwiki.org <http://www.pmwiki.org> > eschews a database, using flat files
> only. I've no idea whether it would be possible to run Wikipedia
> <www.wikipedia.org <http://www.wikipedia.org> > on pmwiki. but the speed in
> everyday use with moderately-sized wikis suggests that the advantages of
> databases are limited.

I think that sometimes we need to define "database" in discussions like
these. A lookup index and flat files can be astoundingly fast, and, well, a
database of sorts. But if you try to just say "database" like it's one,
concrete term, then we run into a lot of trouble over just what a "database"
is.

--
John C. Welch Writer/Analyst
Bynkii.com Mac and other opinions
jwelchbynkii.com



kevinv (apparently) - Aug 4, 2005 9:17 am (#2 Total: 7)  

Reply to this message
via email  

Photo of Author
Posts: 1370
Re: Files vs. databases

--On August 3, 2005 8:28:28 AM -0700 Rick Lavin <rlavinpu-kumamoto.ac.jp>
wrote:

> About the supposed desirability of databases: It's often assumed that a
> decent wiki engine has to make use of a database to work properly. But
> pmwiki <www.pmwiki.org> eschews a database, using flat files only. I've
> no idea whether it would be possible to run Wikipedia <www.wikipedia.org>
> on pmwiki. but the speed in everyday use with moderately-sized wikis
> suggests that the advantages of databases are limited.

databases scale, files systems don't. databases can model complex data
based on relationships between the data. A wiki implements very limited
versions of this. Wiki pages tend to be like pages in a book with links to
related pages.

Try doing something like Quicken as a set of flat files. Even a small scale
financial system would fall apart very quickly in a flat file system.

I'm part of a team implementing a document management system at work. Our
pilot project has about 40GB of data in 40,000+ files. The system we
selected uses a database front-end with pointers to actual file locations
(it's more complicated than that, but that's the simplest explanation). I
can do a filename search across all 40,000+ files and get the results in
less than 10 seconds -- and the files are stored on multiple servers, and
the search results are dependent on how close to the file system I am, same
time to search files in California as here in KC. Won't happen with
standard file systems.

I can also do full text searches across the files the system can understand
(Office, CAD and PDF files at the moment). I can still get results in under
30 seconds.


Nigel Stanger (apparently) - Aug 4, 2005 9:17 am (#3 Total: 7)  

Reply to this message
via email - Dunedin, New Zealand  

Photo of Author
Posts: 435
Re: Files vs. databases

On 4/8/2005 3:28 AM, "Rick Lavin" <rlavinpu-kumamoto.ac.jp> spake thus:

> moderately-sized wikis

Depends on what you mean by "moderately-sized". We've found with a little
experimentation that many DBMSs don't kick in with indexes until you reach
several (tens of? I can't remember offhand) thousand records. Up until the
point where the indexes kick in, you're effectively dealing with a giant
flat file system anyway (yes, I know that's an over-generalisation :)

I'm therefore not at all surprised by these results. If the wiki had several
million entries, however, *then* you would start seeing significant
differences in performance.

--
Nigel Stanger, Dunedin, NEW ZEALAND.
http://public.xdi.org/=nigel.stanger


jwblist - Aug 9, 2005 12:37 pm (#4 Total: 7)  

Reply to this message
 

Photo of Author
Posts: 768
Re: Files vs. databases

On Aug 9, 2005, at 9:44 AM, Peter N Lewis wrote:

While I'm not at all sold on the idea of one message per email (a database is a perfectly sensible thing to store data in after all!), I can't see why scanning several hundred thousand should take half an hour or more - not that it wont, I would not be surprised if it did, but it shouldn't. For example, I just did this:


It takes Retrospect about as long to *scan* the files on my disk for an incremental backup as it takes SuperDuper! to do a changed file copied clone.

It has always (back many versions and many machines) seemed to me that Retrospect scans slowly.

--John

Curtis Wilcox - Aug 9, 2005 12:43 pm (#5 Total: 7)  

Reply to this message
 

Photo of Author
Posts: 355
Re: Files vs. databases

From: TidBITS Talk [mailto:tidbits-talktidbits.com] On Behalf Of Peter N Lewis


Whether any backup program could actually manage that, I don't know. rsync should, but rsync under Tiger is so broken that it has no chance of backing up my home directory.


The only thing I've heard is while Tiger's version of rsync adds support for resource forks (in addition to other "extended attributes"), it always thinks resource forks have changed and copies them again. Is this what you're referring to or are there other problems?

Curtis Wilcox (apparently) - Aug 9, 2005 10:45 pm (#6 Total: 7)  

Reply to this message
via email  

Photo of Author
Posts: 355
Re: Files vs. databases

> -----Original Message-----
> From: TidBITS Talk [mailto:tidbits-talktidbits.com] On
> Behalf Of jwblist

> It takes Retrospect about as long to *scan* the files on my disk for
> an incremental backup as it takes SuperDuper! to do a changed file
> copied clone.
>
> It has always (back many versions and many machines) seemed to me
> that Retrospect scans slowly.

I don't know how Retrospect works but a possible justification for
slowness would be if it's performing a checksum on each file or some
other operation that involves actually checking to see if the file
contents have changed. A file can change without its size changing and
while the file date typically changes when it's modified, it doesn't
have to or can change to be older than the previous copy.

kevinv (apparently) - Aug 19, 2005 9:12 am (#7 Total: 7)  

Reply to this message
via email  

Photo of Author
Posts: 1370
Re: Files vs. databases

--On August 3, 2005 8:28:28 AM -0700 Rick Lavin
<rlavinpu-kumamoto.ac.jp> wrote:

> About the supposed desirability of databases: It's often assumed that a
> decent wiki engine has to make use of a database to work properly. But
> pmwiki <www.pmwiki.org> eschews a database, using flat files only. I've
> no idea whether it would be possible to run Wikipedia <www.wikipedia.org>
> on pmwiki. but the speed in everyday use with moderately-sized wikis
> suggests that the advantages of databases are limited.

I just finished moving a simple web app from using a flat-file storage
to using
a real database engine (MySQL). The app was a registration & payment
system for
a small conference. In the first versions I stored all the
registration info in
a single flat-file CSV file.

I moved to a database because the database engine provides features
that I just
couldn't write myself using PERL and an OS file system.

One of the big features I wanted was better locking. In a web app you
can't tell
how many versions of the app will be attempting to write to your data file at
the same time. I used the OS locking abilities as best I could, but there were
situtations (such as assigning confirmation numbers) where I could not really
be sure I didn't hand out a duplicate confirmation number.

The database server handles this much better. I have a field that auto-numbers
and it handles the simultaneous writing.

Along those lines, manual edits were difficult. I usually tried to do
these late
at night, but there was a chance when I edited the file, someone would come in
and register and when I saved the edited file it would overwrite those
registrations.

If I had used MySQL's InnoDB (or a database that was truly ACID compliant such
as PostGRES SQL) tables I could've implemented locking down to the row level
and even established transaction handling. Transaction handling would
be useful
as I have the ability for users to reserve a spot in a session that has
limited
seating. With transactions I would define the conference registration and
session reservation as a "transaction" and both the registration and session
reservation would have to succeed or both fail. That way I don't have a user
confirmed for a session that isn't confirmed for the conference (or
visa-versa).

I could also mention how much easier searching for items in the database are
now. I used to have to loop through every entry in the file looking for info.
Now the database server handles that for me.

Kevin



  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages


 [F] TidBITS  / TidBITS  / TidBITS Talk  / Files vs. databases




Add a message

To add a message to this discussion, you must be a registered user. Enter your email address below. If you have an account associated with the email address you enter, you will be prompted for your password. If not, you'll be able to create a new account with no fuss.

Enter your email address:

Submit