Sponsored in part by... Smith Micro StuffIt Deluxe 12: breakthrough compression of MP3 files, PDFs,
iWork and MS Office files! Reduce JPEG file sizes with no loss in
quality, burn to CD/DVD, back up archives to iDisk and more. Buy
today for only $59.99! <http://www.stuffit.com/mac/deluxe/tb/>

 [F] TidBITS  / TidBITS  / TidBITS Talk  /

Files in databases

[kevinv]kevinv (apparently) - 08:17am Jun 27, 2006 PST
via email

I don't want to talk about WinFS specifically but is anyone aware of a
filesystem that actually stores the file data in the same database as the
metadata? Does anyone consider this a good idea?

Files are variable things. I probably have files ranging from 1kb to 4GB, I
don't see how a database could be optimized to deal with records of
variable sizes very well (I know they can do them, I just don't think
they're good at them.)

Restoring a file now becomes a transaction rollback of a transaction that
took place days ago.

I know Microsoft's sharepoint product stores files in the database and I
know people have struggled with it, especially the file restore issue.

I for one am glad WinFS is dead, and I honestly don't think Apple will be
able to produce this type of filesystem either.


Kevin



Mark as Read
  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages

edward (apparently) - Jun 28, 2006 12:41 pm (#1 Total: 6)  

Reply to this message
via email  

Photo of Author
Posts: 247
Re: Files in databases

At 08:17 06/27/06 -0700, Kevin van Haaren wrote:
>I don't want to talk about WinFS specifically

That's fine by me, since I'm not familiar with WinFS. ;-)

>but is anyone aware of a filesystem that actually stores the file data in
>the same database as the metadata?

Depends on what you mean by "database". One point of view can be that any
file system is a database, including indices, metadata, content data, and
free space allocation. Since there are generally accepted interfaces to
these kinds of databases, we might call them "filesystem databases", or
"databases with filesystem interfaces". So the context is better stated as
something like "stored in a database which is not a filesystem database".
One might also specify which of the many other potential attributes of
databases are included.

>Does anyone consider this a good idea?

In the context of the above, I would say I dislike the idea of files being
stored in a database which does not have a filesystem interface. However,
if the files are stored in a filesystem database which also has other
capabilities, that sounds like a good idea -- so long as those other
capabilities are strictly extensions and do not impose restrictions such as
the one you mention:

>Restoring a file now becomes a transaction rollback of a transaction that
>took place days ago.

That's not an acceptable restriction on a filesystem database.

There was a day, a decade or two ago, when it would have been important to
preserve the ability to go in with raw I/O tools to read and write a
damaged filesystem database. Even today, the robust state of data recovery
businesses indicates that the need to read the filesystem database without
the benefit of the database software continues to be a need. I would not
consider this a reason to disqualify such systems, just that the design
should ensure as much safety as existing systems, whether that be by
documenting the structure for the data recovery companies, or ensuring
adequate backups in some way, or otherwise.

Looking farther into the future, one might wonder whether it will
eventually be possible to define all data needing to be stored in terms of
tables and fields, completely bypassing the current concept of "file". We
are not even close to that now. We still need not only databases but text
files and code files. Flat data files, at least in concept, could mostly be
converted to database format now. Text files, code files, and some others
remain a considerably greater challenge.

Edward
Art works by Melynda Reid: http://paleo.org


Nigel Stanger (apparently) - Jun 28, 2006 12:41 pm (#2 Total: 6)  

Reply to this message
via email - Dunedin, New Zealand  

Photo of Author
Posts: 412
Re: Files in databases

On 28/6/2006 3:17 AM, "Kevin van Haaren" <kevinvanhaaren.net> spake thus:

> I don't want to talk about WinFS specifically but is anyone aware of a
> filesystem that actually stores the file data in the same database as the
> metadata? Does anyone consider this a good idea?
>
> Files are variable things. I probably have files ranging from 1kb to 4GB, I
> don't see how a database could be optimized to deal with records of
> variable sizes very well (I know they can do them, I just don't think
> they're good at them.)

I can't speak for actual examples, but I can at least attack this from the
database side (more accurately, I can tell you how Oracle does it :) I
don't see any particular problem with storing the files in the database as
well, as long as the database engine is efficient. This was the big problem
with the original Be file system, IIRC, and why they dropped the database
idea in later versions. I was a little sad that they did so.

Almost every DBMS has to deal with variable size records, but in many cases
the variability in size is probably fairly small. If you're talking typical
business data with lots of text and numbers, then you're probably only
looking at a few tens of bytes per record anyway. However, there are plenty
of databases out there now with more "interesting" data in them such as
images and audio, which can cause dramatic variation in record sizes from
one table to the next.

Oracle (and I'm fairly certain a large number of other products) deals with
the issue of variable record size by ignoring it. OK, that's an
over-simplification :) It sets a fixed database block size (as distinct
from the disk block or OS block or record size), which is the smallest unit
that Oracle can read from or write to disk. From memory this defaults to
2KB, but can be any multiple of that up to about 32KB (our teaching system
is set to 8KB). If records are smaller than the block size, you get multiple
records per block, as many as will fit. If records are larger than the block
size, they get chopped into pieces that are stored one piece per block, and
chained together.

The trick then is to tune your block size to be a good fit with the typical
record size. You could base this on expected usage (e.g., large media files
vs. small text files), although that's probably difficult to predict on a
typical desktop system. Debian Linux does something like this during the
install process: you can nominate your typical disk usage scenario, and it
sets the number of inodes to match (so IIRC you can have fewer inodes on a
partition that will be mainly used for large media files, for example).

However, fixed block size technique falls apart when you're dealing with
really large multi-gigabyte chunks of data, especially if there's an upper
limit on the block size. In Oracle's case, even with the maximum block size
you could end up with many thousands of blocks just to hold one record. I'm
a little sketchy on this, but my understanding is that Oracle and other
DBMSs store such items (Oracle refers to them as "large objects") in a
separate part of the database that's optimised for large data. So the big
files and the small files are effectively segregated within the database and
accessed in different ways.

That's probably the best way to handle it: have one database for the file
system but partition it internally into areas optimised for different sizes
of files. You could even make the database partitions correspond to disk
partitions (or even separate drives). These are all standard tuning tricks
that have been around for years in large-scale databases, but have only
recently started to appear at the lower end.

--
Nigel Stanger, Dunedin, NEW ZEALAND.
http://xri.net/=nigel.stanger


John C. Welch (apparently) - Jul 1, 2006 11:46 am (#3 Total: 6)  

Reply to this message
via email  

Photo of Author
Posts: 755
Re: Files in databases

On 6/27/06 10:17, "Kevin van Haaren" <kevinvanhaaren.net> wrote:

> I don't want to talk about WinFS specifically but is anyone aware of a
> filesystem that actually stores the file data in the same database as the
> metadata? Does anyone consider this a good idea?

Depends on what you're doing. IBM's had great success with it for what,
30-40 years now? The AS/400 aka iSeries uses a database for the file system.
But the architecture is tuned around that.

If you're doing tons of database - type work, it's great. If you need
ultimate speed for say, video or 3-D, then not so good.

--
John C. Welch Writer/Analyst
Bynkii.com Mac and other opinions
jwelchbynkii.com



kevinv (apparently) - Jul 1, 2006 11:46 am (#4 Total: 6)  

Reply to this message
via email  

Photo of Author
Posts: 1319
Re: Files in databases

--On June 28, 2006 12:41:27 PM -0700 Edward Reid <edwardpaleo.org> wrote:

> Depends on what you mean by "database". One point of view can be that any
> file system is a database, including indices, metadata, content data, and
> free space allocation. Since there are generally accepted interfaces to
> these kinds of databases, we might call them "filesystem databases", or
> "databases with filesystem interfaces". So the context is better stated as
> something like "stored in a database which is not a filesystem database".
> One might also specify which of the many other potential attributes of
> databases are included.

For file systems I think of the directory data as a database with pointers
to fixed-size blocks that hold the data. Most of the file systems I've used
tend to separate the metadata and real data (not that I'm a file system
expert.)

I agree that these really seem to be well optimized and I'm not really sure
what problems database filestores are supposed to fix. I do support a file
management system that uses a real database for customizable metadata, but
it too keeps the files out of the database merely storing pointers to the
files themselves.


Nik (apparently) - Jul 3, 2006 3:52 pm (#5 Total: 6)  

Reply to this message
via email  

Photo of Author
Posts: 370
Re: Files in databases

On Jun 27, 2006, at 9:17 AM, Kevin van Haaren wrote:

> I don't want to talk about WinFS specifically but is anyone aware of a
> filesystem that actually stores the file data in the same database
> as the
> metadata? Does anyone consider this a good idea?

The late BeOS used a database-like filesystem that incorporated
extensible metadata, instantaneous searching, and basically
everything else we've come to associate with the promise of WinFS.

 From a user perspective (I never developed on Be, just tried it out
some on my Mac), the use of user definable metadata was extremely
cool. Simple databases like address books or MP3 libraries could be
created in the filesystem itself, with no need for programming or
anything else. This metadata was then available between programs, and
was indeed used to, for example, link contacts with emails.

Searching was the best part as it was truly instantaneous! I was
running a 200mhz Mac with BeOS, and could get search results in a
fraction of the time it takes Spotlight to return a search on my 1ghz
G4!

I'd love to see this appear in MacOS X. I can easily see the Finder
turning into something a bit more like iTunes and a little less
physical. If performance is better and I can easier find my files,
I'm all for it.

--Nik

anjtc - Jul 6, 2006 7:12 am (#6 Total: 6)  

Reply to this message
 

Photo of Author
Posts: 20
Re: Files in databases

Nik (apparently) said:

"I can easily see the Finder turning into something a bit more like iTunes"

Yes, I'm quite certain I'm far from the first to think of this, but one time as I was using iTunes, effortlessly locating the songs I was looking for, I remember thinking "You know, you could have something pretty much exactly like iTunes, except with all kinds of files, not just songs." So you'd use a combination of the search function and the different column headers to find the files you wanted, plus custom "playlists" and smart folders to group files together.

And it wouldn't have to replace the finder, either -- it could just be a stand-alone file browser.

- AJ



  OutlineAll MessagesOlder MessagesOldest MessagesNewest MessagesNewer Messages


 [F] TidBITS  / TidBITS  / TidBITS Talk  / Files in databases




Add a message

To add a message to this discussion, you must be a registered user. Enter your email address below. If you have an account associated with the email address you enter, you will be prompted for your password. If not, you'll be able to create a new account with no fuss.

Enter your email address:

Submit