[aklug] Re: Any MongoDB peeps on this list?

From: Shane R. Spencer <shane@bogomip.com>
Date: Thu Apr 28 2011 - 11:44:11 AKDT

On 04/28/2011 11:20 AM, Tim Johnson wrote:
> * Shane R. Spencer<shane@bogomip.com> [110428 10:40]:
>> See.. I'm unorganized about this. But I'm very interested in what other people have to
>> say. Using LIMIT with an OFFSET isn't really an option since it requires scanning from
>> the first result on.. that can be a little crazy with a few million rows. Instead I'm
>> hoping to see some ideas that use multi-column indexing like I've been using and querying
>> from the last row (via the column information) on.
>
> Shane, I am laboring over a hot keyboard right now and don't have
> a lot of time to respond, but what you say above doesn't seem
> quite right. If you have good indexing set, the OFFSET clause
> should *not* necessitate scanning the entire table. There was a
> time when I build DBMSs from scratch and my indexing/offset
> routines certainly did not operate like that and since the
> developer of MySQL (and I presume MongoDB) are light years ahead
> of me, I am sure that the same applies.
>

Setting the starting point for an index specifically by value seems to help. Offset
almost always makes horribly slow queries even if there are no filters.

Pudding #1 (skip 100000 rows): http://dpaste.com/hold/536941/
   Offset = 0.27 sec
   Range = 0.0 sec

Pudding #2 (skip 500000 rows): http://dpaste.com/hold/536946/
   Offset = 1.6 sec
   Range = 0.0 sec

The query time increases as you skip rows.. to me that seems like behind the scenes it's
spending too much time scanning the index for a specific row (minimum to maximum index
key) instead of starting at a specific point using the BTREE index as it should be used.

This is because an item count isn't stored per BTREE node.. otherwise it would be horribly
fast!

You can do a full table scan on an index if you don't tell it to start (start at 0)

- Shane
---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Thu Apr 28 11:43:52 2011

This archive was generated by hypermail 2.1.8 : Thu Apr 28 2011 - 11:43:52 AKDT