AKLUG Archive: [aklug] Re: Any MongoDB peeps on this list?

From: Shane R. Spencer <shane@bogomip.com>
Date: Thu Apr 28 2011 - 10:30:13 AKDT

oh.. I know all about normalization theory.. I'm testing a few things before I post to the
list. I've been working a lot with making queries 'streamy' vs 'bulk' in an attempt to
reduce latency at the cost of more indexing. Both relational and non-relational databases
apply here.

I think the largest thing going back and forth in my head right now is the idea that you
should use different user interfaces for different methods. I'm struggling with low
latency interaction methods being used for bulk requests it seems.

For instance.. it's easy to paginate using range based scanning (without using an offset)
and even easier to store a page ID for a specific sorting order and simply query that (I
love post processing). That all goes to poop when you start filtering the data a bit more
which is where I've come up with some other methods of bulking range scans into temporary
databases and selecting the last row in order to remove a bit of network latency when
returning large pagination sets to the app that it subsequently ignores because it only
needed it to scan to a certain page.

When sorting on multiple columns you have to return those columns to the application
including a unique ID per row that is sorted in last order. Then you have to select the
last row (knowing the count instantly via a new index query) by using offset without
sorting (very fast with naturally stored rows).

So I keep flip flopping between the limitations.. there are tons of different ideas on how
to deal with paginating a filtered query including offloading the results to a temporary
table.. but when you do that you can end up with a VERY large set of duplicate data if you
don't apply any limits.

I'm open to suggestions. I currently REALLY like using capped collections in MongoDB to
store filtered results since I can get the last row of a query quickly using it. Setting
the cap to just over maximum size of a returned row lets MongoDB throw away everything
else quickly.. it's a shame it has to flush to disk in order to do so. (or does it.. ... ... )

See.. I'm unorganized about this. But I'm very interested in what other people have to
say. Using LIMIT with an OFFSET isn't really an option since it requires scanning from
the first result on.. that can be a little crazy with a few million rows. Instead I'm
hoping to see some ideas that use multi-column indexing like I've been using and querying
from the last row (via the column information) on.

indextest=# select * from (select * from indexy where (a = 100 and b >= 10)
     order by a,b limit 100) as uno
   union all
   select * from (select * from indexy where a > 100
     order by a, b limit 100) as dos limit 100;

This works very, very, well if I want to select all rows starting at a:100 and b:10 and
continues the search.. this limits the final result to 100 items and allows either query
to max out at 100. However knowing where to start if you select a specific page number
out of your final filtered query requires quite a bit of scanning and sorting (thankfully
only needed once for a new offset). Otherwise your application can simply use the last
row of the page you are viewing as the start of the next page or end of the previous page.

How about using cache coherency? Cache up to 5 pages in both directions.. this seems like
a great idea initially but it's difficult to determine the boundary. You could use a
memory key/val store with a query identifier to cache the offsets which helps other users
quickly get access to those pages.. however you have to invalidate it by using a somewhat
complex invalidator that attempts to query and find new inserts (and their relations) for
all currently open queries. Tradeoffs.. tradeoffs.. tradeoffs..

This topic itself isn't specifically MongoDB related.. however I was hoping to find a
MongoDB user in the area to bounce non relational database specific ideas off of specifically.

On 04/28/2011 09:55 AM, Tim Johnson wrote:
> * Piet van Weel<pmvw@outwardfocus.net> [110428 09:08]:
>> Well Shane... Since the vast majority of us are IT Folk...
>> We have advanced degrees in BS, Procrastination, and various other fields.
> 1. I have BS in BS and a Masters degree in Obfuscation.
> 2. Do not start by studying a particular software package
> 3. Start by studying Normalization Theory
> 4. GOTO 1.

---------
To unsubscribe, send email to <aklug-request@aklug.org>
with 'unsubscribe' in the message body.
Received on Thu Apr 28 10:29:59 2011

This archive was generated by hypermail 2.1.8 : Thu Apr 28 2011 - 10:29:59 AKDT