[ACCEPTED]-How do databases work internally?-internals

Accepted answer
Score: 87

What does a database actually do to find 62 out what matches a select statement?

To 61 be blunt, it's a matter of brute force. Simply, it 60 reads through each candidate record in the 59 database and matches the expression to the 58 fields. So, if you have "select * from table 57 where name = 'fred'", it literally runs 56 through each record, grabs the "name" field, and 55 compares it to 'fred'.

Now, if the "table.name" field 54 is indexed, then the database will (likely, but 53 not necessarily) use the index first to 52 locate the candidate records to apply the 51 actual filter to.

This reduces the number 50 of candidate records to apply the expression 49 to, otherwise it will just do what we call 48 a "table scan", i.e. read every row.

But 47 fundamentally, however it locates the candidate 46 records is separate from how it applies 45 the actual filter expression, and, obviously, there 44 are some clever optimizations that can be 43 done.

How does a database interpret a join differently 42 to a query with several "where key1 = key2" statements?

Well, a 41 join is used to make a new "pseudo table", upon 40 which the filter is applied. So, you have 39 the filter criteria and the join criteria. The 38 join criteria is used to build this "pseudo 37 table" and then the filter is applied against 36 that. Now, when interpreting the join, it's 35 again the same issue as the filter -- brute 34 force comparisons and index reads to build 33 the subset for the "pseudo table".

How does 32 the database store all its memory?

One 31 of the keys to good database is how it manages 30 its I/O buffers. But it basically matches 29 RAM blocks to disk blocks. With the modern 28 virtual memory managers, a simpler database 27 can almost rely on the VM as its memory 26 buffer manager. The high end DB'S do all 25 this themselves.

How are indexes stored?

B+Trees 24 typically, you should look it up. It's a 23 straight forward technique that has been 22 around for years. It's benefit is shared 21 with most any balanced tree: consistent 20 access to the nodes, plus all the leaf 19 nodes are linked so you can easily traverse 18 from node to node in key order. So, with 17 an index, the rows can be considered "sorted" for 16 specific fields in the database, and the 15 database can leverage that information to 14 it benefit for optimizations. This is distinct 13 from, say, using a hash table for an index, which 12 only lets you get to a specific record quickly. In 11 a B-Tree you can quickly get not just to 10 a specific record, but to a point within 9 a sorted list.

The actual mechanics of storing 8 and indexing rows in the database are really 7 pretty straight forward and well understood. The 6 game is managing buffers, and converting 5 SQL in to efficient query paths to leverage 4 these basic storage idioms.

Then, there's 3 the whole multi-users, locking, logging, and 2 transactions complexity on top of the storage 1 idiom.

Score: 4
  • What does a database actually do to find 11 out what matches a select statement?

    DBs 10 are using indexes(see below)

  • How does a database 9 interpret a join differently to a query 8 with several "where key1 = key2" statements? Join 7 Operations can be translated to binary tree 6 operations by merging trees.

  • How does the 5 database store all its memory?

    memorymapped files for faster 4 access of their data

  • How are indexes stored?

    Internally 3 DBs are working with B-Trees for indexing.

This 2 should be explained in greater details on 1 wikipedia..



Score: 1

In addition to reading, it can be instructive 6 to use the DB tools to examine the execution 5 plan that the database uses on your queries. In 4 addition to getting insight into how it 3 is working, you can experiment with techniques 2 to optimize the queries with a better feedback 1 loop.

Score: 0

Saif, excellent link. A bird's eye overview 22 that manages to cover most topics, and provide 21 details on specific vendor implementations.

I 20 made three tries at writing an explanation, but 19 this is really too big a topic. Check out 18 the Hellerstein article (the one on the 17 berkeley server that Saif linked to), and 16 then ask about specifics.

It's worth noting 15 that only a subset of "known good ideas" is 14 implemented in any given DBMS. For example, SQLite 13 doesn't even do hash joins, it only does 12 nested loops (ack!!). But then, it's an 11 easily embeddable dbms, and it does its 10 work very well, so there's something to 9 be said for the lack of complexity.

Learning 8 about how a DBMS gathers statistics and 7 how it uses them to construct query plans, as 6 well as learning how to read the query plans 5 in the first place, is an invaluable skill 4 -- if you have to choose one "database internals" topic 3 to learn, learn this. It will make a world 2 of difference (and you will never accidentally 1 write a Cartesian product again... ;-)).

Score: 0

If you want to know more in detail, I'd 11 recommend getting the sqlite sources and 10 having a look at how it does it. It's complete, albeit 9 not at the scale of the larger open source 8 and commercial databases. If you want to 7 know more in detail I recommend The Definitive Guide to SQLite which is 6 not only a great explanation of sqlite, but 5 also one of the most readable technical 4 books I know. On the MySQL side, you could 3 learn from MySQL Performance Blog as well as on the book front 2 the O'Reilly High Performance MySQL (V2) of which the blog is 1 one of the authors.

More Related questions