[ACCEPTED]-SQL Joins vs Single Table : Performance Difference?-normalization
Keep the Database normalised UNTIL you have 9 discovered a bottleneck. Then only after 8 careful profiling should you denormalise.
In 7 most instances, having a good covering set 6 of indexes and up to date statistics will 5 solve most performance and blocking issues 4 without any denormalisation.
Using a single 3 table could lead to worse performance if 2 there are writes as well as reads against 1 it.
Michael Jackson (not that one) is famously believed to have said,
- The First Rule of Program Optimization: Don't do it.
- The Second Rule of Program Optimization – For experts only: Don't do it yet.
That was 46 probably before RDBMSs were around, but 45 I think he'd have extended the Rules to 44 include them.
Multi-table SELECTs are almost 43 always needed with a normalised data model; as 42 is often the case with this kind of question, the 41 "correct" answer to the "denormalise?" question 40 depends on several factors.
The 39 relative performance of multi- vs single-table 38 queries is influenced by the platform on 37 which your application lives: the level 36 of sophistication of the query optimisers 35 can vary. MySQL, for example, in my experience, is 34 screamingly fast on single-table queries 33 but doesn't optimise queries with multiple 32 joins so well. This isn't a real issue with 31 smaller tables (less than 10K rows, say) but 30 really hurts with large (10M+) ones.
Data 29 volume
Unless you're looking at tables in 28 the 100K+ row region, there pretty much 27 shouldn't be a problem. If you're looking 26 at table sizes in the hundreds of rows, I 25 wouldn't even bother thinking about indexing.
The 24 whole point of normalisation is to minimise 23 duplication, to try to ensure that any field 22 value that must be updated need only be 21 changed in one place. Denormalisation breaks 20 that, which isn't much of a problem if updates 19 to the duplicated data are rare (ideally 18 they should never occur). So think very 17 carefully before duplicating anything but 16 the most static data, Note that your database 15 may grow significantly
What 14 performance requirements are you trying 13 to meet? Do you have fixed hardware or a 12 budget? Sometimes a performance boost can 11 be most easily - and even most cheaply - achieved 10 by a hardware upgrade. What transaction 9 volumes are you expecting? A small-business 8 accounting system has a very different profile 7 to, say, Twitter.
One last thought strikes 6 me: if you denormalise enough, how is your 5 database different from a flat file? SQL 4 is superb for flexible data and multi-dimensional 3 retieval, but it can be an order of magnitude 2 (at least) slower than a straight sequential 1 or fairly simply indexed file.
We leave query optimisation up to the database 21 for the same reasons we leave code optimisation 20 up to the compiler.
Most modern RDBMSes 19 are pretty good in this respect these days.
Before 18 you think that denormalisation is 'ok' in 17 some cases, consider this: normally you 16 are not interested in every attribute. Therefore 15 loading unneeded data off the disk is inefficient 14 (typically the least efficient component 13 of the database). This can be much worse 12 if you have a denormalised design, with 11 lots of redundant data in a row. Even worse 10 again if you have to then update all that 9 redundant data. It can be much more efficient 8 to load some narrow tables containing only 7 the columns of interest and join them. Again, this 6 depends on the database, so without profiling 5 you have no clue.
If you are really worried 4 about performance, you're probably talking 3 scalability issues. In this case you might 2 want to look at sharding, for which proper (normalised) schema 1 design is important.
There is a cost to decomposing tables for 32 the sake of normalization. There is a performance 31 component to that cost. The performance 30 cost of decomposing tables and joining data 29 in queries can be kept low by: using a 28 good DBMS; designing tables right; designing 27 indexes right; letting the optimizer do 26 its job; and tuning the DBMS specific features 25 of physical design.
There is also a cost 24 to composing large tables that materialize 23 joins. The cost in terms of update anomalies 22 and programming difficulties is outlined 21 in good tutorials on normalization. There 20 is also a performance cost to composing 19 tables. In many DBMS products, loading 18 a very big row into memory costs more than 17 loading a smaller row. When you compose 16 very wide tables, you end up forcing the 15 DBMS to read very big rows, only to discard 14 most of the data read into memory. This 13 can slow you down even more than normalization 12 does.
In general, don't denormalize at random. When 11 necessary, use a design discipline that 10 has been tested by people who went before 9 you, even if that discipline results in 8 some denormalization. I recommend star 7 schema as such a discipline. It has a lot 6 going for it. And there are still plenty 5 of situations where a normalized design 4 works better than a star schema design.
Learning 3 more than one set of design principles and 2 learning when to use which set is the second 1 stage of learning to be an expert.
With the proper indexes set up, your joins 15 can perform very quickly. Use SQL Profiler 14 to determine what indexes need to be created 13 or altered to optimize performance of your 12 common queries. Be sure to have a maintenance 11 plan set up for your database to run once 10 a week (or every day for tables that change 9 a lot) that updates your statistics and 8 indexes.
Normalization is normally preferred 7 over keeping data in multiple locations. There 6 are scenarios where insert/update does not 5 need to occur quickly and select need to 4 occur very quickly in which case you could 3 be better off without normalization. Even 2 so, premature optimization is not recommended 1 so go with a normalized structure first.
One of the ultimate hyperoptimizations available 9 through some of the cloud sites is, in fact, using 8 a smaller number of wider, limited-capability 7 tables for efficiency. So far in the future 6 if you need to wildly scale, this is one 5 way. But it's not considered desirable practice 4 for any relational dbms (which those aren't).
If 3 you're having performance problems, there 2 are a lot of things to work on first, before 1 any kind of denormalizing.
More Related questions