[ACCEPTED]-Should I add this new column to customers table or to a separate new table?-database-design

Accepted answer
Score: 16

I would suggest that you should do this 99 as a second, separate table.

The reasoning 98 is that, as you suggest in your question, only 97 approximately 10% of your customers need 96 these "updates" and therefore 95 approximately 90% of the records from the 94 "customers" table will have a 93 field always containing a NULL value, if 92 you do it as an additional field on the 91 same customers table. Implementing this 90 as a second table avoids this issue.

This 89 isn't really a big issue since your customers 88 table is very small, but the more important 87 issue I would think about when designing 86 something like this is future-proofing.

Basically I might 85 ask myself:

"Would I, at any point in 84 the future, need to know about the customer's history of 83 updates rather than just the very last one?"

Depending 82 upon the application of this (and it sounds 81 like a business-driven program from what 80 you're saying) there may well be a need 79 to examine customer update history. Think 78 Management Information, Reporting, Yearly 77 summaries etc.)

In pretty much all of the 76 business applications I've ever wrote, I've 75 had to retain pretty much everything for 74 at least a few years (it then usually gets 73 archived off to a data warehouse or separate 72 database) for these exact purposes.

Even 71 if you're not interested in the customers 70 history of updates, I personally, prefer 69 the 2 table approach as it certainly allows 68 for historical record keeping, and offers 67 a better designed approach (as only some 66 records from the customers table will need 65 records in the 2nd "updates" table). That 64 said, however, see my EDIT below for further 63 information. If I knew that a history would 62 never be required to this data, I'd implement 61 as a single additional field on the existing 60 customer's table.

Also, don't worry about 59 having a "bunch of very small tables" in your database. There can usually 58 be very good reason for having them, and 57 is a part of the object-relational impedance mismatch and is usually overcome 56 by a more "cohesive" object-oriented 55 design in your application's code.


(In response 54 to the comments on my answer).

Aaron Bertrand makes a very 53 valid point in that, if you were to have 52 many instances of additional pieces of data 51 just like this, and you continually used 50 a separate table each time linked via primary 49 key, you would end up with a plethora of 48 very small tables to hold data about a single 47 one of your customers. In querying the 46 database to extract a complete set of data 45 for even one customer, this could become 44 exceptionally cumbersome and overburdened 43 with excessive and inefficient JOINS over many 42 tables.

Depending upon the nature of the 41 "additional" pieces of data, one 40 has to make a pragmatic decision as to how 39 it will be implemented. Aaron suggests 38 that, in the case of the "LastUpdate" date 37 field, having lots of NULL's in 90% of the customer 36 table is no bad thing, and I do agree with 35 him here in that, from the perspective of 34 NULL's, this isn't a bad thing. My own suggestion 33 to utilise a 2 table approach was not so 32 much based upon the desire to remove the 31 NULL's (although it does accomplish that), but 30 rather to ensure that a history of "LastUpdate" dates 29 can be maintained.

Of course, if keeping 28 a history is entirely not required (and 27 bear in mind that what is not required today 26 may well be required tomorrow) that implementing 25 this "LastUpdate" date as an additional 24 field on the same "Customer" table 23 would be fine. In fact, if there was only 22 ever a direct one-to-one relationship between 21 a single customer and a single "last 20 update" date, splitting it out into 19 2 tables would be the wrong thing to do. In 18 this case, I'd implement it as an additional 17 field on the customers table, as it's now 16 a scalar valued property of that customer.

Of 15 course, you can always implement today as 14 a single field on the customer table, and 13 if it becomes a requirement (let's say a year 12 down the line) to now keep a history of "last 11 update" dates, you can always refactor 10 your database to split this into a 2nd table, but 9 bear in mind that you're only collecting 8 historical data from that point on as you 7 will have no record of the previous year's 6 last update dates.

If you do (or will) require 5 a history of "last update" dates 4 (i.e. one-to-many relationship between customer 3 records and "last update" date 2 records) then using a 2nd table approach, linked 1 by primary key is your only choice.

Score: 7

"When in doubt, do the simplest thing that 17 could possibly work" - Ward Cunningham

There's 16 times where I'd say "add the second table" but 15 in this case I don't feel it's justified. As 14 I understand it there's no requirement to 13 maintain a history of values for this attribute. The 12 table is small. And, ultimately, what you've 11 got is an attribute of a customer. Sure, not 10 all of them will be populated but to me 9 that's a secondary consideration. Lots 8 of fields have NULL values in the majority 7 of cases, but that doesn't mean you should 6 necessarily create a second table to hold 5 them. Keep it as simple as possible (and 4 as normalized) as possible, but no simpler 3 (or normal-er :-). So if it was me I'd 2 add these fields to the CUSTOMERS table. YMMV.

Share 1 and enjoy.

Score: 3

I would go with option 2.

I don't like 24 having columns like SendUpdates. IMO, it's better 23 to have that stored in the existance of 22 a row in a different table.

SELECT * FROM customer_updates_sending;

is simpler 21 and faster than

SELECT * FROM  customers WHERE SendUpdates = 1;

Further thoughts in reaction 20 to comment:

Yes, I would advocate for creating 19 additional tables for additional attributes 18 provided that each attribute is associated with a different and unrelated task or action. Adding attributes that aren't associated 17 with a separate task should be added to 16 the first table (middle name of the buyer 15 for example).

In this case, the attribute 14 (timestamp) is associated with a task (contacting 13 the customer). All information related 12 to contacting the customer should go in 11 that table. (contact address for example).

I'm 10 not sure what you mean "you have to change 9 the source query in every table." A table 8 is a set of information. You don't save 7 queries in tables.

You aren't going to 6 have big queries with 15 joins because you 5 only need the join that has to do with whatever 4 task you're currently doing. Any time you're 3 not sending out the letter, you don't need 2 that information. And when you do need that 1 information, it's 1 join away.

Score: 1

Because of the relatively small size of 108 the database, the two-table approach seems more appropriate as it is a structure that 107 is more normalized, and more likely to support 106 possible extensions. Would there be performance 105 concerns, we'd tend to denormalize, i.e. use 104 the one table approach.

In general with this 103 type of question one consideration, maybe 102 the most important one, is the typical usage patterns for various parts of the data.
A plausible 101 guess is that the Customer table is heavily 100 used for [mostly] read-only purposes; such 99 usage can be supported by keeping the updated 98 (and not frequently queried) data such as 97 the timestamp of the last notification separately. Would 96 the info in the related table be more frequently 95 included in queries, then it may be appropriate 94 to place such info in the main table instead.

On concerns about the sparse usage of the update timestamp column
(a mute point, since the two table approach will likely be choosen, but in general...)
The 93 fact that only 10% of records would have 92 some info in the timestamp column, hints 91 at some "wastefulness" would we choose option 90 1. In actuality the sparse usage of this 89 column have little bearing on the database 88 size and performance in general. For example, if 87 the table readily includes variable length 86 column, the size overhead is effectively 85 zero; if this is the first nullable or 84 variable length column, a minimal size tax 83 will be levied but should have little bearing. (Also 82 in more recent versions of SQL server, 2005, I 81 think, one can use sparse column, although 80 this is hardly worth with thousand or even 79 tens-of-thousand-records sized database.)

On the "sendudpate" column
It 78 is also a good idea to drop the "sendupdate" boolean 77 column from the main table, putting all 76 update related info in the related table. I 75 suggest however that the fact that a customer 74 receives updates should not be implicit 73 to the underlying customerid having a record 72 in the related table. Instead introduce 71 the "sendupdate" column in the related, maybe 70 not as a simple boolean, but as a frequency 69 code for example (eg 0 = no updates, 1 68 = update daily, 7 = update weekly etc.) This 67 doesn't mean that I suggest that all customers 66 should have an entry in the related table, but 65 that the fact that they have such a record 64 be a necessary but not sufficient condition, for 63 example allowing to temporarily disable 62 updates etc. etc.

On whether the one table approach "breaks" and rules of normalization
It is important to distinguish 61 between the physical and the logical database 60 designs. It is quite possible to have a 59 physical schema which doesn't break any 58 normalization rules but which isn't logically 57 normalized. In very broad terms, one basic 56 mantra of normalization is to store only 55 one type of Entity per table. So long as 54 there is no duplication of data, one could 53 put in the same table, in one wide record, info 52 actually pertaining to two logical entities.
Illustrated 51 with the customer database in the question, one 50 can decide that physical records will include 49 the date of last update sent to a given 48 customer. Fair enough, physically no normalization 47 rule broken... Logically, however, one 46 can argue that "Customer administrative 45 info" (Name, address...) and "Update info" (date 44 of last update sent) are two distinct entities, even 43 though they seem to be on a 1-1 relationship 42 at the moment.
Hence the state of normalization of the 41 logical model, where the entities are effectively 40 defined, is often in the eyes of the beholder 39 since one can (sometimes very fairly and 38 reasonnably) argue that elements of data 37 which all have a 1-1 relationship belong 36 to the same concept (entity).

Q: What is logical and physical normalization? ... Aren't the normalization rules straight forward ? I attempted 35 to explain this above.
"physical normalization" (or 34 rather normal form of the physical schema) looks 33 at the very factual composition of the tables 32 and their relationships, and applies simple 31 rules to know what normal form such schema 30 satisfies.
"logical normalization" (or rather 29 normal form of the data model) looks at 28 the effective entities found in the system.
So, to 27 provide another example, when designing 26 a simple database of house for sale listing 25 one may decide to have a single concept 24 of a "HOUSE" stored in a single table with 23 columns like "Address", "Kitchen_area", "Living_Room_area" etc. And 22 such a table would "work" and be technically 21 of a particular normal form; it would be 20 somewhat impractical, preventing the listing 19 of houses with two kitchen etc. Alternatively 18 one could see the house as a "Location" (the 17 address, and maybe other admin info), and 16 "Rooms" (the type, the surface, the flooring 15 info...), whereby each concept (location, room) is 14 stored in a separate table, with one location 13 being associated with several rooms.
Whereby 12 both of these models may be put into a physical 11 schema that is normal, the first model can 10 be said to be denormalized (at logical level), because 9 of the fact that it doesn't capture properly 8 the entities effectively present.

Q: I don't understand how you suggest explicitly mark the fact that customer receives updates? A:

 SELECT whatever
 FROM Customers
 JOIN NotificationTable N on N.CustomerId = C.CustomerId
 WHERE N.notificationFrequency > 0

In 7 the above,
- the JOIN capture the first 6 condition for a customer to be notified: there 5 need to be a corresponding record in Notification 4 table.
- the WHERE N.notificationFrequency 3 > 0 predicate captures the very explicit 2 condition, that of the notificaionFrequency 1 column to be positive.

Score: 1

@CraigTP then again, on to your second point 17 too. there are much better and more complete 16 ways to mainatin history of records. As 15 for that matter, it can be done (and should 14 be done) on full Customer table. That is 13 basically called Auditing. You have triggers 12 for that. You have quite simple approaches 11 in hibernate to do that out of the box for 10 that. Lastly, I would go with adding a new 9 column in existing table design

EDIT I came back 8 to this and think that I missed SendUpdate 7 part too. IMHO, that columns is also not 6 justified (in any table as such) If you 5 keep that, its a transitive dependecy which 4 you should normalize in 3NF. But anyways 3 I believe keeping the history out and keeping 2 the extension out, adding a new column is 1 the way to go w/o SendUpdate column.

Score: 0

I think you've made an excellent start on 9 understanding the issue and coming up with 8 options. Both are reasonably sound designs, which 7 should work reasonably well.

I've seen approach 6 #1 grow out of control - with each new configuration 5 value being added as a new field to the 4 point where each user has many empty config 3 values that apply to only a handful of the 2 population.

I would prefer option 2, as it 1 seems the more intuitive method to me.

Score: 0

I would want a simple way to have a daily 11 record of what updates I sent. This is not 10 another table of "customer" data, but a 9 table of dates each customer was sent an 8 update. The question you are going to be 7 asked is, "Did Customer 'A' get their update 6 on last Tuesday?" The only answer the single 5 table solution can provide is, "I don't 4 know, but they got it on Friday." May not 3 be acceptable. Without a history, you can't 2 resend an update that the customer didn't 1 get.

More Related questions