SQL Database Normalization design recommendations

I'm creating an internal inventory program and setting up the database tables using SQL Server. I would like to normalize the table structure to 3NF or BCNF standards. One issue I have is completely understanding what this all entails. There seems to be some debate and confusion based on the definitions and simple examples to be found around the internet on this.

I'm posting a table diagram that I have and would like some suggestions if I'm doing it correctly.
Some things about what I have:

  1. Most child tables (inv_locations, manufacturers, grades, teams, vendors, etc..) are just two columns, id (primary key and the main attribute column). This seems to be overdone, but my reasoning is for data integrity, and while we won't have millions of rows, it will save space. For example, vendors, I want to prevent misspelled or subtle differences in names which causes things like; Acme Co, Amce Co, Acme when they should just be one.

  2. inv_images - This was on purpose to separate the heavy sized image varbinary data on its own. I did this instead of creating a filestream.

  3. My main table, inv_items, I have a hard time understanding what normalized form it's in. In there is itemDescription, itemNumber, partNumber (these last two are different), do these go against 3rd Normal Form or Boyd-Codd Normal Form? What determines what according to normal form definition?

  4. Just to point out, I've set up primary keys, foreign keys and indexing, uniqueness, etc..

1 Like

Going by my admittedly dodgy knowledge of it, I would say that the stockLocator column may be an issue, but I don't know what the column is used for, or if it's even necessary to split it to another table.

Outside of that, as long as each column is dependent on the key (inv_items.id), then I'd say you look good to my-- also admittedly dodgy-- eye.

Would you ever want to allow multiple images per item? As it stands currently, you wouldn't be able to do that.

This is probably just personal preference, but I like to avoid generic id column names where I can when dealing with a lot of FKs. I prefer to match the names on both sides of the FK (e.g. - inv_items table id column would become itemId, etc). Makes it easier to intuitively understand the relationships without having to keep track of table names too.

3 Likes

Yes, stockLocator is also dependent on the primary key as well, so should be good there.

After more researching, I came across this article that explained it in a another manner to me which verified what I was going towards. Normal Forms Explained

Takeaways:

  1. Second normal form is violated when a non-key field is a fact about a subset of a key (only relevant when using a composite key - multiple columns forming primary key).
  1. Third normal form is violated when a non-key field is a fact about another non-key field

Also found this, an inventory tutorial in MySQL that has some similarity.
Inventory Database - MySQL

You bring up a good point. With the images, would this be where I would create another table, something like - "inv_items_images" which consists of two foreign key columns, itemsId and imagesId?

With id primary key column naming, some years ago, a few years back, I've read debates both ways and went with primary key "id" in it's native table.

When you have two tables that have a column with the same name (e.g. ID) but those columns have very different semantic meanings, you have an attractive nuisance when it comes to natural joins. People will want to do a natural join and it won't give the results they think it will.

Other than that, there's the camp of "always use a synthetic column like ID" and the camp of "always use a natural primary key, even if that means multiple columns". I don't think those two camps will ever stop fighting. :wink:

3 Likes

tbh, I find BCNF is the easier of the two, and any relation in BCNF is also in 3NF and 2NF.

1 Like

You would only need to create another table if you need the ability to do a many-to-many relationship model. Would you ever want to use the same image for two different itemId's? If not, then I would actually remove the imageId column from the inv_items table and change the inv_images table to have a 3rd column for itemId. Then just flip around your FK so that the itemId in inv_images is linked to your id column in the inv_items table. id in inv_images is still your primary key and allows you to have multiple records where itemId is the same value. Just note that joining inv_images to inv_items can cause duplicate records for inv_items if you have multiple images per item.

We do actually have the same image for the different items. The reason is we have items of different quality grades and use the same picture. We currently don't have multiple pictures of each item, but will keep it in mind, then we could go the junction table route.

Why are inv orders and inv order details separated?
Can users changes usernames, needing ids?

Order Details allows to track the individual items added to a cart with all their separate details. The orders table is what the items are all on so we can track the single date, username, comments to that whole order. We track this as a single transfer where multiple items can be on one invoice.

As simple as I wanted to create this with few tables, it turned into modeling it after this: Inventory Database Design. While this database is for internal use only, we still need to track certain things that make it important to break it into enough tables to do its job properly.

I'm also including a relationship with warehouse or site and a location table, where an item may be located at a major warehouse or site (city, etc..), the location would be exactly where it's at (there may be several places used for storage).

1 Like

For minor entity tables I would recommend a column for Name (location, manufacturer, etc.), Friendly Name, and Description. That can help with dropdown lists, tool tips, etc.
I would also include Created By (user id), Created On (datetime), Updated On (user id), and Updated By (datetime) columns on each. That will help when you have to explain "what happened..." to the boss.
If this will ever grow beyond workgroup size then store all datetimes as UTC and convert to local time in the presentation layer. You might consider storing the TZ in the location table too.
If the data will ever be replicated across multiple databases or extracted to a data warehouse then I would strongly consider using GUIDs for the id column. Everyone will hate that of course (too hard to type, database engines are more efficient with integer indexes, etc.) but once you figure out that you have PK collisions on replication or ETL it will be too late to do anything about it.

1 Like