Getting max run rate

Zach_Larson · March 12, 2024, 8:19pm

What is the best way to capture the maximum run rate for the performance part of OEE?

Is there a better way than piggybacking a state events table?

Zach_Larson · March 20, 2024, 4:16pm

I am leaning toward Option 1.
I am adding tracking the Sku so I have the maximum run rate for OEE.

Option 1
One table

Columns:
ndx, t_stamp, location, machine, event code, quantity, sku

Triggers
When a new state event or reject event happens, the sku is logged.
When a sku changes, the state is logged

Option 2
3 event tables

Stop events columns:
ndx, t_stamp, location, machine, stop code

Reject events columns:
ndx, t_stamp, location, machine, reject code, quantity

Sku events columns:
ndx, t_stamp, location, machine, sku

Option 1 query

SELECT 
    sku,
    SUM(CASE WHEN event_code = 'Stop' THEN 1 ELSE 0 END) AS total_stops,
    SUM(CASE WHEN event_code = 'Reject' THEN quantity_or_sku ELSE 0 END) AS total_rejects
FROM 
    your_table
WHERE 
    t_stamp BETWEEN '2024-03-20 08:00:00' AND '2024-03-20 09:00:00'
GROUP BY 
    sku;

Option 2 query

SELECT 
    sk.sku,
    SUM(CASE WHEN s.event_code = 'Stop' THEN 1 ELSE 0 END) AS total_stops,
    SUM(CASE WHEN r.reject_code IS NOT NULL THEN 1 ELSE 0 END) AS total_rejects
FROM 
    sku_events sk
LEFT JOIN 
    stop_events s ON sk.ndx = s.ndx AND s.t_stamp BETWEEN '2024-03-20 08:00:00' AND '2024-03-20 09:00:00'
LEFT JOIN 
    reject_events r ON sk.ndx = r.ndx AND r.t_stamp BETWEEN '2024-03-20 08:00:00' AND '2024-03-20 09:00:00'
GROUP BY 
    sk.sku;

I need help evaluating benefits of Option 2.
I only know that it may have faster indexed inserts, not sure.

I learned to go with a single table before with the events tables.
I had used a single table with states and rejects in one table before.

Though I did see some comments about separating those tables.
I still don't understand why to separate them, or is it just the insert speed?
Organizing SQL tables and named queries efficiently - #11 by zacharyw.larson]

DNice · March 21, 2024, 2:55pm

I would lean toward option 1 as well - you named the columns well to capture both types of events. Since it seems this table will primarily serve the purpose of performance OEE tracking, it makes sense to group it. The only tweak I would make is the same one that @pturmel made in your other linked thread:

For efficiency, I would use a numeric foreign key for machine, where a separate, relatively static table of machines has that as a primary key...

I too would like to undserstand the benefit of the 2 tables too if there is one. I have designed MSSQL databases with both types of tables (millions of rows) and have not seen much performance differences between querying or insrting into either. Maybe once I get into the 100MM records ...

Zach_Larson · March 21, 2024, 3:05pm

I thought my index would be like:
t_stamp, location, machine, event code

What is the benefit of a numeric foreign key for the machine?

I planned to use two indexes on the static table of fault codes as:
Location, machine, code
location, machine, sub-code

pturmel · March 21, 2024, 3:06pm

More efficient indices. If you rename a machine, you don't lose history.

Ideally, you wouldn't have location in the raw data table either--that would be a column in the table of machines.

Zach_Larson · March 21, 2024, 3:14pm

I want to keep the history for sure.

Don't numeric foreign keys lose efficiency because each query would require fetching the machine number?

Denormalized tables make me feel nervous and apprehensive.
I frequently worry about the dependencies between the tables.
The tables will need identical redundant data, inserts, and updates.
Someone typed "testName" as a machine name in a denormalized table.
It is still there, and obstructs me only because it is a denormalized table.

I think if I have normalized tables, I'm calm.
I can get my excel file where I had the static data and recreate the static table.
Nothing bad happens because I don't depend on generated keys.
A row "testName" isn't allowed to exist as it has no place.
There is no generated id for it to exist.
The errant or test row has no place holder or gap when removed if normalized.

pturmel · March 21, 2024, 4:10pm

Sounds to me like your understanding of normalized and denormalized is backwards.

Repeating a text machine name in every raw data row is denormalized. Placing a small integer key in each row with a separate table for key versus text is normalized.

Normalized is generally good. Best for space, and usually best for performance (simply by not needing as much disk traffic).

Denormalized is generally bad. Uses more space, and is computationally more difficult to do things like grouping.

DNice · March 21, 2024, 5:29pm

This is almost reason enough alone to me. This has happened a handful of times for me and instead of changing every occurance of [machine name] in every table that it is recorded, you change one record in the MachineDefininition table.
As mentioned, (noticeably) more performant too join on integers vs strings as well!

Zach_Larson · March 21, 2024, 6:17pm

Are normalized tables the ones without a foreign key?
I might have used the terms wrong.

I think whichever name it is:
Keeping inserts, updates, and removals matched in multiple tables seems harder.
Keeping inserts, updates, and removals matching in a single table seems easier.

There seems to be debate on which way is faster.
I could see a single static table having to be split up anyway.
I can't currently see how joining more tables would be faster though.

Tstamp would be a datetime.
Sku will probably need to be a string.
I mean for location, machine, code, qty to be integers.

pturmel · March 21, 2024, 7:03pm

"Normalizing a table" means "reducing replicated data by moving repeats to single rows of an auxiliary table, accessed as needed with joins".

Removing the replicated strings is almost always a performance and space utilization win. Yes, slightly more complicated. But almost always worth it, for the simpler forms.

There's whole books on various ways, and various levels, of normalization. With various tradeoffs as the forms get more complex.

Start here:

Zach_Larson · March 21, 2024, 7:16pm

Can that style benefit from the indexing still?

What about the maintenance needed per table?

Thanks for helping me with understanding how the normal forms work.

I am looking at something like these options effectively.

Option 1
Key, MachineId, Sku, maxRunRate
Key, MachineId, MachineStringName
Key, MachineId, Infeed, Outfeed
(T_stamp, location, machineId, event code, quantity, sku) not sure of alternative

Option 2
Location, MachineId, MachineStringName, Sku, maxRunRate
Location, MachineId, Infeed, Outfeed
T_stamp, location, machineId, event code, quantity, sku

DNice · March 22, 2024, 3:52pm

Indexing is a whole other discussion. Within MSSQL PK's are automatically indexed and you can add more columns to index by for a given table (you can add your MachineId as an addition index column for example). So the answer to first question is yes.

Not sure exactly what you're getting at with "maintenance per table". I guess you/someone will need to be the one to add a MachineID to be able to refernece that ID in other tables yes.

I'd lean toward your original Option 1 with the following change:

The 2nd table would be your MachineDef table which I would include everything related to that machine like Site (or SIteID), Line (or location), type, etc... and of course your MachineID index.

Is your SKU defined anywhere else alreadylike in your MES or ERP system? Is MaxRunRate available there? Otherwise, probly not a big deal adding this 1 extra column to your performance table.

3rd table with infeed/outfeed columns makes sense unless or is this just a tag that should be historized?

Zach_Larson · March 22, 2024, 4:10pm

In the 70s Codd was using normalized forms to make queries efficient.
In the late 80s and early 90s, indexes were developed.
Is Codd's technique still faster than the alternative after indexes were developed?

Option 1
Key, MachineID, Sku, maxRunRate
Key, MachineID, Location
Key, MachineID, MachineStringName
Key, t_stamp, MachineID, Infeed, Outfeed
Key, t_stamp, MachineID, eventcode, quantity, sku

Option 2
Location, MachineId, MachineStringName, Sku, maxRunRate
T_stamp, location, MachineId, Infeed, Outfeed
T_stamp, location, machineId, event code, quantity, sku

Option 3
MachineID, Sku, maxRunRate
MachineID, Location
MachineID, MachineStringName
t_stamp, MachineID, Infeed, Outfeed
t_stamp, MachineID, eventcode, quantity, sku

I want to understand how option 1 would be faster.

I keep thinking option 2 has fewer joins so it will be faster as long as each table has an index.

Maybe I understand indexes wrong.
I thought indexes made keys obsolete.

Keys are made of row counts and indexes make column(s) into a key, I thought.

I can see how the total storage is smaller when not using option 2.
Option 2 will rows per combination, yet each of those combinations will be indexed before a join, which is why I thought option 2 would be faster.

I don't have good way to measure the speeds.

I also have been reading about OLAP and OLTP.
Snowflake uses OLAP. Though I don't know if it use MOLAP, ROLAP, or HOLAP.
It is not clear to me on the level of consideration to give to that information in the design if I know I will be using Snowflake.

Zach_Larson · April 18, 2024, 7:24pm

I more thoroughly understood the many numbers of ways and numbers of tradeoffs today.
It had not hit me so hard as today.
I think I read it initially, but the weight of it was difficult to grasp at first.

I had kept thinking I could isolate a best way until today.
I liked these notes.
Different Types of Normal Form | HackerNoon]