Run time query math

mrtweaver · October 24, 2008, 2:31pm

I am in the process of designing a screen for the supervisors that would give runtime and efficiencies. I was having some trouble coming up with a SQL query that would do what I wanted so I found a work around that seems to work but wanted to present it here to see if there is anything that might come back and bite me in the behind.

First off the table that I needed to acquire the time frame from was quite large and grows more every day. this is the historical table. As I have come to find out the larger the table the more sluggish the query is when ran against it, even with indexes and keys.

So the first thing i done was created a view on the SQL server. This view only looks at the past 48 hours of historical data. SO now instead of over 1million records now it is under 50K.

Then in my FSQL for each group that required the time I created a query that ran against this view and then stores the information back into the main historical table.

It seems to run ok and thus far our SQL server resources have been running about 50% which is kind of high, most places seem to say that it should not go over 30%. So does that mean that this way is incorrect? Is there a better way?

Thanks and have a great day.

Carl.Gould · October 24, 2008, 2:40pm

I couldn’t really follow what you’re doing there (insterting data into a historical table based on a view of that same historical table??)

But, as a general rule of thumb, I’ve found database performance problems are most frequently due to running polling queries against large (1M+ rows) tables without proper indexes. On a large table, you don’t want to include any columns in the WHERE clause of a query that aren’t indexed.

You can use the SQL Server Activity Monitor to get an idea of what queries are going through the system, and it can help you find queries that are taking a long time.

mrtweaver · October 24, 2008, 3:16pm

Ok let me try a different approach to describing it.

We do have the columns, that are in use for the where clause, for the queries that provide runtime, for each machine location, indexed.
Even with them being indexed when you run a batch of queries on a 1M plus size DB it takes time. Correct?
In order to cut the time down took and created a view. This view gets its data from the historical table and narrows it down to only the most recent 48 hours.
This view is now less than 50K records long. So now the queries run even faster because of the indexes and the much smaller size.
IN FSQL I created an action item that is a SQL query. This query runs against this created view. It does a sum(datediff(ss,start,stop) with a small contingent of items for a where clause. Again it should run much faster because of the smaller amount of records and the index.
Once this query runs by the given trigger in the FSQL group it writes the data it generates back down to the historical table.

So in essence yes I am inserting data into a historical table from a view of the historical table. But since the table size has been shrunk by using the view the SQL to obtain the information seems to run much faster. But is this the best approach? Hence the reason why I am posting here and seeing what feedback I can obtain.

Hope this helps. Have a great day.

[quote=“Carl.Gould”]I couldn’t really follow what you’re doing there (insterting data into a historical table based on a view of that same historical table??)

But, as a general rule of thumb, I’ve found database performance problems are most frequently due to running polling queries against large (1M+ rows) tables without proper indexes. On a large table, you don’t want to include any columns in the WHERE clause of a query that aren’t indexed.

You can use the SQL Server Activity Monitor to get an idea of what queries are going through the system, and it can help you find queries that are taking a long time.[/quote]

Carl.Gould · October 24, 2008, 3:24pm

Ok, I get that, and that sound fine. The thing that was confusing me is that you’re calculating data based on calculated data, which sounds like a bit of an ouroboros, but I’m sure you understand your data better than I. If its giving you the information you need, then the approach sounds fine.

nathan · October 25, 2008, 2:15am

Martin - Something about this feels terribly wrong. Ultimately, if your current scheme meets your requirements (even at a high CPU utilization) that’s fine.

I don’t think that views get cached in SQL Server - meaning that the query on the view is really a query on top of a query with the millions of rows. If you noticed a huge performance difference than my assumption was obviously wrong. Using views is a clever approach to simplifying FSQL Action Items - I’ve never thought of that. A few ideas:

Consider archiving/deleting old data. There are many ways of approaching this one - I would have FactorySQL “DELETE records older than…” on the working table (equivalent of your existing view).
Is this the only way data gets entered into the million+ record historical table?
Consider where the repetition is occurring. For example, you can easily write back important running totals or database counts to the PLC prior to logging instead of logging then querying your log table for this info. Ideally, the PLC should track it for many iterations, then you can log useful information as opposed to lots of raw data.
Measure performance consolidating queries - I’m curious how running that repeated query on a view performs compared to a single consolidated query.

mrtweaver · October 26, 2008, 10:30pm

Nathan, Yes I do agree that something feels wrong about doing it this way. Hence why I posted the information here to see what other alternatives there would be. Here is what I am trying to accomplish:

Creating a report that will at the end of a shift will display with the option of printing out a summary sheet that will show what all machines ran, what job they ran, the amount of pieces they have done, the time they spend in a run cycle, and the average they done per hour.

Thus far I can write a query, which I have, that will query the table in SQL and show me what has run on what machine. With the query that I have this is the type of data I can get:

machine job/task pieces ran
1 1234567890 6790
2 2345678291 11220
etc…

This is accomplished by runing a query that checks for a 1 in a column that designates end of run.

Now in this same table I have the start and stop times. Looks something like:
6/6/2008 10:10:50 6/6/2008 10:15:05 RRUN
" 10:15:05 " 10:16:30 IDLE
And the list goes on. In a run stop column I also have a bit that is 1 during run and 0 during idle.

What I need to do is filter thru this information and pull a DATEDIFF on these times with the proper WHERE clause. However I was unable to find a way to have one query pull up all the information and put it in an order like this example:

machine job pieces ran time ran(in Sec)
1 123456790 6790 11002
and so on. If I had this information to start with I could take and using some of the math function in the reporting plug in do calculations and produce all the data as outlined above. The only way I could get it to work so far was doing it in the way I have presented here. But again it makes me nervous because you have all these queries firing off which could bog down the SQL.

So what other options do I have? I know the data, I know how it stored, and I know the end results, just not 100% sure the way in which it is currently is the best approach.

As for deleting the data, this has been discussed, but a choice has not been made on how long to keep the data current and where to draw the lines on current and archived.

[quote=“nathan”]Martin - Something about this feels terribly wrong. Ultimately, if your current scheme meets your requirements (even at a high CPU utilization) that’s fine.

I don’t think that views get cached in SQL Server - meaning that the query on the view is really a query on top of a query with the millions of rows. If you noticed a huge performance difference than my assumption was obviously wrong. Using views is a clever approach to simplifying FSQL Action Items - I’ve never thought of that. A few ideas:

Consider archiving/deleting old data. There are many ways of approaching this one - I would have FactorySQL “DELETE records older than…” on the working table (equivalent of your existing view).
Is this the only way data gets entered into the million+ record historical table?
Consider where the repetition is occurring. For example, you can easily write back important running totals or database counts to the PLC prior to logging instead of logging then querying your log table for this info. Ideally, the PLC should track it for many iterations, then you can log useful information as opposed to lots of raw data.
Measure performance consolidating queries - I’m curious how running that repeated query on a view performs compared to a single consolidated query.[/quote]

nathan · October 27, 2008, 1:05am

So you have literally millions of entries, each of with tracks thousands of pieces run on a single machine?

Where do the dates come from that you want to run the DATEDIFF on? Different rows? What is the criteria? Please provide the actual queries that you’re running.

mrtweaver · October 27, 2008, 5:33pm

I am going to attempt to answer this as clear as possible.

Yes we have a table, yes the table is quite large, currently over 1M records. Each record is a time span for an event. The time/date is supplied by the PLC.

Here would be the heading of the columns:

machine . job/task . d_start . d_end . counted . accum . operator . etc…

So for a run time you would see a d_start (date time start) and a d_end (date time end) then if the machine went to idle you would see the d_start would have the same value as the d_end from the previous record and a new d_end which would span some amount of time.

The d_start and d_end columns are date time field in Microsoft SQL. These are the one I am running the datediff statement on.

The actual query that is working is:

SELECT MACHINE, ACCUM, JOBTASK, D_END FROM PRODTRACK WHERE T_STAMP = { SOME VALUE } AND SHIFT = {SOME VALUE} AND END = 1

This gives me the initial data as specified earlier in the post. I did try using the DATEDIFF in this query but it kept blowing up and comming up with error messages. The two error messages were: group by error or query returned more than one row of data error message. Never at the same time they seemed to alternate.

This is the query that I tried and failed at:

SELECT MACHINE AS MACHINE, ACCUM, JOBTASK, D_END, (SELECT DATEDIFF(SS,D_START,D_END) FROM PRODTRACK WHERE MACHINE = MACHINE AND SHIFT = SHIFT AND T_STAMP = {SOME VALUE}) FROM PRODTRACK WHERE [color=#FF0000][SEE ABOVE FOR WHERE CLAUSE][/color]

I did try moving things around thinking that it had to select the machine numbers first so I put that as the inside query but again that did not work either. So that is why I have done what I have even thought it does create nervousness. Hope this helps and clears things up.

PS I also posted a question concerning DB queries in the data base section of this forum. It basically asks about indexing and keys. One of the DB people here said that in order to properly function you needed to call the index in the query. Is that true with the IA software?

Thanks again and have a great day.

[quote=“nathan”]So you have literally millions of entries, each of with tracks thousands of pieces run on a single machine?

Where do the dates come from that you want to run the DATEDIFF on? Different rows? What is the criteria? Please provide the actual queries that you’re running.[/quote]

mrtweaver · October 29, 2008, 12:52am

Ok while perusing this site I came across the following code and have been trying to adapt it to work in the application that I have written here in this forum post.

As per notes from Nathan I choose to change FSQL so that not so many queries were being done at one time.

As per Colby I changed my groups so as when I updated to 4.2.7 it would not error out groups.

SO far both of these ideas have worked out quite well. One of the new features that is in the newer version that was not in the version I had been running was a datediff in the expressions. SO what I have done is the following:

took out the SQL query in the action item in FSQL and instead am using the datediff on the two date fields, start and stop, which are written to the SQL. So now instead of running a query against a table I am running it on the two OPC items which write to the SQL.

Now for the query that I found in the database heading of this forums group:

SELECT t.model, count(t.model) as qty FROM (select distinct seqno, mid(model,30,1) as model from schedule) t group by t.model

Now I changed the word count to SUM and the word model to RUNTIME and seqno to MACHINE. It pulls up the information but does not sum the runtime field instead it takes the most recent record and just displays that value. So if the most recent value is 10 min it displays that it wont sum up the column.

I think in a previous post I posted what the data looks like in the SQL table and what I want the output to look like. Any thoughts or suggestions?

As for the posting I made in the database section, I was not clear, our database person was talking to me and made mention of the index on. After my post and Carls reply I went back and tried to get more information about what the database person was asking. And I found out that there are some softwares such as foxpro and others that have this kind of requirements so basically I was asking if your software requires it and carls answer was quite clear. Thanks for all the info and keep up the great work.

Now all I need to do is get this query working.

mrtweaver · October 29, 2008, 1:06pm

Here is what I am trying to do and I hope this clears things up considerably. I currently have the following query:

SELECT MACHINE, START, SHIFT, JOBTASK, ACCUM FROM PRODTRACKLIVE WHERE OVERIDE = 1

Overide is simply a marker that say end of file.

The data coming into the data set with the above query looks like:

Machine jobtask shift accum start
1 123456768 1 4567 {date}
2 123645789 1 5678 {date}
3 897367482 1 11678 {date}
Etc……

There is the final field in the database, it is runtime, what I want to do is add a summing feature to the query that will sum the runtime column where the conditions are machine, jobtask, shift and possibly date. So now when the revised query would run it would produce the following results:

Machine jobtask shift accum start runtime
1 123456768 1 4567 {date} 4567
2 123645789 1 5678 {date} 9853
3 897367482 1 11678 {date} 37482
Etc……

As I tried to say in the forum I did find that query in the database section of your site and tried to adapt it but instead of summing the runtime field it simply pulled up the last record and displayed that value. Here is the query that I attempted to use:

SELECT JOBTASK, MACHINE, ACCUM, SUM(runtime) AS runtime, t_stamp AS time
FROM (SELECT JOBTASK, MACHINE, ACCUM, t_stamp, runtime
FROM prodtracklive
WHERE (OVERIDE = 1)) AS i
GROUP BY JOBTASK, MACHINE, ACCUM, t_stamp

So where did I make the query wrong? Thanks and have a great day.

Carl.Gould · October 29, 2008, 2:02pm

Ok, I think I’m getting the picture. Before I delve into queries, let me regurgitate what I think you want, and you tell me if I’m on the right track.

For each given unique combination of: Machine, Jobtask, Shift [and possibly date? please decide…] you want:

machine
jobtask
shift
accum from the last row (where override=1)
sum of runtime

mrtweaver · October 29, 2008, 2:42pm

You go Carl. That is correct. The date part I already have working as well it is just this dang sum that is not functioning as it appears it should in the SQL help file.

The date part can be added if you like. I have a drop down calendar option that selects a date and that is working so I think I can add that in future its just like I said this summing feature is not working.

Thanks and have a great day.

FYI the first 4 items on this 5 item list can be as noted brought in with one query that checks the overide column. But then you probably already knew that. Just thought i would add this as a disclaimer or whatever they call it.

[quote=“Carl.Gould”]Ok, I think I’m getting the picture. Before I delve into queries, let me regurgitate what I think you want, and you tell me if I’m on the right track.

For each given unique combination of: Machine, Jobtask, Shift [and possibly date? please decide…] you want:

machine
jobtask
shift
accum from the last row (where override=1)
sum of runtime[/quote]

Carl.Gould · October 29, 2008, 2:46pm

One last question: What is the value of “accum” for all of the rows of a given (Machine, Jobtask, Shift) combination where override is NOT 1.

i.e.: are they all 0? are they all less than the value on the override=1 row? something else?

mrtweaver · October 29, 2008, 3:30pm

There are two fields in the database as you can see, one is called COUNTER and the other is called ACCUM. Counter is the events count for piece count. If you look at the table you will see that between start and stop times there was X number of pieces done. The Accum is as the name implies the accumulated field it is the total of all pieces done on that machine.

SO to answer your question all previous values of accum are progressively less. The only accum value I am interested in is the one where overide = 1 all others can be ignored.

Hope this helps clear things up.

[quote=“Carl.Gould”]One last question: What is the value of “accum” for all of the rows of a given (Machine, Jobtask, Shift) combination where override is NOT 1.

i.e.: are they all 0? are they all less than the value on the override=1 row? something else?[/quote]

Carl.Gould · October 29, 2008, 4:01pm

Thats what I thought. In that case the following query should get you what you want:

SELECT machine, jobtask, shift, MAX(accum), SUM(runtime) FROM prodtrack GROUP BY machine, jobtask, shift

or, if you want add the date as a uniqueness parameter.

SELECT machine, jobtask, shift, date, MAX(accum), SUM(runtime) FROM prodtrack GROUP BY machine, jobtask, shift, date

Note that all columns in the GROUP BY should have indexes on them.

mrtweaver · October 29, 2008, 7:38pm

Nope that did not seem to do the trick. If you look at the file with the name results in it you see the query ran and the results. The only thing i did was added the where clause for shift and date. It did run the same with and without this where clause.

Anyway if you look at the results screen shot you see that machine 1 is shown as having done 987 but if you look in the actual table, well in this case a view, you see that machine 1 did a total of 4442. Also if I were to provide all the information in the database and you add up the runtime column that is off as well. The same problems persist with all the other machines listed in the reports screen shot.

The information that was correct was the machines that did run, the jobs that ran on them and the shift number. The other fields were wrong.

Any other ideas or suggestions?

Thanks and have a great day.

[quote=“Carl.Gould”]Thats what I thought. In that case the following query should get you what you want:

SELECT machine, jobtask, shift, MAX(accum), SUM(runtime) FROM prodtrack GROUP BY machine, jobtask, shift

or, if you want add the date as a uniqueness parameter.

SELECT machine, jobtask, shift, date, MAX(accum), SUM(runtime) FROM prodtrack GROUP BY machine, jobtask, shift, date

Note that all columns in the GROUP BY should have indexes on them.[/quote]

Carl.Gould · October 29, 2008, 8:18pm

What are the datatypes for the accum and runtime columns in the PRODTRACK table?

Carl.Gould · October 29, 2008, 8:24pm

I bet they are strings (varchar) - MAX is doing an alphabetical sort, instead of a numerical one. Make them integers, and it should work.

mrtweaver · October 29, 2008, 8:57pm

I'd take you up on that bet but i already know the answer. They are not VARCHAR that are indeed INT. I know this because we had contract out a company to write a report off of the same datasource. When they started working on it yes the two columns were both VARCHAR but they requested that we change our data types so that it would make it easier for Crystal reports to do its job. Here is what we have each of the fields set to:

ProdTrack View

MACHINE int
JOBTASK int
START datetime
STOP datetime
COUNTER int
ACCUM int
SHIFT int
OPERATOR int
ASSIST1 int
ASSIST2 int
ASSIST3 int
FLOATER int
PRODMON int
SUPERVISOR int
MECH1 int
MECH2 int
SPECIALOP int
SPECIAL_A1 int
LOGGED varchar(50)
STATUS varchar(50)
CPM int
EFF int
MECHPG int
UTILPG int
MTRLPG int
QCPG int
BATCHEN int
DETCHK int
SPVOVR int
TIMESTD int
LUNCH int
PAPERWRK int
BTCHDN int
STDRUN int
OVERIDE int
FAULT1 int
FAULT2 int
FAULT3 int
FAULT4 int
FAULT5 int
FAULT6 int
FAULT7 int
FAULT8 int
FAULT9 int
FAULT10 int
FAULT11 int
FAULT12 int
FAULT13 int
FAULT14 int
STPBAR int
T_STAMP datetime
QUALITY_CODE int
SERIAL int
RUNTIME int

So this is how it is currently set up and if youd like I can post another screen show where I expand the table and you could then see the data met these data types.

Anything else? Willing to entertain any ideas.

Carl.Gould · October 29, 2008, 9:04pm

Something is not adding up. MAX() is certainly doing an aphabetical sort. Maybe the “prodtracklive” view is doing something that is making them act like strings? What is the column spec for accum and runtime columns in the prodtrack table?