Table Indexing

TimE · April 15, 2010, 4:23pm

When a group is created and run for the first time the table is created automatically. Could an option be provided to automatically create a clustered index on the t_stamp column? For non DBA’s like me it would save a huge learning curve in trying to figure out how to improve performance.

If this is something you consider valuable could it be added to the Ignition platform as well?

Thanks,

Colby.Clegg · April 16, 2010, 4:21pm

Well, it currently creates an index on that column, but whether or not it’s “clustered” (and what that even means) is going to depend on the database.

Unfortunately, we always have to try and do things that will work for all databases. We have the idea of “database translators” that help us navigate differences, but it’s not fine-grained enough to say “use this syntax for indices” and “use this syntax for the timestamp column of a table”. The difference would be necessary in this case because you can only have one clustered index per table, and in many cases we create multiple indices for a given table.

I think it’s worth noting that your original performance problem of having lots of indices on that table (include the 4-way index) was the result of someone’s actions- FactorySQL doesn’t create multiple-column indices.

TimE · April 16, 2010, 4:41pm

Yeah, I understand that creating a software package that works on multiple platforms with multiple databases puts limitations on what can be done automatically.

In terms of the performance improvements. Simply deleting the indices that IT created did not make that much of an improvement. As I understand from this experience; MS SQL created the clustered index on the PK column automatically and the real performance increase was to change this have the t_stamp column clustered.

Am I interpreting this correctly?

Colby.Clegg · April 16, 2010, 6:38pm

Yeah, I can believe that. That’s another trick about indexes… it’s really quite important that they’re constructed based on how they’ll be used.

With a historical table, the most common query is “where t_stamp>x” or “t_stamp>x and t_stamp<y”. Having the t_stamp column indexed as part of a multi-column index probably won’t help, since you’re not using those other columns in the query. This is a huge misconception about multi-column indices that people often make. They think that they can just throw any column they might want to query on into a single index and get some benefit. The exact result is dependent on the database (and in fact the data engine, in a situation like mysql that supports multiple), but it generally will not be helpful- and in fact could hurt- if you’re not querying on all of the columns together (all columns in the where clause).

So, creating a single index on t_stamp in your case was a good idea. As for clustering- a clustered index dictates how data is stored. For a historical table like this where you’re querying ranges of dates on the t_stamp, it makes a lot of sense to store that similar data close together. Of course, practically speaking your ndx column should have lined up pretty well with the order of your timestamps (because the data is likely being inserted in order), but there may be some subtleties of how the db can optimize on the fact that you’re filtering based on the actual column that’s clustered.