Sorting a table or sorting incoming data?

The_Bit_Plumber · May 28, 2024, 2:31pm

I am working on a project that when boiled down just needs to search a DB. I have the DB setup and have data coming in. But without going in the weeds too much, the incoming data is very dense and disorganized. Looking like this...

It's a long string consisting of a Tag ID, Sector, and Time., for multiple entries. I need this data to be searchable, specifically the Tag ID. Someone is going to enter a Tag ID number in a website, and it will tell them the sector it's in and the last time it was seen. It needs to only select the most recent Tag ID, and the data of Tag Id, Sector, and Time need to stick together bc they are for unique tags.

My question is, can I make a query that does all this in the query, or should I organize the data first? I'm leaning towards organizing the data, and can I do that to the incoming data? Like take the message and parse it out to separate columns and rows for each tag, or should I let the data land in the table it is and then have a script create a separate table from that table and that's the one I search.

Also know that changing how the data is sent is not an option at this point, it was an uphill battle to get to this point and any change or variation will have a massive impact that will mess everything up. I need a working product as soon as possible, an improvement or V2 is always possible at a later time. Right now, I have to work with what I got.

I've tried searching the forum and watching the Induction University videos, and I was going in so many different directions I wasn't getting anywhere. I'm hoping someone can straighten me out and provide some guidance. So far everyone here has helped me a tremendous amount and I'm so thankful this resource is here. TIA.

Transistor · May 28, 2024, 2:38pm

I'd recommend going with a second table. It's cleaner, your queries will be easy and you'll have indexes so it will be fast.

If you have access to the database then create the new table with the columns you require and then, on the existing table, create a trigger to to run before or after a record is inserted and have the SQL parse the data and stuff it into the new table. The existing data won't feel a thing!

If you need further help on that then mention the SQL flavour you are using.

The_Bit_Plumber · May 28, 2024, 6:58pm

Ok, that's kind of along the lines of what I was thinking. I'm using the built in SQL-lite DB in Ignition and I've done most of the scripting in python 2.7 bc that's the version Ignition uses.

I started to try and do this, but I don't know how to make the trigger start the parsing? I'm still very new to Ignition and I've been drinking from a firehose for months.

pturmel · May 28, 2024, 7:08pm

Ewww. That falls over if you push it even a little. (Single-threaded, not designed for production sample storage, just configuration.) Use a real database. (MariaDB or PostgreSQL if you need free.)

Transistor · May 28, 2024, 7:23pm

I agree with Phil. Get a grown-up's database.

The SQL trigger will be an SQL query - no Python involved - and will run (even if Ignition isn't running) any time a record is added to the original data table.

Post a couple of lines of your existing data and format it as code. (See https://forum.inductiveautomation.com/t/wiki-how-to-post-code-on-this-forum/77831.) There are a few of us that would be interested in the challenge of coming up with an SQL query to parse the data into columns. Make sure it's clear to us what parts of the data are required.

The_Bit_Plumber · May 28, 2024, 7:52pm

Should I just look up how to set one of those up or is it built-in to Ignition because I never messed with DB's before. Idk if it's easy to setup, I'm hoping for something that is self-contained and don't have to build something else.

Transistor · May 28, 2024, 7:57pm

Ignition doesn't ship with a database engine so that you have the choice of which to use. Phil would recommend PostgreSQL. I'm more familiar with MariaDB.

You find their download page, run the installer, creating the required passwords. You'll generally need to add a username / password for Ignition to use (rather than the administrator account).
You make a database connection from the Ignition gateway Config page. You will get an indication there showing the success or failure of the connection.

If you have anyone with DB experience to assist it will save you some pain. It's worth the effort though.

The_Bit_Plumber · May 29, 2024, 3:53pm

Ok, so I downloaded POSTgres DB, set it up in Ignition, connected it to my device. and have a table in it where the raw data is going. I setup a separate table that the parsed data would go into but I'm having trouble with doing it. Idk how to set it up to run a script or query to extract and parse the data.

Here's what I have so far, but I don't think it's working...

INSERT INTO parsed_data (tag_id, sector, time_stamp)
SELECT 
  split_part(split_part(split_part(data, 'Tag ID: ', 2), ' was identified', 1), ':', 2) AS tag_id,
  trim(split_part(split_part(data, 'sector ', 2), ' at', 1)) AS sector,
  trim(split_part(split_part(data, 'at ', 2), 'Tag ID:', 1)) AS time_stamp
FROM raw_data
WHERE 
  split_part(split_part(split_part(data, 'Tag ID: ', 2), ' was identified', 1), ':', 2) <> '' AND
  split_part(split_part(data, 'sector ', 2), ' at', 1) <> '' AND
  split_part(split_part(data, 'at ', 2), 'Tag ID:', 1) <> '';

And the raw data looks like this...

Tag ID: 0 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 1 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 2 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 3 was identified, but it is not in any sector at 10:24 AM, May 29, 2024

Tag ID: 24 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 25 was identified in sector purple at 10:24 AM, May 29, 2024

I don't think how I'm doing it is going to work because it's looking for 1 instance of Tag ID or Sector, when there will be dozens. I almost need it to read right-to-left and separate it out one at a time. And most importantly it needs to keep the relevant data together, I can't have a tag being misidentified.

Any suggestions?

pturmel · May 29, 2024, 3:55pm

If you are using the DB to do the work, you have to only use SQL syntax.

If you query into Ignition with one statement, process, then insert into the DB with a separate statement, you can use jython syntax.

The latter is typical if there are two separate databases.

The_Bit_Plumber · May 29, 2024, 4:12pm

I don't fully understand your comment. I'm thinking it would be easier to have 2 separate tables. One where the device dumps raw data, and one that has organized information. Then plan is then to point the customer facing search button part at the organized table making it easier to search. They would just enter the Tag ID number and it pops up the sector and timestamp. No need to go and parse the raw data table every time.

Transistor · May 29, 2024, 4:13pm

Are you allowed to create a new table in the database with the existing records? The solution depends on the answer to this.

The_Bit_Plumber · May 29, 2024, 4:17pm

I don't see why not? Am I not supposed to be able to or is that not normal?

pturmel · May 29, 2024, 4:19pm

Right, I got that. Good plan.

But when you use Insert Into someTable (...) Select ... From otherTable, that's a single SQL statement where all of the data transfer is done in the database. All that Ignition will see is a row count as a result. All of your parsing has to use SQL syntax and functions in this case, and both tables have to be reachable from the one database.

With Ignition, you can break this into three steps:

Select ... From otherTable to get fresh data into Ignition,
Loop through the result, parsing with jython and assembling a list of row data for insert,
Loop through the parsed list using Insert Into someTable (...) Values (...), perhaps optimizing with multi-row inserts.

When Ignition is in the middle like this, someTable and otherTable can be in different databases.

In both approaches, you have to use a WHERE clause of some kind to avoid doubling up any rows you've already processed.

Also, you probably don't want to end up with a single string. Make columns in otherTable to store filterable information (like TagId) separately, preferably in an efficient form (not strings).

Transistor · May 29, 2024, 4:24pm

My question was if you have user permissions to modify the database. (If it is a database belonging to some proprietary system then you might not.)
It sounds like you have access rights.

pturmel · May 29, 2024, 4:35pm

If you can stay in a single database, another option would be to define an Insert Trigger for someTable that parses and inserts to otherTable on the fly. Then Ignition only has to query otherTable. The data will show up there without Ignition doing anything.

This is particularly handy as you don't need to worry about previously parsed rows. Insert triggers are handed just the new ones.

Transistor · May 29, 2024, 4:43pm

Phil, the Trigger operation is a little tricky. All the sample data in post #8 could be on one row so that would require the trigger to loop through the row looking for the start of each record ("Tag ID: ") and start parsing from there, assembling the rows and posting them to the new table. Can PostgreSQL do that?

pturmel · May 29, 2024, 4:52pm

PostgreSQL triggers are fully procedural--local variables allowed, et cetera.

JordanCClark · May 29, 2024, 5:32pm

I tend to be more of a scripting guy, as it's easier for me to trouleshoot, and there are a lot of splits going on.

Parsing example:

import re

message = """Tag ID: 0 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 1 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 2 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 3 was identified, but it is not in any sector at 10:24 AM, May 29, 2024

Tag ID: 24 was identified in sector purple at 10:24 AM, May 29, 2024

Tag ID: 25 was identified in sector purple at 10:24 AM, May 29, 2024"""

for row in  message.split('\n\n'):
	newRow = re.split(' was identified| at ', row)
	
	newRow[0] = int(newRow[0].split(' ')[-1])
	
	sector = newRow[1].split(' ')[-1]
	if sector == 'sector':
		newRow[1] = 'N/A'
	else:
		newRow[1]=sector
	newRow[2] = system.date.parse(newRow[2], 'h:mm a, MMM DD, yyyy')
	
	print newRow

Output:

[0, 'purple', Mon Jan 29 10:24:00 EST 2024]
[1, 'purple', Mon Jan 29 10:24:00 EST 2024]
[2, 'purple', Mon Jan 29 10:24:00 EST 2024]
[3, 'N/A', Mon Jan 29 10:24:00 EST 2024]
[24, 'purple', Mon Jan 29 10:24:00 EST 2024]
[25, 'purple', Mon Jan 29 10:24:00 EST 2024]

To expand on Phil's Ignition-in-the-middle method, I would (well, have...)

add a column to the raw table to hold a flag for rows that have been processed. (e.g. 'ack' or 'parsed' or 'whatever')
Query up to a limited number of rows from the raw table, as we don't know how many will need to be processed
Parse and insert into the parsed table
set ack to 1 in the raw table where the row index is in the dataset you just queried.
Trigger the script from a gateway timer event.

Of course my opinion in 5 dollars will get you a cup of coffee...

The_Bit_Plumber · June 3, 2024, 1:15pm

Where would I put this code or execute this? I don't mean to come off as helpless, but I sort of am. Would I put this in the same table as raw_data or create a new table and pull from the raw_data table?

The_Bit_Plumber · June 3, 2024, 1:23pm

Could you give an example or go into greater detail? If we use raw_data and parsed_data as my table names, what would that look like? While I'm familiar with loops, idk how to make a loop in this instance. Where does it go, is it triggered, is it in the DB query, a scripting option? I'm imagining there is something I can have running in the background constantly parsing data from raw_data to parsed_data, but Idk how to do that. I'm thinking something just in the Gateway Events, but I don't know how to start it, stop it, etc. Thank you