[BUG 6926] Ignition 8.1.23 - writeAsync with remote tags doesn't works anymore when using custom Audit Log module

mazeyrat · December 14, 2022, 3:27pm

@PGriffith
We have a possible regression since we upgraded to Ignition 8.1.23,

when we write to remote tag with system.tag.writeAsync, the tag engine seems to be blocked after it.

We restart the gateway (the one where the writeAsync was executed)
When we write to remote tag with system.tag.writeBlocking, all is fine...

Nota : Tag is a memory we no access restriction.

The issue doesn't seems to be related of the dataype of tags (reproduced with float, dataset, document)

Kevin.Herron · December 14, 2022, 3:48pm

Is this reliably reproducible?

mazeyrat · December 14, 2022, 4:07pm

Yes, I believe. We have made the test 2 or 3 times, restarting the front gateway.
We tested with different tags and it seems to be reproductible.
There is no issue with local tags.

avaughn · December 14, 2022, 4:11pm

I wasn't able to reproduce this with a pair of fresh 8.1.23 gateways. It's possible there's something in your setup that I'm missing in my replication attempt though. Reaching out to support so they can take a look at your system would probably be a good idea.

Kevin.Herron · December 14, 2022, 4:12pm

Do you have auditing configured?

mazeyrat · December 14, 2022, 4:15pm

Yes
I will try to disable it

ggross · December 14, 2022, 4:24pm

I am also unable to replicate with what I have been trying, including testing with auditing on and off for a project (remote and local auditing profiles).

Some other questions:

Are all gateways on 8.1.23? If not, which Gateway has 8.1.23 (one with the tag, or the one making the call?) and what is the version of Ignition running on the other Gateway?
Where is the system function being called from? Script Console, Perspective, Vision, Gateway Event Script?

Thanks,
Garth

Kevin.Herron · December 14, 2022, 4:25pm

A thread dump after everything is locked up might be useful.

mazeyrat · December 14, 2022, 4:32pm

I don't have the issue when I disable the audit.
All the gateway are in 8.1.23.
The script is executed by a Perspective message handler of a view.

I suspect something relative to the audit.

We use a custom audit module build by Travis initially for 7.9 to support audit splitting to multiple destination.

We have purchased the source of this module and we have adapted it for 8.0 and 8.1.
I will search if the issue could be on our side and try to reproduce the issue installing IA audit module.

mazeyrat · December 14, 2022, 4:54pm

the thread dump right after the issue:

Kevin.Herron · December 14, 2022, 4:59pm

There's a thread blocked in what looks like your custom audit log splitting module:

    {
      "name": "pool-3-thread-1",
      "id": 177,
      "state": "WAITING",
      "daemon": false,
      "system": "None",
      "scope": "Gateway",
      "cpuUsage": 0,
      "lockedSynchronizers": [
        "java.util.concurrent.ThreadPoolExecutor$Worker@34d1a746"
      ],
      "waitingFor": {
        "lock": "java.util.concurrent.CompletableFuture$Signaller@219bc82c"
      },
      "stacktrace": [
        "java.base@11.0.16.1/jdk.internal.misc.Unsafe.park(Native Method)",
        "java.base@11.0.16.1/java.util.concurrent.locks.LockSupport.park(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture$Signaller.block(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.ForkJoinPool.managedBlock(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture.waitingGet(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture.get(Unknown Source)",
        "com.bouyguesenergiesservices.ignition.gateway.audit.AuditLogSplitter.createActionTargetEquipement(AuditLogSplitter.java:457)",
        "com.bouyguesenergiesservices.ignition.gateway.audit.AuditLogSplitter._audit(AuditLogSplitter.java:735)",
        "com.bouyguesenergiesservices.ignition.gateway.audit.AuditLogSplitter.audit(AuditLogSplitter.java:352)",
        "app//com.inductiveautomation.ignition.gateway.script.GatewayTagUtilities.lambda$audit$10(GatewayTagUtilities.java:698)",
        "app//com.inductiveautomation.ignition.gateway.script.GatewayTagUtilities$$Lambda$4812/0x000000080198ac40.accept(Unknown Source)",
        "java.base@11.0.16.1/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Unknown Source)",
        "java.base@11.0.16.1/java.util.stream.IntPipeline$Head.forEach(Unknown Source)",
        "app//com.inductiveautomation.ignition.gateway.script.GatewayTagUtilities.audit(GatewayTagUtilities.java:689)",
        "app//com.inductiveautomation.ignition.gateway.script.GatewayTagUtilities.lambda$writeAsyncImpl$3(GatewayTagUtilities.java:257)",
        "app//com.inductiveautomation.ignition.gateway.script.GatewayTagUtilities$$Lambda$4670/0x0000000801a11440.accept(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture.postComplete(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.CompletableFuture$AsyncSupply.run(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)",
        "java.base@11.0.16.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)",
        "java.base@11.0.16.1/java.lang.Thread.run(Unknown Source)"
      ]
    }

Does this remain blocked indefinitely?

mazeyrat · December 14, 2022, 5:08pm

I have disabled the audit splitter module and replace with IA audit module and I don't reproduce this issue.

I will investigate and active more logs to see what is wrong with the audit splitter module.

What is strange is that if I change my script from:

system.tag.writeBlocking([tagPath],[value])|
system.tag.writeAsync([tagPath],[value])|

I don't have the issue with my audit splitter module.
Is there something that can explain this behavior ?

AuditLogSplitter.java:457

CompletableFuture<List<QualifiedValue>> future = this.context.getTagManager().readAsync(readTags,SecurityContext.systemContext());
List<QualifiedValue> results = future.get();

I read some tags information to complete audit data.

Kevin.Herron · December 14, 2022, 5:18pm

So, I'm pretty sure I understand why this is happening.

First, 8.1.23 now actually audits writeAsync calls that happen from Perspective.

Since this is an async call, the audit is happening in the callback from TagProvider::writeAsync. For a remote provider, there's a single thread callbacks are executing on. Your audit logic is now being reached inside the callback while on that single thread.

You're then doing a blocking read inside that callback, presumably for a tag on the same remote provider, but that single thread is currently busy and you've essentially just live-locked yourself.

This doesn't happen with the writeBlocking call because the audit call happens after TagProvider::writeAsync has returned results, not inside the callback that it completed.

We'll need to discuss internally what should change, if anything, but for now you'll have to somehow not block on a result from the same provider in your audit logic.

Kevin.Herron · December 14, 2022, 5:27pm

Yeah, I think we'll be fixing this. It's essentially the same problem fixed for standard tag providers back in 8.1.13, under the release note:

Fixed a bug that lead to timeout when calling system.tag.readBlocking inside the callback for system.tag.readAsync.

and you can reproduce this without any audit logic at all, custom or not, by doing a system.tag.readBlocking for a tag in the same remote provider inside the async callback for the system.tag.writeAsync to a tag in that remote provider.

mazeyrat · December 14, 2022, 6:23pm

Thanks a lot for these detailed explanation and for pointing me to the root cause of this issue.
I will try to decouple and call readBlocking ouside the writeAsync context.

mazeyrat · December 14, 2022, 6:38pm

public class AuditLogSplitter implements AuditProfile {

public void audit(AuditRecord record)
	call _audit

...

private void _audit(Datasource ds, AuditRecord record, int retentionDays, int actionValueMaxSize) {
        try {

            if (logger.isDebugEnabled()) {
                logger.debug("Logging event [" + record + "] to [" + ds.getName() + "] with query: ["
                        + insertQuery + "]");
            }

            // Ctrl actionValue size:
            String actionValue = record.getActionValue();

            if (actionValue != null){
                if (actionValue.length() > actionValueMaxSize){
                    actionValue = actionValue.substring(0,actionValueMaxSize);
                    logger.debug("actionValue column with length = {} truncated to actionValueMaxSize = {}",actionValue.length(),actionValueMaxSize);
                }
            }

            context.getHistoryManager()
                    .storeHistory(ds.getName(), new AuditRecordSFData(ds.getName(), insertQuery,
                            record.getTimestamp(), record.getOriginatingSystem(),
                            record.getOriginatingContext(), record.getActor(), record.getActorHost(),
                            record.getAction(),
                            record.getActionTarget(),
                            createActionTargetEquipement(record),
                            createActionTargetDesignation(record),
                            actionValue, truncateActionValueDetail(createActionValueDetail(record)), record.getStatusCode()));

            long now = System.currentTimeMillis();
            if (now - lastRetentionClear > TIME_BETWEEN_RETENTION_CLEARS) {
                lastRetentionClear = now;

                SRConnection con = ds.getConnection();
                try {
                    int affected = con.runPrepUpdate(deleteQuery, new Date(now - (retentionDays * MILLIS_PER_DAY)));
                    if (affected > 0 && logger.isDebugEnabled()) {
                        logger.debug("Deleted " + affected + " old audit events from [" + ds.getName() + "]");
                    }
                } finally {
                    try {
                        con.close();
                    } catch (SQLException ignored) {
                    }
                }
            }
        } catch (Exception ex) {
            logger.error("Error auditing [" + record + "] to [" + ds.getName() + "], due to underlying exception.", ex);
        }
    }

createActionTargetEquipement(record)
createActionTargetDesignation(record)

called in storeHistory use readblocking

Perhaps I can put this _audit code in:

CompletableFuture<U> supplyAsync(Supplier<U> supplier, Executor executor)

with a dediacted executor ?

@Kevin.Herron do you think it could solve my issue ?

mazeyrat · December 15, 2022, 10:53am

I finally added a ConcurrentLinkedQueue to stack the audit records and process them outside the audit function called by writeAsync.