Configure Wrapper Log for single-line messages

joshua.rowlison · April 30, 2021, 1:22pm

I’m looking to find out if this problem has been encountered and has a known solution, but it’s hard to search for given the common keywords.

We’ve set up a log forwarding whereby we essentially tail the log file and send that off to a server for storage etc.

The problem arises when stack traces for errors are broken across multiple lines and other properties are repeated for each line. As a result tailing the file results in 10+ messages per entry, the vast majority of which are a single line from the middle of a stack trace. This creates a lot of overhead, makes the logs impossible to parse effectively and just generally sucks.

Intuitively I expect that the underlying Java is sending messages individually, and I simply need to find a way to configure the Logback and Log4J I see referenced in the conf directory to remove newlines or replace them with something suitable.

My hope is that someone here has seen this before and can offer a solution before I have to dive into Logback and try to quickly learn how to stem the tide before the team receiving the logs cut me off.

paul-griffith · April 30, 2021, 4:41pm

Looks like you might be able to construct logback filters to do so:

joshua.rowlison · May 4, 2021, 7:44pm

I believe you are correct, however we replaced
<pattern>%.-1p [%-30c{1}] [%d{HH:mm:ss,SSS}]: %m %X%n</pattern>

with

<pattern>%replace(%m){'[\r\n]+', ''}%nopex%n</pattern>

In logback.xml and it did not have the desired effect.
Nothing I’ve done actually seems to change the logs at all, which makes me doubt that I understand the system in place here.

kcollins1 · May 4, 2021, 9:28pm

What are you using to tail the logs and then forward to your logging service? Promtail, as an example, has a multiline transform stage that is designed to aggregate multi-line output (such as stack traces) into single messages.

kcollins1 · May 4, 2021, 10:09pm

I’ll also add a bit of other information for folks wanting to further understand some of the current logging configuration within Ignition. Logback is where logs are originally emitted to stdout (Console) and to logs/system_logs.idb. The Java Service Wrapper is what consumes the stdout from Logback and then outputs to logs/wrapper.log. The format for the logs/wrapper.log can be customized by looking at the wrapper.logfile.* logging configuration properties that you can configure in data/ignition.conf. The wrapper also emits its own buffered console output, it isn’t ideal to consume from this for a log forwarder due to how it buffers the output (you might not get the most recent messages right away, especially if log frequency is sparse). In our Docker image, we actually disable the wrapper console logging and redirect logs/wrapper.log → /dev/stdout directly. We also customize the default output via wrapper.logfile.format="PTM".

There are also additional Appenders that you can add to the data/logback.xml configuration to further customize how log output is produced.

joshua.rowlison · May 5, 2021, 12:30pm

We’re using NXLog. I haven’t found a way to re-condense the log files once they’re in the log file. especially not with the timestamps etc being inserted every line.

The issue I’ve been having is that I make changes to the appender pattern and they are not reflected in the log file. I believe altering SysOutAppender should allow me to change the format of wrapper.log, such as removing new lines from exceptions. I’ve referenced %m and %ex at different times with no results. I’m about to try another change but I’m not sure how I’d even do it other than trial and error.

joshua.rowlison · May 5, 2021, 2:23pm

Do you happen to know if anything other than logback is feeding the wrapper file? Or alternatively, if anything other than %m and %ex are are needed in the filter?

I used
<pattern>X %replace(%ex%m){'[\r\n]+', ''}%nopex Z</pattern>

To try and identify if my pattern was even applying and while there are lines that have the literals and are compressed to one line, the majority of the log file is still unaltered stack traces, as if my pattern was never applie to them in the first place.

kcollins1 · May 5, 2021, 2:28pm

I think you’d probably have better luck in your situation with one of two options:

Go straight to file with Logback and a FileAppender (bypassing the extra step of processing done by the Java wrapper). Then tail this file with NXLog.
Add a SyslogAppender to Logback and point it at a reconfigured NXLog that is listening for Syslog messages on the input side (instead of tailing the logs). This might be helpful for mitigating the multi-line aspects, as each log entry (with stack-trace, etc) will already be packaged.

It sounds like the formatting being done by the Java wrapper (which is consuming the sysout of Logback and then doing its own writing of that content out to the wrapper.log) is probably getting in your way.

joshua.rowlison · May 5, 2021, 2:32pm

That makes sense, I suppose. If I can get away from the pipe-delimiting in the wrapper I’ll have an easier time breaking it down in Loggly later anyway.

Just gotta wrap my head around Logback farther than I was planning today.

kcollins1 · May 5, 2021, 2:38pm

Below is an example Logback file that I’m using to achieve a similar function:

<?xml version="1.0" encoding="UTF-8"?>

<!-- For assistance related to logback-translator or configuration  -->
<!-- files in general, please contact the logback user mailing list -->
<!-- at http://www.qos.ch/mailman/listinfo/logback-user             -->
<!--                                                                -->
<!-- For professional support please see                            -->
<!--    http://www.qos.ch/shop/products/professionalSupport         -->
<!--                                                                -->
<configuration debug="true">
  <appender name="SysoutAppender" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>ignition %date{"yyyy-MM-dd HH:mm:ss.SSS",UTC}Z %level {%thread} [%logger] - %message%n</pattern>
    </encoder>
  </appender>
  <appender name="rsyslog" class="ch.qos.logback.classic.net.SyslogAppender">
    <syslogHost>rsyslog</syslogHost>
    <port>514</port>
    <facility>SYSLOG</facility>
    <suffixPattern>ignition %date{"yyyy-MM-dd",UTC}T%date{"HH:mm:ss.SSS",UTC}Z %level {%thread} [%logger] - %message%ex{full}</suffixPattern>
    <stackTracePattern>  </stackTracePattern>
  </appender>
  <appender name="DB" class="com.inductiveautomation.logging.SQLiteAppender">
    <dir>logs</dir>
    <!--
      Maintenance Settings
      entryLimit: The maximum number of entries in the database. However, at any given time, there may be more than this number, due to how cleanup works.
      maxEventsPerMaintenance: The number of event that can happen before a maintenance cycle occurs.
      minTimeBetweenMaintenance: The minimum time (max frequency) between maintenance events. Takes precedent over max events.
      vacuumFrequency: The number of maintenance cycles before a "vacuum" is performed, to recover disk space.

      On disk, most log events are between 600-800 bytes.
    <entryLimit>50000</entryLimit>
    <maxEventsPerMaintenance>5000</maxEventsPerMaintenance>
    <minTimeBetweenMaintenance>60000</minTimeBetweenMaintenance>
    <vacuumFrequency>3</vacuumFrequency>
    -->
  </appender>
  <appender name="SysoutAsync" class="ch.qos.logback.classic.AsyncAppender" queueSize="1000" discardingThreshold="0">
    <appender-ref ref="SysoutAppender" />
  </appender>
  <appender name="rsyslogAsync" class="ch.qos.logback.classic.AsyncAppender" queueSize="1000" discardingThreshold="0">
    <appender-ref ref="rsyslog" />
  </appender>
  <appender name="DBAsync" class="ch.qos.logback.classic.AsyncAppender" queueSize="100000" discardingThreshold="0">
    <appender-ref ref="DB" />
  </appender>
  <root level="INFO">
    <appender-ref ref="SysoutAsync"/>
    <appender-ref ref="rsyslogAsync" />
    <appender-ref ref="DBAsync"/>
  </root>
</configuration>

In the example above I’m using rsyslog as a UDP syslog receiver and TCP syslog forwarder (to promtail). It looks like NXLog might be able to receive UDP syslog directly…

EDIT: you can ignore the SysoutAppender config above, I was just using that to debug the format before I got the rsyslog one working.

joshua.rowlison · May 5, 2021, 3:36pm

I didn’t realize logback could send to network directly, I could possibly drop nxlog and go straight to the external server this way.

joshua.rowlison · May 5, 2021, 6:36pm

I’ve actually pretty nearly gotten this working, we are able to forward logs directly to our external setup bypassing any log file at all. The format shows up exactly as desired.

Unfortunately it appears as if somehow data is being sent to wrapper logs that doesn’t come through logback.

My wrapper logs are rapidly churning with exceptions coming from device timeouts and the like, I can’t get any such thing to appear in the logs processed by my syslog appender, but I don’t see any sort of filter or setting that would cause me to be excluding such traffic. It feels like I’m an inch away from this, I just need to work out where the rest of these exceptions are coming from.

kcollins1 · May 5, 2021, 6:42pm

Do you get the first line of the associated error message (which are then followed by a stack trace that you're seeing in the wrapper.log)? Or nothing from those particular errors? What is the pattern layout that you're using for your syslog appender?

joshua.rowlison · May 5, 2021, 6:57pm

We’re forwarding to Loggly, and after it starts up it fires off a bunch of non-exception entries and then it’s just silent.

The entire appender block appears below

<appender name="LogstashAppender" class="ch.qos.logback.classic.net.SyslogAppender">
	<syslogHost>logstash</syslogHost>
	<port>514</port>
	<facility>syslog</facility>
	<suffixPattern>date=%date,level=%level,message=%message,exception=%exception</suffixPattern>
	<stackTracePattern>  </stackTracePattern>
  </appender>

joshua.rowlison · May 5, 2021, 7:02pm

I actually set up on my test server, a 5 second script that just tries to access a missing attribute on a variable to generate known traffic. That is my canary currently because I know it’s happening, I can see it in the wrapper logs.

EDIT: And upon inspection I see that those get squashed automatically. Guess that won’t help.

joshua.rowlison · May 5, 2021, 7:18pm

Continuing the investigation, I was able to cause a tag script to generate an error that looks like this in the wrapper log.

INFO   | 2021/05/05 15:12:35 | com.inductiveautomation.ignition.common.script.JythonExecException: Traceback (most recent call last):
INFO   | 2021/05/05 15:12:35 |   File "<tagevent:valueChanged>", line 6, in valueChanged
INFO   | 2021/05/05 15:12:35 | AttributeError: 'int' object has no attribute 'name'
INFO   | 2021/05/05 15:12:35 | 
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.Py.AttributeError(Py.java:207)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyObject.noAttributeError(PyObject.java:1032)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyObject.__getattr__(PyObject.java:1027)
INFO   | 2021/05/05 15:12:35 | 	at org.python.pycode._pyx5.valueChanged$1(<tagevent:valueChanged>:6)
INFO   | 2021/05/05 15:12:35 | 	at org.python.pycode._pyx5.call_function(<tagevent:valueChanged>)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyTableCode.call(PyTableCode.java:171)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyBaseCode.call(PyBaseCode.java:308)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyFunction.function___call__(PyFunction.java:471)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyFunction.__call__(PyFunction.java:466)
INFO   | 2021/05/05 15:12:35 | 	at org.python.core.PyFunction.__call__(PyFunction.java:461)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.common.script.ScriptManager.runFunction(ScriptManager.java:821)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.common.script.ScriptManager.runFunction(ScriptManager.java:805)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.gateway.project.ProjectScriptLifecycle$TrackingProjectScriptManager.runFunction(ProjectScriptLifecycle.java:687)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.gateway.tags.scripting.TagScriptManagerImpl$FunctionInvokerImpl.run(TagScriptManagerImpl.java:527)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.gateway.tags.scripting.events.AbstractTagScript.invoke(AbstractTagScript.java:34)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.gateway.tags.scripting.TagScriptManagerImpl$Task.invoke(TagScriptManagerImpl.java:476)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.gateway.tags.scripting.TagScriptManagerImpl$TagScriptDispatcher.run(TagScriptManagerImpl.java:439)
INFO   | 2021/05/05 15:12:35 | 	at com.inductiveautomation.ignition.common.execution.impl.BasicExecutionEngine$ThrowableCatchingRunnable.run(BasicExecutionEngine.java:518)
INFO   | 2021/05/05 15:12:35 | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
INFO   | 2021/05/05 15:12:35 | 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
INFO   | 2021/05/05 15:12:35 | 	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
INFO   | 2021/05/05 15:12:35 | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
INFO   | 2021/05/05 15:12:35 | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
INFO   | 2021/05/05 15:12:35 | 	at java.base/java.lang.Thread.run(Unknown Source)
INFO   | 2021/05/05 15:12:35 | Caused by: org.python.core.PyException: Traceback (most recent call last):
INFO   | 2021/05/05 15:12:35 |   File "<tagevent:valueChanged>", line 6, in valueChanged
INFO   | 2021/05/05 15:12:35 | AttributeError: 'int' object has no attribute 'name'
INFO   | 2021/05/05 15:12:35 | 
INFO   | 2021/05/05 15:12:35 | 	... 24 common frames omitted

In Loggly on the other end, I find it to be broken out line-by-line, ignoring the pattern entirely.

kcollins1 · August 26, 2021, 1:44pm

EDIT: I came back to this thread by-reference months later and the reply below was sitting here unsubmitted (nice work, Discourse )… I do remember spinning up a Loggly trial and getting some logs flowing in–not sure why I didn’t submit this reply. I’m going to go ahead and do it now with the disclaimer that there might have been some further follow-up I was hoping to do… Alas, here it is, fwiw:

It looks like you’re using Logstash to receive Syslog from Logback? I was able to get Logback->rsyslogd->Loggly to get mostly there using this Logback appender:

  <appender name="rsyslog" class="ch.qos.logback.classic.net.SyslogAppender">
    <syslogHost>rsyslog</syslogHost>
    <port>514</port>
    <facility>SYSLOG</facility>
    <suffixPattern>ignition %d{ISO8601,UTC} %p %t %c %M - %m%ex{full}</suffixPattern>
    <stackTracePattern>  </stackTracePattern>
  </appender>

… and this rsyslogd configuration to receive the UDP syslog traffic locally and then forward via TCP w/TLS to Loggly:

# run UDP syslog on syslog port 514
module(load="imudp")
input(type="imudp" port="514")

# ref: https://www.rsyslog.com/doc/v8-stable/configuration/input_directives/rsconf1_escapecontrolcharactersonreceive.html
$EscapeControlCharactersOnReceive off

# Setup disk assisted queues
$WorkDirectory /var/spool/rsyslog # where to place spool files
$ActionQueueFileName fwdRule1   # unique name prefix for spool files
$ActionQueueMaxDiskSpace 1g    # 1gb space limit (use as much as possible)
$ActionQueueSaveOnShutdown on   # save messages to disk on shutdown
$ActionQueueType LinkedList    # run asynchronously
$ActionResumeRetryCount -1    # infinite retries if host is down

#RsyslogGnuTLS
$DefaultNetstreamDriverCAFile /etc/ssl/certs/ca-certificates.crt

template(name="LogglyFormat" type="string"
string="<%pri%>%protocol-version% %timestamp:::date-rfc3339% %HOSTNAME% %app-name% %procid% %msgid% [xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx@41058 tag=\"RsyslogTLS\"] %msg%\n"
)

# Send messages to Loggly over TCP using the template.
action(type="omfwd" protocol="tcp" target="logs-01.loggly.com" port="6514" template="LogglyFormat" StreamDriver="gtls" StreamDriverMode="1" StreamDriverAuthMode="x509/name" StreamDriverPermittedPeers="*.loggly.com")

The above was largely adapted from their online documentation.

joshua.rowlison · August 26, 2021, 2:05pm

I appreciate you sharing this.

I’m able to get logs over pretty reliably, it’s more about getting control of the logs that come through and forcing them into a reliable format so I can get Loggly to break them down for me.

I think one of my issues at the time may have been failure to filter my prod servers out of the loggly traffic I was watching. It’s a bit of a cross-team effort and I wasn’t up-to-speed on the handling of data on the Logstash server at the time, so I didn’t realize it was mixed. I’ve since corrected this.

I’ve got the following as my LogStashAppender, IP redacted obviously.

 <appender name="LogstashAppender" class="ch.qos.logback.classic.net.SyslogAppender">
	<syslogHost>1.2.3.4</syslogHost>
	<port>514</port>
	<facility>syslog</facility>
	<suffixPattern>SUFF: date=%date,level=%level,message=%message,exception=%exception XX </suffixPattern>
	<stackTracePattern>STACK: date=%date,level=%level,message=%message,exception=%exception |~ </stackTracePattern>
  </appender>

I’ve juggled them around adding and removing the stack trace pattern or fiddling with the suffix pattern, but it’s tricky getting my test server to reliably generate an exception so that I can see how it winds up. And since I’m in doubt about my previous approach, I’m in a weird place.

If I can get it to generate real logs, I’ll respond with whatever I get, just for posterity, if nothing else.

joshua.rowlison · August 26, 2021, 7:42pm

<suffixPattern>SUFF:,DATE=%date,LEVEL=%level,MESSAGE=%message,EXCEPTION=%ex{full} END</suffixPattern>
	<stackTracePattern>STACK: END</stackTracePattern>

gives me logs that look like

STMASV-IGNAP01 STACK: ENDat org.python.core.SequenceIndexDelegate.checkIdxAndGetItem(SequenceIndexDelegate.java:61)

when I force an exception
if my stackTracePattern includes any variant of %ex that would prevent the auto-append of the stack trace, the exceptions simply don’t get logged at all.

joshua.rowlison · August 26, 2021, 8:50pm

After setting up a syslog server locally just to be sure the trip through logstash to Loggly wasn’t somehow causing the problem, I’ve discovered that it was causing a problem at least.

When I send the logs locally, I can add %ex or it’s ilk to the stackTracePattern and at least get the stack trace.

However, any time the StackTracePattern appears at all, I get an entry per line of the stack trace, with each line appended to the output of the pattern. If I include %ex%nopex, I get, for example, the entire stack trace appended with line 1 of the stack trace. Then I get the entire stack trace again, appended with line 2 of the stack trace and so on.

Logback-classic seems to have encountered this some time ago
https://jira.qos.ch/browse/LOGBACK-353

I’m trying to figure out if this is even fixable on my end, I just want to include some meaningful bit of stack trace without getting my logs flooded with duplicate atomized traces.