Hello, I’m looking for clarification on sending/returning datasets with the new ProtoBuf rpc interface.
Reference manual is here (thank you @paul-griffith ): ModuleRPCFactory to ProtoBuf
Background:
- Migrating existing modules from 8.1 to 8.3
- At least one module is designer-based
- At least one scripting module calls another module
- Perspective is our main focus, but automated tests are in vision at the moment
- Many scripting functions return Dataset
Focusing just on scripting module functions returning Dataset for this topic.
Best practice #5:
In general, avoid sending datasets directly over RPC. Datasets are fundamentally awkward to work with from Java due to their total lack of compile time safety. If you need to return a dataset for backwards compatibility reasons (say, in a scripting function implementation), consider returning a strongly typed list of record objects from your RPC interface, and “adapting” that to a dataset in the local scope. Look at com.inductiveautomation.ignition.common.rpc.impl.SwingRpc.SwingSessionInfo
for an example of this pattern.
The scripting-function sample code on github includes a similar comment:
` In this case, we need RPC to retrieve this information from the gateway, but we don't want to deal with an arbitrary dataset over RPC, so instead we use our custom Metadata record class and just adapt that into a DS`
The sample code and guide both reference ProtoRpcSerializer.DEFAULT_INSTANCE, which I infer from the builder and value section of the guide, does NOT support transferring datasets directly.
- Is this correct or should the fallback GSON converter be able to convert a dataset by default?
- What guidance can you give for avoiding problems with dataset serialization?
- Do any of the sample projects have a dataset being sent over rpc?
You're correct, ProtoRpcSerializer
by default deliberately does not support encoding datasets.
It's alluded to in the other guidance you found, but the explicit reason is this:
Datasets are fundamentally incompatible with the 'closed world' model we want to assume as part of the move away from Java serialization towards other technologies like Protobuf.
A dataset can contain any number of columns of any type - there's no way to restrict what types it contains. That means every place you accept datasets on the receiving side you have to "trust" the side channel communication about what classes to construct to "rehydrate" the dataset from whatever serialized form.
Trusting the remote side's serialization information is the classic problem with Java's built in serialization from a security standpoint - the majority of the reason Java serialization is considered insecure is because of how many times a malicious payload has been used to "rehydrate" some random Java class that has a capability that can be exploited.
For anyone not interested in the module dev side who is reading this and is scared of use of Java serialization in Ignition, note that we have also for years now had an automatically applied block list of classes that cannot be recreated from serialization, protecting us from known attacks of this kind even where Java serialization is used.
The best guidance is what you've already quoted - unless you absolutely have to, don't send datasets over RPC. In most cases within Ignition while we were migrating things over, a scripting function that returned a dataset could be transparently rewritten to rely on an RPC method with a specific known payload shape. Think of something like system.util.getSessionInfo
- the 'return shape' is always a dataset with a fixed column count and fixed column types. So the natural representation of that (that's compatible with RPC) is a Java record composed of those primitive/common column types. That combination is naturally GSON serializable so you don't have to do any custom serialization logic. Then you send a list of those "row" objects over RPC, and on the scripting implementation side use DatasetBuilder
to turn that list of rows back into a "real" dataset to return from your scripting endpoint.
The other side of things is the more unfortunate one - what if you do need to return an arbitrary dataset, e.g. from something like system.util.sendMessage
or system.db.runQuery
?
Well, then your options are either:
- Make a 'best effort' to serialize using not Java serialization, accepting that there will be edge cases that cannot be directly serialized.
- Accept the security risk of using Java serialization, ideally for as limited of a scope as possible, i.e. a single RPC method or payload object. Take a look at
NamedQueryRPC
for an example of this - Java serialized data is "smuggled" inside an outer Protobuf envelope.
You can also offer end users a 'switch' to migrate between the two - for the sake of backwards compatibility we added this to 8.3 for system.util.sendMessage/sendRequest
- upgrading users default to Java serialization while new users use a json-over-protobuf approach with fixed type support.
I'll also add that you don't have to use our ProtoRpcSerializer
at all. If you choose to, you can completely sidestep our serialization and use ObjectInputStream/ObjectOutputStream directly - you'll have to implement the core RPC interfaces but there's nothing preventing you from doing this. Ideally you would only do this as a short term development 'hack' to get working on 8.3, but 
No, not to my knowledge. Again, that's pretty deliberate.
The other side of things is the more unfortunate one - what if you do need to return an arbitrary dataset, e.g. from something like system.util.sendMessage
or system.db.runQuery
?
Well, then your options are either:
- Make a 'best effort' to serialize using not Java serialization, accepting that there will be edge cases that cannot be directly serialized.
maybe this is a dumb question, but isn’t the classic solution to this problem ‘send it as a csv assuming the output types are serializable independently, and deserialize it at the other end into a dataset type’?
That's essentially what's done, except CSV would be way too lossy (not to mention inefficient) of a format.
For instance, how do you distinguish between a string and a UUID via CSV? Or a datetime in epoch milliseconds and a long integer? Or even something as basic as double and a float? Or "true"
and a literal boolean true
?
Ultimately the fundamental "primitive" used for IA's first party RPC data model is this Protobuf class:
message Value {
// An associated identifier that will be used to pick up custom deserialization logic on the receiving side.
// If no identifier is supplied, the value is decoded as whatever underlying Java type.
optional string identifier = 1;
oneof value {
bool bool_value = 2;
sint32 int_value = 3;
sint64 long_value = 4;
float float_value = 5;
double double_value = 6;
string string_value = 7;
bytes binary_value = 8;
ValueCollection collection_value = 9;
}
message ValueCollection {
repeated Value value = 1;
optional Implementation implementation = 2;
enum Implementation {
LIST = 0;
SET = 1;
MAP = 2;
}
}
}
Which, will not perfect, allows a "good enough" representation of fundamental Java types, plus the identifier
escape hatch to allow pre-registered tagging. For instance, you send a string plus the identifier "uuid"
down the wire, and on the other side have a registered deserializer for "uuid"
that can turn a string back into a UUID.
1 Like
in my bad old days of development i would have said ‘hungarian notation in column headers’
but yep i take your point for sure