Proper caching in scripts?

Working with some objects from the database that are long standing/very rarely if ever changed but referenced a lot in code so to make things simpler I am trying to implement some basic caching of the rows and using classes for standard/reused behaviors and I want to make sure I am doing things the “right” way, not leaking memory etc.

Here’s an example with one table.

PRODUCTS = system.db.runQuery(“SELECT * FROM dbo.Products”)

class Product(object):
  __slots__=(‘idx’,’name’,’type_’, ‘element’)
  def _init_(self, idx, name, type_, element):
    self.idx = idx
    self.name = name
    self.type_ = type_
    self.element = element

PROD_REPO = {}

def populateProdRepo():
    for row in PRODUCTS:
    # Type is the easiest abbreviate I can use for reference
    PROD_REPO[row[‘Type’]] = Product(row[‘idx’], row[‘Name’], row[‘Type’], row[‘Element’]]

def getSingleProduct(product):
    # product referenced by Type like OXY or NIT or ARG
    # This is how I would get a product in my other production code to use the class etc
    return PROD_REPO.get(product, None)

populateProdRepo()

This would make my coding easier as I need to use the Name/Type/Element of a product at different points in my code. But I don’t know that this is the “right” way to cache. am I going to cause a memory leak like this? @pturmel you talk a lot about caching top level variables and the like. Is this reasonble? What are the best practices/pitfalls to avoid here.

This is for a vision application for what that matters.

First thing that jumps out at me is that you are running a database query right at the top level. This dramatically raises the odds of corrupting races in v8.1, and will likely stall your entire project's scripting in v8.3.

Do not assign to PRODUCTS in the top level at all. Run that query in your populate function, which should be launched with system.util.invokeAsynchronous.

If early callers to getSingleProduct cannot tolerate nulls due to the async still running, you may need some locking.

Do not allow any delays or network-involved operations in the top level of a project script module, nor within any function called from the top level.

Be aware that in Vision, each client will have their own cache.

1 Like

I should mention that I highly recommend my VarMap class and globalVarMap() function for this sort of task, particularly if you need the results in expressions. No need to make your own class to get obj.idx, obj.name and similar to work in scripts, while also supporting expressions' use of square brackets to retrieve those same elements.

The populate script could then call gvm.refresh() at the end to make all expression references to that GVM re-execute.

1 Like

So turns out I do have an issue with early callers not tolerating async. What sort of locking should be done?

My issue is I have a function

def getProductDS(i):
    product = PROD_REPO.get(i, None)
    return product.getAsRow()

and this gets called on some runScript functions (with polling 0) where I need a dataset to work with other expressions.

However at startup, REPO is not populated for just a little bit until my populateRepo which is called asynchronously is finished, and so the first thing I get is an error that NoneType does not have getAsRow

What’s the safest way to make this function “wait” until REPO is populated before returning the value?

Can you coalesce it with a stagnant default dataset?

1 Like

This is always an anti-pattern, as it is not safe for Vision to wait for expression execution.

The re-execution without needing polling is precisely the point of the GVM's .refresh() method. (The toolkit exists to solve problems that are otherwise ridiculously evil.)

2 Likes

Unfortunately for this customer and project I will not be allowed to use any third party modules so I need to think of a different way.

I do like @robertm your idea except then I feel like then I will need runScript on polling just to grab a value from a dictionary and I hate that as it feels very icky to me though I know it should work. I guess I could do it with scripting though I much prefer an expression answer.

@pturmel Any ideas on how to do it with first party tools only?

FWIW this particular data may only update like once a year at max and on a project update. But I like having the classes and what not for scripting nad want to make sure I am doing this the right way for other sitautions.

I sincerely hope you convince the client to change their policy, because I have nothing to offer. I decline contracts that forbid use of this module.

3 Likes

Consider placing the dataset in a dataset memory tag, updated from gateway scope. Subscribe to this memory tag in an client expression tag, to keep it "live" in every client. Access the dataset with lookup expressions sourcing the client tag.

3 Likes

Ok if you don’t mind sanity checking here is how I implemented what you said -

I have two gateway tags

Common/Products_SQL - a SQL Query binding dataset tag that runs SELECT * FROM Products
Common/Products - memory dataset tag

On tag change of common/Products_SQL - if the row is >0, I write to the memory tag to avoid any times when it would have zero rows.

Then, client dataset tag Common/Products that subscribes to gateway tag Common/Products via an expression.

So now for any GUI considerations - lookup will be the preferred method to get a value or to get a row I have a simple function I can runScript on to fetch an entire row.

But my common.Products I still have my Product constants I use in other business logic areas like

def stepInProcess():
    if partOfProcess in (common.Products.PRODUCT_A, common.Products.PRODUCT_B):
        foobar()
    elif partOfProcess in (common.Products.PRODUCT_C):
        boofar()

So what is the safe way to populate these constants - is it still system.util.invokeAsynchronous(populateRepo) or do I get any extra assuredness from have the client memory tag where I can do something like
PRODUCT_A = Product(getProduct(system.tag.readBlocking([‘[client]common/Product’])[0].value, ‘A’))
at the top level of the script?

Though I guess now on second thought if I am doing the GUI all through lookups/runScripts on a client tag that would update from 0 to 4 rows and retrigger runScript upon datachange, those would be fine, and the business logic scripts don’t run immediately so an asycn populateRepo would also be fine at startup so maybe nothing wrong there

That isn't a memory tag. That will run repeatedly and create excess tag change events.

The memory tag should be populated by a gateway startup event script. Have a client tag change script populate your constants, too. (May need a global statement on that function.)

Consider making common.Products and instance of SmartMap so you can assign to it like a dictionary, but access with your existing .something notation.

Just create Products as an empty SmartMap at the top level of common.

1 Like

The process should then be -

  1. On gateway startup I populate my gateway memory tag manually via scripting one time
  2. Client dataset tag subscribes to the gateway memory tag via an expression
  3. I have a client tag change event watching the client dataset tag, on tag change - I can use that as my cue - if the client tag now has data in it - I can populate my common.Products.PRODUCT_A, common.Products.PRODUCT_B

Thanks for the SmartMap link that is what I was trying to recreate lol.

Thanks for taking the time. My first time using top level variable caching on database related objects instead of just hard coded dictionaries that sometimes got updated.

1 Like

FWIW, the VarMap class in my toolkit is essentially a java re-implementation of SmartMap.

1 Like

I have it basically working mostly though I have intermittent problems, and in designer my setup kind of stinks.

I would like to convince the customer to allow this module - if I did what would be the viable path forward for me? You say I don’t need to define my Product class anymore though I do have other needs for doing so (It’s used in a dictionary key and in equality statements so I do have the need to do things like def __hash__ , __eq__ etc) so assuming I do need to define - whats the broad strokes for leveraging your module here?

I’ve come around that its probably easier to convince them of using your module than rolling my own right now.

Eww! Just just the ID as the dictionary key.