Matrix based regression calculations for polynomial coefficients

MikeAllgood · May 9, 2019, 9:39pm

I am building a small project that takes test data and is suppose to use polynomial regression to calculate the coefficients of the polynomial equation. Initial searching online led me to NumPy but I know know it is not compatible with Ignition (Jython).
What I’ve been given is a 6x6 matrix of the Sum of the Squares of the sample data and a 5th order poly equation. The customer has dictated that the solution is to use the Sum of Squares, the Inverse Matrix of the Sum of Squares matrix and matrix multiplication of the Inverse Matrix and the result side matrix to solve for the coefficients.
It would look something like this:

Getting the sum of squares array (matrix) is somewhat trivial.
Calculating the Inverse Matrix of that is another matter altogether! Out beyond a 2x2 matrix this becomes exponentially more difficult with each increase in matrix size.
I’ve seen a few suggestions on other libraries that are Ignition (Jython) compatible, but I haven’t found one that will do this matrix based regression calculation to solve for the coefficients.
Am I just missing it?
Has anyone else had to do this and found an efficient solution?

Thanks,
Mike

Kevin.Herron · May 9, 2019, 9:49pm

You might find something helpful in Apache Commons Math, which we started including recently and should be available to use in Jython scripts.

KathyApplebaum · May 10, 2019, 2:38pm

Exactly what Kevin said. In fact, they even have an example of creating a matrix and getting the inverse: https://commons.apache.org/proper/commons-math/userguide/linear.html

MikeAllgood · May 10, 2019, 4:02pm

I have gotten some test code to work!
I am able to manually validate the resulting Inverse Matrix.
The problem I have now is, I think, with the underlying equations (y = a0 + a1x + a2x^2…) and the assumption that the coefficients (a) can be calculated from this Inv of Sum of Square * Y Result Vector.
I’m not math genius, but when starting with known values for x, y, and a, my resulting coefficients (a) aren’t what I get back out of this logic.

As that part is what was provided by the customer, I’ll have to bounce it back to them for clarification.

Thanks for the help!
Mike

MikeAllgood · August 8, 2019, 4:46pm

Well, I’m back!
Although I figured out the matrix math using the Apache Commons libraries, the results in the real world are lacking. This has to do with the the polynomial regression to find a 5th order (6 coefficients) polynomial requiring a 6x6 Sum of Squares matrix and a 6x1 results vector and the sampling of 7 test points causing me to have to drop one of the sample points. This throws off my resulting equation.

The previous solution used Excel and the LINEST function to perform a linear regression to find the coefficients. I think I can replicate that using the Apache Commons library, I’m just not sure how.
I think the org.apache.commons.math3.fitting package could be used, I’m just at a loss for how to implement it.

Can anyone give me a bit of guidance on implementation of this to solve for the polynomial coefficients?
I’m completely confused as to what to import and how to implement it.
What I have is 7 test points of flow (X) and pressure (Y) and need to evaluate the 5th order polynomial for an equation to fit these test points. That equation would look like this:

Y = C0 + C1 * X + C2 * X^2 + C3 * X^3 + C4 * X^4 + C5 * X^5

I believe the fit method of the AbstractCurveFitter class will calculate these, but I am at a loss as to how to utilize this.

Any help would be appreciated.

MikeAllgood · August 8, 2019, 5:11pm

OK, I think I got this. They really don’t document the usage of these libraries very well sometimes.
Here is what I figured out in case anyone needs this in the future (or when I forget and need a reminder!).

from org.apache.commons.math3.fitting import PolynomialCurveFitter as pcf
from org.apache.commons.math3.fitting import WeightedObservedPoints as wop

lstFlow = [4590, 4099, 3517, 2849, 1673, 690.5, 6.221]
lstHead = [2.142, 5.962, 15.148, 18.065, 23.294, 23.873, 24.187]

fitter = pcf.create(5)
tstPoints = wop()

for i in range(7):
	flow = lstFlow[i]
	head = lstHead[i]
	tstPoints.add(flow, head)
	
coeff = fitter.fit(tstPoints.toList())

for i in range(len(coeff)):
	print coeff[i]

I outputs the following coefficients (which match what my customer’s spreadsheet outputs):
24.1179181818
0.00230480179819
-5.50959194266e-06
3.78439176723e-09
-1.13047599678e-12
1.07539450483e-16

JordanCClark · August 8, 2019, 7:28pm

Just for fun I put your data through my polyfit function. It’s nice to know the the hair pulling I put into it wasn’t a waste.

MikeAllgood · August 8, 2019, 7:48pm

Is that a built in function?
I don’t see it in the docs.
That would have saved me days of effort (and anxiety) if not weeks.

Thanks!

JordanCClark · August 8, 2019, 7:57pm

Written before we had access to Apache Commons Math .