FactoryPMI Gateway Problems

I’m having a problem on my dev machine. I just upgraded with the offline upgrade exe, and now FactoryPMI will not start at all. If I look at the FactoryPMI Gateway Control window, I see this info:

Status: Faulted
Cluster Status: Connecting
Version: 3.1.3 (Build 1621)
Message: Error connecting to cluster: failed to start protocol stack.

This was working before, and I’ve use this machine since May or so with no problem. I have been developing on a different box since mid December, and haven’t touched it since then. Things I’ve verfied and tried:

Added DisableDHCPMediaSense to registry.
Verified the 127.0.0.1 localhost entry in host file.
Added port 8080 to firewall exceptions (although the firewall is turned off)
Uninstalled and reinstalled FactoryPMI

Any ideas on what could have happened? I don’t even use this machine for anything else and it isn’t online, so it wasn’t due to other software being installed or anything like that.

It’s my understanding that an error like that means that the FactoryPMI service began to start, then the clustering (JGroups) had issues, which occurs before the web server comes up. This means that troubleshooting IP resolution, firewall ports, etc is unnecessary. You would do that if you had connectivity problems after the Gateway config tool indicated a Running status. It’s also curious because the default is to load in “Standalone Mode”, which I had assumed doesn’t load JGroups or clustering. After reading the docs, it looks like only affects Runtime Client’s notion of where the Gateway is for NAT/firewall applications, which doesn’t affect you in this case.

  1. Check the Tomcat (web server) logs under C:\Program Files\Inductive Automation\FactoryPMI\logs. These are HTML formatted documents that can be read in a web browser.

  2. In general, I would highly recommend against uninstalling FactoryPMI unless you have a gateway backup, or don’t mind losing all your projects. The projects and configuration are stored in the internal db that go with an uninstall. Contact us and we’ll get you back up without an uninstall.

  3. I’d also speculate that something might be wrong with your cluster.xml configuration file in the FactoryPMI install directory.

Please post a more descriptive error message from #1.

Thanks Nathan.

The re-install is no problem. I back up everything religiously, and had been developing on a different box for the last 5 weeks anyway. Starting from scratch didn’t bother me (not that I wouldn’t have prefered not to).

It’s odd, because I am working on three machines, and routinely move projects around with only a little tweaking. Anyway, I’ll get you that html file later.

Looking at the gateway logs would have been the thing to do here. My guess was that you had “autodetect bind interface” turned off, and your manually specified bind interface didn’t actually exist. For example, you may have specified “192.168.1.16”, when that wasn’t really your IP address (maybe it was DHCP and it changed…)

Ok, let’s assume that is what happened. How would I fix it? I can’t get to the configuration page, and I can’t restore the old or new project. Looking back, it is possible that it was connected on the 192.168.10.xxx network the last time I used it, but I can’t be sure. And wouldn’t a complete uninstall/reinstall set FactorypMI back to the default settings?

Just trying to understand this. I will have the log files in a couple of hours.

Good question. All of those settings are stored in the “cluster.xml” file in your FactoryPMI installation directory. That file is full of XML entries like this:

<entry key="some.named.key">this is the value</entry>

So, if you wanted to change the “bind_addr” property, you’d like for the line like:

<entry key="cluster.bind_addr">192.168.10.20</entry>

and replace the IP address with the correct one there. This is also a convenient way to set up clustering without having to start up the FactoryPMI Gateway, for those users who aren’t afraid of editing some XML. Note: it helps if you look at this file in a text editor (like notepad, or even better, jEdit or TextPad) with “word wrap” off.

Hope this helps,

I looked at the logs (I’ve forwarded them to support seperately), and autodetect was set to on. Let’s take it from there.

There are two interesting lines in those logs:

Caused by: java.lang.Exception: problem creating sockets (bind_addr=/169.254.33.88, mcast_addr=239.4.4.44:45566) ... Caused by: java.net.SocketException: IP_ADD_MEMBERSHIP failed (out of hardware filters?)

It auto-detected your bind interface to be on 169.254.33.88 - does this IP mean anything to you? Apparently it doesn’t allow UDP multicast listeners to be added to it. I suggest turning off cluster.bind_addr.autodetect (set it to “false”) and type in the proper IP address for cluster.bind_addr

I have no idea. My PC only has one network card, and it wasn’t connected to any network at the moment.

I’ll thow one more thing out there. I added a firewire card in early December, but I can’t absolutely say for sure whether I used FactoryPMI after this. It probably isn’t relevent, but brought it up because technically the 1394 port does have tcpip properties. Could it possibly be binding to that somehow?

I’ll try your suggestions. Thanks for the help.

Step7,

That address looks like it was handed out by APIPA - Automatic Private IP Addressing, which uses the IP range 169.254.0.1 to 169.254.255.254. That would make sense if your PC wasn’t set to a static IP address and (being disconnected) it couldn’t find a DHCP server.

Ah, I’ve always wondered what was up with those 169.* addresses in Windows. I learn something new every day! Thanks for the tip Al.

Step7 - turn off media sense and give your ethernet adapter a static IP like 192.168.18.20 or something, and use that as your bind_addr.

Thanks, I’ll give it a try in a little while. What is strange though is that I haven’t changed anything. In fact, I have been running FactoryPMI all day on the laptop I typing this on right now, and I have autodetect set to 1 and no static ip assigned. That is how I’ve always had it set on my three development machines except when I connect to a real PLC, in which case the only thing I change is the static ip address.

Ok, it’s a no-go. I set the static ip to 192.168.10.124, and made the new changes in the xml file. I’ve sent the log files seperately.

One thing I saw that may or may not be important is this message I got when trying to open the xml file in my browser: "The system cannot locate the resource specified. Error processing resource ‘http://java.sun.com/dtd/properties.dtd’. " Mean anything to you?

Hmm… the error thats causing this is the java.net.SocketException: IP_ADD_MEMBERSHIP failed (out of hardware filters?)

A google search on that error seems to suggest that it occurs when IP multicast traffic is not enabled on a certain networking device. I’ve never seen this message before. Is this a strange device (VPN device?), or is there any network protection software installed, like Norton?

Don’t worry about that message you saw - don’t open up XML documents in a web browser (despite Windows’ suggestion)

No, there is no anti-virus, Norton software, or anything like that. I don’t even have the firewall enabled. I use this strictly for a dev machine, and as far as I can remember, it has never even been on the net before.

But, the good news is that I got it working. Your note about filtering made me think of something. I went to “Local Area Connection Properties”>Internet Protocol>Properties>Advanced>Options>Properties, and enabled/disabled the “Enable TCP/IP Filtering” option. I swear on Bill Gates’ keyboard that that is all I did, and now it works fine. It fired right up on the first try.

What can I say. Chalk that one up as something to check when absolutely nothing else works.

The Windows 2000/XP Packet Filtering is a feature that disables all TCP/IP communication on the selected adapters, but allows you to define: TCP ports, UDP ports, and protocols to allow usage. It sounds like checking that option effectively hoses your network adapter.

It was unchecked orginally, but I checked it and unchecked it again. I knew the setting was there, but never had to use it before.

And, I can’t be 100% certain that it’s cured yet. I haven’t rebooted yet, but I’m going to wait a few days so I can get some work done. Stay tuned…

Wow, I’m glad you found that option…I never would have thought of that!

Interestingly enough, I just saw this a few minutes ago on another customer’s machine. Of course, I don’t believe in coincidence, so what changed? They were running 3.1.3 as well, which means its not the new JGroups library.

Your solution (checking and unchecking the filtering checkbox) worked in this situation as well!

Weird. Did a Microsoft auto-update get pushed down recently that messed with the TCP/IP stack?

Not on my machine, at least. Remember, my machine has never been online, and I have done no offline updates. And, I have not had a problem on my other two PCs.

I wonder if there was some sequence of events I did in restoring a project from one server to another that corrupted something (I am using three different databases, different permissions, and different IP addresses). Although the filtering option was not set, toggling it may have fixed whatever state it was in. I’ll take a look in the event viewers and see if there is a trace of anything.