Extracting data from webpage (javascript object or json?)

Angus_Sweet · January 28, 2023, 8:57pm

I'm trying to create a script that will extract data from a local webpage and write it to a PLC. I have done the following:
webpagedata = system.net.httpGet("http://testwebsite.p")
test1=webpagedata[4850:5175]
print(test1)

There was a lot of other data that came back from the webpage that I don't need, hence the [4850:5175]. This returns the following:

td>CAN1143828760002AJameson Dry & Lime6x4pk 6.3% EXPORT (MAN) 637Ejected25/01/23 12:36930072701918924/01/24300713A

(2972 of 3000) 99%

<

I can see the data in this unicode that I need (eg 1143828, Jameson Dry and the number '3000'. But I'm just not sure how to get it out. Is the above example in JSON format or something else? Do I need to format or compile?

nideyijuyidong · January 30, 2023, 1:15am

Try Regular expression for html page, or system.util.jsonDecode for jsonString.
https://docs.inductiveautomation.com/display/DOC81/system.util.jsonDecode

victordcq · January 30, 2023, 7:47am

You should probably look for a html parser to use js/css queries to get the right element...

JordanCClark · January 30, 2023, 9:24am

Angus_Sweet · February 5, 2023, 10:46am

Thanks everyone for the replies. I did have play with the system.util.jsonDecode but couldn't get the result I wanted. I did some more reading and ended up with the following:

webdata = system.net.httpGet("http://testwebsite")
import re
candataraw = re.findall("CAN.*

", webdata)

candataformatted = system.dataset.toCSV(candataraw, showHeaders = True, forExport = False)
cantarget=candataformatted[-16:-12]
canint=int(cantarget)
system.tag.write("[default]Performance/N7/N7:36",(canint),)

Result was the number I wanted 3000.

This works for now but it's relying on the position of the 3000 number staying in that exact position in the line of unicode. This could change so I'll still try out the other methods recommended (beatifulsoup
and the HTML parser.)

Angus_Sweet · February 13, 2023, 9:15pm

Hi Jordan,
Thanks for the info. I managed to import beautifulsoup3.2.2 but I'm getting the following error....
BeautifulSoup:107: UserWarning: You are using a very old release of Beautiful Soup, last updated in 2011. If you installed the 'beautifulsoup' package through pip, you should know the 'beautifulsoup' package name is about to be reclaimed by a more recent version of Beautiful Soup which is incompatible with this version.

This will happen at some point after January 1, 2021.

If you just started this project, this is easy to fix. Install the 'beautifulsoup4' package instead of 'beautifulsoup' and start using Beautiful Soup 4.

If this is an existing project that depends on Beautiful Soup 3, the project maintainer (potentially you) needs to start the process of migrating to Beautiful Soup 4. This should be a relatively easy part of the Python 3 migration.

I have tried to find version 4 but I can only find it in the following formats:
beautifulsoup4-4.11.2.tar.gz
beautifulsoup4-4.11.2-py3-none-any.whl

Ignition is looking for .zip files so I'm not sure how to import the new version with the .gz or .whl exrension.

Any ideas?

JordanCClark · March 7, 2023, 8:37pm

Sorry for a late reply, as work stuff takes precedence.

Last supported python2 version is 4.9.3. Unzip and put all folders in the gateway's /user-lib/pylib/site-packages. a restart of the designer would be required. Probably the gateway too, if using it from that scope.

EDIT: Oops! didn't include enough files...
bs4.zip (468.4 KB)

Otherwise, you can either ignore the message as we're on Jython 2.7 anyway, or modify the script to not generate the nag message.

Angus_Sweet · March 9, 2023, 9:50am

It worked! Thanks very much JordanCClark!!!