Help with parsing Air Quality Index from airnow.gov

I’m trying to get air quality index data from the airnow.gov website.

This one isn’t so simple though. My output contains everything up to the second description element, which contains the only data I really need - Air Quality Index. It’s like it’s skipping the remaining data.

Any help would be appreciated!

My Code:

import system
import xml.dom.minidom

url = "http://feeds.enviroflash.info/rss/realtime/133.xml"

response = system.net.httpGet(url)

dom = xml.dom.minidom.parseString(response)

for tag in dom.getElementsByTagName("*"):
print tag.firstChild.data

DATA:

<rss version="2.0">
<channel>
<title>San Francisco, CA - Current Air Quality</title>
<link>http://www.airnow.gov/</link>
<description>EnviroFlash RSS Feed</description>
<language>en-us</language>
<webMaster>
airnowdmc@sonomatech.com (AIRNow Data Management Center)
</webMaster>
<pubDate>Thu, 12 Oct 2017 08:45:10 PDT</pubDate>
<item>
<title>San Francisco, CA - Current Air Quality</title>
<link>
http://feeds.enviroflash.info/rss/realtime/133.xml?id=AC9AF12B-02F4-5A9E-BD504999C6EF606E
</link>
<description>
<!--  Format data output  -->
 <div xmlns="http://www.w3.org/1999/xhtml"> <table style="width: 350px;">    
 <tr> <td> <br> </td> </tr> <tr> <td valign="top">
 <div><b>Location:</b> San Francisco, CA</div><br /> <div> <b>Current
 Air Quality:</b> 10/12/17 8:00 AM PDT<br /><br /> <div> Unhealthy -
 156 AQI - Particle Pollution (2.5 microns)<br /> <br /> Good - 1 AQI -
 Ozone<br /> <br /> </div> </div> <div><b>Agency:</b> San Francisco Bay
 Area AQMD </div><br /> <div><i>Last Update: Thu, 12 Oct 2017 08:45:10
 PDT</i></div> </td> </tr> </table> </div>
</description>
</item>
</channel>
</rss>

My OUTPUT:

San Francisco, CA - Current Air Quality
http://www.airnow.gov/
EnviroFlash RSS Feed
en-us
airnowdmc@sonomatech.com (AIRNow Data Management Center)
Thu, 12 Oct 2017 08:45:10 PDT

San Francisco, CA - Current Air Quality
http://feeds.enviroflash.info/rss/realtime/133.xml?id=AC9AF12B-02F4-5A9E-BD504999C6EF606E

Simplest change:

for tag in dom.getElementsByTagName("*"):
	print tag.lastChild.data

tag.firstChild.data seems to be interpreting the commented <!-- Format data output --> as an element, but not actually displaying it. lastChild of the description element has the full description value.

1 Like

Here’s another approach:

import re
import system
import xml.etree.ElementTree as ET

url = "http://feeds.enviroflash.info/rss/realtime/133.xml"
response = system.net.httpGet(url)

# get the inner HTML description from the XML
root = ET.fromstring(response)
inner = root.find('.//item/description')
inner_html = inner.text

# probably fragile regex to get the AQI values
aqi_particle = re.search("(\d+) AQI - Particle Pollution", inner_html).group(1)
aqi_ozone = re.search("(\d+) AQI - Ozone", inner_html).group(1)

print "AQI (Particle): ", aqi_particle
print "AQI (Ozone): ", aqi_ozone
2 Likes

There is no JSON feed?

1 Like

That helped thanks! Is there any way to write a gateway script to feed it to a tag in case I want an alarm email sent out?

Gateway script I tried, but doesn’t work. Runs in the script editor, but if I try to make it a gateway script it throws up IOError: connect timed out

import re
import system
import xml.etree.ElementTree as ET

# the URL to pull data from
url = "http://feeds.enviroflash.info/rss/realtime/134.xml"
	
response = system.net.httpGet(url)
		
root = ET.fromstring(response)
inner = root.find('.//item/description')
inner_html = inner.text
	
# probably fragile regex to get the AQI values
aqi_particle = re.search("(\d+) AQI - Particle Pollution", inner_html).group(1)
currentData = aqi_particle
	
# push data to tag
system.tag.write("Global Tags/Air Quality Index", currentData)

That approach worked. I didn’t realize comments were being considered, thanks!

The output of that still contains the < div > displaying, so I went with Kevin’s approach and send it to a value in a window so the guys can see when it’s too smoky outside (wildfires up north).

Thanks again!

This script should work, I think the server just doesn’t have access to the internet.