Web scraper BeautifulSoup

Looks like it's just some HTML tables
image

Here's a snippit of a few rows of that table

<table class="dt dataTable no-footer" border="0" cellspacing="0" cellpadding="0" id="DataTables_Table_1" role="grid" aria-describedby="DataTables_Table_1_info">
    <thead>
      <tr role="row"><th align="left" class="sorting_disabled td-left" rowspan="1" colspan="1" style="width: 101px;">DATE / TIME<br>PDT</th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 54px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35256&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">RIV STG</a> </i></font><br><font color="black"><b>FEET</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 40px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35789&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">FLOW</a> </i></font><br><font color="black"><b>CFS</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 58px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35261&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">RAINTIP</a> </i></font><br><font color="black"><b>INCHES</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 41px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35258&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">TEMP</a> </i></font><br><font color="black"><b>DEG F</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 64px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35262&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">REL HUM</a> </i></font><br><font color="black"><b>  %</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 60px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35259&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">WIND SP</a> </i></font><br><font color="black"><b>MPH</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 62px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35260&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">WIND DR</a> </i></font><br><font color="black"><b>DEG</b></font></th><th align="right" class="sorting_disabled" rowspan="1" colspan="1" style="width: 56px;"><font color="black"><i><a href="/jspplot/jspPlotServlet.jsp?sensor_no=35257&amp;end=05/03/2023+08:10&amp;geom=small&amp;interval=2&amp;cookies=cdec01">TEMP W</a> </i></font><br><font color="black"><b>DEG F</b></font></th></tr>

    </thead>
    <tbody>
    <tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 20:15&nbsp;</td>
          
            <td align="right"><font color="#000000">13.57</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,396</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">73</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">103</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">45.7</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 20:30&nbsp;</td>
          
            <td align="right"><font color="#000000">13.68</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,581</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">74</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">107</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">45.9</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 20:45&nbsp;</td>
          
            <td align="right"><font color="#000000">13.61</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,463</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">74</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">7</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">107</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">45.9</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 21:00&nbsp;</td>
          
            <td align="right"><font color="#000000">13.62</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,479</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">78</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">7</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">142</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.0</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 21:15&nbsp;</td>
          
            <td align="right"><font color="#000000">13.58</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,412</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">78</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">4</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">106</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.0</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 21:30&nbsp;</td>
          
            <td align="right"><font color="#000000">13.55</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,362</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">80</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">123</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.0</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 21:45&nbsp;</td>
          
            <td align="right"><font color="#000000">13.53</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,329</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">81</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">80</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.0</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 22:00&nbsp;</td>
          
            <td align="right"><font color="#000000">13.54</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,346</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">248</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.0</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 22:15&nbsp;</td>
          
            <td align="right"><font color="#000000">13.50</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,279</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">84</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">291</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.0</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 22:30&nbsp;</td>
          
            <td align="right"><font color="#000000">13.48</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,246</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">85</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">263</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 22:45&nbsp;</td>
          
            <td align="right"><font color="#000000">13.47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,230</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">86</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">195</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 23:00&nbsp;</td>
          
            <td align="right"><font color="#000000">13.42</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,148</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">86</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">61</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 23:15&nbsp;</td>
          
            <td align="right"><font color="#000000">13.45</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,197</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">90</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">126</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/02/2023 23:30&nbsp;</td>
          
            <td align="right"><font color="#000000">13.40</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,115</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">91</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">78</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.1</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="odd">
          <td nowrap="" align="right" class=" td-left">05/02/2023 23:45&nbsp;</td>
          
            <td align="right"><font color="#000000">13.45</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,197</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">90</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">281</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr><tr role="row" class="even">
          <td nowrap="" align="right" class=" td-left">05/03/2023 00:00&nbsp;</td>
          
            <td align="right"><font color="#000000">13.47</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">6,230</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">55.83</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">92</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">87</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
              
            <td align="right"><font color="#000000">46.2</font><a href="/misc/flaglist.html style=text-decoration:none"> </a></td>
                
        </tr></tbody>
  </table>

What do you want to get out of this ?
I have to bath the kid and put her to bed, but I'll come back later.
In the meantime, try to explain what you want to extract, and in what format.

Also, read beautiful soup's doc ;D

i dont even know how to do this since the date in the url changes but i basically want to get the latest cfs flow of the water (every 15 minutes it gets updated on the website) send that to a tag and display it in ignition. i want to read the current flow of that river and I'll give it a read thanks and have fun!

The date in the url is a parameter. It tells the backend you want data for that date, so it knows what data to return to you. You'll have to format the url depending on what data you want, but that's out of beautiful soup's scope.

So, you got a response from a get on your url, and called BeautifulSoup on it.
Now you have a soup object, which is more or less a tree representing the html of the page.
The next step is to figure out what you want to grab from it. In your case, you want the 'flow' column from a table. So, you check the html, and see that an html table is structured by rows, that contain a cell for each column. You want the third cell of each row.
First, target the table, so you limit the scope a bit. The html tells us that the table has an id, so let's use that to make sure we target the right thing (with soup being your beautiful soup object):

table = soup.find(id="DataTables_Table_1")

To get every row from that table, you'd do this :

rows = table.find_all('tr')

So now you have a list of all the rows in the table. Let's get the third cell from every row:

cells = [row[2] for row in rows]

And there you are, all the flow cells in that table.
More compact version:

soup = BeautifulSoup(your_query_response, the_parser_you_want_to_use)
return [row[2] for row in soup.find(id="DataTables_Table_1").find_all('tr')]

Note: this will only return the 3rd <td> item of each row. You'll still need to extract the value from them, but I leave that to you as an exercise.

There are other ways of doing this, by selecting your targets differently, but my goal here is just to give you a quick overview of how this works.
Disclaimer: I'm not testing any of this, as I don't have access to that page, but it should be close enough

5 Likes

With no real change to the script:

from project.BeautifulSoup import BeautifulSoup
url = 'http://cdec.water.ca.gov/dynamicapp/QueryF?s=MBG&d=03-May-2023+08:10'

soup = BeautifulSoup(system.net.httpGet(url))

# find the table
t = soup.find('table')
# get the rows
tr = t.findAll('tr')

headers = ['t_stamp', 'value']
data = []
# For each row...
for row in tr:
	# ... get the columns
	td = row.findAll('td')
	# If the column count > 0...
	if len(td) > 0:
		# ...add the row to the dataset
		# Trimmed to max 16 characters
		data.append([col.text[:16] for col in td])

dataset = system.dataset.toDataSet(headers, data)

for row in data:
	print row

Output:

[u'05/02/2023 21:00', u'12.8']
[u'05/02/2023 22:00', u'12.8']
[u'05/02/2023 23:00', u'12.7']
[u'05/03/2023 00:00', u'12.7']
[u'05/03/2023 01:00', u'12.7']
[u'05/03/2023 02:00', u'12.7']
[u'05/03/2023 03:00', u'12.6']
[u'05/03/2023 04:00', u'12.6']
[u'05/03/2023 05:00', u'12.6']
[u'05/03/2023 06:00', u'12.5']
[u'05/03/2023 07:00', u'12.6']
[u'05/03/2023 08:00', u'12.7']
2 Likes

how do you find the second table when its all named the same except the div id is named different "DataTables_Table_1_wrapper" a 1 instead of a 0. i always get a return of none no matter what i try

The exact same method should work. Can I see a copy of its html and your code ?

Otherwise, you can select multiple tables at once, find and find_all accept regexes.
If the 2nd table comes right after the parent, you can also target it from the first table, I don't remember exactly how... but it should be in the doc.
At least in bs4, I've never used bs3 so I don't know about it.

When I look at the source, I don't see these id's.

find_all doesn't work for me always throws an error "Traceback (most recent call last):
File "", line 7, in
TypeError: 'NoneType' object is not callable"

but heres me trying to only print the second table, i can print the first one but second one returns as none.


theres "datatables_table_0" and datatables_table_1"

Tha's a division name not the table. Let me look closer.

It's not there when I look at the source. Thankfully there's only two tables. we can work with that. :slight_smile:

Grabs all tables with headers and makes a list of datasets.

from project.BeautifulSoup import BeautifulSoup
url = 'http://cdec.water.ca.gov/dynamicapp/QueryF?s=MBG&d=03-May-2023+08:10'

soup = BeautifulSoup(system.net.httpGet(url))

datasetList = []

# find the table
tables = soup.findAll('table')

for t in tables:
	headers = []
	th = t.findAll('th')
	for header in th:
		headers.append(header.text)
	tr = t.findAll('tr')
	
	data = []
	# For each row...
	for row in tr:
		# ... get the columns
		td = row.findAll('td')
		# If the column count > 0...
		if len(td) > 0:
			# ...add the row to the dataset
			# Trimmed to max 16 characters
			data.append([col.text[:16] for col in td])
	
	datasetList.append(system.dataset.toDataSet(headers, data))

datasets created:

row | DATE / TIMEPDT   | BAT VOLVOLTS
-------------------------------------
0   | 05/02/2023 21:00 | 12.8        
1   | 05/02/2023 22:00 | 12.8        
2   | 05/02/2023 23:00 | 12.7        
3   | 05/03/2023 00:00 | 12.7        
4   | 05/03/2023 01:00 | 12.7        
5   | 05/03/2023 02:00 | 12.7        
6   | 05/03/2023 03:00 | 12.6        
7   | 05/03/2023 04:00 | 12.6        
8   | 05/03/2023 05:00 | 12.6        
9   | 05/03/2023 06:00 | 12.5        
10  | 05/03/2023 07:00 | 12.6        
11  | 05/03/2023 08:00 | 12.7        
row | DATE / TIMEPDT   | RIV STGFEET | FLOWCFS | RAINTIPINCHES | TEMPDEG F | REL HUM% | WIND SPMPH | WIND DRDEG | TEMP WDEG F
-----------------------------------------------------------------------------------------------------------------------------
0   | 05/02/2023 20:15 | 13.57       | 6,396   | 55.83         | 48        | 73       | 6          | 103        | 45.7       
1   | 05/02/2023 20:30 | 13.68       | 6,581   | 55.83         | 48        | 74       | 2          | 107        | 45.9       
2   | 05/02/2023 20:45 | 13.61       | 6,463   | 55.83         | 48        | 74       | 7          | 107        | 45.9       
3   | 05/02/2023 21:00 | 13.62       | 6,479   | 55.83         | 48        | 78       | 7          | 142        | 46.0       
4   | 05/02/2023 21:15 | 13.58       | 6,412   | 55.83         | 48        | 78       | 4          | 106        | 46.0       
5   | 05/02/2023 21:30 | 13.55       | 6,362   | 55.83         | 48        | 80       | 1          | 123        | 46.0       
6   | 05/02/2023 21:45 | 13.53       | 6,329   | 55.83         | 47        | 81       | 2          | 80         | 46.0       
7   | 05/02/2023 22:00 | 13.54       | 6,346   | 55.83         | 47        | 83       | 1          | 248        | 46.0       
8   | 05/02/2023 22:15 | 13.50       | 6,279   | 55.83         | 47        | 84       | 1          | 291        | 46.0       
9   | 05/02/2023 22:30 | 13.48       | 6,246   | 55.83         | 47        | 85       | 1          | 263        | 46.1       
10  | 05/02/2023 22:45 | 13.47       | 6,230   | 55.83         | 47        | 86       | 1          | 195        | 46.1       
11  | 05/02/2023 23:00 | 13.42       | 6,148   | 55.83         | 47        | 86       | 2          | 61         | 46.1       
12  | 05/02/2023 23:15 | 13.45       | 6,197   | 55.83         | 46        | 90       | 2          | 126        | 46.1       
13  | 05/02/2023 23:30 | 13.40       | 6,115   | 55.83         | 46        | 91       | 2          | 78         | 46.1       
14  | 05/02/2023 23:45 | 13.45       | 6,197   | 55.83         | 46        | 90       | 2          | 281        | 46.2       
15  | 05/03/2023 00:00 | 13.47       | 6,230   | 55.83         | 46        | 92       | 2          | 87         | 46.2       
16  | 05/03/2023 00:15 | 13.45       | 6,197   | 55.83         | 45        | 93       | 2          | 270        | 46.2       
17  | 05/03/2023 00:30 | 13.43       | 6,164   | 55.83         | 44        | 94       | 0          | 220        | 46.2       
18  | 05/03/2023 00:45 | 13.35       | 6,036   | 55.83         | 44        | 96       | 0          | 132        | 46.2       
19  | 05/03/2023 01:00 | 13.32       | 5,990   | 55.83         | 43        | 96       | 1          | 162        | 46.2       
20  | 05/03/2023 01:15 | 13.42       | 6,148   | 55.83         | 43        | 96       | 1          | 75         | 46.1       
21  | 05/03/2023 01:30 | 13.31       | 5,974   | 55.83         | 43        | 96       | 1          | 339        | 46.2       
22  | 05/03/2023 01:45 | 13.25       | 5,882   | 55.83         | 44        | 95       | 1          | 97         | 46.2       
23  | 05/03/2023 02:00 | 13.28       | 5,928   | 55.83         | 44        | 95       | 0          | 71         | 46.1       
24  | 05/03/2023 02:15 | 13.22       | 5,836   | 55.83         | 44        | 96       | 0          | 127        | 46.1       
25  | 05/03/2023 02:30 | 13.19       | 5,790   | 55.83         | 44        | 96       | 1          | 346        | 46.1       
26  | 05/03/2023 02:45 | 13.31       | 5,974   | 55.83         | 44        | 95       | 1          | 97         | 46.1       
27  | 05/03/2023 03:00 | 13.27       | 5,912   | 55.83         | 44        | 96       | 1          | 282        | 46.0       
28  | 05/03/2023 03:15 | 13.19       | 5,790   | 55.83         | 44        | 96       | 1          | 65         | 46.0       
29  | 05/03/2023 03:30 | 13.27       | 5,912   | 55.83         | 44        | 97       | 0          | 358        | 45.8       
30  | 05/03/2023 03:45 | 13.30       | 5,959   | 55.83         | 42        | 97       | 1          | 348        | 45.9       
31  | 05/03/2023 04:00 | 13.19       | 5,790   | 55.83         | 42        | 97       | 1          | 68         | 45.9       
32  | 05/03/2023 04:15 | 13.16       | 5,745   | 55.83         | 41        | 98       | 1          | 38         | 45.8       
33  | 05/03/2023 04:30 | 13.12       | 5,684   | 55.83         | 41        | 98       | 1          | 290        | 45.7       
34  | 05/03/2023 04:45 | 13.20       | 5,805   | 55.83         | 40        | 98       | 0          | 134        | 45.7       
35  | 05/03/2023 05:00 | 13.18       | 5,775   | 55.83         | 40        | 98       | 1          | 299        | 45.6       
36  | 05/03/2023 05:15 | 13.16       | 5,745   | 55.83         | 40        | 98       | 0          | 28         | 45.5       
37  | 05/03/2023 05:30 | 13.11       | 5,669   | 55.83         | 40        | 98       | 1          | 32         | 45.3       
38  | 05/03/2023 05:45 | 13.17       | 5,760   | 55.83         | 40        | 99       | 2          | 251        | 45.2       
39  | 05/03/2023 06:00 | 12.97       | 5,462   | 55.83         | 40        | 99       | 1          | 185        | 45.2       
40  | 05/03/2023 06:15 | 13.12       | 5,684   | 55.83         | 40        | 99       | 1          | 346        | 45.1       
41  | 05/03/2023 06:30 | 13.10       | 5,654   | 55.83         | 40        | 99       | 1          | 27         | 45.1       
42  | 05/03/2023 06:45 | 13.00       | 5,506   | 55.83         | 42        | 98       | 2          | 94         | 45.2       
43  | 05/03/2023 07:00 | 13.05       | 5,580   | 55.83         | 42        | 97       | 1          | 29         | 45.2       
44  | 05/03/2023 07:15 | 13.06       | 5,595   | 55.83         | 42        | 97       | 1          | 239        | 45.1       
45  | 05/03/2023 07:30 | 13.06       | 5,595   | 55.83         | 42        | 97       | 1          | 136        | 45.1       
46  | 05/03/2023 07:45 | 13.08       | 5,624   | 55.83         | 43        | 97       | 2          | 86         | 45.0       
47  | 05/03/2023 08:00 | 12.99       | 5,491   | 55.83         | 44        | 94       | 1          | 89         | 45.0       

I also wrote a helper script to print datasets to the console, if you find it helpful.

1 Like

Thats wonderful thank you! im gonna try to see how to get the bottom row of the second table and export it to a tag this is very helpful. Thanks again.

You need to look at the doc. How do you expect to use a library without reading its manual ?

https://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#The%20basic%20find%20method:%20findAll(name,%20attrs,%20recursive,%20text,%20limit,%20**kwargs)

Beautiful Soup 3 uses findAll instead of find_all.

The id he gave is on the table, he only highlighted the wrong line in the screenshot.
But indeed, he there are only 2 tables and he wants them both, then selecting tables is the way to go.