Saving a file to local path from an Amazon S3 bucket via URL

MikeAllgood · September 18, 2020, 6:06pm

In an existing project I have a Web Dev endpoint that is executing the doPost script.
The client wants to add functionality to pass in a URL of a file located in an Amazon S3 Bucket and have the doPost routine save that to a local folder.
I have never scripted the saving of a file via a URL so I’m unsure how to proceed.
Has anyone ever done that and if so, can you share how it was done?

Thanks,
Mike

PGriffith · September 18, 2020, 6:50pm

You need to separately retrieve the content of the file via another HTTP request. Since your post is tagged 8.0.x, you can use system.net.httpClient(), which is much more ergonomic. Generally speaking, downloads are GET requests - so your hypothetical webdev endpoint would be something like this:

#request should be a GET to http://localhost:8088/data/webdev/${project}/${path}?filepath=${pathToS3}
s3Url = params["filepath"]

if s3Url is not None:
	def downloadFile(response):
		system.file.writeFile("path/to/some/file", response.body)

	request = system.net.httpClient().getAsync(s3Url)
	request.whenComplete(downloadFile)

MikeAllgood · September 18, 2020, 7:04pm

Hey Paul,

So the downloadFile function has a parameter named “response”, where is that coming from?

Also, the URL I need to get the, in this case, image file from is not the Ignition project URL, but an external Amazon URL. Am I not understanding your commented line at the very top?

Thanks!

kgamble · September 18, 2020, 7:33pm

When the httpClient returns the data from s3Url, it passes that into the downloadFile function as the response. So the response variable is your file

He is saying your amazon URL goes at the end of that request URL. In this case pathToS3 is the file url
http://localhost:8088/data/webdev/${project}/${path}?filepath=${pathToS3}

MikeAllgood · September 18, 2020, 8:02pm

Hey Keith,

I was able to get this to work in the script console. Thanks for following up.
BTW, I’ve been meaning to reach out just haven’t had the chance yet.

FYI. I’m not using the Web Dev module to make this request so the URL
http://localhost:8088/data/webdev/${project}/${path}?filepath=${pathToS3}
isnt’ really relevant to my case (I think, especially since I got this to work).
I’m simply executing the httpClient’s getAsync from the doPost method of my endpoint (being called externally via the API) and saving the associated file locally and logging the destination file path to a database to associate it with a part (for later use).

Thanks for the help!

MikeAllgood · September 25, 2020, 2:08pm

So I have a follow up question on this.
Is there any results that can be utilized to positively identify that the file download completed successfully or failed?
Thanks.

kgamble · September 25, 2020, 3:38pm

I think you could handle any errors downloading with
request.handleException(YourFunctionToHandleFailure)

And you could stipulate a shorter timeout (default is 60,000) for the download in the original httpClient like:
request = system.net.httpClient(timeout=30000).getAsync(s3Url)

But that only really covers part of the problem, technically with neither of those you are aware of if you downloaded a corrupt file or only partially downloaded the file

MikeAllgood · March 9, 2021, 4:13pm

It’s me again!
I have some follow up questions on the httpClient and the various objects, properties, and methods associated with it.
I’ve tried looking for more info on this, and though there seems to be plenty out there, it is cryptic to my tiny brain! I’ve tried playing with this in the Script Console and have had about as much luck. I’d like to add some exception handling. I’ve tried to implement the request.handleException method but all that happens in the console is I get something like the following:
<Promise@1602849530 isDone=false>
The doc for this method says, “and is expected to return a new fallback value for the next step in the promise chain.” I’m not sure what “the next step in the promise chain” is or means, so I assume I’m doing something wrong.
My console script looks like this (currently that is, I’ve moved it around and hacked it up a bit with no change in outcome):

request = system.net.httpClient(timeout=30000).getAsync(bag_label_url)
request.handleException(badURL)

def getResp(response):
	print str(response.body)
	print 'Response Status Code: ' + str(resonse.statusCode)
	system.file.writeFile(bag_label_dest, str(response.body))

def badURL(thrownError):
	print 'thrown error: ' + str(thrownError)
	
bag_label_url = 'https://docs.inductiveautomation.com/display/DOC80/system.net.httpClient'
bag_label_dest = 'C:\\OrdersAPI\\TextFiles\\httpclient_ExceptionTest.txt'

request.whenComplete(getResp)

My aim is to get an understanding of what is going on with this so I can implement some quick error handling in my image download functionality so I don’t have to wait for a timeout on each image if it does not exist or it’s “viability” has expired (these S3 bucket images are limited to a period of time in which they can be downloaded).

Any input or direction to other sources that are aimed at those cerebrally challenged would be appreciated!

PGriffith · March 9, 2021, 5:35pm

handleException gets the error, and is expected to return something that is not an error. handleException means “accept an exception, do something with it, then return some default value” - so it only makes sense to use handleException if there actually is some default/computed value that makes sense for your application.
whenComplete already exists to conditionally handle an error in the chain; so you could either combine the handling (which won’t catch an exception happening in the file write step):

bag_label_url = 'https://docs.inductiveautomation.com/display/DOC80/system.net.httpClient'
request = system.net.httpClient(timeout=30000).getAsync(bag_label_url)

def handleResponse(response, thrownError):
	bag_label_dest = 'C:\\OrdersAPI\\TextFiles\\httpclient_ExceptionTest.txt'
	if thrownError is not None:
		print 'thrown error:', thrownError
	else:
		print str(response.body)
		print 'Response Status Code: ' + str(resonse.statusCode)
		system.file.writeFile(bag_label_dest, str(response.body))


request.whenComplete(handleResponse)

Or you can ‘chain’ steps together, adding whenComplete at the end to handle any possible exception:

def saveFile(response):
	bag_label_dest = 'C:\\OrdersAPI\\TextFiles\\httpclient_ExceptionTest.txt'
	print str(response.body)
	print 'Response Status Code: ' + str(resonse.statusCode)
	system.file.writeFile(bag_label_dest, str(response.body))

def onDone(result, thrownError):
	if thrownError is not None:
		print 'thrown error:', thrownError

bag_label_url = 'https://docs.inductiveautomation.com/display/DOC80/system.net.httpClient'
request = system.net.httpClient(timeout=30000)
	.getAsync(bag_label_url)
	.then(saveFile)
	.whenComplete(onDone)

MikeAllgood · March 9, 2021, 7:10pm

Hey Paul,
The first script does not print anything in the console’s output window.
The second has errors in the last 3 lines. I get an EOF expected at the line with " .getAsync…".
I assume meant to create a promise object in the getAsync line and then use that on the next two lines
When I did that I was back to getting the output of “<Promise@1732944234 isDone=false>” printed to the output window. I’m not sure that the response.body of these functions are returning anything that could be written to a file, but I would expect as much. Unfortunately that does not happen.
Below is the modified script from your second one:

def saveFile(response):
	bag_label_dest = 'C:\\OrdersAPI\\TextFiles\\httpclient_ExceptionTest.txt'
	print str(response.body)
	print 'Response Status Code: ' + str(resonse.statusCode)
	system.file.writeFile(bag_label_dest, str(response.body))

def onDone(result, thrownError):
	if thrownError is not None:
		print 'thrown error:', thrownError

bag_label_url = 'https://docs.inductiveautomation.com/display/DOC80/system.net.httpClient'
request = system.net.httpClient(timeout=30000)
promise = request.getAsync(bag_label_url)
promise.then(saveFile)
promise.whenComplete(onDone)

This web stuff has gotten me completely turned sideways and upside-down.

PGriffith · March 9, 2021, 7:51pm

Whoops. My bad - I didn’t indent the last example correctly. Your rewrite doesn’t work right because the modification operators (.then, .whenComplete) return new objects - so you’re not correctly ‘chaining’ things off the original. This works for me:

def saveFile(response):
	bag_label_dest = 'C:/test/log.txt'
#	print response.text
	print 'Response Status Code: ', response.statusCode
	system.file.writeFile(bag_label_dest, response.body)

def onDone(result, thrownError):
	if thrownError is not None:
		print 'thrown error:', thrownError

bag_label_url = 'https://docs.inductiveautomation.com/display/DOC80/system.net.httpClient'
(system.net.httpClient(timeout=30000)
	.getAsync(bag_label_url)
	.then(saveFile)
	.whenComplete(onDone)
)

Or you can collapse the last line into a single line to drop the parentheses:
system.net.httpClient(timeout=30000).getAsync(bag_label_url).then(saveFile).whenComplete(onDone)
Or you can take care to modify the promise variable at each step:

promise = system.net.httpClient(timeout=30000).getAsync(bag_label_url)
promise = promise.then(saveFile)
promise = promise.whenComplete(onDone)

MikeAllgood · March 9, 2021, 8:14pm

In my Script Console that is still not printing any information to the output window.

Is there a “dummies” guide, online preferably, to these objects/methods that you could recommend?
My lack of understanding what is going on here is a big issue. The java and python docs don’t do me any favors as they are written, in my opinion, like I should already know a lot about what they are referring to. And I know zilch!

On a slightly different front, the original code that I had working as part of the doPost script in the Web Dev endpoint is getting moved a bit. We’ve found that the level of data that the customer is needing to send to the system is causing slowdowns. So we are moving the downloading of the images to GW Timed Event and letting that process a handful of images at a time. When I test this on a single image download it times out (here the timeout is one of my making so I can validate the image was downloaded and update the database accordingly). I can manually download the file by pasting the URL into a browser, so I’m, again, not understanding what is happening to prevent the download.
The code I’m using looks like this:

def downloadImage(response):
	util.log(ln, 'In downloadImage function.')
	system.file.writeFile(bag_label_dest, response.body)
			
promise = system.net.httpClient(timeout=30000).getAsync(bag_label_url)
util.log(ln, 'created promise object')
promise.whenComplete(downloadImage)
		
#pause to allow image file to download
#init values to loop and check for file
begin = system.date.now()
elapsed = 0
timeout = 31000
stopLoop = False
fileExists = False
firstPass = True
#loop until file found or timeout expires
while fileExists == False and elapsed <= timeout:
	if firstPass:
		util.log(ln, 'Entering first loop of File Exists checks for file ' + bag_label_dest + '.')
		firstPass = False
	elapsed = abs(system.date.millisBetween(begin, system.date.now()))
	fileExists = os.path.exists(bag_label_dest)

I never see a log entry from the log statement inside the downloadImage function.
This is one reason I’m wanting to add in some error handling, but I’m falling flat on my face.

PGriffith · March 9, 2021, 10:21pm

Yes - that's expected. The script console will only capture 'direct' output - because you're invoking your request asynchronously, it's "leaving" the context of the script console and going to the general output console - where you should see output.

Unfortunately not. The Promise object is largely inspired by Javascript's Promise object, but under the hood is really just a Java CompletableFuture... and, honestly, if Promise is a moderately confusing API, CompletableFuture is the Necronomicon.

You mentioned you're downloading these images from S3 - Ignition might not be providing the right authentication (likely via a cookie) to authenticate with S3, so S3 is just refusing to give any indication that the URL is valid at all. Can you still download if you try a URL in an incognito/private session in the browser?

Also, I'm not really sure what your goal is with the while loop in your latest example. If you want to 'block' the rest of the script from executing until the file is downloaded, just avoid the async call entirely and call httpClient.get() - then you're guaranteed to have a Response that you know is valid (or not) before script execution continues. Or, if you do want to use async, fully use async; starting an async task and then blocking to wait for it immediately after is somewhat pointless.
A more idiomatic thing to do would be to get() on the promise, if you had some intermediate work to do:

def downloadImage(response):
	util.log(ln, 'In downloadImage function.')
	system.file.writeFile(bag_label_dest, response.body)
	return "File written"
		
client = system.net.httpClient(timeout=30000) # 30s timeout for the actual HTTP connection
promise = client.getAsync(bag_label_url).then(downloadImage)
util.log(ln, 'created promise object')

# do some intermediate task

output = promise.get(timeout=30000) # additional 30s for the second stage to complete
# output == 'File written'

MikeAllgood · March 16, 2021, 5:42pm

Essentially, what the system needs to do is to ensure the file is downloaded to the destination folder. If for some reason the download fails it needs to be annotated as such in the database so it can be tried again later. If there is a way to get the cause of the failure that would be great to have so it too could be logged in the DB.
My while loop (defeating the async nature of the getasync method) is intended to ensure the file makes it to the destination before moving to the next download. What we had been seeing is that the getasync would execute, but before the file was downloaded the next part’s data was being processed, and therefore the image file changed. Somehow the previous image (once it’s download was finished) was being named with the next part’s image name. That is why the loop was added.

We will be overhauling this whole order processing feature to break a lot of the functionality out of the doPost function. One of the things we are planning on doing is completely separating the image download feature. The currently plan is to create a timed event that will process the image downloads in blocks. The DB will be updated based on successful download or failure. That will require some “failure” data to be processed and used to update the DB. As yet I have not had a chance to really look into the nature of getting this failure info from the download request.