OCR REST - Jython, Web Dev?

I was trying to maintain the semantics of the example you originally posted - the file parameter to post automatically streams the file contents (from somewhere local to the http client instance) to the target URL. The very last lines of the file (retrieving whatever is at the outputURL from the web service) would be pretty similar:

#file_response = requests.get(jobj["OutputFileUrl"], stream=True)
#with open("outputDoc.doc", 'wb') as output_file:
#   shutil.copyfileobj(file_response.raw, output_file)

Would become:

file_response = client.get(jobj["OutputFileUrl"])
system.file.writeFile("outputDoc.doc", file_response.body)

If you follow the steps in my post, it'll automatically download it for you. "pip" is a package manager for python, so Step 2 ( ‘jython -m pip install requests’ ) does the download and install. There's no separate link to the source, since using pip will also grab dependencies, which are required, and might do some bytecode compiling too.

As step 3 mentions, just grab the site-packages directory contents after you install requests with pip and copy them into Ignition's site-packages directory. If you need to look at the source for some reason or want to see it out of curiosity, you can look at the requests package that pip has provided to you in that site-packages directory after it's downloaded it.

Thanks Kevin for the explanation. I’m guessing Pip can only be used in V8+?

OK, but apparently ':' is an illegal character for this method. In your code it is "C:\test_image.jpg", as per the OP, but unfortunately the computer says no...

Race you to reply with a fix whilst I make my morning coffee :slight_smile:

Get rid of the backslashes at the start of a path. I think those are only used when pointing to a network share, not a local drive.

Thanks Kevin, done and now still failing at line 70 “Unable to POST”.

Think I’ll try and arrange a call with support tomorrow evening.

Yes, that's right. In 8 we made some improvements so site-packages/ works for imports, which allows step 3 in my original post to work.

(Technically, Jython 2.5 does have pip, but it wouldn't help you in 7.9, since you wouldn't be able to import any of the packages directly into Ignition 7.9. Plus, the number of libraries that are compatible with Python 2.5 are pretty limited.)

Worked with Patrick to get the example code working with system.net.httpClient() - we had to move the username/password to the actual post call, instead of calling it on the client - Java’s HTTPClient only returns authentication if the server returns a specific WWW-Authenticate header. If you specify a username and password on the actual request, it’ll get parsed into a (basic auth) Authentication header for you.

# import shutil

"""
	Sample project for OCRWebService.com (REST API).
	Extract text from scanned images and PDF documents and convert into editable formats.
	Please create new account with ocrwebservice.com via http://www.ocrwebservice.com/account/signup and get license code
"""

# Provide your username and license code
LicenseCode = '<username>'
UserName =  'license code'

"""

		You should specify OCR settings. See full description http://www.ocrwebservice.com/service/restguide
		 
		Input parameters:
		 
	[language]     - Specifies the recognition language. 
			This parameter can contain several language names separated with commas. 
						For example "language=english,german,spanish".
			Optional parameter. By default:english
		
	[pagerange]    - Enter page numbers and/or page ranges separated by commas. 
			For example "pagerange=1,3,5-12" or "pagerange=allpages".
						Optional parameter. By default:allpages
		 
		[tobw]	      - Convert image to black and white (recommend for color image and photo). 
			For example "tobw=false"
						Optional parameter. By default:false
		 
		[zone]         - Specifies the region on the image for zonal OCR. 
			The coordinates in pixels relative to the left top corner in the following format: top:left:height:width. 
			This parameter can contain several zones separated with commas. 
				For example "zone=0:0:100:100,50:50:50:50"
						Optional parameter.
		  
		[outputformat] - Specifies the output file format.
						Can be specified up to two output formats, separated with commas.
			For example "outputformat=pdf,txt"
						Optional parameter. By default:doc

		[gettext]	- Specifies that extracted text will be returned.
			For example "tobw=true"
						Optional parameter. By default:false
		
		[description]  - Specifies your task description. Will be returned in response.
						Optional parameter. 


	!!!!  For getting result you must specify "gettext" or "outputformat" !!!!  

"""

# Build your OCR:

# Extract text with English language by default
RequestUrl = "https://www.ocrwebservice.com/restservices/processDocument?gettext=true";
client = system.net.httpClient()

# Extract text with English and german language using zonal OCR
#RequestUrl = 'http://www.ocrwebservice.com/restservices/processDocument?language=english,german&zone=0:0:600:400,500:1000:150:400';

# Convert first 5 pages of multipage document into doc and txt
# RequestUrl = 'http://www.ocrwebservice.com/restservices/processDocument?language=english&pagerange=1-5&outputformat=doc,txt';

#Full path to uploaded document
FilePath = "C:\\test_image.jpg"
	
r = client.post(RequestUrl, file=FilePath, username=UserName, password=LicenseCode)

if r.statusCode == 401:
	#Please provide valid username and license code
	print("Unauthorized request")
	exit()

# Decode Output response
jobj = r.json

ocrError = str(jobj["ErrorMessage"])

if ocrError != '':
	#Error occurs during recognition
	print ("Recognition Error: " + ocrError)
	exit()


# Task description
print("Task Description:" + str(jobj["TaskDescription"]))

# Available pages 
print("Available Pages:" + str(jobj["AvailablePages"]))

# Processed pages 
print("Processed Pages:" + str(jobj["ProcessedPages"]))

# For zonal or multipage OCR: OCRText[z][p]    z - zone, p - pages

# Extracted text from first or single page
# print("Extracted Text:" + str(jobj["OCRText"][0][0]))

# Extracted text from second page (if multipage doc converted)
#print("Extracted Text:" + str(jobj["OCRText"][0][1]))

# Get extracted text from First zone for each page
# print("Zone 1 Page 1 Text:" + str(jobj["OCRText"][0][0]))
# print("Zone 1 Page 2 Text:" + str(jobj["OCRText"][0][1]))

# Get extracted text from Second zone for each page
#print("Zone 2 Page 1 Text:" + str(jobj["OCRText"][1][0]))
#print("Zone 2 Page 2 Text:" + str(jobj["OCRText"][1][1]))

#Download output file (if outputformat was specified)
system.file.writeFile("outputDoc.doc", client.get(jobj["OutputFileUrl"]).body)
#file_response = requests.get(jobj["OutputFileUrl"], stream=True)
#with open("outputDoc.doc", 'wb') as output_file:
#   shutil.copyfileobj(file_response.raw, output_file)
1 Like

Great support from all, thank you!