OCR REST - Jython, Web Dev?

paul-griffith · July 21, 2020, 12:00am

I was trying to maintain the semantics of the example you originally posted - the file parameter to post automatically streams the file contents (from somewhere local to the http client instance) to the target URL. The very last lines of the file (retrieving whatever is at the outputURL from the web service) would be pretty similar:

#file_response = requests.get(jobj["OutputFileUrl"], stream=True)
#with open("outputDoc.doc", 'wb') as output_file:
#   shutil.copyfileobj(file_response.raw, output_file)

Would become:

file_response = client.get(jobj["OutputFileUrl"])
system.file.writeFile("outputDoc.doc", file_response.body)

Kevin.McClusky · July 21, 2020, 4:50am

If you follow the steps in my post, it'll automatically download it for you. "pip" is a package manager for python, so Step 2 ( ‘jython -m pip install requests’ ) does the download and install. There's no separate link to the source, since using pip will also grab dependencies, which are required, and might do some bytecode compiling too.

As step 3 mentions, just grab the site-packages directory contents after you install requests with pip and copy them into Ignition's site-packages directory. If you need to look at the source for some reason or want to see it out of curiosity, you can look at the requests package that pip has provided to you in that site-packages directory after it's downloaded it.

Matrix_Engineering · July 21, 2020, 5:18am

Thanks Kevin for the explanation. I’m guessing Pip can only be used in V8+?

Matrix_Engineering · July 21, 2020, 5:29am

OK, but apparently ':' is an illegal character for this method. In your code it is "C:\test_image.jpg", as per the OP, but unfortunately the computer says no...

Race you to reply with a fix whilst I make my morning coffee

Kevin.Herron · July 21, 2020, 1:23pm

Get rid of the backslashes at the start of a path. I think those are only used when pointing to a network share, not a local drive.

Matrix_Engineering · July 21, 2020, 2:27pm

Thanks Kevin, done and now still failing at line 70 “Unable to POST”.

Think I’ll try and arrange a call with support tomorrow evening.

Kevin.McClusky · July 21, 2020, 5:38pm

Yes, that's right. In 8 we made some improvements so site-packages/ works for imports, which allows step 3 in my original post to work.

(Technically, Jython 2.5 does have pip, but it wouldn't help you in 7.9, since you wouldn't be able to import any of the packages directly into Ignition 7.9. Plus, the number of libraries that are compatible with Python 2.5 are pretty limited.)

paul-griffith · July 22, 2020, 4:43pm

Worked with Patrick to get the example code working with system.net.httpClient() - we had to move the username/password to the actual post call, instead of calling it on the client - Java’s HTTPClient only returns authentication if the server returns a specific WWW-Authenticate header. If you specify a username and password on the actual request, it’ll get parsed into a (basic auth) Authentication header for you.

# import shutil

"""
	Sample project for OCRWebService.com (REST API).
	Extract text from scanned images and PDF documents and convert into editable formats.
	Please create new account with ocrwebservice.com via http://www.ocrwebservice.com/account/signup and get license code
"""

# Provide your username and license code
LicenseCode = '<username>'
UserName =  'license code'

"""

		You should specify OCR settings. See full description http://www.ocrwebservice.com/service/restguide
		 
		Input parameters:
		 
	[language]     - Specifies the recognition language. 
			This parameter can contain several language names separated with commas. 
						For example "language=english,german,spanish".
			Optional parameter. By default:english
		
	[pagerange]    - Enter page numbers and/or page ranges separated by commas. 
			For example "pagerange=1,3,5-12" or "pagerange=allpages".
						Optional parameter. By default:allpages
		 
		[tobw]	      - Convert image to black and white (recommend for color image and photo). 
			For example "tobw=false"
						Optional parameter. By default:false
		 
		[zone]         - Specifies the region on the image for zonal OCR. 
			The coordinates in pixels relative to the left top corner in the following format: top:left:height:width. 
			This parameter can contain several zones separated with commas. 
				For example "zone=0:0:100:100,50:50:50:50"
						Optional parameter.
		  
		[outputformat] - Specifies the output file format.
						Can be specified up to two output formats, separated with commas.
			For example "outputformat=pdf,txt"
						Optional parameter. By default:doc

		[gettext]	- Specifies that extracted text will be returned.
			For example "tobw=true"
						Optional parameter. By default:false
		
		[description]  - Specifies your task description. Will be returned in response.
						Optional parameter. 


	!!!!  For getting result you must specify "gettext" or "outputformat" !!!!  

"""

# Build your OCR:

# Extract text with English language by default
RequestUrl = "https://www.ocrwebservice.com/restservices/processDocument?gettext=true";
client = system.net.httpClient()

# Extract text with English and german language using zonal OCR
#RequestUrl = 'http://www.ocrwebservice.com/restservices/processDocument?language=english,german&zone=0:0:600:400,500:1000:150:400';

# Convert first 5 pages of multipage document into doc and txt
# RequestUrl = 'http://www.ocrwebservice.com/restservices/processDocument?language=english&pagerange=1-5&outputformat=doc,txt';

#Full path to uploaded document
FilePath = "C:\\test_image.jpg"
	
r = client.post(RequestUrl, file=FilePath, username=UserName, password=LicenseCode)

if r.statusCode == 401:
	#Please provide valid username and license code
	print("Unauthorized request")
	exit()

# Decode Output response
jobj = r.json

ocrError = str(jobj["ErrorMessage"])

if ocrError != '':
	#Error occurs during recognition
	print ("Recognition Error: " + ocrError)
	exit()


# Task description
print("Task Description:" + str(jobj["TaskDescription"]))

# Available pages 
print("Available Pages:" + str(jobj["AvailablePages"]))

# Processed pages 
print("Processed Pages:" + str(jobj["ProcessedPages"]))

# For zonal or multipage OCR: OCRText[z][p]    z - zone, p - pages

# Extracted text from first or single page
# print("Extracted Text:" + str(jobj["OCRText"][0][0]))

# Extracted text from second page (if multipage doc converted)
#print("Extracted Text:" + str(jobj["OCRText"][0][1]))

# Get extracted text from First zone for each page
# print("Zone 1 Page 1 Text:" + str(jobj["OCRText"][0][0]))
# print("Zone 1 Page 2 Text:" + str(jobj["OCRText"][0][1]))

# Get extracted text from Second zone for each page
#print("Zone 2 Page 1 Text:" + str(jobj["OCRText"][1][0]))
#print("Zone 2 Page 2 Text:" + str(jobj["OCRText"][1][1]))

#Download output file (if outputformat was specified)
system.file.writeFile("outputDoc.doc", client.get(jobj["OutputFileUrl"]).body)
#file_response = requests.get(jobj["OutputFileUrl"], stream=True)
#with open("outputDoc.doc", 'wb') as output_file:
#   shutil.copyfileobj(file_response.raw, output_file)

Matrix_Engineering · July 22, 2020, 5:01pm

Great support from all, thank you!