Compare commits

...

2 Commits

Author SHA1 Message Date
Eugen Ciur 7ddb02dcb5 PEP8 formatting 2021-01-18 07:47:08 +01:00
Eugen Ciur eb98ef1329 Add comments about Step class limitations 2021-01-18 07:42:08 +01:00
1 changed files with 39 additions and 1 deletions

View File

@ -1,5 +1,43 @@
class Step:
# Q: What is ``Step`` and why it was a bad decision to introduce it?
#
# A: ``Step`` class is closely related to zooming in/zooming out
# a specific page in the document in the frontend (javascript code).
#
# When user opens the document in document viewer, he/she actually
# sees an image with text over it (text overlay). Text overlay is
# created from hocr data. Very important point here, is that
# text hocr data corresponds to (extracted, format jpeg) image of the page
# of VERY SAME width/height. Again, hocr file and respective image file
# of the page MUST HAVE SAME WIDTH AND HEIGHT.
#
# Each step is meant to be a specific zoom value of the page. Thus, step
# 2, which corresonds to LIST[2] % = 75 % of the page initial logical size
# of WIDTH_100p = 1240.
# When user zooms in/zooms out - a new hocr file is downloaded
# corresponding to that zoom step. As you may guess, user can zoom only
# 125%, 100%, 75% and 50%. Value of 10% corresponds to thumbnail of the
# document and does not count as 'real' step.
#
# Instead of doing this step thingy, it would have been better to drop
# the entire step concept. Much better solution for zoom in/zoom out would
# have been to download one SVG file for each page (instead of hocr) and
# SVG file of respective page should contain embedded image
# (binary jpeg; yes SVG format allows embedding of binary formats!) and
# correctly mapped text overlay (built from hocr file). User later
# can zoom in/zoom out using SVG transforations in frontend!
#
# The good things about SVG solutions are:
#
# * there will be 4X less OCR required (corresponding to
# hOCR of each step minus thumbnail/10% step)
# * will simplify front-end code as SVG (= hocr + jpeg) will be
# generated on the on server side
# * eliminate conept of Step entirely
# (there will be only one SVG file per page)
# * increase front-end and back-end performance as only one file SVG file
# will be sent back and forth (from backend to frontend)
#
# width of a document when displayed as 100%.
WIDTH_100p = 1240
PERCENT = 100