Edit OCR Text in Metadata Module

Is there a way to edit the raw text that is available in the Metadata module? I know you can copy and paste the text, but it would be nice to be able to make corrections so that any text files or searchable PDFs are more accurate.

1 „Gefällt mir“

Dear j_morris,

is it something like this you are looking for?


On Github: http://thorstenv.github.io/PoCoTo/
1 „Gefällt mir“

I don’t have a “have to have it” situation but something like that would be nice. I like the ability in the CIS tool to do batch corrections, but even having the ability to manually correct text would be great. Consider this an enhancement idea.

1 „Gefällt mir“

To follow up on this – are .txt files editable in the “Edit OCR Results” window? I can’t find clarification on that in the docs. I am able to see the results of the text files under the “Show OCR results for this page” window, but not in the “Edit OCR Results” window – and I’m therefore unable to do any editing.

Hey all,

the fulltext editng area which is part of the metadata editor is acually editing ALTO files - not plaintext files. Plaintext is not really needed or used by Goobi workflow and Goobi viewer as there are no coordinate information available then.

In the metadata editor it looks like this:

You can see there, how I edited the original word with its coordinates and added more text as a “correction”.

Converting the internal ALTO-files to Plaintext is easy then. This is what you see in the parallel view in the metadata editor then automatically (if no plaintext files are in the file system available, as these would be used then:

These (adapted) ALTO are then the ones which are used for the PDF-Generation where the fulltext is embedded as well.

Does this help you and answer your questions?

Best,

Steffen

1 „Gefällt mir“

Got it, that makes sense. The ALTO files on some of our older materials was simply not usable even down to line detection, so we were hoping to jump-start the process by transcribing into raw text files. We can work around this for a while, though.

In the meantime this plugin could help you for simple text:

Best,

Steffen