Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert from PAGE to ALTO #59

Open
stweil opened this issue Sep 11, 2024 · 2 comments
Open

Cannot convert from PAGE to ALTO #59

stweil opened this issue Sep 11, 2024 · 2 comments

Comments

@stweil
Copy link
Contributor

stweil commented Sep 11, 2024

I created PAGE XML with ocrd-tesserocr-recognize -I DEFAULT -O PAGE_GERMAN_PRINT -P segmentation_level region -P textequiv_level word -P find_tables true -P model german_print. Then I wanted to transform the PAGE XML files to ALTO with ocrd-fileformat-transform. It fails with an error: "The PAGE-XML to transform contains neither Border nor PrintSpace"

@kba
Copy link
Member

kba commented Sep 12, 2024

Try with

-P script-args '--no-check-border'

(cf https://github.com/kba/page-to-alto/blob/master/ocrd_page_to_alto/convert.py#L191-L211)

We should switch the default, you're not the first to complain about that behavior :(

@stweil
Copy link
Contributor Author

stweil commented Sep 12, 2024

Thanks a lot! It works with this additional parameter. And yes, it should be the default or at least mentioned in the OCR-D workflow documentation. Therefore I suggest to keep this issue open until this is done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants