Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCR issue: skipping text? #18

Open
hzadeh17 opened this issue May 7, 2020 · 1 comment
Open

OCR issue: skipping text? #18

hzadeh17 opened this issue May 7, 2020 · 1 comment

Comments

@hzadeh17
Copy link
Contributor

hzadeh17 commented May 7, 2020

So I was trying to look further into Bcc's and I found that there are indeed quite a few emails in the drive (particularly in deqs 1, 2, and 4) where Bcc: is included in the email header. But, in many instances this is not reflected in the OCR text--but perhaps more importantly, in this format of email output where Bcc's do show up, entire bodies of email are skipped over by OCR.

For example: this text file has the headers but not the bodies of the emails included in deq01_Part316 in the drive. Same with others like it, like deq01_Part385 and this file.

(Note that as far as Bcc: goes, it does sometimes show up, as in this text file.)

maybe this is not something we can fix, but still probably good to know. I wonder if there is a way to check how much this is happening? The body text that is being skipped is light blue, so maybe that is why...but that doesn't explain why some of the header text that was black (i.e. Bcc:) is also being skipped.

@hzadeh17
Copy link
Contributor Author

another instance of skipping text: it appears that deq09, which from the drive seems to only have a few emails and a bunch of attachment pages, did not OCR correctly. I think that of the 30 files, there is only one file that has any text--this one--and it's nothing much..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant