Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Problem about ISO-8859-1 #202

Closed
drunkenQCat opened this issue Jun 18, 2023 · 7 comments
Closed

The Problem about ISO-8859-1 #202

drunkenQCat opened this issue Jun 18, 2023 · 7 comments
Assignees

Comments

@drunkenQCat
Copy link

drunkenQCat commented Jun 18, 2023

The problem

When I was writing some Chinese metadata to a wav file, the metadata written in was some random code. I tried to decipher these garbled codes and found that they were encoded by ISO-8859-1 but decoded by utf8. Besides that, all the Chinese metadata written in my bext turned into question marks, which in binary 3F. I am wondering why. Is there any way to avoid writing garbled code?

Environment

tested on codespace and windows in dotnet 7

Details

image

@Zeugma440 Zeugma440 self-assigned this Jun 18, 2023
@Zeugma440
Copy link
Owner

Hi and thanks for your feedback.

Could you please explain which field you're saving to ? WAV has many chunks that follow different specifications.

@drunkenQCat
Copy link
Author

drunkenQCat commented Jun 18, 2023

    public void WriteMetaData() 
     { 
         foreach (var item in LogList) 
         { 
             foreach (var bwf in item.bwfList) 
             { 
                 Track tr = new(bwf.FullName); 
                 WriteAdditional(tr, "ixml.SCENE", item.scn + "-" + item.sht); 
                 WriteAdditional(tr, "ixml.TAKE", item.tk.ToString()); 
                 WriteAdditional(tr, "ixml.NOTE", item.scnNote + "," + item.shtNote); 
                 WriteAdditional(tr, "ixml.CIRCLED", (item.okTk == TkStatus.ok) ? "TRUE" : "FALSE"); 
                 WriteAdditional(tr, "ixml.TAKE_TYPE", (item.okTk == TkStatus.bad) ? "NO_GOOD" : "DEFAULT"); 
                 WriteAdditional(tr, "ixml.WILD_TRACK", (item.tkNote.Contains("wild")) ? "TRUE" : "FALSE"); 
                 tr.Description = item.tkNote; 
                 tr.Title = item.shtNote; 
                 tr.Save(); 
             } 
         } 
     } 
  
     void WriteAdditional(Track tr, string tag, string content) 
     { 
         if (tr.AdditionalFields.ContainsKey(tag)) tr.AdditionalFields[tag] = content; 
         else tr.AdditionalFields.Add(tag, content); 
     }

the random code happened in ixml.NOTE and question mark in description and title.

@drunkenQCat
Copy link
Author

drunkenQCat commented Jun 19, 2023

I tried to modify the source to make it enabled to write the utf8 information I need.
#203
it fixed. the picture I show in Details is the problem of waveagent. the utf8 information showed correctly in metadata management softwares. here is an example in reaper:

image

the title is still random code in File Explorer because the default encoder of my system is GB2312.

image

that's the problem. I read CharsetDetector/UTF-unknown#143 and learn that it maybe the problem caused by this. So it is caused that the Settings.DefaultTextEncoding did not cover the other fields?

@Zeugma440
Copy link
Owner

Zeugma440 commented Jun 19, 2023

I tried to decipher these garbled codes and found that they were encoded by ISO-8859-1 but decoded by utf8

The places where you found garbled text are read and written using ISO-8859-1, which does not support oriental characters.

I've done that because of what specifications say :

  • BEXT (used for the description field) : Specifications say the string fields should be written using ASCII. However, ASCII being a subset of UTF-8, we can switch to UTF-8 without any issue👍
  • LIST INFO (used for the title field) : Specifications say the string fields should be written using ASCII. However, ASCII being a subset of UTF-8, we can switch to UTF-8 without any issue 👍
  • Other fields you've written use the iXML structure, which is already UTF-8-encoded 😄

the title is still random code in File Explorer because the default encoder of my system is GB2312.

Precisely. Western versions of Windows use ISO-8859-1 as their default encoding. They assume WAV metadata are encoded using ISO-8859-1, which works because WAV metadata is usually encoded using ASCII, which is a subset of ISO-8859-1.

Your version of Windows might be expecting GB2312, which is not compatible with UTF-8, hence the garbled characters displayed on the Explorer.

=> Another way of fixing that issue and make Windows happy would be to use Settings.DefaultTextEncoding instead of UTF-8 in the library code, and set Settings.DefaultTextEncoding to System.Text.Encoding.GetEncoding("GB2312") in your application code.
That would fix the issue with your Windows, but would completely deviate from the BEXT and LIST INFO specifications, which would make the text you save unreadable on a western computer. That's why I'd rather hardcode UTF-8 as suggested above.

Do you agree with me on that one ?

I read CharsetDetector/UTF-unknown#143 and learn that it maybe the problem caused by this.

This has nothing to do with WAV files. UTF-unknown is only used by the library to detect CUE sheets encoding.

@drunkenQCat
Copy link
Author

drunkenQCat commented Jun 20, 2023

Thanks for your detailed explaination, it answered a lot of problems. And I have to appologize for my ambgious description. I totally agree the answer, the random code on windows explorer in fact dosen't matter in sound production, I have felt the benefit of utf8 especially when I cooperate with others whose OS is macOS.

Beside, I finally find that the most important bug:

all the Chinese metadata written in my bext turned into question marks, which in binary 3F

is actually caused by

WavHelper.writeFixedTextValue(description, 256, w);

which uses Latin1Encoding as encoder to utf8 text. I inferred that Lain1Encoding.GetBytes(utf8Text) may return 3F(question mark) when out of range.


I varified the problem:
image
It is actually caused by GetBytes.

@Zeugma440 Zeugma440 added the bug label Jun 20, 2023
@Zeugma440
Copy link
Owner

Perfect, thanks for confirming 👍

I'm gonna publish a fixed version in the following days. Stay tuned~

@Zeugma440
Copy link
Owner

Fix is available on today's v4.34

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants