You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2-3 years ago I uploaded my files from my local HDD --> Backblaze B2 Bucket.
(Current day) | I need to download the files from my B2 Backblaze bucket to my local HDD
When I download the files using the b2 cli tool, I cannot complete the download due to: UnicodeEncodeError: "charmap" can't encode character "\u0c7f" in position 132: character maps to <undefined>
This error occurs in several of my files, that all have different encode character errors (not limited to: \u0c7f).
In the comments, I was speaking to [ppolewicz]https://github.com/ppolewicz and he correctly pointed out that it is an error where they b2 python CLI tool cannot encode my file names to cp1252.
Screenshots of the errors from the B2 CLI tool
Here are some screenshots of the errors using the b2 CLI tool
In the comments, I was speaking to [ppolewicz]https://github.com/ppolewicz and he correctly pointed out that it is an error where they b2 python CLI tool cannot encode my file names to cp1252.
What I have tried
I thought that if I could identify and generate a list of the filenames which could not be encoded to cp1252, I could rename each filename to an acceptable filename which the b2 CLI tool could handle.
I wrote a C# Console app (I can share the code if it is needed) which uses the B2.NET which is a C# .NET library that essentially queries the B2 API.
My console app performs the following steps:
1.) Downloads all file names (and folders)
2.) Encodes the file names from UTF-16 to cp1252. When I receive the files from the B2 API, they are received in UTF-8 or UTF-16 format, I believe its UTF-16.
3.) If the filename can not be encoded to cp1252, an exception would be thrown with the errors are logged, the file is marked as error within my Console App.
Expected Result:
4.) I expected that the filenames that the b2 CLI tool cannot encode to cp1252, to also throw an error within my own Console App. I expected the same files which the b2 CLI tool app has issues encoding to cp1252 also have issues within my own Console App, and my Console App would not be able to encode the same files to cp1252.
Then problematic target files that cannot be encoded to cp1252 will be marked as error within my Console App so that I could identify the problem files and rename the files using the B2 API
Actual Result:
4.) No encoding errors occurred and all file names were accurately mapped from UTF-8 or UTF-16 to cp1252 within my console app. No encoding issues occured.
Conclusion
At this point, I am somewhat at a loss and looking for direction to how I or we should fix this. I can provide anything that you guys need to help come to a solution to fix this (I would love to help too, I also wrote a Console App to help fix this).
I can share anything that you guys need including:
My console app source code
List of all my file names
etc, whatever you guys need to help diagnose this.
Thanks!
The text was updated successfully, but these errors were encountered:
🤔 can you write a snippet (bash, c# or b2-sdk-python maybe) which would upload a single empty file with a problematic filename? It would make things much easier to debug. Alternatively if one of the files in question is not a secret, maybe you could create a key limited to a prefix (so in reality to just one object) so that I can inspect it on your bucket?
This would be way easier to debug if I could reproduce it locally
@ppolewicz sure thing, let me give it a go and see if I can narrow it down as you suggested.
I will let you know if I have any issues narrowing it down the problematic file names in the method which you suggested, and will post my findings here.
PS: If I need to create a script, it'll be C# (I like brackets)
Hello!
I discovered this issue and originally thought it was a 'Connection Pool' issue which I commented here.
<--
There are a few comments in this link that pertain to this current issue #778Scenerio
-->
Backblaze B2 Bucket.b2
cli tool, I cannot complete the download due to:UnicodeEncodeError: "charmap" can't encode character "\u0c7f" in position 132: character maps to <undefined>
\u0c7f
).In the comments, I was speaking to [ppolewicz]https://github.com/ppolewicz and he correctly pointed out that it is an error where they
b2
python CLI tool cannot encode my file names tocp1252
.Screenshots of the errors from the B2 CLI tool
Here are some screenshots of the errors using the
b2
CLI toolIn the comments, I was speaking to [ppolewicz]https://github.com/ppolewicz and he correctly pointed out that it is an error where they
b2
python CLI tool cannot encode my file names tocp1252
.What I have tried
I thought that if I could identify and generate a list of the filenames which could not be encoded to
cp1252
, I could rename each filename to an acceptable filename which the b2 CLI tool could handle.1.) Downloads all file names (and folders)
2.) Encodes the file names from
UTF-16
tocp1252
. When I receive the files from the B2 API, they are received inUTF-8
orUTF-16
format, I believe itsUTF-16
.3.) If the filename can not be encoded to
cp1252
, an exception would be thrown with the errors are logged, the file is marked aserror
within my Console App.Expected Result:
4.) I expected that the filenames that the
b2
CLI tool cannot encode tocp1252
, to also throw an error within my own Console App. I expected the same files which theb2
CLI tool app has issues encoding tocp1252
also have issues within my own Console App, and my Console App would not be able to encode the same files tocp1252
.Then problematic target files that cannot be encoded to
cp1252
will be marked aserror
within my Console App so that I could identify the problem files and rename the files using the B2 APIActual Result:
4.) No encoding errors occurred and all file names were accurately mapped from
UTF-8
orUTF-16
tocp1252
within my console app. No encoding issues occured.Conclusion
At this point, I am somewhat at a loss and looking for direction to how I or we should fix this. I can provide anything that you guys need to help come to a solution to fix this (I would love to help too, I also wrote a Console App to help fix this).
I can share anything that you guys need including:
Thanks!
The text was updated successfully, but these errors were encountered: