Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: Can't decode from latin1 #8

Open
pabloab opened this issue Dec 26, 2023 · 0 comments
Open

UnicodeDecodeError: Can't decode from latin1 #8

pabloab opened this issue Dec 26, 2023 · 0 comments

Comments

@pabloab
Copy link

pabloab commented Dec 26, 2023

I was getting

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 24: invalid continuation byte

0xd1 is "Ñ" on latin1, which is consistent with the output dbfview -t -b myfile.dbf > myfile.txt; dbfile myfile.txt.

I tried to

dbf2csv -ie 'latin1'  'myfile.dbf'

which I believe should fix the issue, but it doesn't.

git clone https://github.com/akadan47/dbf2csv.git
cd dbf2csv
pip install -r requirements.txt
python setup.py install

dbf2csv --version
# dbf2csv 1.3

Workaround

I ended up converting with LibreOffice

sudo apt install libreoffice-base
libreoffice --headless --convert-to csv myfile.dbf  # Will generate myfile.csv
iconv -f ISO-8859-15 -t UTF-8 myfile.csv  > myfile-utf8.csv

I validated the CSV generated with frictionless.

For multiple files:

find . -type f -execdir libreoffice --headless --convert-to csv "{}" +
for FILE in *.csv; do iconv -f ISO-8859-15 -t UTF-8 "$FILE" -o "${FILE%%.*}-utf8.csv"; done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant