Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non ascii characters in data. #25

Open
neslekkim opened this issue Sep 1, 2017 · 1 comment
Open

Non ascii characters in data. #25

neslekkim opened this issue Sep 1, 2017 · 1 comment

Comments

@neslekkim
Copy link

The norwegian characters, æ, ø, å and so on, are not handled properly.
For some reason, it seems that one can use SpssOption when Writing files, but not when reading??
Would be nice if we could change encoding, or even detect encoding?

The version used to save the file I'm trying to read is: "IBM SPSS STATISTICS 64-bit MS Windows 23.0.0.2"
I don't have access to spss at all, have only some files to read.

@neslekkim
Copy link
Author

Have been looking around in the code, and trying to find out if I could get encoding to work.
I find that there are already mechanics to read the encoding, atleast in the githubversion, didn't bother to check the code for the nuget version I was testing first.
The CharacterEncodingRecord.FillInfo method reads UTF-8 from the file, and sets the encoding to that, but in my case, the file does not contain correct data according to that encoding, so I get an diamond symbol, code 65533, which is used to indicate decoding error for UTF-8.
If I try to use ASCII instead, i get question marks.
But if I use ISO-8859-1/Latin1, then I get the correct values, so I wonder, why have the files an UTF-8 indicator, but the data is something else?
The asciicode for the character I was struggling with in this case is 248, ø.

Is the format of the spss files described anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant