Skip to content

Commit

Permalink
Backport an old TIL about XML/JSON
Browse files Browse the repository at this point in the history
  • Loading branch information
alexwlchan committed May 26, 2024
1 parent dda32c4 commit 74e57fe
Showing 1 changed file with 64 additions and 0 deletions.
64 changes: 64 additions & 0 deletions src/_til/2023/2023-11-30-prefer-xml-in-the-wikimedia-apis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
layout: til
date: 2023-11-30 00:27:25 +0100
title: Why I prefer XML to JSON in the Wikimedia Commons APIs
summary: |
The XML-to-JSON conversion leads to some inconsistent behaviour, especially in corner cases of the API.
tags:
- wikimedia-commons
---
The Wikimedia APIs I've used can return results in three formats: HTML, JSON, and XML.
Initially I was using the JSON APIs because JSON is easy, it's familiar, there are built-in methods for it my HTTP client libraries.

It seems like at least some of the APIs are doing an automated XML-to-JSON translation, which has inconsistent results in certain corner cases.
This is why I'm gradually leaning towards the XML APIs, which seem to be more consistent in how they behave.

This is a useful example of automated XML-to-JSON risks in general.

## The `languagesearch` API

First let's go ahead and use then [Languagesearch API](https://www.mediawiki.org/wiki/API:Languagesearch) to find a list of languages which match the query "english":

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>curl <span class="s1">'https://en.wikipedia.org/w/api.php?action=languagesearch&amp;search=english&amp;format=json'</span> | jq <span class="nb">.</span>
<span class="go">{
"languagesearch": {
"en": "english",
"en-us": "english sa america",
"en-au": "english sa australia",
}
}

</span><span class="gp">$</span><span class="w"> </span>curl <span class="s1">'https://en.wikipedia.org/w/api.php?action=languagesearch&amp;search=english&amp;format=xml'</span> | xmllint <span class="nt">--format</span> -
<span class="go">&lt;?xml version="1.0"?&gt;</span><span class="w">
</span><span class="go">&lt;api&gt;</span><span class="w">
</span><span class="go"> &lt;
languagesearch
en="english"
en-us="english sa america"
en-au="english sa australia"
</span><span class="go"> /&gt;</span><span class="w">
</span><span class="go">&lt;/api&gt;</span><span class="w">
</span></code></pre></div></div>

The JSON contains an object which maps language ID to name; the XML uses language IDs as attributes and names as values.

Now let's try that query again, with a query that won't return any results;

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">$</span><span class="w"> </span>curl <span class="s1">'https://en.wikipedia.org/w/api.php?action=languagesearch&amp;search=doesnotexist&amp;format=json'</span> | jq <span class="nb">.</span>
<span class="go">{
"languagesearch": []
}

</span><span class="gp">$</span><span class="w"> </span>curl <span class="s1">'https://en.wikipedia.org/w/api.php?action=languagesearch&amp;search=doesnotexist&amp;format=xml'</span> | xmllint <span class="nt">--format</span> -
<span class="go">&lt;?xml version="1.0"?&gt;</span><span class="w">
</span><span class="go">&lt;api&gt;</span><span class="w">
</span><span class="go"> &lt;languagesearch/&gt;</span><span class="w">
</span><span class="go">&lt;/api&gt;</span><span class="w">
</span></code></pre></div></div>

Notice that the structure of the JSON response has changed slightly -- where previously it returned an object, now it returns an array.
Meanwhile the XML response looks just as before, just without any attributes.

This broke my JSON-using code, because I was assuming the `languagesearch` value would always be a mapping, and that worked until I tested the empty case.

3 comments on commit 74e57fe

@alexwlchan
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Published on https://alexwlchan.net as production
🚀 Deployed on https://6653c932165fa1f67ce85072--alexwlchan.netlify.app

@alexwlchan
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Published on https://alexwlchan.net as production
🚀 Deployed on https://66543762f3c7b7b060de645f--alexwlchan.netlify.app

@alexwlchan
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Published on https://alexwlchan.net as production
🚀 Deployed on https://665588f788d445be86d37f69--alexwlchan.netlify.app

Please sign in to comment.