Feature request: Use natural language processing to parse opening hours #531

RayBB · 2023-01-27T01:34:16Z

Hello,

I'll keep this short and sweet.
Would you be open to using NLP (https://wiki.openstreetmap.org/wiki/Natural_language_processing) to try to parse opening hours and convert them to the the correct format for OSM?
Something that's pretty much like this? https://www.webmapping.cyou/WebToOSMOH/

I think it would be really handy as I find entering hours (especially more complex ones) to be one of the tasks I dread most even though I think you've made the UI great.

If you're not considering supporting it, I might try to make a PR to the repo of the above example to add mobile support so I can use it to on my phone. OSM-de/WebToOSMOH#12

I think the ideal workflow would be something like:

I take a photo
I copy/paste the text from photo using OCR (or everydoor does it)
Everydoor parses the language into the format for OSM
I very/adjust the opening hours

Cheers and thanks so much for your hard work on this app, I really love it :)

mnalis · 2023-01-27T03:00:22Z

Discussion in similar project you may want to read - about the scope of the problem: streetcomplete/StreetComplete#4222, streetcomplete/StreetComplete#1186, or bryceco/GoMap#227

In short, it is likely to be incredibly complex:

firstly you need to have very good OCR, capable of handling skewing, rotation, glass reflections, blurring, tons of different fonts, handwriting, etc. Preferably offline on a phone. We do not seem to be nowhere near that technologically. There are a lot of pictures online so feel free to try it yourself how your favorite OCR handles it.
assuming that such OCR had at least 95%+ success rate (although to be usable it would be have to be at least 99.9% accurate, or you'll be spending more time verifying and fixing it then typing it in from scratch) above, then you have to have AI which will be able to parse the formatting - how the table is layed out vertically / horizontally, which data fits which (often invisible !) columns, which is unrelated text like phone number etc.
then when you have all the correct text (step 1) in the the correct order (step 2), you have to have something which will parse and understand it (like the https://www.webmapping.cyou/WebToOSMOH/ which you mention, which have been shown (see thread linked) to be woefully inadequate, even in very simple cases (much less complex ones!). Kudos the programmer, but it is incredibly complex task in itself. There is literally one correct answer in millions of possible combinations.
then you have to program how to map that knowledge from previous step to opening_hours format (which might turn out to be impossible anyway, even in very simple cases like "even dates 08:00-15:00, odd dates 16:00-20:00")

But, if someone wants to try doing it, by all means, go for it! It would be quite interesting to see the results and limitations, even if it turns out to be too unreliable for actually making inputting the data easier.

Zverik · 2023-01-27T09:06:55Z

Thank you for the suggestion, and thanks Matija for explanations. I'm against using NLP for a simple reason that it would be much, much slower than what we have now. Instead of pressing 6-10 buttons on screen, you would need to:

Launch phone camera (which also drains the battery)
Make a good, discernible photo (provided opening hours are often printed on transparent doors with in-door dim lighting, or in the sun, which is also bad)
Feed the photo to the parser (which also would have trouble with multiple languages and so on, but let's say it works most of the time)
Edit the result, because it will be imperfect most of the time (off-by-one errors, missing letters and such).

As we know from editing geometry in OSM, changing something that's there is much harder and takes more time than drawing something from scratch.

Each of these steps would take much more time that I would like. The goal is to spend at most 20 seconds on each POI, and toying with the parser definitely would take more than a minute just for opening hours.

Zverik closed this as not planned Won't fix, can't repro, duplicate, stale Jan 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Use natural language processing to parse opening hours #531

Feature request: Use natural language processing to parse opening hours #531

RayBB commented Jan 27, 2023

mnalis commented Jan 27, 2023 •

edited

Loading

Zverik commented Jan 27, 2023 •

edited

Loading

Feature request: Use natural language processing to parse opening hours #531

Feature request: Use natural language processing to parse opening hours #531

Comments

RayBB commented Jan 27, 2023

mnalis commented Jan 27, 2023 • edited Loading

Zverik commented Jan 27, 2023 • edited Loading

mnalis commented Jan 27, 2023 •

edited

Loading

Zverik commented Jan 27, 2023 •

edited

Loading