Skip to content
This repository has been archived by the owner on Jan 5, 2023. It is now read-only.

Datasets are not searchable by location #523

Closed
thejuliekramer opened this issue Nov 24, 2020 · 10 comments
Closed

Datasets are not searchable by location #523

thejuliekramer opened this issue Nov 24, 2020 · 10 comments
Assignees

Comments

@thejuliekramer
Copy link
Contributor

Related to #511

How to reproduce

  1. Go to datasets page and search using the location area dropdown

Screen Shot 2020-11-24 at 10 46 44 AM

Expected behavior

Should return datasets from that location

Actual behavior

Always returns 0 datasets

@avdata99
Copy link
Contributor

If we stop rolling up extras and use valid GeoJSON polygons for spatial extra like this:

{
  "type":"Polygon",
  "coordinates":[
      [[2.05827, 49.8625],[2.05827, 55.7447], [-6.41736, 55.7447], [-6.41736, 49.8625], [2.05827, 49.8625]]
  ]
}

Then the search function works:

image

@avdata99
Copy link
Contributor

avdata99 commented Dec 3, 2020

To make datasets geo-searchable we need to fix/transform the old spatial data already included in the dataset
If the dataset is new, this transformation will happen automatically

To QA this is required to fix the spatial data in datasets to make them searchable.
I will wait for the deploy and prepare this

@ghost
Copy link

ghost commented Dec 3, 2020

@avdata99 Is this ready to be moved to QA?

@avdata99
Copy link
Contributor

avdata99 commented Dec 3, 2020

@avdata99 Is this ready to be moved to QA?

Just deploy and last changes in sandbox datasets are required @cmugisha

@avdata99
Copy link
Contributor

avdata99 commented Dec 4, 2020

All deployed to the sandbox.
Dataset saved to update Solr fields
Error in sandbox when search by location:

==> /var/log/ckan/gunicorn.log <==
2020-12-04 12:43:00,947 INFO  [ckanext.geodatagov.plugins] Added FQ to collection_package_id

2020-12-04 12:43:00,962 ERROR [pysolr] Solr responded with an error (HTTP 500): [Reason: None]

{
	"responseHeader": {
		"status": 500,
		"QTime": 1,
		"params": {
			"mm": "2<-1 5<80%",
			"facet.field": ["groups", "vocab_category_all", "metadata_type", "tags", "res_format", "organization_type", "organization", "publisher", "bureauCode"],
			"bf": "div(mul(mul(max(0,sub(min(-4.21875,maxx),max(-167.34375,minx))),max(0,sub(min(75.6721973906,maxy),max(-44.5904671813,miny)))),2),add(19617.8471583,mul(sub(maxy,miny),sub(maxx,minx))))",
			"fl": "id validated_data_dict",
			"start": "0",
			"sort": "views_recent desc",
			"fq": [
                                   " -dataset_type:harvest -collection_package_id:[\"\" TO *]", 
                                   "{!frange incl=false l=0 u=1}div(mul(mul(max(0,sub(min(-4.21875,maxx),max(-167.34375,minx))),max(0,sub(min(75.6721973906,maxy),max(-44.5904671813,miny)))),2),add(19617.8471583,mul(sub(maxy,miny),sub(maxx,minx))))", 
                                   "+site_id:\"geo.gov\"", 
                                   "+state:active"
                                 ],
			"rows": "21",
			"fq_list": "{!frange incl=false l=0 u=1}div(mul(mul(max(0,sub(min(-4.21875,maxx),max(-167.34375,minx))),max(0,sub(min(75.6721973906,maxy),max(-44.5904671813,miny)))),2),add(19617.8471583,mul(sub(maxy,miny),sub(maxx,minx))))",
			"facet.limit": "50",
			"q": "*:*",
			"tie": "0.1",
			"defType": "edismax",
			"qf": "name^4 title^4 tags^2 groups^2 text",
			"facet.mincount": "1",
			"wt": "json",
			"facet": "true"
		}
	}

"error":{"trace":"java.lang.UnsupportedOperationException
	at org.apache.lucene.queries.function.FunctionValues.floatVal(FunctionValues.java:44)
	at org.apache.solr.search.ValueSourceParser$18$1.func(ValueSourceParser.java:274)
	at org.apache.lucene.queries.function.valuesource.DualFloatFunction$1.floatVal(DualFloatFunction.java:60)
	at org.apache.lucene.queries.function.valuesource.ProductFloatFunction.func(ProductFloatFunction.java:39)
	at org.apache.lucene.queries.function.valuesource.MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
	at org.apache.lucene.queries.function.valuesource.SumFloatFunction.func(SumFloatFunction.java:39)
	at org.apache.lucene.queries.function.valuesource.MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
	at org.apache.lucene.queries.function.valuesource.DivFloatFunction.func(DivFloatFunction.java:40)
	at org.apache.lucene.queries.function.valuesource.DualFloatFunction$1.floatVal(DualFloatFunction.java:60)
	at org.apache.lucene.queries.function.FunctionValues$5.matches(FunctionValues.java:200)
	at org.apache.lucene.queries.function.ValueSourceScorer$1.matches(ValueSourceScorer.java:54)
	at org.apache.lucene.search.TwoPhaseIterator$1.doNext(TwoPhaseIterator.java:66)
	at org.apache.lucene.search.TwoPhaseIterator$1.nextDoc(TwoPhaseIterator.java:54)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:219)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:172)
	at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:821)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:535)
	at org.apache.solr.search.DocSetUtil.createDocSetGeneric(DocSetUtil.java:107)
	at org.apache.solr.search.DocSetUtil.createDocSet(DocSetUtil.java:96)
	at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1398)
	at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:1073)
	at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:1245)
	at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1852)
	at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1628)
	at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:587)
	at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:524)
	at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)
	at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:2102)
	at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:654)
	at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
	at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
	at org.eclipse.jetty.server.Server.handle(Server.java:499)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
	at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
	at java.lang.Thread.run(Thread.java:748)
","code":500}}

@avdata99
Copy link
Contributor

avdata99 commented Dec 7, 2020

Probable related to ckanext-spatial#218

@avdata99
Copy link
Contributor

avdata99 commented Dec 7, 2020

The fields

<field name="bbox_area" type="float" indexed="true" stored="true" />
<field name="maxx" type="float" indexed="true" stored="true" />
<field name="maxy" type="float" indexed="true" stored="true" />
<field name="minx" type="float" indexed="true" stored="true" />
<field name="miny" type="float" indexed="true" stored="true" 

are not present in schema.xml on the sole server. This is the cause of the error.
Reviewing the deploy process.

@avdata99
Copy link
Contributor

avdata99 commented Dec 7, 2020

I added the fields manually in the schema.xml file and the error disappears.

The geo fields live in the schema.xml file and this file is copied with ansible

- name: Copy solr schema file
  action: >-
    copy src={{ item }}/home/solr/ckan/conf/schema.xml
    dest={{ solr_home }}/data/{{ item }}/conf/schema.xml
    mode=0644
    force=yes
  with_items: "{{ solr_cores }}"
  notify: restart solr

I updated Solr in catalog-next for sandbox

pipenv run ansible-playbook solr.yml --limit datagov-solr1tf.internal.sandbox.datagov.us

Log:

TASK [gsa.datagov-deploy-solr : Copy solr schema file] ***************************************************************************************************************************************
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=catalog)
changed: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=catalog-next)
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=inventory)
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=inventory-next)

This rolled back my changes and delete the manually added geo fields

If I run the command a second time

TASK [gsa.datagov-deploy-solr : Copy solr schema file] ***************************************************************************************************************************************
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=catalog)
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=catalog-next)
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=inventory)
ok: [datagov-solr1tf.internal.sandbox.datagov.us] => (item=inventory-next)

I assume the file is updated but we have a bad version so I created a new PR

@avdata99
Copy link
Contributor

avdata99 commented Dec 8, 2020

Working in the sandbox

image

How to QA?

  • Simple version: Go to the sandbox and draw a polygon in the map and look for results
  • Complex version: Is required to re-index all datasets with spatial data. This could be achieved using the geodatagov update-dataset-geo-fields command. This probably will require a new issue

@kimwdavidson kimwdavidson self-assigned this Dec 8, 2020
@avdata99
Copy link
Contributor

Working also for production (it still require an update to cover all datasets)

image.png

@ghost ghost closed this as completed Jan 8, 2021
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants