A shell script to create a mapping (schema)
in Elasticsearch DB. If you do not specify a mapping, Elasticsearch will by default generate one dynamically when detecting new fields in documents during indexing. However, this dynamic mapping generation comes with a few caveats: detected types might not be correct, uses default analyzers and settings for indexing and searching.
The schema in Elasticsearch is a mapping that describes the the fields in the JSON documents along with their data type, as well as how they should be indexed in the Lucene indexes that lie under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”.
mapping-create.sh
script should be used on a newly created database only. If you need to update an existing DB which already has some data, please use mapping-update.sh
script (described below).
For the content
index:
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=content" "MAPPING_FILE=mapping-content.json" ./mapping-create.sh
And for the twitter
index:
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=content" "MAPPING_FILE=mapping-twitter.json" ./mapping-create.sh
A script to update index with a specified mapping. The script allows for redefinition of existing index by defining new index with new schema, copying there old data (using _reindex) and then deleting and defining old index once again to finally copy data into old-new index.
For the content
index:
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=content" "MAPPING_FILE=mapping-content.json" ./mapping-update.sh
And for the twitter
index:
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=content" "MAPPING_FILE=mapping-twitter.json" ./mapping-update.sh
The mapping file and temporary index can be defined as well as ES_INDEX_TMP
and MAPPING_FILE
if required.
This script allows for creation of an light version of pdf_documents
index which contains a big blobs of text and is very heavy to process and render in Kibana.
env "ES_URL=https://xxx:[email protected]" "ES_INDEX=pdf_documents" "ES_INDEX_DEST=pdf_documents_light" ./pdf-documents-light.sh
A python script based on locust to stress test a search performance at ES database.
You can run locust
web UI where you can manage your stress tests
locust -f stresstest.py --host=http://localhost:9200
Alternatively, you can run it headless
env "ES_INDEX=content" locust -f stresstest.py --host=http://localhost:9200 -c 200 -r 50 --run-time 5m --no-web
where -c
specifies the number of Locust users to spawn, and -r
specifies the hatch rate (number of users to spawn per second).