Replies: 14 comments
-
Hello @ntmhung, When the user enters a query with several words, the short words like "and, or, to, for, are, be, by, in,… " are most of the time very frequently found in the catalog, so the short words might have a strong weight in the search result despite their lack of relevance. Most search engines (this includes Elasticsearch) offer to filter out those using a static list of "stop words". The search engine will ignore any word in this list. In the ElasticSuite we use a different way: the search engine calculates the frequency of each word in the catalog, and ignores the words with a frequency above a given limit, called Cutoff Frequency. If all words in the request are You can go further with the official documentation here : ElasticSearch Cutoff Frequency The Cutoff Frequency can be set up in the back-office:
The default value for the Also, check our Wiki for more information about Fulltext base settings and Cutoff frequency configuration. So, you need to do some tests with the cutoff frequency value and check if it's enough for your purposes or if this value should be higher or lower than the default value. Also, pay attention to the BR, |
Beta Was this translation helpful? Give feedback.
-
Thanks @vahonc Let me try editing those settings |
Beta Was this translation helpful? Give feedback.
-
Hello @ntmhung, I updated a bit @vahonc's comment. Basically the cut-off frequency is expressed as a percentage of indexed documents. Not indexed top entities. In the catalog product index, there are documents that contain relevant text data and those that do not.
So, in a default Magento2 installation, you have 4 customer groups, hence 4 prices per product top document. If you have a lot of products, you probably have also a lot of categories, so you increase the number of "categories". Now, if you have a lot of customer prices or a multi-stock feature where you end up with, say, 300 documents per indexed product on average with a majority of those, say 250 not containing text, then the cut-off frequency must be made 25 times lower than 0.15, so around 0.006 PS: You can find the total number of documents for a given index through the Elasticsuite > System > Indices. |
Beta Was this translation helpful? Give feedback.
-
Hey @rbayet Is there a formula to calculate the cut-off frequency based on the number of indexed documents? I don't really understand how "the cut-off frequency must be made 25 times lower than 0.15, so around 0.006" part is calculated. |
Beta Was this translation helpful? Give feedback.
-
Hello @ntmhung, I said 25 times, because in my example, we go from 10 documents per product (higher bracker on a Luma sample catalog) to 250 - 300 documents per product with most of those not containing actual text data (being only physical shops related stock and price data). A more accurate formula would be "Divide by Another example: Imagine you have 10K products indexed and for each product, price and stock availability for 100 physical stores, and an approximate total of 2 050 000 indexed documents (10K x (1 top document + ~4 categories records + 100 stock records+ 100 price records) = 10k x 205). So the ratio of documents with actual text data is "50 K / 2 050K = 1/41". I hope it's a bit clearer. Regards, |
Beta Was this translation helpful? Give feedback.
-
Thanks @rbayet , To confirm if I understand correctly. My store has 6600 products, 7 customer groups, and 1 physical store. So the total indexed documents should be 6600 x (1 top document + ~4 categories records + 7 stock records+ 7 price records) = 125 400. So the ratio of documents with actual text data is "33 K / 125 400 ~= 1/4". Is this correct? |
Beta Was this translation helpful? Give feedback.
-
Hi @rbayet Looks like the cutoff frequency doesn’t work with my case. I asked google bard for a solution and I am wondering if this could work? https://g.co/bard/share/aba745335e37 (see the code toward the bottom). |
Beta Was this translation helpful? Give feedback.
-
Hello @ntmhung,
Have you seen Nope, it will not work as described by Bard, I'd say it's a stupid suggestion. BR, |
Beta Was this translation helpful? Give feedback.
-
Hi @vahonc Is there any way I can add that query programmatically? E.g, create a plugin to Elasticsuite class? I don't know the correct class to add that query to |
Beta Was this translation helpful? Give feedback.
-
Hello @ntmhung, If you're not using something like Retailer Suite with the concept of shops inside a store
So in your case, the approximate number of documents should be 6600 x (1 top document + ~4 categories records + 7 price records) = 79200.
But if 0.0375 is not already working then the valid cut-off frequency must be lower still. As a side note, have you changed the "minimum should match" in Elasticsuite > Search Relevance > Fulltext base settings" ? Regards, |
Beta Was this translation helpful? Give feedback.
-
Hi @rbayet I tried to change the "minimum should match" setting to 5%, but the results were not as expected. The expected result is, for example, when I search "lidocaine and epinephrine" on the site https://shop.tccpharma.com/, it is only showing results with that exact match when we would like it to show results for "lidocaine and epinephrine", "lidocaine with epinephrine", and "lidocaine epinephrine". Regards, |
Beta Was this translation helpful? Give feedback.
-
The setting Minimum Should Match allows you to choose the minimum percentage of words that should match in the user query in order to suggest a result. Stopwords are not taken into account when calculating this percentage. When a user enters a query with several words, several approaches could be used:
or
The first approach is better for relevancy but may return no result for queries with a lot of words. We recommend using a value of 100% (all words should match). Optionally, a slightly lower value may improve the results for long queries. EXAMPLE:
BR, |
Beta Was this translation helpful? Give feedback.
-
Hi @vahonc I tried to set that setting to 30%, and the number of suggested products was increased. However, only products that match the search text were shown on the search result page. Using the same approach, can we increase the number of products on the search result page? Regards, |
Beta Was this translation helpful? Give feedback.
-
Hello @ntmhung, Could you clarify your question ?
Just to be clear: the Minimum Should Match operates both on autocomplete and fulltext search. So yes, lowering the Minimum Should Match usually augments the number of products found, but it might not be exactly those that the user is searching for (since a product titled "red plane" will match, as well as a "yellow race car"). PS: moving this issue as a discussion as there is not technical problem reported. |
Beta Was this translation helpful? Give feedback.
-
We are using version 2.4.0 of Magento and ElasticSuite for searches. We would like our search to exclude stopwords like "with" and "and". Is there any configuration or adjustment in the code we can do?
Ex :
For example, the search "lidocaine epinephrine", "lidocaine with epinephrine", and "lidocaine and epinephrine" should all show the same exact search results.
Additional context
This is our website: https://shop.tccpharma.com/
Beta Was this translation helpful? Give feedback.
All reactions