REGEXP Query for SEF

  • Posts: 31
  • Thank you received: 0
10 years 10 months ago #146882

Latest HikaShop seems to have added some queries for SEF. Our test site slowed down even more with the latest version. MySQL logs man queries which only return 1 or 2 results and the queries are taking 1 minute or more to complete.

This one took 1 minute and 27 seconds to complete:
SELECT product_id FROM prfx_hikashop_product
WHERE product_alias REGEXP '^ *chic.+pencil.+skirt.+ash.+gray.+womens.+petite *$'
OR
product_name REGEXP '^ *chic.+pencil.+skirt.+ash.+gray.+womens.+petite *$'

Mysql shows the query examined 2598890 rows to return 1 result. Is this how this portion of hikashop supposed to work? Is there something in the config we can change to speed this up or skip this query? It wasn't in the previous versions.

Thanks.

Please Log in or Create an account to join the conversation.

  • Posts: 2334
  • Thank you received: 403
10 years 10 months ago #146906

:S Hi there,

You can try to disable the option "Remove products and categories id in URLs" in System>Configuration>Features>SEF.
IT will basically remove this query and you should gain a lot of time :)

Another option would be to add an index on the product_alias and product_name table, it should also improve the speed.

Last edit: 10 years 10 months ago by Eliot.

Please Log in or Create an account to join the conversation.

  • Posts: 31
  • Thank you received: 0
10 years 10 months ago #147285

Ouch! Removing those id numbers from the URLs is excactly what we wanted. When we do a CSV import and new products and categories have been inserted, those id numbers don't match previous numbers and google gets 404s when indexing the site.

Please Log in or Create an account to join the conversation.

  • Posts: 26171
  • Thank you received: 4030
  • MODERATOR
10 years 10 months ago #147306

Hi,

At this moment the best is to change the queries in the "router.php" in HikaShop in order to not make these REGEX on the field.

We are thinking about adding a new column in the product table for the "product_sef_name".
The idea is to store the sef name returned by the SEF system into this column. At this moment the query in the router will be faster because it will just use a simple comparison and not a REGEX.
The problem is that the different SEF systems doesn't transform the text in the same way, there is not unique rule ; that's why we put a REGEX in order to have something which will work with all of them.

Regards,


Jerome - Obsidev.com
HikaMarket & HikaSerial developer / HikaShop core dev team.

Also helping the HikaShop support team when having some time or couldn't sleep.
By the way, do not send me private message, use the "contact us" form instead.

Please Log in or Create an account to join the conversation.

  • Posts: 31
  • Thank you received: 0
10 years 9 months ago #148415

REGEXP seems to be much slower than LIKE in SQL. We ran the slow queries at the mysql prompt. Then converted the queries to use LIKE instead of REGEXP and they run much faster. Not knowing exactly what the REGEXP is trying to do, but here's what we get:

Using REGEXP:
SELECT product_id FROM prefx_hikashop_product WHERE product_alias REGEXP '^ *the.+love.+that.+was.+lost.+womens.+long.+sleeve.+hooded.+t.+shirt *$' OR product_name REGEXP '^ *the.+love.+that.+was.+lost.+womens.+long.+sleeve.+hooded.+t.+shirt *$'

+


+
| product_id |
+
+
| 126875 |
+
+

48.853 seconds

Using LIKE:
SELECT product_id FROM prefx_hikashop_product WHERE product_alias LIKE 'the%love%that%was%lost%womens%long%sleeve%hooded%t%shirt%' OR product_name LIKE 'the%love%that%was%lost%womens%long%sleeve%hooded%t%shirt%'

+
+
| product_id |
+
+
| 126875 |
+
+

2.783 seconds

These two queries bring back the same product_id value. When would LIKE not work and bring back the wrong value(s) for product_id?

Please Log in or Create an account to join the conversation.

  • Posts: 26171
  • Thank you received: 4030
  • MODERATOR
10 years 9 months ago #148417

Hi,

Yes and no.
".+" and " +" for REGEX and not the same than "%" for LIKE.

The result are different and that's why we use REGEX and not LIKE.
But we are thinking about improvements so, please do not use the feature if it does not work as you expected.

Regards,


Jerome - Obsidev.com
HikaMarket & HikaSerial developer / HikaShop core dev team.

Also helping the HikaShop support team when having some time or couldn't sleep.
By the way, do not send me private message, use the "contact us" form instead.

Please Log in or Create an account to join the conversation.

  • Posts: 31
  • Thank you received: 0
10 years 9 months ago #148454

We understand regular expressions. REGEXP & LIKE are different. We are trying to point out that you are using REGEXP and the power of REGEXP may not be necessary in this situation.

We even added the % at the beginning of the string thinking we might be missing something that your ^ * picks up with REGEXP, but the results are identical so far. We've been testing for 48 hours with 12500 urls hit so far with 100% identical results. We were hoping you could quickly point out situations when our LIKE version wouldn't work.

There are some really neat builtin functions like REGEXP and RAND(), but REGEXP and order by RAND() get really slow once the number of rows scanned grows.

Please Log in or Create an account to join the conversation.

  • Posts: 83050
  • Thank you received: 13409
  • MODERATOR
10 years 9 months ago #148525

Suppose that you have a product called "test" and another one called "tester".

In the case of the "test" product link to its product page, your query will be:
SELECT product_id FROM prefx_hikashop_product WHERE product_alias LIKE 'test%' OR product_name LIKE 'test%'
Suppose that the product "tester" has been created first, the returned product will be the one with the name "tester" instead of the one with the name "test".

That's an example of why we had to use a regex on that query. Of course, we would have used a LIKE if it would have been possible but that was not enough in order to handle all the special cases that we found in our tests.

Please Log in or Create an account to join the conversation.

  • Posts: 31
  • Thank you received: 0
10 years 9 months ago #148575

Thanks for the reply. We found this very behaviour last night and posted in the bugs section:
hikashop.com/forum/3-bug-report/866686-r...-in-urls-broken.html

This fails with REGEXP. I think you will have to come up with an entirely different strategy to remove the IDs from the URLs.

As far as LIKE vs REGEXP, our testing shows that LIKE is much faster although both bring back the identical INCORRECT results. One interesting note is that Google Merchant rejected our site b/c the pages were loading too slow. We switched to our LIKE implementation and Google Merchant reapproved the site.

We are eagerly awaiting the next release in hopes the ID removal from URLs works correctly and is not a big slow down to the sites.

Thx.

Please Log in or Create an account to join the conversation.

  • Posts: 83050
  • Thank you received: 13409
  • MODERATOR
10 years 9 months ago #148590

Hi,

Next version of HikaShop will have an smart system to better handle that.

Please Log in or Create an account to join the conversation.

Time to create page: 0.084 seconds
Powered by Kunena Forum