MySQL Tip: Find the N-th element in a delimiter-separated string

I recently had to monitor a MySQL field containing rather long URI strings with 7 GET parameters, and after some research I found a way to extract each parameter in separate SELECT fields, thanks to a somewhat unrelated Stack Overflow answer.


The Solution

Input string

http://domain.ext/page?param1=value1&param2=value2&param3=value3&param4=value4...

Intended result

param1param2param3param4...
value1value2value3value4...

Final query

SELECT
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 1), '&', -1), '=', -1) AS param1,
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 2), '&', -1), '=', -1) AS param2,
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 3), '&', -1), '=', -1) AS param3,
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 4), '&', -1), '=', -1) AS param4,
    ...

Breakdown

MySQL natively offers very few functions to manage delimiter-separated strings. There's no split function that would effectively transform a string in an array that we would also have trouble to handle in a query. This is expected behavior from a database software whose very job is to store lists of values.

Fortunately, the SUBSTRING_INDEX(string, delimiter, count) allows to manipulate a delimiter-separated string. In essence, it returns a substring of string until the count-th delimiter has been found in string. If count is positive, it starts from the left of the string. If it's negative, it starts from the right.

A chain of two such functions allows to get the N-th element in a delimiter-separated string, the third one is there only to remove the GET parameter name from the output.

For example, to get param3 value:
The first call makes sure that the third parameter is at the end of the string.

> SUBSTRING_INDEX(string, '&', 3)
http://domain.ext/page?param1=value1&param2=value2&param3=value3

This is where the magic happens. Once the third parameter is at the end of the string, we just have to extract the last part of the string delimited by ampersands ( & ).

> SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 3), '&', -1)
param3=value3

This is just the cherry on the cake to extract only the righthand part of param3=value3.

> SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 3), '&', -1), '=', -1)
value3

Astute readers will point out that I could skip the second SUBSTRING_INDEX call and directly take what's right of the last =, but this wouldn't make a good general example on how to handle delimiter-separated strings in MySQL.

Leave a comment

10 Comments

  • I wanted to see if I could do the same with Microsoft SQL Server. Unfortunately there is no equivalent to the behavior of the SUBSTRING_INDEX function. So I tried to write my own query even if I would recommend to create a dedicated function for better readability. This query is using CTE so it is only compatible with SQL Server 2005+.

    DECLARE @T VarChar(256)
    SET @T = 'http://totostaki.fr/page.html?param1=1&param2=22&param3=333&param4=4444'
    
    ; WITH SplitUrlParams
    AS (
      SELECT CAST(NULL AS VarChar(256)) AS Value
       , CHARINDEX('=', REVERSE(@T)) AS ValueDelimiterIdx
       , CHARINDEX('&', REVERSE(@T)) AS ParamDelimiterIdx
       , REVERSE(@T) AS RemainingValue
       , 0 AS InvertedOrder
      UNION ALL
      SELECT REVERSE(SUBSTRING(RemainingValue, 1, ValueDelimiterIdx - 1)) AS Value
       , CHARINDEX('=', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue))) AS ValueDelimiterIdx
       , CASE 
          WHEN CHARINDEX('&', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue))) > 0 THEN CHARINDEX('&', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue)))
          ELSE CHARINDEX('?', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue)))
         END
       AS ParamDelimiterIdx
       , SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue)) AS RemainingValue
       , InvertedOrder + 1
      FROM SplitUrlParams
      WHERE ParamDelimiterIdx > 0
    )
    SELECT Value
    FROM SplitUrlParams
    WHERE InvertedOrder > 0
    ORDER BY InvertedOrder DESC
    • Line 1 : Here I'm using a variable but you could incorporate this query within a query to target columns.
    • Line 2 : This query works even if there is no url parameter or if the value is NULL.
    • Line 4 : I'm using a recursive CTE to process all url parameters.
    • Line 6, 9 : The first SELECT is the recursion initialization. The RemainingValue column is the remaining part of the url to process, on the initial state the url is reversed to be in a situation of value-first.
    • Line 7 : ValueDelimiterIdx gives the first position of the character '=' in RemainingValue.
    • Line 8 : ParamDelimiterIdx gives the first position of the character '&' in RemainingValue.
    • Line 10 : Since the url string is reversed, the first values extracted the last, this index will allow to reorder properly the values. Index 0 also denote the initial row which do not contains any value and must be ignored.
    • Line 14-17 : On the last chunk, the character '&' is not present anymore, in such occasion we need to search for the '?' character index.
    • Line 22 : The recursion is broken when the character '&' and '?' is not found in the remaining chunk. Meaning all parameters have been processed.
    • Line 26 : ignore the initialization row marked with InvertedOrder = 0 as it does not contains any value.
    • Line 27 : reverse the order of the rows to have the values in their original orders.

    Just to help the comprehension here is the raw results given for an url with four parameters :

    ValueValueDelimiterIdxParamDelimiterIdxRemainingValueInvertedOrder
    NULL5124444=4marap&333=3marap&22=2marap&1=1marap?lmth.egap/rf.ikatsotot//:ptth0
    4444411333=3marap&22=2marap&1=1marap?lmth.egap/rf.ikatsotot//:ptth1
    33331022=2marap&1=1marap?lmth.egap/rf.ikatsotot//:ptth2
    22291=1marap?lmth.egap/rf.ikatsotot//:ptth3
    100lmth.egap/rf.ikatsotot//:ptth4
  • Boy am I glad not to have to work with SQL Server. Thanks for trying though!

  • Like I said if you factorize the logic in a function it became way simpler and easier to use.
    As a matter of fact, in a function I would not use the same algorithm because here the challenge is to do the job in one query/instruction while in a function I can use loop and such smile

  • Still, I prefer a one-liner over a SQL function any time of day! ouf

  • I'm not sure what you meant by that since in this case you used a (built-in) function to achieve your goals which SQL Server allows you to do the same once you created your user function.

    Even better you can create functions that will returns a table-like result rather than a scalar value. And if you're really hardcore you can create your function in a .NET DLL (using C# or VB.NET rather than T-SQL) and importing it in SQL Server to create CLR functions. Though I'm not using that because I'm a little concerned performance-wise and the importation requires high privileges that usually are not provided if you're using a shared hosting platform.

  • In that case, I needed a quick way to monitor a table being fed complete URLs. I didn't need to do any processing on those URL, so writing a complete SQL function would have been rather overkill.

  • Experience proved to me that sometimes you should really spend the time to build debug/monitoring tools in order to save times in subsequent debugging/monitoring sessions. Especially if you found yourself repeating a lot of times those actions. smile

  • Wise words I can abide by, however in this particular case it was a one-time task and MySQL proved itself up to the challenge.

  • Fair enough. Still that gave me some kind of challenge to make with SQL Server instead of working on that pain-in-the-ass project that come back every year for Christmas and on which I(we) should have coded some utility functions years ago. To which I, finally, decided to do this year.

Leave a comment

You need to be logged in to comment. Log in now or sign up !