10 0 MySQL Tip: Find the N-th element in a delimiter-separated string

Hypolite Petovan
more than 11 years ago
Public

I recently had to monitor a MySQL field containing rather long URI strings with 7 GET parameters, and after some research I found a way to extract each parameter in separate SELECT fields, thanks to a somewhat unrelated Stack Overflow answer.

The Solution

Input string

http://domain.ext/page?param1=value1&param2=value2&param3=value3&param4=value4...

Intended result

param1	param2	param3	param4	...
value1	value2	value3	value4	...

Final query

SELECT
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 1), '&', -1), '=', -1) AS param1,
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 2), '&', -1), '=', -1) AS param2,
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 3), '&', -1), '=', -1) AS param3,
    SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 4), '&', -1), '=', -1) AS param4,
    ...

Breakdown

MySQL natively offers very few functions to manage delimiter-separated strings. There's no split function that would effectively transform a string in an array that we would also have trouble to handle in a query. This is expected behavior from a database software whose very job is to store lists of values.

Fortunately, the SUBSTRING_INDEX(string, delimiter, count) allows to manipulate a delimiter-separated string. In essence, it returns a substring of string until the count-th delimiter has been found in string. If count is positive, it starts from the left of the string. If it's negative, it starts from the right.

A chain of two such functions allows to get the N-th element in a delimiter-separated string, the third one is there only to remove the GET parameter name from the output.

For example, to get param3 value:
The first call makes sure that the third parameter is at the end of the string.

> SUBSTRING_INDEX(string, '&', 3)
http://domain.ext/page?param1=value1&param2=value2&param3=value3

This is where the magic happens. Once the third parameter is at the end of the string, we just have to extract the last part of the string delimited by ampersands ( & ).

> SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 3), '&', -1)
param3=value3

This is just the cherry on the cake to extract only the righthand part of param3=value3.

> SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(string, '&', 3), '&', -1), '=', -1)
value3

Astute readers will point out that I could skip the second SUBSTRING_INDEX call and directly take what's right of the last =, but this wouldn't make a good general example on how to handle delimiter-separated strings in MySQL.

10 Comments

Aka Guymelef more than 11 years ago *

I wanted to see if I could do the same with Microsoft SQL Server. Unfortunately there is no equivalent to the behavior of the SUBSTRING_INDEX function. So I tried to write my own query even if I would recommend to create a dedicated function for better readability. This query is using CTE so it is only compatible with SQL Server 2005+.

DECLARE @T VarChar(256)
SET @T = 'http://totostaki.fr/page.html?param1=1&param2=22&param3=333&param4=4444'

; WITH SplitUrlParams
AS (
  SELECT CAST(NULL AS VarChar(256)) AS Value
   , CHARINDEX('=', REVERSE(@T)) AS ValueDelimiterIdx
   , CHARINDEX('&', REVERSE(@T)) AS ParamDelimiterIdx
   , REVERSE(@T) AS RemainingValue
   , 0 AS InvertedOrder
  UNION ALL
  SELECT REVERSE(SUBSTRING(RemainingValue, 1, ValueDelimiterIdx - 1)) AS Value
   , CHARINDEX('=', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue))) AS ValueDelimiterIdx
   , CASE 
      WHEN CHARINDEX('&', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue))) > 0 THEN CHARINDEX('&', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue)))
      ELSE CHARINDEX('?', SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue)))
     END
   AS ParamDelimiterIdx
   , SUBSTRING(RemainingValue, ParamDelimiterIdx + 1, LEN(RemainingValue)) AS RemainingValue
   , InvertedOrder + 1
  FROM SplitUrlParams
  WHERE ParamDelimiterIdx > 0
)
SELECT Value
FROM SplitUrlParams
WHERE InvertedOrder > 0
ORDER BY InvertedOrder DESC

Line 1 : Here I'm using a variable but you could incorporate this query within a query to target columns.
Line 2 : This query works even if there is no url parameter or if the value is NULL.
Line 4 : I'm using a recursive CTE to process all url parameters.
Line 6, 9 : The first SELECT is the recursion initialization. The RemainingValue column is the remaining part of the url to process, on the initial state the url is reversed to be in a situation of value-first.
Line 7 : ValueDelimiterIdx gives the first position of the character '=' in RemainingValue.
Line 8 : ParamDelimiterIdx gives the first position of the character '&' in RemainingValue.
Line 10 : Since the url string is reversed, the first values extracted the last, this index will allow to reorder properly the values. Index 0 also denote the initial row which do not contains any value and must be ignored.
Line 14-17 : On the last chunk, the character '&' is not present anymore, in such occasion we need to search for the '?' character index.
Line 22 : The recursion is broken when the character '&' and '?' is not found in the remaining chunk. Meaning all parameters have been processed.
Line 26 : ignore the initialization row marked with InvertedOrder = 0 as it does not contains any value.
Line 27 : reverse the order of the rows to have the values in their original orders.

Just to help the comprehension here is the raw results given for an url with four parameters :

Value	ValueDelimiterIdx	ParamDelimiterIdx	RemainingValue	InvertedOrder
NULL	5	12	`4444=4marap&333=3marap&22=2marap&1=1marap?lmth.egap/rf.ikatsotot//:ptth`	0
4444	4	11	`333=3marap&22=2marap&1=1marap?lmth.egap/rf.ikatsotot//:ptth`	1
333	3	10	`22=2marap&1=1marap?lmth.egap/rf.ikatsotot//:ptth`	2
22	2	9	`1=1marap?lmth.egap/rf.ikatsotot//:ptth`	3
1	0	0	`lmth.egap/rf.ikatsotot//:ptth`	4

Hypolite Petovan more than 11 years ago *
#2

Boy am I glad not to have to work with SQL Server. Thanks for trying though!
Aka Guymelef more than 11 years ago *
#3

Like I said if you factorize the logic in a function it became way simpler and easier to use.
As a matter of fact, in a function I would not use the same algorithm because here the challenge is to do the job in one query/instruction while in a function I can use loop and such
Hypolite Petovan more than 11 years ago *
#4

Still, I prefer a one-liner over a SQL function any time of day!
Aka Guymelef more than 11 years ago *
#5

I'm not sure what you meant by that since in this case you used a (built-in) function to achieve your goals which SQL Server allows you to do the same once you created your user function.

Even better you can create functions that will returns a table-like result rather than a scalar value. And if you're really hardcore you can create your function in a .NET DLL (using C# or VB.NET rather than T-SQL) and importing it in SQL Server to create CLR functions. Though I'm not using that because I'm a little concerned performance-wise and the importation requires high privileges that usually are not provided if you're using a shared hosting platform.
Hypolite Petovan more than 11 years ago *
#6

In that case, I needed a quick way to monitor a table being fed complete URLs. I didn't need to do any processing on those URL, so writing a complete SQL function would have been rather overkill.
Aka Guymelef more than 11 years ago *
#7

Experience proved to me that sometimes you should really spend the time to build debug/monitoring tools in order to save times in subsequent debugging/monitoring sessions. Especially if you found yourself repeating a lot of times those actions.
Hypolite Petovan more than 11 years ago *
#8

Wise words I can abide by, however in this particular case it was a one-time task and MySQL proved itself up to the challenge.
Aka Guymelef more than 11 years ago *
#9

Fair enough. Still that gave me some kind of challenge to make with SQL Server instead of working on that pain-in-the-ass project that come back every year for Christmas and on which I(we) should have coded some utility functions years ago. To which I, finally, decided to do this year.
Hypolite Petovan more than 11 years ago *
#10

Good for you! \o/

You need to be logged in to comment. Log in now or sign up !

10 0 MySQL Tip: Find the N-th element in a delimiter-separated string

The Solution

Input string

Intended result

Final query

Breakdown

10 Comments

Leave a comment

by Hypolite Petovan

Tags

10 0 MySQL Tip: Find the N-th element in a delimiter-separated string

The Solution

Input string

Intended result

Final query

Breakdown

10 Comments

Leave a comment

by Hypolite Petovan

Tags

More like this

Debunking the template language hype

5 good reasons Dell Alienware is more expensive than ASUS Republic of Gamers