首页 > Sql-Mysql > SQL Server 中使用正则表达式Regular

SQL Server 中使用正则表达式Regular

来源: http://www.codeproject.com/KB/database/xp_pcre.aspx

An Extended Stored Procedure to use regular expressions in T-SQL.
一个使用正则表达式的存储过程.

存储过程下载地址:

Download source and DLL – 219 Kb

Sql Server 不如Oracle,自带正则表达式,不过还是有办法实现的。使用这个SP就可以啦。

使用方法:

1.复制 xp_pcre.dll Program FilesMicrosoft SQL ServerMSSQLbinn 目录下;

2.运行安装脚本 INSTALL.SQL;

3.然后就可以使用啦,主要有以下6个SP:

  • xp_pcre_match
  • xp_pcre_match_count
  • xp_pcre_replace
  • xp_pcre_format
  • xp_pcre_split
  • xp_pcre_show_cache

————————————————————–

Introduction

xp_pcre is a follow-up to my extended stored procedure xp_regex. Both allow you to use regular expressions in T-SQL on Microsoft SQL Server 2000. This version was written because xp_regex uses the .NET Framework, which many people were reluctant to install on their SQL Servers. (It turns out they were the smart ones: although installing the .NET Framework on a SQL Server is no cause for concern, I’ve been informed by several people that hosting the CLR inside the SQL Server process is a Bad Idea�. Please see the warnings on the xp_regex page for more information.)

xp_pcre is so named because it uses the “Perl Compatible Regular Expressions” library. This library is available at www.pcre.org. (You don’t need to download the PCRE library in order to use xp_pcre. The library is statically linked.)

Overview

There are six extended stored procedures in the DLL:

  • xp_pcre_match
  • xp_pcre_match_count
  • xp_pcre_replace
  • xp_pcre_format
  • xp_pcre_split
  • xp_pcre_show_cache

The parameters of all of these procedures can be CHAR, VARCHAR or TEXT of any SQL Server-supported length. The only exception is the @column_number parameter of xp_pcre_split, which is an INT.

If any required parameters are NULL, no matching will be performed and the output parameter will be set to NULL. (Note: This is different than the previous version which left the parameters unchanged.)

1. xp_pcre_match

Syntax:

Collapse
EXEC master.dbo.xp_pcre_match @input, @regex, @result OUTPUT
  • @input is the text to check.
  • @regex is the regular expression.
  • @result is an output parameter that will hold either ‘0’, ‘1’ or NULL.

xp_pcre_match checks to see if the input matches the regular expression. If so, @result will be set to ‘1’. If not, @result is set to ‘0’. If either @input or @regex is NULL, or an exception occurs, @result will be set to NULL.

For example, this will determine whether the input string contains at least two consecutive digits:

Collapse
DECLARE @out CHAR(1)
EXEC master.dbo.xp_pcre_match ‘abc123xyz’, ‘d{2,}’, @out OUTPUT
PRINT @out

prints out:

Collapse
1

This one will determine whether the input string is entirely comprised of at least two consecutive digits:

Collapse 
DECLARE @out CHAR(1)
EXEC master.dbo.xp_pcre_match 'abc123xyz', '^d{2,}$', @out OUTPUT
PRINT @out

prints out:

Collapse
0

2. xp_pcre_match_count

Syntax:

Collapse
EXEC master.dbo.xp_pcre_match_count @input, @regex, @result OUTPUT
  • @input is the text to check.
  • @regex is the regular expression.
  • @result is an output parameter that will hold the number of times the regular expression matched the input string (or NULL in the case of NULL inputs and/or an invalid regex).

xp_pcre_match_count tells you how many non-overlapping matches were found in the input string. The reason for making this a separate procedure than xp_pcre_match is for efficiency. In xp_pcre_match, as soon as there is one match, the procedure can return. xp_pcre_match_count needs to continually attempt a match until it reaches the end of the input string.

For example, this will determine how many times a separate series of numbers (of any length) appears in the input:

Collapse
DECLARE @out VARCHAR(20)
EXEC master.dbo.xp_pcre_match_count ‘123abc4567xyz’, ‘d+’, @out OUTPUT
PRINT @out

prints out:

Collapse
2

3. xp_pcre_replace

Syntax:

Collapse
EXEC master.dbo.xp_pcre_replace @input, @regex, @replacement, @result OUTPUT
  • @input is the text to parse.
  • @regex is the regular expression.
  • @replacement is what each match will be replaced with.
  • @result is an output parameter that will hold the result.

xp_pcre_replace is a search-and-replace function. All matches will be replaced with the contents of the @replacement parameter.

For example, this is how you would remove all white space from an input string:

Collapse
DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace ‘one  two    three four ‘, ‘s+’, ”, @out OUTPUT
PRINT ‘[‘ + @out + ‘]’

prints out:

Collapse
[onetwothreefour]

To replace all numbers (regardless of length) with “###”:

Collapse
DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace
   '12345 is less than 99999, but not 1, 12, or 123',
   'd+',
   '###',
   @out OUTPUT

PRINT @out

prints out:

Collapse
### is less than ###, but not ###, ###, or ###

Capturing parentheses is also supported. You can then use the captured text in your replacement string by using the variables $1, $2, $3, etc. For example:

Collapse
DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace
   ‘one two three four five six seven’,
   ‘(w+) (w+)’,
   ‘$2 $1,’,
   @out OUTPUT

PRINT @out

prints out:

Collapse
two one, four three, six five, seven

If you need to include a literal $ in your replacement string, escape it with a . Also, if your replacement variable needs to be followed immediately by a digit, you’ll need to put the variable number in braces. ${1}00 would result in the first capture followed by the literal characters 00. For example:

Collapse
DECLARE @out VARCHAR(8000)
EXEC master.dbo.xp_pcre_replace
   '75, 85, 95',
   '(d+)',
   '$${1}00',
   @out OUTPUT

PRINT @out

prints out:

Collapse
$7500, $8500, $9500

4. xp_pcre_format

Syntax:

Collapse
EXEC master.dbo.xp_pcre_format @input, @regex, @format, @result OUTPUT
  • @input is the text to match.
  • @regex is the regular expression.
  • @format is the format string.
  • @result is an output parameter that will hold the result.

xp_pcre_format behaves exactly like Regex.Result() in .NET or string interpolation in Perl (i.e., $formatted_phone_number = "($1) $2-$3")

For example, the regex (d{3})[^d]*(d{3})[^d]*(d{4}) will parse just about any US-phone-number-like string you throw at it:

Collapse
DECLARE @out VARCHAR(100)

DECLARE @regex VARCHAR(50)
SET @regex = '(d{3})[^d]*(d{3})[^d]*(d{4})'

DECLARE @format VARCHAR(50)
SET @format = '($1) $2-$3'

EXEC master.dbo.xp_pcre_format
   '(310)555-1212',
   @regex,
   @format,
   @out OUTPUT
PRINT @out

EXEC master.dbo.xp_pcre_format
   '310.555.1212',
   @regex,
   @format,
   @out OUTPUT
PRINT @out

EXEC master.dbo.xp_pcre_format
   ' 310!555 hey! 1212 hey!',
   @regex,
   @format,
   @out OUTPUT
PRINT @out

EXEC master.dbo.xp_pcre_format
   ' hello, ( 310 ) 555.1212 is my phone number. Thank you.',
   @regex,
   @format,
   @out OUTPUT
PRINT @out

prints out:

Collapse
(310) 555-1212
(310) 555-1212
(310) 555-1212
(310) 555-1212

The capturing and escaping conventions are the same as with xp_pcre_replace.

5. xp_pcre_split

Syntax:

Collapse
EXEC master.dbo.xp_pcre_split @input, @regex, @column_number, @result OUTPUT
  • @input is the text to parse.
  • @regex is a regular expression that matches the delimiter.
  • @column_number indicates which column to return.
  • @result is an output parameter that will hold the formatted results.

Column numbers start at 1. An error will be raised if @column_number is less than 1. In the event that @column_number is greater than the number of columns that resulted from the split, @result will be set to NULL.

This function splits text data on some sort of delimiter (comma, pipe, whatever). The cool thing about a split using regular expressions is that the delimiter does not have to be as consistent as you would normally expect.

For example, take this line as your source data:

Collapse
one ,two|three : four

In this case, our delimiter is either a comma, pipe or colon with any number of spaces either before or after (or both). In regex form, that is written: s*[,|:]s*.

For example:

Collapse
DECLARE @out VARCHAR(8000)

DECLARE @input VARCHAR(50)
SET @input = 'one  ,two|three  : four'

DECLARE @regex VARCHAR(50)
SET @regex = 's*[,|:]s*'

EXEC master.dbo.xp_pcre_split @input, @regex, 1, @out OUTPUT
PRINT @out

EXEC master.dbo.xp_pcre_split @input, @regex, 2, @out OUTPUT
PRINT @out

EXEC master.dbo.xp_pcre_split @input, @regex, 3, @out OUTPUT
PRINT @out

EXEC master.dbo.xp_pcre_split @input, @regex, 4, @out OUTPUT
PRINT @out

prints out:

Collapse
one
two
three
four

6. xp_pcre_show_cache

Syntax:

Collapse
EXEC master.dbo.xp_pcre_show_cache

In order to prevent repeated regex recompilation, xp_pcre keeps a cache of the last 50 regular expressions it has processed. (Look at the bottom of RegexCache.h to change this hard-coded value.) xp_pcre_show_cache returns a result set containing all of the regular expressions currently in the cache. There’s really no need to use it in the course of normal operations, but I found it useful during development. (I figured I would leave it in since it may be helpful for anyone who is looking at this to learn more about extended stored procedure programming.)

7. fn_pcre_match, fn_pcre_match_count, fn_pcre_replace, fn_pcre_format and fn_pcre_split

These are user-defined functions that wrap the stored procedures. This way you can use the function as part of a SELECT list, a WHERE clause, or anywhere else you can use an expression (like CHECK constraints!). To me, using the UDFs is a much more natural way to use this library.

Collapse
USE pubs
GO

SELECT master.dbo.fn_pcre_format(
   phone,
   '(d{3})[^d]*(d{3})[^d]*(d{4})',
   '($1) $2-$3'
   ) as formatted_phone
FROM
   authors

This would format every phone number in the “authors” table.

Please note, you’ll either need to create the UDFs in every database that you use them in or remember to always refer to them using their fully-qualified names (i.e., master.dbo.fn_pcre_format). Alternatively, you can follow bmoore86‘s advice at the bottom of this page in his post entitled “You don’t have to put the functions in every database”.

Also note that user-defined functions in SQL Server are not very robust when it comes to error handling. If xp_pcre returns an error, the UDF will suppress it and will return NULL. If you are using the UDFs and are getting NULLs in unexpected situations, try running the underlying stored procedure. If xp_pcre is returning an error, you’ll be able to see it.

8. Installation

  1. Copy xp_pcre.dll into your Program FilesMicrosoft SQL ServerMSSQLbinn directory.
  2. Run the SQL script INSTALL.SQL. This will register the procedures and create the user-defined functions in the master database.
  3. If you’d like to run some basic sanity checks/assertions, run TESTS.SQL and ERROR_TESTS.SQL. These scripts also serve to document the behavior of the procedures in cases of invalid input.

9. Unicode support

Unfortunately, this version does not support Unicode arguments. Potential solutions include:

  1. Use xp_regex. Internally, the CLR and .NET Framework are 100% Unicode. This option is not recommended, however, due to the potential problems with hosting the CLR inside the SQL Server process.
  2. Use the Boost Regex++ library. Unfortunately, this means giving up a lot of the newer regular expression functionality (zero-width assertions, cloistered pattern modifiers, etc.).
  3. Have xp_pcre convert to UTF-8, which is supported by PCRE. Since I don’t use Unicode data in SQL Server, I haven’t implemented it. We’ll leave this as the dreaded “exercise for the reader”. 🙂
  4. Use CAST, CONVERT or implicit conversions in the UDFs to coerce the arguments to ASCII. This probably won’t work for you because the reason you’re using NVARCHAR/NTEXT columns in the first place is because your data cannot be represented using ASCII.

10. Misc

To build the code, you’ll need to have the Boost libraries installed. You can download them from www.boost.org. Just change the “Additional Include Directories” entry under the project properties in VS.NET. It’s under Configuration Properties | C/C++ | General.

Comments/corrections/additions are welcome. Feel free to email me…you can find my email address in the header of any of the source files. Thanks!

11. History

  • 16 Mar 05 (v. 1.3.1):
    • Added xp_pcre_match_count and fn_pcre_match_count.
  • 20 Feb 05 (v. 1.3):
    • All PCRE++ code was removed and rewritten from scratch. It wasn’t thread safe and was too inefficient (in my opinion) when doing splitting and replacing. This should hopefully improve concurrency (since I no longer have to do any locking on the PCRE objects). Also, since I started from scratch, I was able to make the behavior of splitting and replacing/formatting much closer to what Perl produces (especially in cases when there is a zero-width match.)
    • Added xp_pcre_format.
    • Parameter validation and error handling have been improved.
    • Updated TESTS.sql and added ERROR_TESTS.sql.
  • 14 Feb 05 (v. 1.2):
    • Fixed the issue where splitting on a regex that matched a zero-width string (i.e., ‘s*’) would cause xp_pcre to loop infinitely.
    • Error conditions will now cause the output parameter to be set to NULL. The old version left the value unchanged.
    • Matching using the pcrepp::Pcre objects are now protected by a CRITICAL_SECTION. Although PCRE++ objects can be reused, they don’t appear thread-safe. If anyone feels this is adversely affecting scalability, please let me know. We can probably modify the cache to allow multiple instances of the same regular expression.
    • Created TESTS.sql as a way to document/verify expected results in both normal and error conditions.
    • This version statically links against PCRE 5. The previous version used PCRE 4.3 DLL. I built both a Debug (pcre5_d.lib) and a Release (pcre5.lib) version of PCRE. xp_pcre will link against the appropriate version when it is built.
    • Parameter data types and sizes are checked both more rigorously and proactively. Previously, I just waited for a generic failure error when trying to read or write parameter values.
    • If the output parameter cannot hold the entire result, an error message will be returned to SQL Server indicating how large the variable is required to be. The value of the parameter will be set to NULL.
    • catch(...) handlers have been added where applicable to prevent an unhandled exception from propagating back to SQL Server.
  • 6 Oct 03 – Updated ZIP to include xp_pcre.dll. Mentioned the Boost requirement in the Misc section. Cleaned up the documentation a bit.
  • 10 Aug 03 – Initial release.
  1. polar f4 12月 8th, 2010 @ 22:54 | #1

    Hello, I discovered your blog in a new directory of blogs. I dont know how your blog came up. Your weblog looks good. Have a good day.

  2. online store 12月 11th, 2010 @ 04:59 | #2

    I usually do not understand until the end of what on earth is written here, because I have no idea English so well. You may write it more plainly?

  3. T Shirt Template 12月 15th, 2010 @ 03:20 | #3

    this is such a useful post of hints that I’ve signed up for a subscription. The comments I hate most on my weblog are the ones by people who don’t anything more than thanks, but your are unique, thank you!

  4. Michael jackson beat it 12月 20th, 2010 @ 01:36 | #4

    thanks! seems faster at first look, hopefully it continues to function like this.

  5. Tagged home 12月 25th, 2010 @ 01:50 | #5

    Thank you for giving us the opportunity to see how you work and learn so much!

  6. Samsung Soul 12月 26th, 2010 @ 06:22 | #6

    I just cant stop reading this. Its so cool, so full of information that I just didnt know. Im glad to see that people are actually writing about this issue in such a smart way, showing us all different sides to it. Youre a great blogger. Please keep it up. I cant wait to read whats next.

  7. Rock and roll hall of fame 12月 27th, 2010 @ 14:16 | #7

    I’ve bookmarked this because I found it interesting. I would be very interested to hear more news on this. Thanks!

  8. Bucknell University 12月 28th, 2010 @ 16:42 | #8

    I very adore your own posting choice, very remarkable.

  9. Baby looney tunes 12月 29th, 2010 @ 06:02 | #9

    I usually don’t post in Blogs but your blog forced me to, amazing work.. beautiful …

  10. Bucknell University 12月 29th, 2010 @ 08:21 | #10

    This is amazing, I don’t know what else to say. Perfect execution of a cool idea!

  11. Braun rasierer 12月 29th, 2010 @ 21:44 | #11

    Lot of thanx. This templates are very useful in everyday work

  12. Ipl live score 12月 31st, 2010 @ 17:51 | #12

    Excellent article, a great deal of valuable information.

  13. Christmas Songs Lyrics 1月 1st, 2011 @ 02:24 | #13

    I love it! Could perhaps be a tad more polished, but it’s far better than what we use at the moment, nonetheless

  14. Diario Extra 1月 1st, 2011 @ 18:48 | #14

    Hey, I attempted to email you about this article that i’ve a few inquires, but can’t seem to reach you. Please email me when have a minute. Thanks.

  15. Google sketchup 1月 3rd, 2011 @ 00:57 | #15

    very use full information. thank you.

  16. Denon Receivers 1月 4th, 2011 @ 13:16 | #16

    Someone I work with visits your blog frequently and recommended it to me to read too. The writing style is great and the content is top-notch. Thanks for the insight you provide the readers!

评论提交中, 请稍候...

留言

可以使用的标签: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
Trackbacks & Pingbacks ( 0 )
  1. 还没有 trackbacks