[ACCEPTED]-Regular expression to match common SQL syntax?-unit-testing
Regular expressions can match languages 3 only a finite state automaton can parse, which 2 is very limited, whereas SQL is a syntax. It can be demonstrated you can't validate SQL with a regex. So, you 1 can stop trying.
SQL is a type-2 grammar, it is too powerful to be described 5 by regular expressions. It's the same as 4 if you decided to generate C# code and then 3 validate it without invoking a compiler. Database 2 engine in general is too complex to be easily 1 stubbed.
That said, you may try ANTLR's SQL grammars.
As far as I know this is beyond regex and 5 your getting close to the dark arts of BnF 4 and compilers.
Same things happens to people 3 who want to do correct syntax highlighting. You 2 start cramming things into regex and then 1 you end up writing a compiler...
I had the same problem - an approach that 5 would work for all the more standard sql 4 statements would be to spin up an in-memory 3 Sqlite database and issue the query against 2 it, if you get back a "table does not exist" error, then 1 your query parsed properly.
Off the top of my head: Couldn't you pass 3 the generated SQL to a database and use 2 EXPLAIN on them and catch any exceptions 1 which would indicate poorly formed SQL?
Have you tried the lazy selectors. Rather 3 than match as much as possible, they match 2 as little as possible which is probably 1 what you need for quotes.
To validate the queries, just run them with 7 SET NOEXEC ON, that is how Entreprise Manager does it 6 when you parse a query without executing 5 it.
Besides if you are using regex to validate 4 sql queries, you can be almost certain that 3 you will miss some corner cases, or that 2 the query is not valid from other reasons, even 1 if it's syntactically correct.
I suggest creating a database with the same 2 schema, possibly using an embedded sql engine, and 1 passing the sql to that.
I don't think that you even need to have 16 the schema created to be able to validate 15 the statement, because the system will not 14 try to resolve object_name etc until it 13 has successfully parsed the statement.
With 12 Oracle as an example, you would certainly 11 get an error if you did:
select * from non_existant_table;
In this case, "ORA-00942: table 10 or view does not exist".
However if you execute:
select * frm non_existant_table;
Then 9 you'll get a syntax error, "ORA-00923: FROM 8 keyword not found where expected".
It ought 7 to be possible to classify errors into syntax 6 parsing errors that indicate incorrect syntax 5 and errors relating to tables name and permissions 4 etc..
Add to that the problem of different 3 RDBMSs and even different versions allowing 2 different syntaxes and I think you really 1 have to go to the db engine for this task.
There are ANTLR grammars to parse SQL. It's really a 6 better idea to use an in memory database or a very lightweight 5 database such as sqlite. It seems wasteful to 4 me to test whether the SQL is valid from 3 a parsing standpoint, and much more useful 2 to check the table and column names and 1 the specifics of your query.
The best way is to validate the parameters 5 used to create the query, rather than the 4 query itself. A function that receives the 3 variables can check the length of the strings, valid 2 numbers, valid emails or whatever. You can 1 use regular expressions to do this validations.
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.