[ACCEPTED]-Random string that matches a regexp-random

Accepted answer
Score: 19

Welp, just musing, but the general question 57 of generating random inputs that match a 56 regex sounds doable to me for a sufficiently 55 relaxed definition of random and a sufficiently 54 tight definition of regex. I'm thinking 53 of the classical formal definition, which 52 allows only ()|* and alphabet characters.

Regular 51 expressions can be mapped to formal machines 50 called finite automata. Such a machine is a directed graph 49 with a particular node called the final 48 state, a node called the initial state, and 47 a letter from the alphabet on each edge. A 46 word is accepted by the regex if it's possible 45 to start at the initial state and traverse 44 one edge labeled with each character through 43 the graph and end at the final state.

One 42 could build the graph, then start at the 41 final state and traverse random edges backwards, keeping 40 track of the path. In a standard construction, every 39 node in the graph is reachable from the 38 initial state, so you do not need to worry 37 about making irrecoverable mistakes and 36 needing to backtrack. If you reach the 35 initial state, stop, and read off the path 34 going forward. That's your match for the 33 regex.

There's no particular guarantee about 32 when or if you'll reach the initial state, though. One 31 would have to figure out in what sense the 30 generated strings are 'random', and in what 29 sense you are hoping for a random element 28 from the language in the first place.

Maybe 27 that's a starting point for thinking about 26 the problem, though!

Now that I've written 25 that out, it seems to me that it might be 24 simpler to repeatedly resolve choices to 23 simplify the regex pattern until you're 22 left with a simple string. Find the first 21 non-alphabet character in the pattern. If 20 it's a *, replicate the preceding item some 19 number of times and remove the *. If it's 18 a |, choose which of the OR'd items to preserve 17 and remove the rest. For a left paren, do 16 the same, but looking at the character following 15 the matching right paren. This is probably 14 easier if you parse the regex into a tree 13 representation first that makes the paren 12 grouping structure easier to work with.

To 11 the person who worried that deciding if 10 a regex actually matches anything is equivalent 9 to the halting problem: Nope, regular languages 8 are quite well behaved. You can tell if 7 any two regexes describe the same set of 6 accepted strings. You basically make the 5 machine above, then follow an algorithm 4 to produce a canonical minimal equivalent 3 machine. Do that for two regexes, then 2 check if the resulting minimal machines 1 are equivalent, which is straightforward.

Score: 19

String::Random in Perl will generate a random string from 1 a subset of regular expressions:


use strict;
use warnings;

use String::Random qw/random_regex/;

print random_regex('[A-Za-z]{3}[0-9][A-Z]{2}[!@#$%^&*]'), "\n";
Score: 6

If you have a specific problem, you probably 14 have a specific regular expression in mind. I 13 would take that regular expression, work 12 out what it means in simple human terms, and 11 work from there.

I suspect it's possible to create 10 a general regex random match generator, but 9 it's likely to be much more work than just handling 8 a specific case - even if that case changes 7 a few times a year.

(Actually, it may not 6 be possible to generate random matches in 5 the most general sense - I have a vague 4 memory that the problem of "does any string 3 match this regex" is the halting problem 2 in disguise. With a very cut-down regex 1 language you may have more luck though.)

Score: 4

I have written Parsley, which consist of a Lexer 5 and a Generator.

  • Lexer is for converting a regular expression-like string into a sequence of tokens.
  • Generator is using these tokens to produce a defined number of codes.
$generator = new \Gajus\Parsley\Generator();

 * Generate a set of random codes based on Parsley pattern.
 * Codes are guaranteed to be unique within the set.
 * @param string $pattern Parsley pattern.
 * @param int $amount Number of codes to generate.
 * @param int $safeguard Number of additional codes generated in case there are duplicates that need to be replaced.
 * @return array
$codes = $generator->generateFromPattern('FOO[A-Z]{10}[0-9]{2}', 100);

The above example will generate 4 an array containing 100 codes, each prefixed 3 with "FOO", followed by 10 characters 2 from "ABCDEFGHKMNOPRSTUVWXYZ23456789" haystack 1 and 2 numbers from "0123456789" haystack.

Score: 3

This PHP library looks promising: ReverseRegex

Like all 3 of these, it only handles a subset of regular 2 expressions but it can do fairly complex 1 stuff like UK Postcodes:

([A-PR-UWYZ]([0-9]([0-9]|[A-HJKSTUW])?|[A-HK-Y][0-9]([0-9]|[ABEHMNPRVWXY])?) ?[0-9][ABD-HJLNP-UW-Z]{2}|GIR0AA)


B6 6SB
P9 7EX
N9 2DH
GQ28 4UL
Score: 1

You'd need to write a string generator that 9 can parse regular expressions and generate 8 random members of character ranges for random 7 lengths, etc.

Much easier would be to write 6 a random password generator with certain 5 rules (starts with a lower case letter, has 4 at least one punctuation, capital letter 3 and number, at least 6 characters, etc) and 2 then write your regex so that any passwords 1 created with said rules are valid.

Score: 0

Presuming you have both a minimum length 6 and 3-of-4* (or similar) requirement, I'd 5 just be inclined to use a decent password 4 generator.

I've built a couple in the past 3 (both web-based and command-line), and have 2 never had to skip more than one generated 1 string to pass the 3-of-4 rule.

  • 3-of-4: must have at least three of the following characteristics: lowercase, uppercase, number, symbol
Score: 0

It is possible (for example, Haskell regexp 6 module has a test suite which automatically 5 generates strings that ought to match certain 4 regexes).

However, for a simple task at hand 3 you might be better off taking a simple 2 password generator and filtering its output 1 with your regexp.

More Related questions