[ACCEPTED]-How do Markov Chain Chatbots work?-markov-chains

Accepted answer
Score: 140

I made a Markov chain chatbot for IRC in 71 Python a few years back and can shed some 70 light how I did it. The text generated does 69 not necessarily make any sense, but it can 68 be really fun to read. Lets break it down 67 in steps. Assuming you have a fixed input, a 66 text file, (you can use input from chat 65 text or lyrics or just use your imagination)

Loop 64 through the text and make a Dictionary, meaning 63 key - value container. And put all pair 62 of words as keys and the word following 61 as a value. For example: If you have a 60 text "a b c a b k" you start with "a b" as 59 key and "c" as value, then "b c" and "a" as 58 value... the value should be a list or any 57 collection holding 0..many 'items' as you 56 can have more than one value for a given 55 pair of words. In the example above you 54 will have "a b" two times followed fist 53 by "c" then in the end by "k". So in the 52 end you will have a dictionary/hash looking 51 like this: {'a b': ['c','k'], 'b c': ['a'], 'c a': ['b']}

Now you have the needed structure 50 for building your funky text. You can choose 49 to start with a random key or a fixed place! So 48 given the structure we have we can start 47 by saving "a b" then randomly taking a following 46 word from the value, c or k, so the first 45 save in the loop, "a b k" (if "k" was the 44 random value chosen) then you continue by 43 moving one step to the right which in our 42 case is "b k" and save a random value for 41 that pair if you have, in our case no, so 40 you break out of the loop (or you can decide 39 other stuff like start over again). When 38 to loop is done you print your saved text 37 string.

The bigger the input, the more values 36 you will have for you keys (pair of words) and 35 will then have a "smarter bot" so you can 34 "train" your bot by adding more text (perhaps 33 chat input?). If you have a book as input, you 32 can construct some nice random sentences. Please 31 note that you don't have to take only one 30 word that follows a pair as a value, you 29 can take 2 or 10. The difference is that 28 your text will appear more accurate if you 27 use "longer" building blocks. Start with 26 a pair as a key and the following word as 25 a value.

So you see that you basically can 24 have two steps, first make a structure where 23 you randomly choose a key to start with 22 then take that key and print a random value 21 of that key and continue till you do not 20 have a value or some other condition. If 19 you want you can "seed" a pair of words 18 from a chat input from your key-value structure 17 to have a start. Its up to your imagination 16 how to start your chain.

Example with real 15 words:

"hi my name is Al and i live in a box that i like very much and i can live in there as long as i want"

"hi my" -> ["name"]

"my name" -> ["is"]

"name is" -> ["Al"]

"is Al" -> ["and"]


"and i" -> ["live", "can"]


"i can" -> ["live"]


Now construct a loop:

Pick a random 14 key, say "hi my" and randomly choose a value, only 13 one here so its "name" (SAVING "hi my name").
Now move one step 12 to the right taking "my name" as the next 11 key and pick a random value... "is" (SAVING "hi my name is").
Now 10 move and take "name is" ... "Al" (SAVING "hi my name is AL").
Now take 9 "is Al" ... "and" (SAVING "hi my name is Al and").


When you come to "and 8 i" you will randomly choose a value, lets 7 say "can", then the word "i can" is made 6 etc... when you come to your stop condition 5 or that you have no values print the constructed 4 string in our case:

"hi my name is Al and i can live in there as long as i want"

If you have more values 3 you can jump to any keys. The more values 2 the more combinations you have and the more 1 random and fun the text will be.

Score: 9

The bot chooses a random word from your 30 input and generates a response by choosing 29 another random word that has been seen to 28 be a successor to its held word. It then 27 repeats the process by finding a successor 26 to that word in turn and carrying on iteratively 25 until it thinks it’s said enough. It reaches 24 that conclusion by stopping at a word that 23 was prior to a punctuation mark in the training 22 text. It then returns to input mode again 21 to let you respond, and so on.

It isn’t very 20 realistic but I hereby challenge anyone 19 to do better in 71 lines of code !! This 18 is a great challenge for any budding Pythonists, and 17 I just wish I could open the challenge to 16 a wider audience than the small number of 15 visitors I get to this blog. To code a bot 14 that is always guaranteed to be grammatical 13 must surely be closer to several hundred 12 lines, I simplified hugely by just trying 11 to think of the simplest rule to give the 10 computer a mere stab at having something 9 to say.

Its responses are rather impressionistic 8 to say the least ! Also you have to put 7 what you say in single quotes.

I used War 6 and Peace for my “corpus” which took a couple 5 of hours for the training run, use a shorter 4 file if you are impatient…

here is the trainer

import pickle
for line in b:
    for word in line.split():
        text.append (word)
for l in range(len(textset)):
    for w in range(len(text)-1):
        if check==text[w] and text[w][-1] not in '(),.?!':

Here 3 is the bot:

import pickle,random
def nextword(a):
    if a in successorlist:
        return random.choice(successorlist[a])
        return 'the'
while speech!='quit':
    while True:
        response+=' '+neword
        if neword[-1] in ',?!.':
    print response

You tend to get an uncanny feeling 2 when it says something that seems partially 1 to make sense.

Score: 0

You could do like this: Make a order 1 markov 17 chain generator, using words and not letters. Everytime 16 someone post something, what he posted is 15 added to bot database. Also bot would save 14 when he gone to chat and when a guy posted 13 the first post (in multiples of 10 seconds), then 12 he would save the amount of time this same 11 guy waited to post again (in multiples of 10 10 seconds)... This second part would be 9 used to see when the guy will post, so he 8 join the chat and after some amount of time 7 based on a table with "after how many 10 6 seconds the a guy posted after joining the 5 chat", then he would continue to post with 4 the same table thinking "how was the amount 3 of time used to write the the post that 2 was posted after a post that he used X seconds 1 to think about and write"

More Related questions