############################################################################ ## GeneralizedGouldenJackson Save this file as GeneralizedGouldenJackson # ## stay in the same directory. # ## Get into Maple (by typing: maple ) # ## and then type: read GeneralizedGouldenJackson # ## Then follow the instructions given there # ## # ## Written by Beth Kupin and Debbie Yuster, Rutgers University, # ## [ekupin, yuster] at math dot rutgers dot edu. # ############################################################################ print(`This is the package GeneralizedGouldenJackson, written by Beth Kupin and Debbie Yuster`): Help:=proc(): if args=NULL then print(`This is the package GeneralizedGouldenJackson, written by Beth Kupin and Debbie Yuster`): print(``): print(`This Maple Package extends the Goulden Jackson Cluster Method to keeping track of (i.e. assigning formal weights to) single letters, ordered letter pairs and ordered letter triples.`); print(`~~~`); print(`This package contains the following generalizations of the original cluster method: SingleGJ, DoubleGJ, TripleGJ, SingleGJST, DoubleGJST.`); print(`In addition, it contains the following programs designed specifically for probability weights: ProbDoubleGJ, ProbDoubleGJST, ProbDoubleGJIF, ProbTripleGJ.`); print(`As an alternative to using the Goulden-Jackson method, this package also includes two programs that rely solely on recursion: RecursiveDouble and RecursiveProbDouble.`); print(`For making small, faster computations, the package contains the following programs: SingleGJseries, DoubleGJseries, TripleGJseries, ProbDoubleGJseries, and ProbTripleGJseries. These programs compute term by term the same generating functions as their regular counterparts, and can be faster than the traditional method when the alphabet is large and only a small number of terms are needed..`); print(`To test these programs, the package also contains some simpler, slower programs: CleanWords, EmpiricalSingleGJseries, EmpiricalDoubleGJseries, EmpiricalTripleGJseries, EmpiricalProbDoubleGJseries, EmpiricalProbTripleGJseries. These first compute the set of allowable words, and then from there compute the first terms of the generating function.`); print(`For descriptions and help on individual programs type "Help(procedure_name)"`); fi: ###Single Letter GJ Functions### if nops([args])=1 and op(1,[args])=SingleGJ then print(`SingleGJ(A,F,t,V) takes in an alphabet A (as a list), a set of forbidden words F, and implements the Goulden-Jackson Cluster method keeping track of the weights of individual letters.`); print(`V is a list (in the same order as A) with the individual letter weights. For example, in this setting the weight of CAT is t^3*V[c]*V[a]*V[t].`); print(`Try SingleGJ([a,b],{[a,b],[b,a]},t,[1/3,2/3])`); fi: if nops([args])=1 and op(1,[args])=SingleGJST then print(`SingleGJST(A,F,s,t,V) takes in an alphabet A (as a list), a set of forbidden words F, and implements the Goulden-Jackson Cluster method keeping track of the weights of individual letters. V is a list (in the same order as A) with the individual letter weights.`); print(`The generating function computed includes the variable s, whose exponent counts how many mistakes are in a given word. For example, the coefficient of s^m*t^n in the Taylor expansion of a computed generating function is the sum of weights of n-letter words containing exactly m mistakes. For example, in this setting the weight of CAT is t^3*V[c]*V[a]*V[t].`); print(`Try SingleGJST([a,b],{[a,b],[b,a]},t,[1/3,2/3])`); fi: ###Double Letter GJ Functions### if nops([args])=1 and op(1,[args])=DoubleGJ then print(`DoubleGJ(A,F,t,M) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of single letters.`); print(`M is a table that contains the digraphs weights, as well as the single letter weights. The program MakeTable(R,A) takes in an array R, of a specific form, and initializes a table that can be used in DoubleGJ. Type Help(MakeTable) for more information. To run DoubleGJ symbolically it is not necessary to initialize a table.`); print(`The weight of a word under DoubleGJ will be the product of the weights of the digraphs and the weights of the single letters the word contains. For example, the weight of CAT is t^3*M[ID,c]*M[ID,a]*M[ID,t]*M[c,a]*M[a,t] (if the variable M is not initialized, M[ID,c] will be the weight of the single letter 'c' and M[c,a] the weight of the digraph "ca").`); print(`Try DoubleGJ([a,b],{[a,b],[b,a]},t,M)`); fi: if nops([args])=1 and op(1,[args])=DoubleGJST then print(`DoubleGJST(A,F,s,t,M) is the s,t-analog of DoubleGJ. It takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs). The generating function will be in two variables, 's' and 't'.`); print(`M is a table that contains the digraphs weights, as well as the single letter weights. Any M for DoubleGJ will work for DoubleGJST, and vice versa - their format is identitical. The program MakeTable(R,A) takes in an array R, of a specific form, and initializes a table that can be used in DoubleGJST. Type Help(MakeTable) for more information. To run DoubleGJST symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbDoubleGJST will be the product of the digraphs it contains as well as the single letter weight for the first letter. For example, the weight of CAT is t3*M[ID,c]*M[c,a]*M[a,t], where M[ID,c] is the probability of the single letter 'c'.`); print(` The generating function computed includes the variable s, whose exponent counts how many mistakes are in a given word. For example, the coefficient of s^m*t^n in the Taylor expansion of a computed generating function is the sum of weights of n-letter words containing exactly m mistakes. `); print(`Try DoubleGJST([a,b],{[a,b],[b,a]},s,t,M)`); fi: if nops([args])=1 and op(1,[args])=ProbDoubleGJ then print(`ProbDoubleGJ(A,F,t,M) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the probabilities of digraphs (ordered letter pairs) and initial single letter probabilities.`); print(`M is a table that contains the digraph probabilities, as well as the single letter probabilities. Any M for DoubleGJ will work for ProbDoubleGJ, and vice versa - their format is identitical. The program MakeTable(R,A) takes in an array R, of a specific form, and initializes a table that can be used in ProbDoubleGJ. Type Help(MakeTable) for more information. To run ProbDoubleGJ symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbDoubleGJ will be the product of probabilities of the digraphs it contains, as well as the single letter probability for the first letter. For example, the weight of CAT is t^3*M[ID,c]*M[c,a]*M[a,t], where M[ID,c] is the probability of the single letter 'c'. We interpret M[a,b] as a conditional probability, i.e. the probability that, given an occurrence of "a", the letter following it is "b".`); print(`Try ProbDoubleGJ([a,b],{[a,b],[b,a]},t,M)`); fi: if nops([args])=1 and op(1,[args])=ProbDoubleGJIF then print(`ProbDoubleGJIF(A,F,t,M) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of the first and last letters of the word.`); print(`M is a table that contains the digraphs weights, as well as the single letter weights. The program MakeTableF(R,A) takes in an array R, of a specific form, and initializes a table that can be used in DoubleGJ. Type Help(MakeTableF) for more information. This is NOT the same table as can be used for DoubleGJ and ProbDoubleGJ. To run ProbDoubleGJIF symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbDoubleGJIF will be the product of the weights of the digraphs it contains, as well as the weight of the first and last letters. For example, the weight of CAT is t^3*M[ID,c]*M[c,a]*M[a,t]*M[FIN,t]. M[ID,c] is the probability that a word begins with c, M[FIN,t] is the probability that a word ends with t, and M[c,a] is the probability of the digraph "ca".`); print(`Note that in this model we are allowed to assign different values to M[ID,a] and M[FIN,a]. In essence, we are allowed to change the weights based not only on the letter in question, but also on its position within the word. If we set M[FIN,a]=1 for all a in A, we recover ProbDoubleGJ.`); print(`Try ProbDoubleGJIF([a,b],{[a,b],[b,a]},t,M)`); fi: if nops([args])=1 and op(1,[args])=ProbDoubleGJST then print(`ProbDoubleGJST(A,F,s,t,M) is the s,t-analog of ProubDoubleGJ. It takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the probabilities of digraphs (ordered letter pairs). The generating function will be in two variables, 's' and 't'.`); print(`M is a table that contains the digraph probabilities, as well those of the single letters. Any M for DoubleGJ will work for ProbDoubleGJST, and vice versa - their format is identitical. The program MakeTable(R,A) takes in an array R, of a specific form, and initializes a table that can be used in ProbDoubleGJST. Type Help(MakeTable) for more information. To run ProbDoubleGJST symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbDoubleGJST will be the product of the probabilities of the digraphs it contains, as well as the single letter probability for the first letter. For example, the weight of CAT is t3*M[ID,c]*M[c,a]*M[a,t], where M[ID,c] is the probability of the single letter 'c'. We interpret M[a,b] as a conditional probability, i.e. the probability that, given an occurrence of "a", the letter following it is "b".`); print(`The generating function computed includes the variable s, whose exponent counts how many mistakes are in a given word. For example, the coefficient of s^m*t^n in the Taylor expansion of a computed generating function is the sum of weights of n-letter words containing exactly m mistakes.`); print(`Try ProbDoubleGJST([a,b],{[a,b],[b,a]},s,t,M)`); fi: ###Triple Letter GJ Functions### if nops([args])=1 and op(1,[args])=TripleGJ then print(`TripleGJ(A,F,t,M) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of trigraphs (ordered letter triples), digraphs (ordered letter pairs), and single letters.`); print(`M is a table that contains the trigraph weights, as well as the digraph and single letter weights. The program MakeTable2(R,A) takes in an array R, of a specific form, and initializes a table that can be used in TripleGJ. Type Help(MakeTable2) for more information. To run TripleGJ symbolically it is not necessary to initialize a table.`); print(`The weight of a word under TripleGJ will be the product of the trigraphs it contains as well as the weight the digraphs and single letters. If the word is only one letter long, its weight will be the weight of that single letter. For example, the weight of CAT is t^3*M[ID,ID,c]*M[ID,ID,a]*M[ID,ID,t]*M[ID,c,a]*M[ID,a,t]*M[c,a,t] (if M is not initialized, M[c,a,t] will be the weight of the trigraph "cat", M[ID,c,a] the weight of the digraph "ac", and M[ID,ID,c] the weight of the single letter "c").`); print(`Try TripleGJ([a,b],{[a,b],[b,a]},t,M)`); fi: if nops([args])=1 and op(1,[args])=ProbTripleGJ then print(`ProbTripleGJ(A,F,t,M) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of trigraphs (ordered letter triples), as well as initial digraph or single letter weights.`); print(`M is a table that contains the trigraph weights, as well as the digraph and single letter weights. The program MakeTable2(R,A) takes in an array R, of a specific form, and initializes a table that can be used in ProbTripleGJ. Any M for TripleGJ will work for ProbTripleGJ, and vice versa - their format is identitical. Type Help(MakeTable2) for more information. To run ProbTripleGJ symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbTripleGJ will be the product of the trigraphs it contains, as well as the weight of the first digraph. If the word is only one letter long, its weight will be the weight of that single letter. For example, the weight of CATTY is t^5*M[ID,c,a]*M[c,a,t]*M[a,t,t]*M[t,t,y], where M[ID,c,a] is the probability of the digraph "ca".`); print(` We interpret the weights as follows: M[c,a,t] is the probability that, given an occurrence of the digraph "ca", the next letter is "t". M[ID,c,a] is the probability of seeing "ca", and M[ID,ID,c] is the probability of seeing just "c".`); print(`Try ProbTripleGJ([a,b],{[a,b],[b,a]},t,M)`); fi: ###Recursive Functions### if nops([args])=1 and op(1,[args])=RecursiveDouble then print(` RecursiveDouble(A,F,t,M) inputs an alphabet A, a set of dirty words F, and returns the generating function for the words in that alphabet that avoid all dirty words in F with the same weights as DoubleGJ. RecursiveDouble uses generating functions, not the Goulden-Jackson cluster method, and is included as an alternative to DoubleGJ.`); print(`M is a table containing the monograph and digraph weights. Any M that works for DoubleGJ will work for RecursiveDouble, and vice versa - the formats are identical. Given an array R of a certain form, MakeTable will correctly initialize this table. Type Help(MakeTable) for more information. The program will run symbolically without an initialized table.`); print(`Try RecursiveDouble([a,b],{[a,b],[b,a]},t,M).`); fi: if nops([args])=1 and op(1,[args])=RecursiveProbDouble then print(` RecursiveProbDouble(A,F,t,M) inputs an alphabet A, a set of dirty words F, and returns the generating function for the words in that alphabet that avoid all dirty words in F with the same weights as ProbDoubleGJ. RecursiveProbDouble uses generating functions, not the Goulden-Jackson cluster method, and is included as an alternative to ProbDoubleGJ.`); print(`M is a table containing the monograph and digraph weights. Any M that works for ProbDoubleGJ will work for RecursiveProbDouble, and vice versa - the formats are identical. Given an array R of a certain form, MakeTable will correctly initialize this table. Type Help(MakeTable) for more information. The program will run symbolically without an initialized table.`); print(`Try RecursiveProbDouble([a,b],{[a,b],[b,a]},t,M).`); fi: ###Series Functions### if nops([args])=1 and op(1,[args])=SingleGJseries then print(`SingleGJseries(A,F,x,M,L) takes in an alphabet A (as a list), a set of forbidden words F, and implements the Goulden-Jackson Cluster method keeping track of the weights of individual letters. Unlike SingleGJ, SingleGJseries does not compute the whole generating function, but only the first L terms.`); print(`M is a table with the single letter weights. The program MakeTableS(R,A) will initialize a table in a form that can be used by this program. Type Help(MakeTableS) for more information. SingleGJseries will also run symbolically, without initializing a table.`); print(`The code for SingleGJseries is essentially the same as the code for the program wGJseries, written by John Noonan and Doron Zeilberger, available online at http://www.math.rutgers.edu/~zeilberg/gj.html under the package GJseries. We have modified the format and input to make it match the rest of the series programs.`); print(`For example, try SingleGJseries([a,b],{[a,b],[b,a]},t,M,5)`); fi: if nops([args])=1 and op(1,[args])=DoubleGJseries then print(`DoubleGJseries(A,F,t,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of single letters. Unlike DoubleGJ, DoubleGJseries does not compute the whole generating function, but only the first L terms.`); print(`M is a table that contains the digraphs weights, as well as the single letter weights. The program MakeTable(R,A) takes in an array R, of a specific form, and initializes a table that can be used in DoubleGJseries. Type Help(MakeTable) for more information. To run DoubleGJseries symbolically it is not necessary to initialize a table.`); print(`The weight of a word under DoubleGJseries will be the product of the weights of the digraphs and the weights of the single letters the word contains. For example, the weight of CAT is t^3*M[ID,c]*M[ID,a]*M[ID,t]*M[c,a]*M[a,t] (if the variable M is not initialized, M[ID,c] will be the weight of the single letter 'c' and M[c,a] the weight of the digraph "ca").`); print(`Try DoubleGJseries([a,b],{[a,b],[b,a]},t,M,5)`); fi: if nops([args])=1 and op(1,[args])=ProbDoubleGJseries then print(`ProbDoubleGJseries(A,F,t,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of single letters. Unlike ProbDoubleGJ, ProbDoubleGJseries does not compute the whole generating function, but only the first L terms.`); print(`M is a table that contains the digraphs weights, as well as the single letter weights. The program MakeTable(R,A) takes in an array R, of a specific form, and initializes a table that can be used in ProbDoubleGJseries. Type Help(MakeTable) for more information. To run ProbDoubleGJseries symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbDoubleGJseries will be the product of the weights of the digraphs the word contains and the weight of the first single letter. For example, the weight of CAT is t^3*M[ID,c]*M[c,a]*M[a,t]. We interpret M[ID,a] as the probability of the occurrence of the letter 'a', and M[a,b] as a conditional probability: given an occurrence of 'a', M[a,b] is the probability that the letter immediately following 'a' is 'b'.`); print(`Try ProbDoubleGJseries([a,b],{[a,b],[b,a]},t,M,5)`); fi: if nops([args])=1 and op(1,[args])=TripleGJseries then print(`TripleGJseries(A,F,t,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of trigraphs (ordered letter triples), digraphs (ordered letter pairs), and single letters. Unlike TripleGJ, TripleGJseries does not compute the whole generating function, but just the first L terms.`); print(`M is a table that contains the trigraph weights, as well as the digraph and single letter weights. The program MakeTable2(R,A) takes in an array R, of a specific form, and initializes a table that can be used in TripleGJseries. Type Help(MakeTable2) for more information. To run TripleGJseries symbolically it is not necessary to initialize a table.`); print(`The weight of a word under TripleGJseries will be the product of the trigraphs it contains, as well as the weight of the digraphs and single letters is contains. For example, the weight of CAT is: t^3*M[ID,ID,c]*M[ID,ID,a]*M[ID,ID,t]*M[ID,c,a]*M[ID,a,t]*M[c,a,t] (if M is not initialized, M[c,a,t] will be the weight of the trigraph "cat", M[ID,c,a] the weight of the digraph "ac", and M[ID,ID,c] the weight of the single letter "c").`); print(`Try TripleGJseries([a,b],{[a,b],[b,a]},t,M,5)`); fi: if nops([args])=1 and op(1,[args])=ProbTripleGJseries then print(`ProbTripleGJseries(A,F,t,M) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of trigraphs (ordered letter triples), digraphs (ordered letter pairs), and single letters. Unlike ProbTripleGJ, ProbTripleGJseries does not compute the whole generating function, but just the first L terms.`); print(`M is a table that contains the trigraph weights, as well as the digraph and single letter weights. The program MakeTable2(R,A) takes in an array R, of a specific form, and initializes a table that can be used in ProbTripleGJseries. Type Help(MakeTable2) for more information. To run ProbTripleGJseries symbolically it is not necessary to initialize a table.`); print(`The weight of a word under ProbTripleGJseries will be the product of the trigraphs it contains, as well as the weight of the first digraph. For example, the weight of CATTY is: t^5*M[ID,c,a]*M[c,a,t]*M[a,t,t]*M[t,t,y]. We interpret M[ID,c,a] to be the probability of the digraph 'ca', and M[c,a,t] as a conditional probability: given an occurrence of 'ca, M[c,a,t] is the probability that the next letter will be 't. If the word has length one, for example 'c', its weight will be t*M[ID,ID,c], the probability of the single letter 'c' appearing. `); print(`Try ProbTripleGJseries([a,b],{[a,b],[b,a]},t,M,5)`); fi: ###Table Making Functions### if nops([args])=1 and op(1,[args])=MakeTable then print(`MakeTable(R,A) inputs an n+1 by n array, where n is the length of the alphabet, A, and outputs a table that can be used by DoubleGJ, ProbDoubleGJ, ProbDoubleGJSeries, and most of the double letter programs.`); print(`The array R must be in a specific form:`); print(`The n columns and the first n rows each correspond to the letters of the alphabet (in the same order as they appear in A). The entry in the row corresponding to "a" and column corresponding to "b" is the weight of the digraph "ab". The last row encodes the monograph weights. The entry in the n+1 row and column corresponding to "c" is the weight of the single letter "c".`); print(`Try MakeTable(array(1..3,1..2,[[1/2,1/2],[1/2,1/2],[1/2,1/2]]),[a,b])`); fi: if nops([args])=1 and op(1,[args])=MakeTableF then print(`MakeTableF(R,A) inputs an n+2 by n array, where n is the length of the alphabet, A, and outputs a table that can be used by ProbDoubleGJIF.`); print(`The array R must be in a specific form:`); print(`The n columns and the first n rows each correspond to the letters of the alphabet (in the same order as they appear in A). The entry in the row corresponding to "a" and column corresponding to "b" is the probability of the digraph "ab". The second to last (n+1) row encodes the initial monograph probabilities. The last row encodes the final monograph probabilities.`); print(`Try MakeTableF(array(1..4,1..2,[[1/2,1/2],[1/2,1/2],[1/2,1/2],[1/2,1/2]),[a,b])`); fi: if nops([args])=1 and op(1,[args])=MakeTable2 then print(`MakeTable2(R,A) inputs an n^2+n+1 by n array, where n is the length of the alphabet, A, and outputs a table that can be used by TripleGJ and all of the triple letter programs.`); print(`The array R must be in a specific form:`); print(`The n columns each correspond to the letters of the alphabet (in the same order as they appear in A). The first n^2 rows correspond to all the digraphs of A, ordered lexicographically, i.e. if "a" comes before "b" in A, then "aa" comes before "ab". The entry in the row corresponding to "ab" and the column corresponding to "c" is the weight of the trigraph "abc".`); print(`The next n rows correspond to the single letters, in the same order as A. The entry in the row corresponding to "b", the second letter in A, (i.e. in row n^2+2) and column corresponding to "c" is the weight of the digraph "bc". The last row encodes the monograph weights. The entry in row n^2n+1 and column corresponding to "t" is the weight of the single letter "t".`); print(`Try R:=array(1..7,1..2,[seq([1/2,1/2],i=1..4),[1/2,1/6],[1/6,1/6],[1/3,2/3]]),`); print(`and then try MakeTable2(R,[a,b])`); fi: if nops([args])=1 and op(1,[args])=MakeTableS then print(`MakeTableS(L,A) inputs list of length n, where n is the length of the alphabet, A, and outputs a table that can be used by SingleGJseries. Each entry in the list corresponds to the weight of a single letter, and the list must be in the same order as A.`); print(`Try MakeTableS([1/3,1/3,1/3],[a,b,c])`); fi: ###Test Programs### if nops([args])=1 and op(1,[args])=CleanWords then print(` CleanWords(A,F,n) inputs an alphabet A, a set of forbidden (dirty) words F, and an integer n, and outputs the set of n-letter words in A that have no subwords in F.`); fi: if nops([args])=1 and op(1,[args])=EmpiricalSingleGJseries then print(`EmpiricalSingleGJseries(A,F,M,L) takes in an alphabet A (as a list), a set of forbidden words F, weights table M (which can be left uninitialized as a symbol), and positive integer L. It computes, empirically, the first L coefficients of the Taylor expansion of SingleGJ(A,F,t,M).`); print(`WARNING: This program is very slow and should be used for checking purposes only!`); print(`For example, try EmpiricalSingleGJseries([a,b],{[a,b],[b,a]},M,5)`); fi: if nops([args])=1 and op(1,[args])=EmpiricalDoubleGJseries then print(`EmpiricalDoubleleGJseries(A,F,M,L) takes in an alphabet A (as a list), a set of forbidden words F, weights table M (which can be left uninitialized as a symbol), and positive integer L. It computes, empirically, the first L coefficients of the Taylor expansion of DoubleGJ(A,F,t,M).`); print(`WARNING: This program is very slow and should be used for checking purposes only!`); print(`For example, try EmpiricalDoubleGJseries([a,b],{[a,b],[b,a]},M,5)`); fi: if nops([args])=1 and op(1,[args])=EmpiricalTripleGJseries then print(`EmpiricalTripleGJseries(A,F,M,L) takes in an alphabet A (as a list), a set of forbidden words F, weights table M (which can be left uninitialized as a symbol), and positive integer L. It computes, empirically, the first L coefficients of the Taylor expansion of TripleGJ(A,F,t,M).`); print(`WARNING: This program is very slow and should be used for checking purposes only!`); print(`For example, try EmpiricalTripleGJseries([a,b],{[a,b],[b,a]},M,5)`); fi: if nops([args])=1 and op(1,[args])=EmpiricalProbDoubleGJseries then print(`EmpiricalProbDoubleleGJseries(A,F,M,L) takes in an alphabet A (as a list), a set of forbidden words F, weights table M (which can be left uninitialized as a symbol), and positive integer L. It computes, empirically, the first L coefficients of the Taylor expansion of ProbDoubleGJ(A,F,t,M).`); print(`WARNING: This program is very slow and should be used for checking purposes only!`); print(`For example, try EmpiricalProbDoubleGJseries([a,b],{[a,b],[b,a]},M,5)`); fi: if nops([args])=1 and op(1,[args])=EmpiricalProbTripleGJseries then print(`EmpiricalProbTripleGJseries(A,F,M,L) takes in an alphabet A (as a list), a set of forbidden words F, weights table M (which can be left uninitialized as a symbol), and positive integer L. It computes, empirically, the first L coefficients of the Taylor expansion of ProbTripleGJ(A,F,t,M).`); print(`WARNING: This program is very slow and should be used for checking purposes only!`); print(`For example, try EmpiricalProbTripleGJseries([a,b],{[a,b],[b,a]},M,5)`); fi: end: ##################################################################### ##################################################################### ##################################################################### ##################################################################### #Overlap takes in two "words" in the same alphabet, w and v, and outputs all the ways in which a tail of w can be a head of v. The format is a set of numbers, each corresponding to the size of a possible overlap. #i.e Overlap([0,1,0,1],[1,0,1,0]) is {1,3}; #Note: This program is called in ALL main Goulden Jackson programs! Overlap:=proc(w,v) local k,Tailsw,Headsv: Tailsw:={seq([op(k..nops(w),w)],k=2..nops(w))}: Headsv:={seq([op(1..k,v)],k=1..nops(v)-1)}: map(nops,Tailsw intersect Headsv): end: ##################################################################### ##################################################################### ##################################################################### ##################################################################### #SingleGJ - Goulden Jackson, taking into account single letter weights ##################################################################### #The equation for W[f]: the generating function in t, for those clusters that start out #with the dirty word f, taking into account single letter weights. #Words and initial word fragments are weighted by their length as well as a weight for each single letter they contain. #Weight("abcd")=t^4*V[a]*V[b]*V[c]*V[d] SingleFindEq:=proc(A,F,f,t,W,V) local e,g,S,coe,s,n,Index,i,j,k: n:=nops(f); for i from 1 to nops(A) do Index[A[i]]:=i; od; #This Index function needed because V is a vector, indexed by numbers, but it encodes weights of letters (they are assumed to appear in the same order as A). I need to be able to look up the weight of a letter - Index gives me a way to bridge the indexing gap. #It would be more consistent with the other programs to have V be a table indexed by letters, which gets rid of the whole problem (and then I wouldn't have to pass A all the way through to this function!). However, I find that having V a vector gives me more flexibility - I don't have to initialize a whole table just to run a test. e:=W[f]+t^n*mul(V[Index[f[j]]],j=1..n): for g in F do S:=Overlap(f,g): coe:=0; for s in S do coe:=coe+t^(n-s)*mul(V[Index[f[k]]],k=1..(n-s)): od; e:=e+coe*W[g]: od: e: end: ################################################################ #SingleCgf inputs a set of dirty words F, and a variable (symbol t), and outputs the cluster generating function in that variable. #Names modified to correspond to SingleGJ, but the operations are the same as in the original Cgf. SingleCgf:=proc(A,F,t,V) local var,eq,W,f,var1: var:={seq(W[f], f in F)}: eq:={seq(SingleFindEq(A,F,f,t,W,V), f in F)}: var1:=solve(eq,var): normal(add(subs(var1,v), v in var)): end: ################################################################## #SingleGJ(A,F,t,V): The Goulden-Jacskon Cluster method for an alphabet A (as a list) and a set of dirty words F in the variable t, with single letter weights encoded in a vector V. #V is indexed by numbers, not letters, so we assume that the weights in V are in the same order as the letters in A. The first number in V is the weight of the first letter in A, and so on. SingleGJ:=proc(A,F,t,V): normal(1/(1-t*(add(V[i], i=1..nops(A)))-SingleCgf(A,F,t,V))): end: ##################################################################### #SingleGJST: Goulden Jackson with s and t, taking into account single letter frequencies, number of occurrences of forbidden words. ##################################################################### # SingleFindEqST((A,F,f,s,t,W,V): The s,t-analog of SingleFindEq. See code for SingleFindEq for more info on the function. #This function differs in the following ways: #(1) The local variable s has been renamed z. #(2) There is an additional input parameter, the variable s (see comments for SingleGJST for more info on s). SingleFindEqST:=proc(A,F,f,s,t,W,V) local e,g,S,coe,z,n,Index,i,j,k: n:=nops(f); for i from 1 to nops(A) do Index[A[i]]:=i; od; #This Index function is sort of a hack - V is a vector, indexed by numbers, but it encodes probabilities of letters (they are assumed to appear in the same order as A). I need to be able to look up the probability of a letter - Index gives me a way to bridge the indexing gap. #It would be easier, and more consistent with the other two programs, to have V be a table indexed by letters, which gets rid of the whole problem (and then I wouldn't have to pass A all the way through to this function!). Somehow I find that having V a vector gives me more flexibility - I don't have to initialize a whole table just to run a test - but in the future I may change this. e:=W[f]+(-s+1)*t^n*mul(V[Index[f[j]]],j=1..n): for g in F do S:=Overlap(f,g): coe:=0; for z in S do coe:=coe+t^(n-z)*mul(V[Index[f[k]]],k=1..(n-z)): od; e:=e+(-s+1)*coe*W[g]: od: e: end: ################################################################ # SingleCgfST(A,F,s,t,V): The s,t-analog of SingleCgf. Includes the variable s as an input parameter. For more info on this function, see code for SingleCgf. For more info on the role of s, see comments in SingleGJST. SingleCgfST:=proc(A,F,s,t,V) local var,eq,W,f,var1: var:={seq(W[f], f in F)}: eq:={seq(SingleFindEqST(A,F,f,s,t,W,V), f in F)}: var1:=solve(eq,var): normal(add(subs(var1,v), v in var)): end: ################################################################## # SingleGJST(A,F,s,t,V): The s,t-analog of SingleGJ. #The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the variables t and s, with known single letter frequencies encoded in a vector V. #In the generating function produced as output, the coefficient of s^m*t^n will represent the probability of an n-letter word containing exactly m "mistakes". # For more information about this function, see the comments in SingleGJ. #V is indexed by numbers, not letters, so we assume that the probabilities in V are in the same order as the letters in A. The first number in V is the probability of the first letter in A, etc. SingleGJST:=proc(A,F,s,t,V): normal(1/(1-t*(add(V[i], i=1..nops(A)))-SingleCgfST(A,F,s,t,V))): end: ################################################################ ################################################################ ################################################################ ################################################################ #Programs for Goulden Jackson, taking into account digraph and monograph weights ################################################################### #MakeTable inputs a n+1 by n matrix (or array), where n is the length of the alphabet, and outputs a table that can be used by DoubleGJ, ProbDoubleGJ, and most of the double letter programs. #The matrix M must be in a specific form: #The n columns and the first n rows each correspond to the letters of the alphabet (in the same order as they appear in A). The entry in the row corresponding to "a" and column corresponding to "b" is the weight of the digraph "ab". #The last row encodes the monograph weights. The entry in the n+1 row and column corresponding to "c" is the weight of the single letter "c". MakeTable:=proc(M,A) local n: n:=nops(A); table([ #initializes digraph weights: seq(seq( (A[i],A[j]) = M[i,j] ,i=1..n),j=1..n), #initializes single letter weights: seq((ID,A[i])=M[n+1,i],i=1..n) ]); end: ################################################################### #MakeTableF inputs a n+2 by n matrix (or array), where n is the length of the alphabet, and outputs a table that can be used by ProbDoubleGJIF. #The matrix M must be in a specific form: #The n columns and the first n rows each correspond to the letters of the alphabet (in the same order as they appear in A). The entry in the row corresponding to "a" and column corresponding to "b" is the weight of the digraph "ab". #The second to last row encodes the initial letter probabilities. The entry in the n+1 row and column corresponding to "c" is the probability a word begins with "c". #The last row encodes the final letter probabilities. The entry in the n+2 row and the column corresponding to "c" is the probability that a word ends with "c". MakeTableF:=proc(M,A) local n: n:=nops(A); table([ #initializes digraph probabilities: seq(seq( (A[i],A[j]) = M[i,j] ,i=1..n),j=1..n), #initializes initial letter probabilities: seq((ID,A[i])=M[n+1,i],i=1..n), #initializes final letter probabilities seq((FIN,A[i])=M[n+2,i],i=1..n) ]); end: ################################################################## #DoubleGJ, the first generalization of Goulden Jackson (keeps track of all monographs and digraph weights). ################################################################## #DoubleGJclusters (A,F,t,W,M,End): The equation for W[f]: the generating function in t, for those clusters that start out with the dirty word f, taking into account digraph and monograph weights. #This is the similar to the original FindEq program, except words and initial word fragments now are weighted by the single letters they contain, and the digraphs they contain. #For example, the weight of CAT is M[ID,c]*M[ID,a]*M[ID,t]*M[c,a]*M[a,t]. #To deal with the fact that the weight of a letter may depend on the letter before it, I have made two changes to FindEq: #1.To account for the fact that in a cluster we are attaching a word to the end of a different word, but the weights carry over, I have added some extra terms to coe. #2.We will eventually want to attach things to the ends of clusters, but to do this we have to know what the last letter of a given cluster is. I accomplish this by adding a sort of flag, End, to the end of each cluster. Later we will remove that, and replace it with the appropriate weights (which will change depending on what exactly will follow the cluster). DoubleGJclusters:=proc(A,F,t,W,M,End) local eq,f,g,n,e,S,s,coe,var; eq:={}; for f in F do n:=nops(f); #Initializes e with the weight of the cluster consisting of the single forbidden word f. e:=W[f]+t^(n)*mul(M[(f[i],f[i+1])],i=1..n-1)*mul(M[ID,f[i]],i=1..n)*End[f[n]]: for g in F do S:=Overlap(f,g): coe:=0; #Adds in all the potentially overlapping other forbidden words, that could extend the cluster. for s in S do: coe:=coe + t^(n-s)*mul(M[(f[i],f[i+1])],i=1..(n-s))*mul(M[ID,f[i]],i=1..n-s); od: e:=e+coe*W[g]: od: eq:=eq union {e}; od: var:={seq(W[f],f in F)}; solve(eq,var); end: ################################################################ #DoubleGJmarked(A,F,t,X,M): The equation for X[a]: the generating function in t, for marked words that begin with the letter "a". This is a major structural change from the original SingleGJ program. #Marked words beginning with "a" have a nice recursive decomposition, which is where the equations come from. Let M(a) be the set of marked words beginning with "a". Then we have M(a)={the single letter "a"} + {words beginning with a, a is not part of a cluster} + {clusters where the first forbidden word begins with "a"} #The first two sets are fairly easy to define recursively with X[a], the third is a bit more complicated. Alph breaks up the forbidden words, based on their first letter. We almost get that the third set is the generating functions for the cluster multiplied by X[c,d], but that assumes that at least one letter follows the cluster. The last few lines of the eq variable are adding in the case where our word consists of a single cluster. DoubleGJmarked:=proc(A,F,t,X,M) local cluster,a,b,c,eq,var,Alph,f,End,W: cluster:=DoubleGJclusters(A,F,t,W,M,End); #breaks up the forbidden words based on their first letter. for a in A do Alph[a]:={}; for f in F do if f[1]=a then Alph[a]:=Alph[a] union {f}; fi: od; od; #encodes the recursive formula for X[a]. The first two terms correspond to the single digraph "a" and the case when we can peel the letter a off without touching a cluster. eq:={seq( X[a]=t*M[ID,a]+ add(M[a,b]*M[ID,a]*t*X[b],b in A)+ #this term corresponds to the case where we have a cluster beginning with "a", followed by a marked word with length at least one. add(add( subs( {seq(End[c]=M[(c,b)],c in A)},subs(cluster,W[f]))*X[b] ,b in A) ,f in Alph[a])+ #this terms corresponds to the case where the words is just a single cluster. add( subs( {seq(End[c]=1, c in A)},subs(cluster,W[f])) ,f in Alph[a]) ,a in A)}; var:={seq(X[a], a in A)}; solve(eq,var); end: ################################################################ #DoubleGJ(A,F,t,M): The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the variable t, with digraph and monograph weights encoded in a table M. #There are two types of entries in M: #M[a,b] is the weight of the digraph "ab". #M[ID,a] is the weight of the monograph "a". #Given an array, MakeTable will correctly initialize this table. DoubleGJ:=proc(A,F,t,M) local marked,X,a; marked:=DoubleGJmarked(A,F,t,X,M); #To put everything together, we use the fact that if a marked word isn't empty, it has some first letter (and so is included in X[a] for some a). normal(1+add(subs(marked,X[a]),a in A)); end: ################################################################## #DoubleGJST, a generalization of Goulden Jackson (keeps track of all monographs and digraph wweights), with an additional variable s, that keeps track of how many occurrences of a bad word there are. ################################################################## #DoubleGJclustersST(A,F,s,t,W,M,End): The equation for W[f]: the generating function in t, for those clusters that start out with the dirty word f, taking into account digraph and monograph weights. #This is the similar to the original FindEq program, except words and initial word fragments now are weighted by the single letters they contain, and the digraphs they contain. #For example, the weight of CAT is M[ID,c]*M[ID,a]*M[ID,t]*M[c,a]*M[a,t]. #To deal with the fact that the weight of a letter may depend on the letter before it, I have made two changes to FindEq: #1.To account for the fact that in a cluster we are attaching a word to the end of a different word, but the weights carry over, I have added some extra terms to coe. #2.We will eventually want to attach things to the ends of clusters, but to do this we have to know what the last letter of a given cluster is. I accomplish this by adding a sort of flag, End, to the end of each cluster. Later we will remove that, and replace it with the appropriate weights (which will change depending on what exactly will follow the cluster). DoubleGJclustersST:=proc(A,F,s,t,W,M,End) local eq,f,g,n,e,S,z,coe,var; eq:={}; for f in F do n:=nops(f); #Initializes e with the weight of the cluster consisting of the single forbidden word f. e:=W[f]+(-s+1)*t^(n)*mul(M[(f[i],f[i+1])],i=1..n-1)*mul(M[ID,f[i]],i=1..n)*End[f[n]]: for g in F do S:=Overlap(f,g): coe:=0; #Adds in all the potentially overlapping other forbidden words, that could extend the cluster. for z in S do: coe:=coe + (-s+1)*t^(n-z)*mul(M[(f[i],f[i+1])],i=1..(n-z))*mul(M[ID,f[i]],i=1..n-z); od: e:=e+coe*W[g]: od: eq:=eq union {e}; od: var:={seq(W[f],f in F)}; solve(eq,var); end: ################################################################ #DoubleGJmarkedST(A,F,s,t,W,M,End): The equation for X[a]: the generating function in t, for marked words that begin with the letter "a". This is a major structural change from the original SingleGJ program. #Marked words beginning with "a" have a nice recursive decomposition, which is where the equations come from. Let M(a) be the set of marked words beginning with "a". Then we have M(a)={the single letter "a"} + {words beginning with a, a is not part of a cluster} + {clusters where the first forbidden word begins with "a"} #The first two sets are fairly easy to define recursively with X[a], the third is a bit more complicated. Alph breaks up the forbidden words, based on their first letter. We almost get that the third set is the generating functions for the cluster multiplied by X[c,d], but that assumes that at least one letter follows the cluster. The last few lines of the eq variable are adding in the case where our word consists of a single cluster. DoubleGJmarkedST:=proc(A,F,s,t,X,M) local cluster,a,b,c,eq,var,Alph,f,End,W: cluster:=DoubleGJclustersST(A,F,s,t,W,M,End); #breaks up the forbidden words based on their first letter. for a in A do Alph[a]:={}; for f in F do if f[1]=a then Alph[a]:=Alph[a] union {f}; fi: od; od; #encodes the recursive formula for X[a]. The first two terms correspond to the single digraph "a" and the case when we can peel the letter a off without touching a cluster. eq:={seq( X[a]=t*M[ID,a]+ add(M[a,b]*M[ID,a]*t*X[b],b in A)+ #this term corresponds to the case where we have a cluster beginning with "a", followed by a marked word with length at least one. add(add( subs( {seq(End[c]=M[(c,b)],c in A)},subs(cluster,W[f]))*X[b] ,b in A) ,f in Alph[a])+ #this terms corresponds to the case where the words is just a single cluster. add( subs( {seq(End[c]=1, c in A)},subs(cluster,W[f])) ,f in Alph[a]) ,a in A)}; var:={seq(X[a], a in A)}; solve(eq,var); end: ################################################################ #DoubleGJST(A,F,s,t,M): The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the variables t and s, with digraph and monograph weights encoded in a table M. #There are two types of entries in M: #M[a,b] is the weight of the digraph "ab". #M[ID,a] is the weight of the monograph "a". #Given an array, MakeTable will correctly initialize this table. DoubleGJST:=proc(A,F,s,t,M) local marked,X,a; marked:=DoubleGJmarkedST(A,F,s,t,X,M); #To put everything together, we use the fact that if a marked word isn't empty, it has some first letter (and so is included in X[a] for some a). normal(1+add(subs(marked,X[a]),a in A)); end: ############################################################################ #Double Goulden Jackson, but specifically set up for probability applications (only keeps track of initial monograph probabilities, all digraph probabilities). ############################################################################ #ProbDoubleGJclusters(A,F,tW,M,End) The equation for W[f]: the generating function in t, for those clusters that start out with the dirty word f, taking into account digraph and monograph weights (designed for probability applications). #This is the similar to the original FindEq program, except words and initial word fragments now are weighted by their initial single letter, and the digraphs they contain. For example, the weight of CAT is M[ID,c]*M[c,a]*M[a,t]. #To deal with the fact that the weight of a letter may depend on the letter before it, I have made three changes to FindEq: #1.Technically, M[ID,f[1]]*W[f] is the generating function for a cluster beginning with f. But by ignoring the first single letter, we allow for the fact that we may want to attach a cluster to the end of something else - this lets us fill in the probability of f[1] as we need to. #2.To account for the fact that in a cluster we are attaching a word to the end of a different word, but the weights carry over, I have added some extra terms to coe. #3.We will eventually want to attach things to the ends of clusters, but to do this we have to know what the last letter of a given cluster is. I accomplish this by adding a sort of flag, End, to the end of each cluster. Later we will remove that, and replace it with the appropriate weights (which will change depending on what exactly will follow the cluster). ProbDoubleGJclusters:=proc(A,F,t,W,M,End) local eq,f,g,n,e,S,s,coe,var; eq:={}; for f in F do n:=nops(f); #Initializes e with the weight of the cluster consisting of the single forbidden word f. e:=W[f]+t^(n)*mul(M[(f[i],f[i+1])],i=1..n-1)*End[f[n]]: for g in F do S:=Overlap(f,g): coe:=0; #Adds in all the potentially overlapping other forbidden words, that could extend the cluster. for s in S do: coe:=coe + t^(n-s)*mul(M[(f[i],f[i+1])],i=1..(n-s)); od: e:=e+coe*W[g]: od: eq:=eq union {e}; od: var:={seq(W[f],f in F)}; solve(eq,var); end: ################################################################ #ProbDoubleGJmarked(A,F,t,X,M): The equation for X[a]: (almost) the generating function in t, for marked words that begin with the letter "a". This is a major structural change from the original SingleGJ program. #Technically, M[ID,a]*X[a] is the generating function for marked words beginning with the letter "a", but since we will want to attach this to the end of other objects it's helpful to omit the initial probability, and fill it in later as we need to. #Marked words beginning with "a" have a nice recursive decomposition, which is where the equations come from. Let M(a) be the set of marked words beginning with "a". Then we have M(a)={the single letter "a"} + {words beginning with a, a is not part of a cluster} + {clusters where the first forbidden word begins with "a"} #The first two sets are fairly easy to define recursively with X[a], the third is a bit more complicated. Alph breaks up the forbidden words, based on their first letter. We almost get that the third set is the generating functions for the cluster multiplied by X[c,d], but that assumes that at least one letter follows the cluster. The last few lines of the eq variable are adding in the case where our word consists of a single cluster. ProbDoubleGJmarked:=proc(A,F,t,X,M) local cluster,a,b,c,eq,var,Alph,f,End,W: cluster:=ProbDoubleGJclusters(A,F,t,W,M,End); #breaks up the forbidden words based on their first letter. for a in A do Alph[a]:={}; for f in F do if f[1]=a then Alph[a]:=Alph[a] union {f}; fi: od; od; #encodes the recursive formula for X[a]. The first two terms correspond to the single digraph "a" and the case when we can peel the letter a off without touching a cluster. #Note how we aren't multiplying anything by the initial probabilities! eq:={seq( X[a]=t+ add(M[a,b]*t*X[b],b in A)+ #this term corresponds to the case where we have a cluster beginning with "a", followed by a marked word with length at least one. add(add( subs( {seq(End[c]=M[(c,b)],c in A)},subs(cluster,W[f]))*X[b] ,b in A) ,f in Alph[a])+ #this terms corresponds to the case where the words is just a single cluster. add( subs( {seq(End[c]=1, c in A)},subs(cluster,W[f])) ,f in Alph[a]) ,a in A)}; var:={seq(X[a], a in A)}; solve(eq,var); end: ################################################################ #ProbDoubleGJ(A,F,t,M): The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the variable t, with known digraph probabilities (as well as monograph probabilities) encoded in a table M, designed for probability applications. #There are two types of entries in M: #M[a,b] is the probability that the letter b follows the letter a. #M[ID,a] is the probability of the letter a. #Given an array, MakeTable will correctly initialize this table. ProbDoubleGJ:=proc(A,F,t,M) local marked,X,a; marked:=ProbDoubleGJmarked(A,F,t,X,M); #To put everything together, we use the fact that if a marked word isn't empty, it has some first letter (and so is covered by X[a] for some a). Note that we multiply the initial probabilities back in here, as well. normal(1+add(subs(marked,X[a]*M[ID,a]),a in A)); end: ########################################################################### #Double Goulden Jackson with digraph probabilities, initial and final monograph probabilites ########################################################################### #ProbDoubleGJIF uses the same cluster generating functions as ProbDoubleGJ - we do not recopy them here. #The only difference between this program and ProbDoubleGJmarked is putting in the final monograph frequencies. A marked word must either end with a single letter, not part of a cluster (hence we multiply the first term in the equation by a final letter probability), or it ends with a cluster. We can handle the cluster case with a clever substitution and the End[a]. ProbDoubleGJmarkedIF:=proc(A,F,t,X,M) local cluster,a,b,c,eq,var,Alph,f,End,W: cluster:=ProbDoubleGJclusters(A,F,t,W,M,End); #breaks up the forbidden words based on their first letter. for a in A do Alph[a]:={}; for f in F do if f[1]=a then Alph[a]:=Alph[a] union {f}; fi: od; od; #encodes the recursive formula for X[a]. The first two terms correspond to the single digraph "a" and the case when we can peel the letter a off without touching a cluster. #Note that we add the final probability here. eq:={seq( X[a]=t*M[Fin,a]+ add(M[a,b]*t*X[b],b in A)+ #this term corresponds to the case where we have a cluster beginning with "a", followed by a marked word with length at least one. add(add( subs( {seq(End[c]=M[(c,b)],c in A)},subs(cluster,W[f]))*X[b] ,b in A) ,f in Alph[a])+ #this terms corresponds to the case where the words is just a single cluster. #Note that we include the final probability here. add( subs( {seq(End[c]=M[Fin,c], c in A)},subs(cluster,W[f])) ,f in Alph[a]) ,a in A)}; var:={seq(X[a], a in A)}; solve(eq,var); end: ################################################################ #ProbDoubleGJIF (A,F,t,M): The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the variable t, with known digraph probabilities (as well as monograph probabilities) encoded in a table M; a variant of ProbDoubleGJ. #There are three types of entries in M: #M[a,b] is the probability that the letter b follows the letter a, given an occurrence of a. #M[ID,a] is the probability that a word begins with the letter a. #M[FIN,a] is the probability that a word ends with the letter a. #Given an array, MakeTable will NOT correctly initialize this table. Instead, use MakeTableF. ProbDoubleGJIF:=proc(A,F,t,M) local marked,X,a; marked:=ProbDoubleGJmarkedIF(A,F,t,X,M); #To put everything together, we use the fact that if a marked word isn't empty, it has some first letter (and so is covered by X[a] for some a. normal(1+add(subs(marked,X[a]*M[ID,a]),a in A)); end: ############################################################################ #ProbDoubleGJST - The s,t analog for ProbDoubleGJ ################################################################## #ProbDoubleGJclustersST(A,F,s,t,W,M,End): This function is the s,t-analog of the ProbDoubleGJclusters function. See comments in that function for details about the overall workings of the function. #The changes made in this version are: #(1) the local variable s from that function has been renamed z in this one. #(2) s has been added as a parameter and appears twice in the code. #See comments to ProbDoubleGJST for more info on the role of s. ProbDoubleGJclustersST:=proc(A,F,s,t,W,M,End) local eq,f,g,n,e,S,z,coe,var; eq:={}; for f in F do n:=nops(f); #Initializes e with the weight of the cluster consisting of the single forbidden word f. e:=W[f]+(-s+1)*t^(n)*mul(M[(f[i],f[i+1])],i=1..n-1)*End[f[n]]: for g in F do S:=Overlap(f,g): coe:=0; #Adds in all the potentially overlapping other forbidden words, that could extend the cluster. for z in S do: coe:=coe + t^(n-z)*mul(M[(f[i],f[i+1])],i=1..(n-z)); od: e:=e+(-s+1)*coe*W[g]: od: eq:=eq union {e}; od: var:={seq(W[f],f in F)}; solve(eq,var); end: ################################################################ #ProbDoubleGJmarkedST(A,F,s,t,X,M): This function is identical to the function ProbDoubleGJmarked except for the addition of s as a parameter and the fact that it calls ProbDoubleGJclustersST instead of ProbDoubleGJclusters. #For more info on this function see the comments in ProbDoubleGJmarked. #For more info about the role of s, see the comments in ProbDoubleGJST. ProbDoubleGJmarkedST:=proc(A,F,s,t,X,M) local cluster,a,b,c,eq,var,Alph,f,End,W: cluster:=ProbDoubleGJclustersST(A,F,s,t,W,M,End); #breaks up the forbidden words based on their first letter. for a in A do Alph[a]:={}; for f in F do if f[1]=a then Alph[a]:=Alph[a] union {f}; fi: od; od; #encodes the recursive formula for X[a]. The first two terms correspond to the single digraph "a" and the case when we can peel the letter a off without touching a cluster. #Note how we aren't multiplying anything by the initial probabilities! eq:={seq( X[a]=t+ add(M[a,b]*t*X[b],b in A)+ #this term corresponds to the case where we have a cluster beginning with "a", followed by a marked word with length at least one. add(add( subs( {seq(End[c]=M[(c,b)],c in A)},subs(cluster,W[f]))*X[b] ,b in A) ,f in Alph[a])+ #this terms corresponds to the case where the words is just a single cluster. add( subs( {seq(End[c]=1, c in A)},subs(cluster,W[f])) ,f in Alph[a]) ,a in A)}; var:={seq(X[a], a in A)}; solve(eq,var); end: ################################################################ #ProbDoubleGJST(A,F,s,t,M): This function is nearly identical to the function ProbDoubleGJ, except that it includes the variable s as an input parameter. The function also calls the ST-version of ProbDoubleGJmarked. #In the Taylor expansion of the generating function produced as output, the coefficient of s^m*t^n will represent the probability of an n-letter word containing exactly m "mistakes". #For more information about this function, see the comments in ProbDoubleGJ. ProbDoubleGJST:=proc(A,F,s,t,M) local marked,X,a; marked:=ProbDoubleGJmarkedST(A,F,s,t,X,M); #To put everything together, we use the fact that if a marked word isn't empty, it has some first letter #and so is covered by X[a] for some a. normal(1+add(subs(marked,X[a]*M[ID,a]),a in A)); end: ############################################################################ ############################################################################ ############################################################################ ############################################################################ #Programs for Goulden Jackson, taking into account trigraph, digraph, and monograph weights ############################################################################### #MakeTable2 takes in a n^2+n+1 by n array (where n is the length of the alphabet), #initializes a table that can be used in TripleGJ and all the three-letter applications. #The array must be in a specific form: #The columns of the table correspond to the letters of the alphabet (in the same order as in A), and the the first n^2 rows correspond to the letter pairs, in lexicographic order. The entry in the column corresponding to c and the row corresponding to the digraph "ab" is the weight of the trigraph "abc". #The next n rows correspond to the letters of the alphabet. The entry in the row corresponding to b (if b is the second letter in A, this is row n^2+2) and in the column corresponding to c is the weight of the digraph "bc". #The last row encodes the monograph weights. The entry in the n^2+n+1 row, column corresponding to "a", is the weight of the monograph "a". MakeTable2:=proc(M,A) local n: n:=nops(A); table([ #initializes trigraph weights: seq(seq(seq( (A[i],A[j],A[k]) = M[(i-1)*n+j,k], k=1..n),j=1..n),i=1..n), #initializes digraph weights: seq(seq( (ID,A[i],A[j])=M[n^2+i,j],j=1..n),i=1..n), #initializes monograph weights: seq( (ID,ID,A[i])=M[n^2+n+1,i], i=1..n) ]); end: ################################################################ #TripleGJ, the basic extension of Goulden Jackson (keeps track of all trigraph, digraph and monograph weights. ################################################################ #TripleGJclusters(A,F,t,W,M,End): The equation for W[f]: the generating function in t, for those clusters that start out with the dirty word f, taking into account trigraph, digraph, and monograph weights. #This is the similar to the original FindEq program, except words and initial word fragments now are weighted by which monographs, digraphs, and trigraphs they contain, in addition to their length. For example, the weight of CAT is now: t^3*M[ID,ID,c]*M[ID,ID,a]*M[ID,ID,t]*M[ID,c,a]*M[ID,a,t]*M[c,a,t]. #To deal with the fact that the weight of a letter may depend on the two letters before it, I have made two changes to FindEq: #1.To account for the fact that in a cluster we are attaching a word to the end of a different word, but the weights carry over, I have added some extra terms to coe. #2.We will eventually want to attach things to the ends of clusters, but to do this we have to know what the last letters of a cluster are. I accomplish this by adding a sort of flag, End, to the end of each cluster. Later we will remove that, and replace it with the appropriate weights (which will change depending on what exactly will follow the cluster). TripleGJclusters:=proc(A,F,t,W,M,End) local eq,var,f,g,i,j,k,n,s,S,coe,e; eq:={}; for f in F do n:=nops(f); #Initializes e with the weight of the cluster consisting of the single forbidden word f. e:=W[f]+t^n*mul(M[(f[i],f[i+1],f[i+2])],i=1..(n-2))*mul(M[ID,f[j],f[j+1]],j=1..(n-1)) *mul(M[ID,ID,f[k]],k=1..n)*End[f[n-1],f[n]]: for g in F do S:=Overlap(f,g): coe:=0; #Adds in all the potentially overlapping other forbidden words, that could extend the cluster. for s in S do: if s>1 then coe:=coe + t^(n-s)*mul(M[f[i],f[i+1],f[i+2]],i=1..(n-s))* mul(M[ID,f[j],f[j+1]],j=1..(n-s))*mul(M[ID,ID,f[k]],k=1..(n-s)); else coe:=coe + t^(n-s)*mul(M[f[i],f[i+1],f[i+2]],i=1..(n-s)-1)*M[f[n-s],f[n],g[2]]* mul(M[ID,f[j],f[j+1]],j=1..(n-s))*mul(M[ID,ID,f[k]],k=1..(n-s)); fi: od: e:=e+coe*W[g]: od: eq:=eq union {e}; od: var:={seq(W[f],f in F)}; solve(eq,var); end: ################################################################ #TripleGJmarked(A,F,t,X,M): The equation for X[a,b]: the generating function in t for marked words that begin with the digraph "ab". This is a major structural change from the original SingleGJ program. #Marked words beginning with ab have a nice recursive decomposition, which is where the #equations come from. Let M(ab) be the set of marked words beginning with "ab". Then we have M(ab)={the digraph "ab} + {words beginning with ab, a is not part of a cluster} + {clusters where the first forbidden word begins with "ab"} #The first two sets are fairly easy to define recursively with X[ab], the third is a bit more complicated. Alph breaks up the forbidden words, based on their first two letters. We almost get that the third set is the generating functions for the cluster multiplied by X[c,d], but that assumes that at least two letters follow the cluster. The last few lines of the eq variable are adding in the cases where we have a single cluster, or a cluster followed by a single letter. TripleGJmarked:=proc(A,F,t,X,M) local eq,var,a,b,c,d,p,q,f,Alph,End,cluster; cluster:=TripleGJclusters(A,F,t,W,M,End); #breaks up the forbidden words, based on their first two letters: for a in A do for b in A do Alph[a,b]:=select(FirstTwo,F,a,b); od: od: #encodes the recursive formula for X[a,b]. The first two terms correspond to the single digraph "ab" and the case when we can peel the letter a off without touching a cluster. eq:={seq(seq( X[a,b]= M[ID,a,b]*M[ID,ID,a]*M[ID,ID,b]*t^2+ add(M[ID,ID,a]*M[ID,a,b]*M[a,b,c]*t*X[b,c],c in A)+ #this term is the case where we begin with a cluster that begins with "ab", and then have at least two letters following that cluster (i.e., then we have another marked word beginning with "cd"). add(add(add( subs( seq(seq(End[p,q]=M[p,q,c]*M[q,c,d]*M[ID,q,c],q in A), p in A),subs(cluster,W[f]))*X[c,d] ,d in A), c in A) ,f in Alph[a,b])+ #this term adds in the case that our cluster, beginning "ab", was only followed by a single letter, q. add(add( subs( seq(seq(End[p,q]=M[p,q,c]*M[ID,q,c],p in A),q in A),subs(cluster,W[f]*t*M[ID,ID,c])) ,c in A),f in Alph[a,b])+ #this term adds in the case that our cluster, beginning "ab", was the end of the word as well as the beginning. add(subs( seq(seq(End[p,q]=1,p in A),q in A),subs(cluster,W[f]) ) ,f in Alph[a,b]) ,b in A),a in A)}; var:={seq(seq(X[a,b], b in A), a in A)}; solve(eq,var); end: ################################################################ #TripleGJ(A,F,t,M): The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the #variable t, with known trigraph, digraph, and monograph weights encoded in a table M. #There are three types of entries in M: #M[a,b,c] is the weight of the trigraph "abc". #M[ID,a,b] is the weight of the digraph "ab". #M[ID,ID,a] is the weight of the monograph "a". #Given an array, MakeTable2 will correctly initialize this table. TripleGJ:=proc(A,F,t,M) local A1,F1,a,b,f,g,marked; A1:=A; F1:=F; #Running this program with single letter forbidden words is equivalent to running it with a smaller alphabet. Therefore our first step is to remove the single letter forbidden words. This enables us to assume that all forbidden words have length at least 2, which simplifies TripleGJclusters and TripleGJmarked. for f in F do if nops(f)=1 then A1:=Delete(A1,f[1]); F1:=remove(Contains,F1,f); fi: od; marked:=TripleGJmarked(A1,F1,t,X,M); #When we put things back together, we use the fact that every marked word is either the empty word, a singleton word, or begins with at least two letters (and so is covered by X[a,b] for some a and b). normal(1+ t*add(M[ID,ID,a], a in A1)+ add(add( subs(marked,X[a,b]) ,b in A1),a in A1) ); end: ############################################################################## #ProbTripleGJ, Triple Goulden Jackson, specifically set up for probability weights (keeps track of trigraph and initial digraph probabilities). ############################################################################## #ProbTripleGJclusters(A,F,t,W,M,End): The equation for W[f]: the generating function in t, for those clusters that start out with the dirty word f, taking into account known trigraph probabilities (as well as digraph and monograph probabilities). The weights are designed for probability applications. #This is the similar to the original FindEq program, except words and initial word fragments now are weighted by their initial digraph and trigraphs they contain, in addition to their length. For example, the weight of CAT is now: t^3*M[ID,c,a]*M[c,a,t]. #To deal with the fact that the probability of a letter depends on the two letters before it, I have made three changes to FindEq: #1.Technically, M[ID,f[1],f[2]]*W[f] is the generating function for a cluster beginning with f. But by ignoring the first digraph, we allow for the fact that we may want to attach a cluster to the end of something else - this lets us fill in the probability of f[1] and f[2] as we need to. #2.To account for the fact that in a cluster we are attaching a word to the end of a different word, but the probabilities carry over, I have added some extra terms to coe. #3.We will eventually want to attach things to the ends of clusters, but to do this we have to know what the last letters of a cluster are. I accomplish this by adding a sort of flag, End, to the end of each cluster. Later we will remove that, and replace it with the appropriate probabilities (which will change depending on what exactly will follow the cluster). ProbTripleGJclusters:=proc(A,F,t,W,M,End) local eq,var,f,g,i,n,s,S,coe,e; eq:={}; for f in F do n:=nops(f); #Initializes e with the weight of the cluster consisting of the single forbidden word f. e:=W[f]+t^n*mul(M[(f[i],f[i+1],f[i+2])],i=1..n-2)*End[f[n-1],f[n]]: for g in F do S:=Overlap(f,g): coe:=0; #Adds in all the potentially overlapping other forbidden words, that could extend the cluster. for s in S do: if s>1 then coe:=coe + t^(n-s)*mul(M[f[i],f[i+1],f[i+2]],i=1..(n-s)); else coe:=coe + t^(n-s)*mul(M[f[i],f[i+1],f[i+2]],i=1..(n-s)-1)*M[f[n-s],f[n],g[2]] fi: od: e:=e+coe*W[g]: od: eq:=eq union {e}; od: var:={seq(W[f],f in F)}; solve(eq,var); end: ################################################################ #ProbTripleGJmarked(A,F,t,X,M): The equation for X[a,b]: (almost) the generating function in t, for marked words that begin with the digraph "ab" (designed for probability applications). This is a major structural change from the original SingleGJ program. #Technically, M[ID,a,b]*X[a,b] is the generating function for marked words beginning with the digraph "ab", but since we will want to attach this to the end of other objects it's helpful to omit the first probabilities, and fill them in later as we need to. #Marked words beginning with ab have a nice recursive decomposition, which is where the equations come from. Let M(ab) be the set of marked words beginning with "ab". Then we have M(ab)={the digraph "ab} + {words beginning with ab, a is not part of a cluster} + {clusters where the first forbidden word begins with "ab"} #The first two sets are fairly easy to define recursively with X[ab], the third is a bit more complicated. Alph breaks up the forbidden words, based on their first two letters. We almost get that the third set is the generating functions for the cluster multiplied by X[c,d], but that assumes that at least two letters follow the cluster. The last few lines of the eq variable are adding in the cases where we have a single cluster, or a cluster followed by a single letter. ProbTripleGJmarked:=proc(A,F,t,X,M) local eq,var,a,b,c,d,p,q,f,Alph,End,cluster; cluster:=ProbTripleGJclusters(A,F,t,W,M,End); #breaks up the forbidden words, based on their first two letters: for a in A do for b in A do Alph[a,b]:={}; for f in F do if f[1]=a and f[2]=b then Alph[a,b]:=Alph[a,b] union {f}; fi: od: od: od: #encodes the recursive formula for X[a,b]. The first two terms correspond to the single digraph "ab" and the case when we can peel the letter a off without touching a cluster. #Note how we aren't multiplying anything by the initial probabilities! eq:={seq(seq( X[a,b]= t^2+ add(M[a,b,c]*t*X[b,c],c in A)+ #this term is the case where we begin with a cluster that begins with "ab", and then have at least two letters following that cluster (i.e., then we have another marked word beginning with "cd". add(add(add( subs( seq(seq(End[p,q]=M[p,q,c]*M[q,c,d],q in A), p in A),subs(cluster,W[f]))*X[c,d] ,d in A), c in A) ,f in Alph[a,b])+ #this term adds in the case that our cluster, beginning "ab", was only followed by a single letter, q. add(add( subs( seq(seq(End[p,q]=M[p,q,c],p in A),q in A),subs(cluster,W[f]*t)) ,c in A),f in Alph[a,b])+ #this term adds in the case that our cluster, beginning "ab", was the end of the word as well as the beginning. add(subs( seq(seq(End[p,q]=1,p in A),q in A),subs(cluster,W[f]) ) ,f in Alph[a,b]) ,b in A),a in A)}; var:={seq(seq(X[a,b], b in A), a in A)}; solve(eq,var); end: ################################################################ #ProbTripleGJ(A,F,t,M): The Goulden-Jacskon Cluster method for an alphabet A and a set of dirty words F in the variable t, with known trigraph probabilities (as well as digraph and monograph probabilities) encoded in a table M, designed for probability applications. #There are three types of entries in M: #M[a,b,c] is the probability that the letter c follows the digraph "a,b". #M[ID,a,b] is the probability that a word contains the digraph "a,b". #M[ID,ID,a] is the probability that a word contains the monograph "a". #Given an array, MakeTable2 will correctly initialize this table. ProbTripleGJ:=proc(A,F,t,M) local A1,F1,a,b,f,g,marked; A1:=A; F1:=F; #Running this program with single letter forbidden words is equivalent to running it with a smaller alphabet. Therefore our first step is to remove the single letter forbidden words. This enables us to assume that all forbidden words have length at least 2, which simplifies ProbTripleGJclusters and ProbTripleGJmarked. for f in F do if nops(f)=1 then A1:=Delete(A1,f[1]); F1:=remove(Contains,F1,f[1]); fi: od; marked:=ProbTripleGJmarked(A1,F1,t,X,M); #When we put things back together, we use the fact that every marked word is either the empty word, a singleton word, or begins with at least two letters (and so is covered by X[a,b] for some a and b). normal(1+ t*add(M[ID,ID,a], a in A1)+ add(add( subs(marked,X[a,b]*M[ID,a,b]) ,b in A1),a in A1) ); end: ################################################################ #Helper Functions, called by TripleGJ and ProbTripleGJ ################################################################ #Given a list A, and an element a, Positions(A,a) returns a list (in increasing order) of indicies i, such that A[i]=a. Positions:=proc(A,a) local i, spot; spot:=[]; for i from 1 to nops(A) do if A[i]= a then spot:=[op(spot), i]; fi: od: spot; end: ################################################################ #Given a list A, and an element a, Delete(A,a) removes all the occurrences of a from A. For example, Delete([1,2,3,1,2],2) should return [1,3,1]. Delete:=proc(A,a) local i,B; B:=Positions(A,a); [op(1..B[1]-1,A), seq( op(B[i]+1..B[i+1]-1,A), i=1..nops(B)-1), op(B[nops(B)]+1..nops(A),A)]; end: ################################################################ #Given two lists, S and w, Contains returns true if S contains w as a contiguous subword, and false otherwise. Contains:=proc(S,w) local n,i; n:=nops(w); for i from 1 to nops(S)-n+1 do if evalb([seq(S[i+j],j=0..n-1)]=w) then return true; fi: od: return false; end: ###################################################### FirstTwo:=proc(W,a,b); if W[1]=a and W[2]=b then return true; fi: false: end: ##################################################################### ##################################################################### ##################################################################### ##################################################################### # Series programs- used to compute the first L terms in the Taylor expansions of the regular Goulden-Jackson methods #################################################################### #MakeTableS(L,A) inputs list of length n, where n is the length of the alphabet, A, and outputs a table that can be used by SingleGJseries. Each entry in the list corresponds to the weight of a single letter, and the list must be in the same order as A. MakeTableS:=proc(L,A): table([seq( A[i]=L[i] ,i=1..nops(A))]); end: ################################################################### #SingleGJseries(A,F,x,M,L) takes in an alphabet A (as a list), a set of forbidden words F, and implements the Goulden-Jackson Cluster method keeping track of the weights of individual letters. x is the variable of the function, M is a table with the single letter weights, and L is the number of terms we will compute. #This program follows closely program wGJseries, available at http://www.math.rutgers.edu/~zeilberg/programs.html #The changes I have made are simply to make it match the other series programs included here: I have changed variable names, condensed some for loops into map/select/remove statements, and modified the woverlapmat1 program. However, the underlying structure is identical. SingleGJseries:=proc(A,F,x,M,L) local V, a, MAXOVER, i, j, k, CU, NUM, pip, CUU, KHADAS, overlap, r, u, v, TOT, TOV, i1, khad, MISTAKES,SUMLETTERS, n: MISTAKES:=Hakten(F): TOT:=[]: NUM:=nops(MISTAKES): V:=overlapmat1(MISTAKES); #MAXOVER is the size of the largest overhang when one forbidden word overlaps with another. MAXOVER:=max(0,seq(seq(seq(V[u,v][i],i=1..nops(V[u,v])),u in MISTAKES),v in MISTAKES)); #CU will eventually hold the weights of clusters, and is used in constructing TOT #CU[i][f] will be the weight of all the clusters that begin with a specific forbidden word f, and have exactly k-i letters. CU:=[seq([seq(0,j=1..NUM)],i=1..MAXOVER)]; for k from 1 to L do KHADAS:=[]: for v in MISTAKES do n:=nops(v); if nops(v)=k then pip:=-mul(M[v[i]],i=1..nops(v))*x^k; else pip:=0: fi: for j from 1 to NUM do overlap:=V[v,op(j, MISTAKES)]: pip:=pip-add(x^(r)*mul(M[v[i]],i=1..r)*CU[r][j],r in overlap); od: KHADAS:=[op(KHADAS),expand(pip)]: od: #TOT takes the place of the cluster generating function. #TOT[k] corresponds to the weights of clusters that have EXACTLY k letters. TOT:=[op(TOT),convert(KHADAS,`+`)]: CUU:=[KHADAS,seq(op(i,CU),i=1..MAXOVER-1)]; CU:=CUU; od: #This is actually the heart of the program - once we have the cluster generating function TOT (which is the point of the first for loop), this is how we put it all together: SUMLETTERS:=add(M[a]*x, a in A): TOV:=[1]: for r from 1 to L do #The first term of khad corresponds to adding an arbitrary letter to the beginning of words of length r-1 to create a new word. #The second term corresponds to the case when adding a first letter creates a forbidden word. khad:=SUMLETTERS*op(nops(TOV),TOV)+add(op(r-i1+1,TOV)*op(i1,TOT),i1=2..r): TOV:=[op(TOV),expand(khad)]: od: TOV: end: ############################################################################# #DoubleGJseries(A,F,x,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of single letters. x is the variable of the function, M is a table with the weights, and L is the number of terms the program computes. #This program is analogous to DoubleGJ, except that it doesn't compute the whole generating function, merely the first L terms. So in structure it is actually much closer to SingleGJseries. #The main difference is that we break the words up based on their first letter. This means that we have to treat the empty word separately (it has no first letter), which is why TOV must be initialized in a new way. #We also separate the clusters based on their first word (this is actually inherent in SingleGJseries, but we get rid of it - in this case we want to keep it), and then break up the forbidden words according to their first letters, so we have control over the clusters. #The trick with END and substitution (to keep track of the ends of clusters) is from DoubleGJ - see the notes there for more information. DoubleGJseries:=proc(A,F,x,M,L) local W, a,b, MAXOVER, i, j, k, CU, NUM, pip, CUU, KHADAS, overlap, r, u, v, TOT, TOV, i1, khad, MISTAKES, Alph, f,n,END: MISTAKES:=Hakten(F): NUM:=nops(MISTAKES): W:=overlapmat1(MISTAKES); #MAXOVER is the size of the largest overhang when one forbidden word overlaps with another. MAXOVER:=max(0,seq(seq(seq(W[u,v][i],i=1..nops(W[u,v])),u in MISTAKES),v in MISTAKES)); #CU will eventually hold the weights of clusters, and is used in constructing TOT #CU[i][f] will be the weight of all the clusters that begin with a specific forbidden word f, and have exactly k-i letters. CU:=[seq([seq(0,j=1..NUM)],i=1..MAXOVER)]; for k from 1 to L do KHADAS:=[]: for v in MISTAKES do n:=nops(v); if nops(v)=k then pip:=mul(M[v[i],v[i+1]],i=1..n-1)*mul(M[ID,v[i]],i=1..n)*x^n*(-1)*END[v[k]]; else pip:=0: fi: for j from 1 to NUM do overlap:=W[v,op(j, MISTAKES)]: pip:=pip-add(x^(r)*mul(M[ID,v[i]],i=1..r) *mul(M[v[i],v[i+1]],i=1..r)*CU[r][j],r in overlap); od: KHADAS:=[op(KHADAS),expand(pip)]: od: #TOT takes the place of the cluster generating function. #TOT[f,k] corresponds to the weights of clusters that begin with the forbidden word f and have EXACTLY k letters. for i from 1 to NUM do TOT[MISTAKES[i],k]:=KHADAS[i]: od: CUU:=[KHADAS,seq(op(i,CU),i=1..MAXOVER-1)]; CU:=CUU; od: #This breaks up the set of forbidden words into groups based onto their first letters. for a in A do Alph[a]:=select(First,MISTAKES,a); od; #This is actually the heart of the program - once we have the cluster generating function, TOT (which was the point of the whole first for loop), this is how we put it all together. TOV:=[[1],[seq(x*M[ID,a],a in A)]]: for r from 2 to L do #This term corresponds to adding every possible first letter to create an unmarked word khad:=seq( add(x*M[ID,a]*M[a,A[i]]*TOV[nops(TOV)][i],i=1..nops(A)) #This term corresponds to the case when we add a new first letter and create a marked word, where the cluster has < r letters. +add(add(add( subs({seq(END[b]=M[b,A[j]],b in A)},TOT[f,i1])*TOV[r-i1+1][j], j=1..nops(A)),f in Alph[a]),i1=2..r-1) #This term corresponds to the case when we add a new first letter and create a marked word that is one cluster of length exactly r. +add(subs({seq( (END[b]=1) ,b in A)},TOT[f,r]),f in Alph[a]) ,a in A); TOV:=[op(TOV),[khad]]: od: [seq( convert(TOV[i],`+`),i=1..L+1)]: end: ############################################################################# #ProbDoubleGJseries(A,F,x,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of single letters. x is the variable of the function, M is a table with the weights, and L is the number of terms the program computes. #This program is analogous to ProbDoubleGJ, except that it doesn't compute the whole generating function, merely the first L terms. So in structure it is actually much closer to DoubleGJseries. #The main difference from DoubleGJseries is that we only want to include the weight of the first single letter. This is done by leaving it off entirely for most of the program, and multiplying it in at the end. #Although the structure is very different, this modification to get from DoubleGJseries to ProbDoubleGJseries is basically the same adjustment we used to get from DoubleGJ to ProbDoubleGJ. ProbDoubleGJseries:=proc(A,F,x,M,L) local W, a,b, MAXOVER, i, j, k, CU, NUM, pip, CUU, KHADAS, overlap, r, u, v, TOT, TOV, i1, khad, MISTAKES, Alph, f,n,END: MISTAKES:=Hakten(F): NUM:=nops(MISTAKES): W:=overlapmat1(MISTAKES); #MAXOVER is the size of the largest overhang when one forbidden word overlaps with another. MAXOVER:=max(0,seq(seq(seq(W[u,v][i],i=1..nops(W[u,v])),u in MISTAKES),v in MISTAKES)); #CU will eventually hold the weights of clusters, and is used in constructing TOT #CU[i][f] will be the weight of all the clusters that begin with a specific forbidden word f, and have exactly k-i letters. CU:=[seq([seq(0,j=1..NUM)],i=1..MAXOVER)]; for k from 1 to L do KHADAS:=[]: for v in MISTAKES do n:=nops(v); if nops(v)=k then pip:=mul(M[v[i],v[i+1]],i=1..n-1)*x^n*(-1)*END[v[k]]; else pip:=0: fi: for j from 1 to NUM do overlap:=W[v,op(j, MISTAKES)]: pip:=pip-add(x^(r)*mul(M[v[i],v[i+1]],i=1..r)*CU[r][j],r in overlap); od: KHADAS:=[op(KHADAS),expand(pip)]: od: #TOT takes the place of the cluster generating function. #TOT[f,k] corresponds to the weights of clusters that begin with the forbidden word f and have EXACTLY k letters. for i from 1 to NUM do TOT[MISTAKES[i],k]:=KHADAS[i]: od: CUU:=[KHADAS,seq(op(i,CU),i=1..MAXOVER-1)]; CU:=CUU; od: #This breaks up the set of forbidden words into groups based onto their first letters. for a in A do Alph[a]:=select(First,MISTAKES,a); od; #This is actually the heart of the program - once we have the cluster generating function TOT (which is the point of the first big for loop), this is how we put it all together: TOV:=[[1],[seq(x,a in A)]]: for r from 2 to L do #The first term corresponds to adding every possible first letter to words of length r-1 to form words of length r. khad:=seq( add(x*M[a,A[i]]*TOV[nops(TOV)][i],i=1..nops(A)) #This term corresponds to the case when we add a new first letter and create a marked word, where the cluster has < r letters. +add(add(add( subs({seq(END[b]=M[b,A[j]],b in A)},TOT[f,i1])*TOV[r-i1+1][j], j=1..nops(A)),f in Alph[a]),i1=2..r-1) #This term corresponds to the case when we add a new first letter and create a marked word that is one cluster of length exactly r. +add(subs({seq( (END[b]=1) ,b in A)},TOT[f,r]),f in Alph[a]) ,a in A); TOV:=[op(TOV),[khad]]: od: [1,seq(normal(add( TOV[i][j]*M[ID,A[j]],j=1..nops(A))),i=2..L+1)]: end: ############################################################################# #TripleGJseries(A,F,x,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of trigraphs, digraphs and single letters. x is the variable of the function, M is a table with the weights, and L is the number of terms the program computes. #This program is analogous to TripleGJ, except that it doesn't compute the whole generating function, merely the first L terms. So in structure it is actually much closer to DoubleGJseries. #The main difference is that we break the words up based on their first two letter. This means that we have to treat the empty word and one letter words separately, which is why TOV must be initialized in a new way. This also contributes to the number of different cases we have to consider in the definition of khadas. #We separate the clusters based on their first word (this is actually inherent in SingleGJseries, but we get rid of it - in this case we want to keep it), and then break up the forbidden words according to their first two letters, so we have control over the clusters. #The trick with END and substitution (to keep track of the ends of clusters) is from TripleGJ - see the notes there for more information. #TripleGetTOT is a separate program simply for ease in reading and debugging - it is exactly the trigraph analog of the first large for loop in SingleGJseries and DoubleGJseries. TripleGJseries:=proc(alphabet,F,x,M,L) local W,A, a,b,c,d, MAXOVER, i, j,l, k, CU, NUM, pip, CUU, KHADAS, overlap, r, u, v, TOT, TOV, i1, khad, MISTAKES, Alph,num, f,h,n,END,maybe,Set,g: A:=alphabet; MISTAKES:=F; #Removes 1-letter forbidden words for f in F do if nops(f)=1 then A:=Delete(A,op(f)); MISTAKES:=remove(Contains, MISTAKES, f); fi: od: #This will be used to initialize TOV, and ensure that TOV doesn't include forbidden words of length 2. maybe:=[seq( [seq( x^2*M[ID,ID,A[i]]*M[ID,ID,A[j]]*M[ID,A[i],A[j]], j=1..nops(A))],i=1..nops(A))]; for f in MISTAKES do if nops(f)=2 then i:=op(Positions(A,f[1])); j:=op(Positions(A,f[2])); maybe:=[op(1..i-1,maybe),[op(1..j-1,maybe[i]),0,op(j+1..nops(A),maybe[i])],op(i+1..nops(A),maybe)]; fi: od: TOV:=[[1],[seq(x*M[ID,ID,a],a in A)],[seq(seq(maybe[i][j],j=1..nops(A)),i=1..nops(A))]]; #Separates the forbidden words into groups based on their first two letters. for a in A do for b in A do Alph[a,b]:=map2(Positions,MISTAKES,select(FirstTwo,MISTAKES,a,b)); od; od; NUM:=nops(MISTAKES): TOT:=TripleGetTOT(A,MISTAKES,x,M,L,END); for r from 3 to L do khad:=[seq(seq( #This term corresponds to adding an arbitrary first letter to words of length r-1. add(x*M[ID,ID,A[k]]*M[ID,A[k],A[l]]*M[A[k],A[l],A[i]]*TOV[nops(TOV)][(l-1)*nops(A)+i] ,i=1..nops(A)) #This term corresponds to the case where adding a first letter creates a cluster, and the cluster has length < r-1 +add(add(add(add( subs({seq(seq(END[c,d]=M[c,d,A[i]]*M[d,A[i],A[j]]*M[ID,d,A[i]],d in A),c in A)},TOT[i1][op(Alph[A[k],A[l]][num])]) *TOV[r-i1+1][(i-1)*nops(A)+j] ,j=1..nops(A)),i=1..nops(A)),num=1..nops(Alph[A[k],A[l]])),i1=2..r-2) #This term corresponds to the case where adding a first letter creates a cluster, and the cluster has length = r-1 +add(add( subs({seq(seq( (END[c,d]=M[c,d,A[i]]*M[ID,d,A[i]]), c in A), d in A)},x*M[ID,ID,A[i]]*TOT[r-1][op ( Alph[A[k],A[l]][num])]) ,num=1..nops( Alph[A[k],A[l]])),i=1..nops(A)) #This term corresponds to the case where adding a first letter creates a cluster, and the cluster has length = r +add(subs({seq(seq( (END[c,d]=1) ,c in A),d in A)},TOT[r][ op(Alph[A[k],A[l]][num])]),num=1..nops (Alph[A[k],A[l]])) ,l=1..nops(A)),k=1..nops(A))]; TOV:=[op(TOV),map(expand,khad)]: od: [seq( convert(TOV[i],`+`),i=1..L+1)]: end: ######################## #TripleGetTOT(A,MISTAKES,x,M,L,END): The trigraph analog of the first large for loop in SingleGJseries and DoubleGJseries. TripleGetTOT:=proc(A,MISTAKES,x,M,L,END) local NUM, MAXOVER, CU, u, v, k, KHADAS, pip, i, n, r, CUU, TOT; TOT:=[]; NUM:=nops(MISTAKES): #MAXOVER is the size of the largest overhang when one forbidden word overlaps with another. MAXOVER:=max(0,seq(seq( op( overlap1(u,v)),u in MISTAKES),v in MISTAKES)); #CU will eventually hold the weights of clusters, and is used in constructing TOT #CU[i][f] will be the weight of all the clusters that begin with a specific forbidden word f, and have exactly k-i letters. CU:=[seq([seq(0,j=1..NUM)],i=1..MAXOVER)]; for k from 1 to L do KHADAS:=[]: for v in MISTAKES do n:=nops(v); if nops(v)=k then pip:=mul(M[ID,v[i],v[i+1]],i=1..n-1)*mul(M[ID,ID,v[i]],i=1..n)* mul(M[v[i],v[i+1],v[i+2]],i=1..n-2)*x^n*(-1)*END[v[n-1],v[n]]; else pip:=0: fi: pip:=pip-add(add( x^(r)*mul(M[ID,ID,v[i]],i=1..r)*mul(M[ID,v[i],v[i+1]],i=1..r)* mul(M[v[i],v[i+1],v[i+2]],i=1..r-1)*M[v[r],MISTAKES[j][1],MISTAKES[j][2]]*CU[r][j] ,r in overlap1(v,MISTAKES[j])),j=1..NUM); KHADAS:=[op(KHADAS),expand(pip)]: od: TOT:=[op(TOT),KHADAS]; CUU:=[KHADAS,seq(op(i,CU),i=1..MAXOVER-1)]; CU:=CUU; od: TOT; end: ############################################################################# #ProbTripleGJseries(A,F,x,M,L) takes in an alphabet A (as a list) and a set of forbidden words F, and implements the Goulden-Jackson Cluster method, keeping track of the weights of digraphs (ordered letter pairs) and the weights of single letters. x is the variable of the function, M is a table with the weights, and L is the number of terms the program computes. #This program is analogous to ProbTripleGJ, except that it doesn't compute the whole generating function, merely the first L terms. So in structure it is actually much closer to TripleGJseries. #The main difference from TripleGJseries is that we only want to include the weight of the first digraph. This is done by leaving it off entirely for most of the program, and multiplying it in at the end. #Although the structure is very different, this modification to get from TripleGJseries to ProbTripleGJseries is basically the same adjustment we used to get from TripleGJ to ProbTripleGJ. #ProbTripleGetTOT is a separate program simply for ease in reading and debugging - it is exactly the trigraph analog of the first large for loop in ProbDoubleGJseries. #We cannot use TripleGetTOT here, because even though the structure is the same, the weights are different. ProbTripleGJseries:=proc(alphabet,F,x,M,L) local W,A, a,b,c,d, MAXOVER, i, j,l, k, CU, NUM, pip, CUU, KHADAS, overlap, r, u, v, TOT, TOV, i1, khad, MISTAKES, Alph,num, f,h,n,END,maybe,Set,g: A:=alphabet; MISTAKES:=F; #Removes 1 letter forbidden words for f in F do if nops(f)=1 then A:=Delete(A,op(f)); MISTAKES:=remove(Contains, MISTAKES, f); fi: od: #This will be used to initialize TOV, and ensure that TOV doesn't have any forbidden words of length 2. maybe:=[seq( [seq( x^2, j=1..nops(A))],i=1..nops(A))]; for f in MISTAKES do if nops(f)=2 then i:=op(Positions(A,f[1])); j:=op(Positions(A,f[2])); maybe:=[op(1..i-1,maybe),[op(1..j-1,maybe[i]),0,op(j+1..nops(A),maybe[i])],op(i+1..nops(A),maybe)]; fi: od: TOV:=[[1],[seq(x*M[ID,ID,a],a in A)],[seq(seq(maybe[i][j],j=1..nops(A)),i=1..nops(A))]]; #Breaks up the forbidden words into groups based on their first two letters for a in A do for b in A do Alph[a,b]:=map2(Positions,MISTAKES,select(FirstTwo,MISTAKES,a,b)); od; od; NUM:=nops(MISTAKES): #TOT will take the place of the cluster generating function. TOT:=ProbTripleGetTOT(A,MISTAKES,x,M,L,END); for r from 3 to L do khad:=[seq(seq( #This term corresponds to adding an arbitrary first letter to words of length r-1. x*add(M[A[k],A[l],A[i]]*TOV[nops(TOV)][(l-1)*nops(A)+i],i=1..nops(A)) #This term corresponds to the case where adding a first letter creates a cluster, and the cluster has length < r-1 +add(add(add(add( subs({seq(seq(END[c,d]=M[c,d,A[i]]*M[d,A[i],A[j]],d in A),c in A)},TOT[i1][op(Alph[A[k],A[l]][num])]) *TOV[r-i1+1][(i-1)*nops(A)+j] ,j=1..nops(A)),i=1..nops(A)),num=1..nops(Alph[A[k],A[l]])),i1=2..r-2) #This term corresponds to the case where adding a first letter creates a cluster, and the cluster has length = r-1 +add(add( subs({seq(seq( (END[c,d]=M[c,d,A[i]]), c in A), d in A)},x*TOT[r-1][op( Alph[A[k],A[l]][num])]) ,num=1..nops( Alph[A[k],A[l]])),i=1..nops(A)) #This term corresponds to the case where adding a first letter creates a cluster, and the cluster has length = r +add(subs({seq(seq( (END[c,d]=1) ,c in A),d in A)},TOT[r][ op(Alph[A[k],A[l]][num])]),num=1..nops (Alph[A[k],A[l]])) ,l=1..nops(A)),k=1..nops(A))]; TOV:=[op(TOV),map(expand,khad)]: od: if L<=1 then RETURN([seq(convert(TOV[i],`+`),i=1..L+1)]); else RETURN([seq( convert(TOV[i],`+`),i=1..2), seq(normal(add(add( TOV[i][(j-1)*nops(A)+k]*M[ID,A[j],A[k]], k=1..nops(A)),j=1..nops (A))),i=3..L+1)]): fi: end: ######################## #ProbTripleGetTOT(A,MISTAKES,x,M,L,END): The trigraph analog of the first large for loop in SingleGJseries and DoubleGJseries, but with weights designed for probability applications. ProbTripleGetTOT:=proc(A,MISTAKES,x,M,L,END) local NUM, MAXOVER, CU, u, v, k, KHADAS, pip, i, n, r, CUU, TOT; TOT:=[]; NUM:=nops(MISTAKES): #MAXOVER is the size of the largest overhang when one forbidden word overlaps with another. MAXOVER:=max(0,seq(seq( op( overlap1(u,v)),u in MISTAKES),v in MISTAKES)); #CU will eventually hold the weights of clusters, and is used in constructing TOT #CU[i][f] will be the weight of all the clusters that begin with a specific forbidden word f, and have exactly k-i letters. CU:=[seq([seq(0,j=1..NUM)],i=1..MAXOVER)]; for k from 1 to L do KHADAS:=[]: for v in MISTAKES do n:=nops(v); if nops(v)=k then pip:=mul(M[v[i],v[i+1],v[i+2]],i=1..n-2)*x^n*(-1)*END[v[n-1],v[n]]; else pip:=0: fi: pip:=pip-add(add( x^(r)* mul(M[v[i],v[i+1],v[i+2]],i=1..r-1) *M[v[r],MISTAKES[j][1],MISTAKES[j][2]]*CU[r][j] ,r in overlap1(v,MISTAKES[j])),j=1..NUM); KHADAS:=[op(KHADAS),expand(pip)]: od: TOT:=[op(TOT),KHADAS]; CUU:=[KHADAS,seq(op(i,CU),i=1..MAXOVER-1)]; CU:=CUU; od: TOT; end: ######################################################### #Helper Functions, called by the series programs ######################################################### #overlap1(u,v) finds the list of places i in v s.t. v[1],...,v[i] is a suffix of u, accompanied by the length of the leftover prefix of u. overlap1:=proc(u,v) local i,gu: gu:=[]: for i from 1 to nops(v)-1 do if nops(u)-i+1>1 and [op(1..i,v)]=[op(nops(u)-i+1..nops(u),u)] then gu:=[op(gu),nops(u)-i]: fi: od: gu: end: ######################################################### #Given a set of mistakes, overlapmat1(MISTAKES) is a table A such that A[u,v] is overlap1(u,v) overlapmat1:=proc(MISTAKES) local A,u,v; A:=table([seq(seq( (u,v)=overlap1(u,v), v in MISTAKES), u in MISTAKES)]); end: ######################################################### #superfluous(B,w) given a set of bad words, and one of its members, finds whether it is superfluous, that is, whether it is a factor of another one. superfluous:=proc(B,w) local i,v: if not type(B,set) or not type(w,list) then ERROR(`Bad input`): fi: if not member(w,B) then ERROR(`Second argument must belong to first arg.`): fi: for i from 1 to nops(B) do v:=op(i,B): if v<>w and Contains(v,w) then RETURN(true): fi: od: false: end: ######################################################### hakten:=proc(B) local w,i: for i from 1 to nops(B) do w:=op(i,B): if superfluous(B,w) then RETURN(B minus {w}): fi: od: B: end: ######################################################### #Hakten(B) removes all superfluous words in a set B of forbidden words. #For example, there is no need to have both 12221 and 1222 as forbidden subwords - any word that avoids 1222 must also avoid 12221. Hakten:=proc(B) local B1,B2: B1:=B: B2:=hakten(B1): while B1<>B2 do B1:=B2: B2:=hakten(B2): od: B2: end: ######################################################################## First:=proc(W,a); if W[1]=a then return true: fi: false: end: ################################################################# ################################################################# ################################################################## ################################################################## #Recursive Methods for Subword Avoidance - alternatives to the Goulden-Jackson Cluster Method programs ################################################################## #RecursiveDouble takes in an alphabet A1, a set of forbidden words F1, a variable t and a table of single and double letter weights, M. #RecursiveDouble returns the generating function in t that is the weight enumerator of the words on A1 that avoid words in F1 as subwords. #With the same input, this function should have the same output as DoubleGJ. In addition, any table with frequencies or wieghts that can be used by DoubleGJ can be used here. See MakeTable for more information on what this table must be. RecursiveDouble:=proc(A1,F1,t,M) local A,F,f,W,a,eq,var,S,ToDo,Done,elt,Sets,G,g; A:=A1; F:=F1; #Gets rid of one letter forbidden words in F1 by removing them from the alphabet. for f in F1 do if nops(f)=1 then A:=Delete(A,op(f)); F:=remove(Contains, F, f); fi: od: #Sets up the system of equations and variables to be solved. Note that we program in W[a,{[a]}]=0 directly (there are no words that simultaneously begin with a yet avoid "a" as an initial subword). eq:={seq(W[[a,{[a]}]],a in A)}; var:={seq(W[[a,{}]],a in A),seq(W[[a,{[a]}]],a in A)}; ToDo:={seq([a,{}],a in A)}; Done:={seq([a,{[a]}],a in A)}; while ToDo<>{} do for elt in ToDo do Sets:=table([seq(a={}, a in A)]); #Guillotine(F,a) returns the set of initial forbidden subwords we create when we chop off the first letter a. G:=Guillotine({op(F),op(elt[2])},elt[1]); for g in G do Sets[g[1]]:=Sets[g[1]] union {g}; od: Done:=Done union {elt}; ToDo:=ToDo union {seq([a,Sets[a]],a in A)} minus Done; var:=var union {seq(W[[a,Sets[a]]],a in A)}; eq:= eq union {W[elt]=M[ID,elt[1]]*t + add(M[ID,elt[1]]*M[elt[1],a]*t*W[[a,Sets[a]]],a in A)}; od: od: S:=solve(eq,var); 1+add( subs(S,W[[a,{}]]), a in A); end: ################################################################# #RecursiveProbDouble takes in an alphabet A1, a set of forbidden words F1, a variable t and a table of single and double letter weights, M. #RecursiveDouble returns the generating function in t that is the weight enumerator of the words on A1 that avoid words in F1 as subwords. #With the same input, this function should have the same output as ProbDoubleGJ. In addition, any table with frequencies or wieghts that can be used by DoubleGJ can be used here. See MakeTable for more information on what this table must be. RecursiveProbDouble:=proc(A,F,t,M) local W,a,eq,var,S,ToDo,Done,elt,Sets,G,g; #Sets up the system of equations and variables to be solved. Note that we program in W[a,{[a]}]=0 directly (there are no words that simultaneously begin with a yet avoid "a" as an initial subword). eq:={seq(W[[a,{[a]}]],a in A)}; var:={seq(W[[a,{}]],a in A),seq(W[[a,{[a]}]],a in A)}; ToDo:={seq([a,{}],a in A)}; Done:={seq([a,{[a]}],a in A)}; while ToDo<>{} do for elt in ToDo do Sets:=table([seq(a={}, a in A)]); #Guillotine(F,a) returns the set of initial forbidden subwords we create when we chop off the first letter a. G:=Guillotine({op(F),op(elt[2])},elt[1]); for g in G do Sets[g[1]]:=Sets[g[1]] union {g}; od: Done:=Done union {elt}; ToDo:=ToDo union {seq([a,Sets[a]],a in A)} minus Done; var:=var union {seq(W[[a,Sets[a]]],a in A)}; eq:= eq union {W[elt]=t + add(M[elt[1],a]*t*W[[a,Sets[a]]],a in A)}; od: od: S:=solve(eq,var); 1+add( subs(S,W[[a,{}]]*M[ID,a]), a in A); end: ################################################################### #Guillotine(F,a) takes in a set of forbidden words (on some alphabet), and a letter "a" (also assumed to be in this alphabet. #Guillotine(F,a) returns a set of words that are the forbidden subwords we must avoid as initial subwords, if we want to be able to add "a" as the new first letter without creating forbidden subwords. This is important in the recursive programs RecursiveDouble and RecursiveProbDouble. Guillotine:=proc(F,a) local G,f,g,h,Extra: G:={}; for f in F do if nops(f)>1 and f[1]=a then h:=[op(2..nops(f),f)]; #Once the subwords are created, we want to make sure that we end up with the smallest set possible. We do this by arguing that if a word avoids 112 as an initial subword, then it must also avoid 1122 as an initial subword. We're then free to remove the extra subwords - this reduces the number of equations we need in the Recursive programs. Extra:={}; for g in G do if [op(1..min(nops(g),nops(h)),g)]=[op(1..min(nops(g),nops(h)),h)] then Extra:=Extra union {g}; h:=[op(1..min(nops(g),nops(h)),h)] fi: od: G:=G union {h} minus Extra; fi: od: G: end: ################################################################## ################################################################## ################################################################## ################################################################## #Test Programs - alternate methods to test the Goulden Jackson Programs. ################################################################## #CleanWords generates the OK words themselves, to verify that the Goulden Jackson programs are doing exactly what they ought to be doing. ################################################################## # IsSubWord(w,v): inputs a word w, and another word v, and outputs true if and only if v is a subword of w in the weak sense, i.e. contiguous subwords only. IsSubWord := proc(w,v) local i, offset: for offset from 0 to nops(w)-nops(v) do for i from 1 to nops(v) do if not w[offset+i]=v[i] then break: fi: if i=nops(v) then return(true): fi: od: od: return(false): end: ################################################################### #NoDirties(w,D): inputs a word w, and a set of "dirty" words D, and outputs true if and only if w does not contain any of these dirty words. NoDirties := proc(w,D) local d: for d in D do if IsSubWord(w,d) then return(false): fi: od: return(true): end: ################################################################## #CleanWords(A,F,n): inputs an alphabet A, and a set of dirty words F, and an integer n, and outputs the set of n-letter words in A, that avoid the dirty words of F CleanWords := proc(A,F,n) local a,T,t, c,candidates, clean_guys: if n=0 then return({[]}): fi: T:= CleanWords(A,F,n-1): candidates:= seq(seq([op(t),a], a in A), t in T): clean_guys:= {}: for c in candidates do if NoDirties(c,F) then clean_guys:= clean_guys union {c}: fi: od: return(clean_guys): end: ##################################################### # Empirical GJ Series programs: ##################################################### # EmpiricalSingleGJseries(A,F,V,L): Empirically finds the first L coefficients in the Taylor expansion of SingleGJ(A,F,t,V), given alphabet A, forbidden words F, and weights list V. # WARNING: This method is slow and should only be used for checking purposes!! EmpiricalSingleGJseries:= proc(A,F,M,L) local Words,n,w,thelist, coeff,i: thelist:= []; for n from 1 to L do coeff:=0: Words:= CleanWords(A,F,n): for w in Words do coeff:= coeff+mul(M[w[i]], i=1..n); od: thelist:= [op(thelist),coeff]: od: thelist: end: ##################################################### # EmpiricalDoubleGJseries(A,F,M,L): Empirically finds the first L coefficients in the Taylor expansion of DoubleGJ(A,F,t,M), given alphabet A, forbidden words F, and matrix of digraph and single letter weights, M. See MakeTable for more information on M. # WARNING: This method is slow and should only be used for checking purposes!! EmpiricalDoubleGJseries:= proc(A,F,M,L) local Words,n,w,thelist, coeff,i: thelist:= []; for n from 1 to L do coeff:=0: Words:= CleanWords(A,F,n): for w in Words do coeff:= coeff+mul(M[w[i],w[i+1]], i=1..n-1)*mul(M[ID,w[i]],i=1..n); od: thelist:= [op(thelist),coeff]: od: thelist: end: ##################################################### # EmpiricalProbDoubleGJseries(A,F,M,L): Empirically finds the first L coefficients in the Taylor expansion of ProbDoubleGJ(A,F,t,M), given alphabet A, forbidden words F, and matrix of digraph and single letter weights, M. See MakeTable for more information on M. # WARNING: This method is slow and should only be used for checking purposes!! EmpiricalProbDoubleGJseries:= proc(A,F,M,L) local Words,n,w,thelist, coeff,i: thelist:= []; for n from 1 to L do coeff:=0: Words:= CleanWords(A,F,n): for w in Words do if n=1 then coeff:= coeff + M[ID,w[1]]; else coeff:= coeff+mul(M[(w[i],w[i+1])],i=1..(n-1))*M[ID,w[1]]; fi: od: thelist:= [op(thelist),coeff]: od: thelist: end: ##################################################### # EmpiricalProbTripleGJseries(A,F,M,L): Empirically finds the first L coefficients in the Taylor expansion of ProbTripleGJ(A,F,t,M), given alphabet A, forbidden words F, and matrix of trigraph, digraph and single letter weights, M. See MakeTable2 for more information on M. # WARNING: This method is slow and should only be used for checking purposes!! EmpiricalTripleGJseries:= proc(A,F,M,L) local Words,n,w,thelist, coeff,i: thelist:= []; for n from 1 to L do coeff:=0: Words:= CleanWords(A,F,n): for w in Words do coeff:= coeff+mul(M[(w[i],w[i+1],w[i+2])],i=1..(n-2))*mul(M[ID,w[j],w[j+1]],j=1..(n-1)) *mul(M[ID,ID,w[k]],k=1..n); od: thelist:= [op(thelist),coeff]: od: thelist: end: ##################################################### # EmpiricalProbTripleGJseries(A,F,M,L): Empirically finds the first L coefficients in the Taylor expansion of ProbTripleGJ(A,F,t,M), given alphabet A, forbidden words F, and matrix of trigraph, digraph and single letter weights, M. See MakeTable2 for more information on M. # WARNING: This method is slow and should only be used for checking purposes!! EmpiricalProbTripleGJseries:= proc(A,F,M,L) local Words,n,w,thelist, coeff,i: thelist:= []; for n from 1 to L do coeff:=0: Words:= CleanWords(A,F,n): for w in Words do if n=1 then coeff:= coeff + M[ID,ID,w[1]]; elif n=2 then coeff:= coeff + M[ID,w[1],w[2]]; else coeff:= coeff+mul(M[(w[i],w[i+1],w[i+2])],i=1..(n-2))*M[ID,w[1],w[2]]; fi: od: thelist:= [op(thelist),coeff]: od: thelist: end: