From Pseudocode to Prototype

In my last practicum-related blog post, I adumbrated some of the preparatory procedures involved when setting out to develop a software application. You can read this post here. In my case, the application is a text differentiation tool (which highlights the differences between two text strings) for the new, revised version of the Versioning Machine, due to be rolled out sometime later this summer.

Now that I have drafted and revised my pseudocode to a point that I am relatively happy with, it is time to (hopefully) realise the pseudocode through a working JavaScript prototype. Attempting to build the application through the actual Versioning Machine source code would be extremely difficult.  Therefore, careful not to put the cart before the horse, the best starting point, I decided, would be to write rudimentary HTML through which my JS can be more easily tested.

The first point will be to create two HTML strings with which I can change their content with my JavaScript code:

<html>
<body>
<p id=”string1″>The quick brown fox jumps over the lazy dog </p>

<p id=“string2″> The quick brown fox jumped over the lazy dog</p>
</body>
</html>

The two strings above, enclosed in <p> tags and each with their own unique id attribute, serve as proxies for the VM’s HTML sample files, through which I will hopefully be working at a later date, given my prototype is in working order.

Once the HTML has been set up, it is now time to ‘get’ the correct HTML elements that we need to change with our JavaScript. This is currently the first step in my pseudocode.
In this instance, because the two strings each have an id attribute (namely, “string1” and “string2”, the best way to access these  elements is to use JavaScript’s getElementById() method. Rather than simply accessing the two strings, however, I would also need to create a variable in which to store the information within each string:

var String1 = document.getElementById(“string1″);

var String2 = document.getElementById(“string2″);

One thing I need to keep in mind, however, is that the VM code may not lend itself to being accessed through the getElementById() method, and so I may need to figure out another way of grabbing the appropriate elements if I were to begin implementing my JS code within the VM. All I am doing here is creating a working HTML and JS environment in order to experiment with different JS methods and to see if they may be of use for the application I intend to build.

Once I get the correct HTML elements and have stored them in variables, I will need to figure out how to “split” the strings by each word. This is because my text comparison tool needs to differentiate each single word between two strings. As of yet, the two HTML strings stored inside var String1 and var String2 are stored as a series of characters – not specifically word by word. The computer at the moment sees each string as one, made up of a series of characters(like letters) and whitespace(which is also a character). In order to manipulate the way the computer reads the strings – that is, word for word, closer to how  a human parses a sentence – we need to break, or split, each string up like this. As I mentioned, whitespace is in and of itself a character, just like a letter. Therefore, if we were to split each string up by the whitespace character, this means we would capture each word within those two areas of whitespace. Don’t forget though, these words are still in a sense strings (series of characters) but we’ve just found a way that the computer will recognise each of these groups of characters as distinct entities. Splitting characters up within a string by the whitespace is very easy. All you have to do is use the string split method, followed by (“”).  The “” between the parentheses is where we ask the computer to break the split by whitespace. If, instead, we had string split followed by (“b”), all the characters in our string would be broken up each time the computer comes across the least “b” in a string.

If then I were to split the variable String1 up by whitespace:

var String1 = document.getElementById(“string1”);
var myArray1 = String1.split(“”);

The result would be that it is broken up, like this:

The, quick, brown, fox, jumps, over, the, lazy, dog

If I were to do String1.split(“b”), it would then come out like this:

The quick ,rown fox jumps over the lazy dog.

Employing the split() method, then, on both var String1 and var String2 means that we have two arrays of substrings, so to speak, made up of blocks of characters that are essentially words.  As with the example of the String1.split(“”) above, String1 as a substring array is now passed into the variable myArray1. We would do the exact procedure for String2, and we could call that array myArray2.

Arrays in JS are essentially a special type of object that can store multiple values in a single variable. Each of these values has an index number. The index numbers for each array are the same: beginning at 0 and so on for the array’s length. The myArray number indexes then are:

Index [0] = The
Index [1] = quick
Index [2] = brown
Index [3] = fox
Index [4] = jumps
Index [5] = over
Index [6] = the
Index [7] = lazy
Index [8] = dog

For our other array, myArray2, that is now storing the String2 variable information following the use of the split() method, the character within each index number is identical, except for Index[4], which instead of ‘jumps’ is ‘jumped’.
The next phase, then, will require comparing these two arrays, Array1 and Array2, getting the computer to determine which Index is not the same (Index[4]) and then HIGHLIGHTING this difference. The highlighting can be done very easily with the simple CSS background-color property. So, whenever the computer comes across a difference, it will need to somehow apply this background-color highlight on the Index in question.
At the moment, I need to do more research into how to compare two arrays, and then, once a difference is picked-up on, I need to find out how to somehow take this ‘difference’ out of the array so that it can be highlighted. I intend to keep you all posted on my progress.

Beginning the Text Comparison Application

Since I have been given the task of implementing a new text comparison feature for the Versioning Machine 5, it is clear that many questions are going to have to be answered before I can begin even thinking about writing the code and embedding it into the software.

In a VM survey carried out a few months ago, 17 out of 21 survey participants noted that they used the VM for the comparison of prose texts, 10 for poetry and 3 for drama.

The VM, however, lacks any differentiation comparison capabilities. In its current iteration it can only highlight the same line across each of the different witnesses. Its inability to show up difference seems a deficit somewhat, given the fact that users expressly stated in the survey that they would like to use the VM for longer texts in the future, and would therefore find a comparison functionality which highlights the difference within each witness more useful – as with Juxta’s heat map view.

Only in the last couple of weeks have I been set the task, by my mentor, to begin development of the text comparison application. The lifecycle of such a task doesn’t begin, however, with coding. There is a lengthly incubation process in which I must break down the code into a list of steps. These steps can be thought of as something like detailed descriptions of each function that will be requisite for successful implementation of the app.

In order to begin my development task, I needed to start by writing what is commonly referred to in IT as pseudo-code. Pseudo-code is not a functioning code in and of  itself, but it is an important aid in web development and software engineering  and helps the human developer to better understand the complexity and semantics of their software.

Being a novice at best at JavaScript, it has been extremely tricky for me to identify how I should begin drafting my code. For instance, in order to even begin creating a text comparison diff. (difference) utility, I will need to be able to grab the correct parts of the DOM elements from the Versioning Machine’s HTML files, as this is the only way that I will be able to get the code to parse through each line that has the same line number from each different witness. The pseudo-code, for the time being anyway, begins like this:

1.Loop over all lines of all witnesses

2. Get all witnesses for one line
a.  get the a class or line ID to find all readings of same line across different witnesses
b. using a class or line ID get different readings in HTML DOM
c. store the readings in either a JS array or JS object
d.  return the JS array or object

The pseudo-code above, however, is only one possible way to begin writing a diff. utility tool. There are already available many text comparison tools and APIs available on the internet, like prettydiff <prettydif.com> or <stringjs.com> , each with their own idiosyncrasies and differing capabilities. Another issue I will have to chew over is whether or not it may be easiest to embed one of these open-source APIs into the VM instead of building it from scratch.

The project team will need to discuss whether these APIs are capable of delivering what is needed for the VM. They may not, for instance, be able to compare four or five different texts side by side (usually they go as far as two). It may be better in the long run to develop a bespoke differentiation application since it will be easier to debug if the VM were to pass through later iterations down the line.

Other ontologically-based problems at hand are questions such as, should the text comparison app deal with mark-up based characters such as punctuation, or should it just highlight different words? Should the diff. comparison be viewed as a heat map, in which the colour gradient highlighting each word becomes darker, say, to indicate where there are is more differentiation across each of the separate witness documents? — or should there be a side by side view where one text is offset against another? If it was decided to highlight difference across three, four, or five different witnesses side-by-side at the same time, how would this work? Would there need to be a choice for the user to toggle on and off a base text of choice against which all the other witnesses are compared?

As I continue to revise and refine the pseudo-code I can begin to test it out, incrementally, by writing small pieces of code (like functions and methods) through which I can learn how to implement my ideas through the JavaScript syntax.

The meetings I have had with my mentor have been incredibly useful so far – as he has shown me how to create a workable JavaScript environment on my computer (which involves creating a html file through which JavaScript code can be tried and tested). My mentor has suggested that I write a small sample webpage consisting of four strings, something like:

<div class=’apparatus v1′>The is a test string</div>
<div class=’apparatus v2′>This is a test string.</div>
<div class=’apparatus v3′>This is a string test</div>
<div class=’apparatus v4′>This is, a string test.</div>

and through this I can learn how to write simple import or ‘get strings’ function, or cycle through the <div> tags with for or do… while loops. Writing these functions will be the first step in learning how to get, grab and manipulate the html code for the Versioning Machine.

Javascript Module Pattern: Anonymous Functions and Implied Globals

Recently, I have been assigned the task to look into the Javascript Module Pattern. Restructuring the VM JS code in this way may make it easier to delineate different components of the code into discrete functions, so to obviate the possibility of variable duplication and control the scope of variables so that they remain local. A very good introductory article to the JS module pattern can be found here <http://www.adequatelygood.com/JavaScript-Module-Pattern-In-Depth.html>.

To quote Ben Cherry, “the fundamental construct that makes it all possible” is the JavaScript anonymous function. Cherry describes this code as a closure, that is: everything that runs inside the anonymous function is discrete and is isolated from the rest of the surrounding JS code.

Explanatory Youtube video about anonymous functions: <https://www.youtube.com/watch?v=JRCJ0zmooJE>.

None of the code inside an anonymous function has global scope, and Cherry uses the term ‘privacy’ to describe this. Likewise, the code within the anonymous function is afforded ‘state’. What I understand to mean by ‘state’ is that if any code is altered or changed within the anonymous function, this will not effect any of the code extraneous to it.

Click here to find out more about JavaScript state:<http://www.dofactory.com/javascript/state-design-pattern>.

Below is an example Berry gives of an anonymous function, which states that globals can still be accessed within the wrapped element, but any variables or functions declared within the scope of the anonymous function are contained within its scope:

(function () {

// … all vars and functions are in this scope only

// still maintains access to all globals

}());

Berry adds that the () around the anonymous function is required, since statements that begin with the token function are considered to be function declarations. The () creates a function expression instead.

A function expression is different to a function declaration in that a function can be assigned to a variable in a function expression. When a function expression has been stored in a variable, the variable can then be used as a function.

Here is an example from W3schools: <http://www.w3schools.com/js/tryit.asp?filename=tryjs_function_expression_variable>

The example above is in fact an anonymous function, as the function has not been assigned a name.

The example above that Berry gives of an anonymous function can also be more accurately described as an anonymous self-invoking function. These functions do not have to be called, since they invoke themselves. You cannot self-invoke a function declaration, which is why a self—invoking function needs to be a function expression. Example of self-invoking expressions from W3schools: <http://www.w3schools.com/js/tryit.asp?filename=tryjs_function_expression_self>.

Of course, one of the main differences between a self-invoking function and a function declaration is the fact that a function declaration needs to be called to be executed (i.e. the function is stored and saved for later use).