Beginning the Text Comparison Application

Since I have been given the task of implementing a new text comparison feature for the Versioning Machine 5, it is clear that many questions are going to have to be answered before I can begin even thinking about writing the code and embedding it into the software.

In a VM survey carried out a few months ago, 17 out of 21 survey participants noted that they used the VM for the comparison of prose texts, 10 for poetry and 3 for drama.

The VM, however, lacks any differentiation comparison capabilities. In its current iteration it can only highlight the same line across each of the different witnesses. Its inability to show up difference seems a deficit somewhat, given the fact that users expressly stated in the survey that they would like to use the VM for longer texts in the future, and would therefore find a comparison functionality which highlights the difference within each witness more useful – as with Juxta’s heat map view.

Only in the last couple of weeks have I been set the task, by my mentor, to begin development of the text comparison application. The lifecycle of such a task doesn’t begin, however, with coding. There is a lengthly incubation process in which I must break down the code into a list of steps. These steps can be thought of as something like detailed descriptions of each function that will be requisite for successful implementation of the app.

In order to begin my development task, I needed to start by writing what is commonly referred to in IT as pseudo-code. Pseudo-code is not a functioning code in and of  itself, but it is an important aid in web development and software engineering  and helps the human developer to better understand the complexity and semantics of their software.

Being a novice at best at JavaScript, it has been extremely tricky for me to identify how I should begin drafting my code. For instance, in order to even begin creating a text comparison diff. (difference) utility, I will need to be able to grab the correct parts of the DOM elements from the Versioning Machine’s HTML files, as this is the only way that I will be able to get the code to parse through each line that has the same line number from each different witness. The pseudo-code, for the time being anyway, begins like this:

1.Loop over all lines of all witnesses

2. Get all witnesses for one line
a.  get the a class or line ID to find all readings of same line across different witnesses
b. using a class or line ID get different readings in HTML DOM
c. store the readings in either a JS array or JS object
d.  return the JS array or object

The pseudo-code above, however, is only one possible way to begin writing a diff. utility tool. There are already available many text comparison tools and APIs available on the internet, like prettydiff <prettydif.com> or <stringjs.com> , each with their own idiosyncrasies and differing capabilities. Another issue I will have to chew over is whether or not it may be easiest to embed one of these open-source APIs into the VM instead of building it from scratch.

The project team will need to discuss whether these APIs are capable of delivering what is needed for the VM. They may not, for instance, be able to compare four or five different texts side by side (usually they go as far as two). It may be better in the long run to develop a bespoke differentiation application since it will be easier to debug if the VM were to pass through later iterations down the line.

Other ontologically-based problems at hand are questions such as, should the text comparison app deal with mark-up based characters such as punctuation, or should it just highlight different words? Should the diff. comparison be viewed as a heat map, in which the colour gradient highlighting each word becomes darker, say, to indicate where there are is more differentiation across each of the separate witness documents? — or should there be a side by side view where one text is offset against another? If it was decided to highlight difference across three, four, or five different witnesses side-by-side at the same time, how would this work? Would there need to be a choice for the user to toggle on and off a base text of choice against which all the other witnesses are compared?

As I continue to revise and refine the pseudo-code I can begin to test it out, incrementally, by writing small pieces of code (like functions and methods) through which I can learn how to implement my ideas through the JavaScript syntax.

The meetings I have had with my mentor have been incredibly useful so far – as he has shown me how to create a workable JavaScript environment on my computer (which involves creating a html file through which JavaScript code can be tried and tested). My mentor has suggested that I write a small sample webpage consisting of four strings, something like:

<div class=’apparatus v1′>The is a test string</div>
<div class=’apparatus v2′>This is a test string.</div>
<div class=’apparatus v3′>This is a string test</div>
<div class=’apparatus v4′>This is, a string test.</div>

and through this I can learn how to write simple import or ‘get strings’ function, or cycle through the <div> tags with for or do… while loops. Writing these functions will be the first step in learning how to get, grab and manipulate the html code for the Versioning Machine.

Javascript Module Pattern: Implied Globals and Module Augmentation

 

Global Import:

Another important aspect of JS that makes the Module Pattern possible is the implied globals feature. There is a good video describing the process of implied globals here: <https://www.youtube.com/watch?v=6VxkOC65Msk>. Basically, if a variable is declared within the scope of a given function, then that variable is inaccessible to any code outside of that function. If, however, the var is not used, but just the name of the variable, then the interpreter assumes that this is a global variable, and so the variable and its value can be accessed outside of the function.

Berry suggests that passing globals as parameters is a clearer and faster way than passing implied globals for importing globals into the code:

(function ($, YAHOO) {
// now have access to globals jQuery (as $) and YAHOO in this code
}(jQuery, YAHOO));

Berry’s next step in expounding the ins and outs of the module pattern in JS is the declaring of globals within the anonymous function, rather than simply importing them as parameter values. Using the return value means that only what we want to come out of the function as a global object will be passed through it. Everything else remains well hidden within the wrapper.

Berry’s example:

var MODULE = (function () {

var my = {},

privateVariable = 1;
function privateMethod() {
// …
}
my.moduleProperty = 1;
my.moduleMethod = function () {
// …
};
return my;
}());

 

In the above code, Berry has created a namespace so that the methods within it can be accessed when returned.

In order to gain a more comprehensive understanding of Berry’s code, it may be worth looking at this shorter segment of code by Todd Motto <http://toddmotto.com/mastering-the-module-pattern/>:

 

var Module = (function () {
var privateMethod = function () {
// do something
};
})();

 

In the above code a function is declared: privateMethod. This is locally contained within the new scope of the anonymous function.

Using return within a module’s scope will then return the methods inside of it back to its declaration “var Module” (or namespace), which essentially means that the object (the result of the methods inside the module) will then be global:

 

var Module = (function () {

return {
publicMethod: function () {
// code
}
};
})();

 

As it’s an object literal being returned, we can then call them anywhere in the code as globals in this way:

Module.publicMethod();

Motto gives us a distinct example of how object literals can be returned, so that it can then be called globally as: Module.publicMethodOne

 

var Module = (function () {
var privateMethod = function () {};
return {
publicMethodOne: function () {
// I can call `privateMethod()` you know…
},
publicMethodtwo: function () {
},
publicMethodThree: function () {
}
};
})();

 

Motto also demonstrates how to access private methods if we want to do so. All we need to do is to pass the private methods into public ones within the module:

 

var Module = (function () {
var privateMethod = function (message) {
console.log(message);
};
var publicMethod = function (text) {
privateMethod(text);
};
return {
publicMethod: publicMethod
};
})();

// Example of passing data into a private method
// the private method will then `console.log()` ‘Hello!’

Module.publicMethod(‘Hello!’);

 

Note that using publicMethod:publicMethod with the return keyword is possible due to something called the JS Revealing Module Pattern.

Another interesting function of the module pattern is the ability to augment modules, which basically means importing other modules from different files into our current module.  We can then do things with the imported module before passing this through the current module via the return keyword:

 

var MODULE = (function (my) {
my.anotherMethod = function () {
// added method…
};
return my;
}(MODULE));

We could access this method globally with the statement:

MODULE.anotherMethod

 

Loose Augmentation:

One particular pattern we can use when augmenting a module is Loose Augmentation. Berry describes how with this method, scripts are loaded asynchronously, where “flexible multi-part modules that can load themselves in any order“:

 

var MODULE = (function (my) {

// add capabilities…

return my;

}(MODULE || {}));

 

So, what is happening in the above code? If we resolve it into its component parts we see that the code is an anonymous self-invoking function that handles the “my” object parameter. This means that changes are made to “my” within the code and then returned to Module.

The “my” parameter is sent back as (MODULE || {}).

This expression means: if MODULE is defined, use it, otherwise, create a new empty one.

Javascript Module Pattern: Anonymous Functions and Implied Globals

Recently, I have been assigned the task to look into the Javascript Module Pattern. Restructuring the VM JS code in this way may make it easier to delineate different components of the code into discrete functions, so to obviate the possibility of variable duplication and control the scope of variables so that they remain local. A very good introductory article to the JS module pattern can be found here <http://www.adequatelygood.com/JavaScript-Module-Pattern-In-Depth.html>.

To quote Ben Cherry, “the fundamental construct that makes it all possible” is the JavaScript anonymous function. Cherry describes this code as a closure, that is: everything that runs inside the anonymous function is discrete and is isolated from the rest of the surrounding JS code.

Explanatory Youtube video about anonymous functions: <https://www.youtube.com/watch?v=JRCJ0zmooJE>.

None of the code inside an anonymous function has global scope, and Cherry uses the term ‘privacy’ to describe this. Likewise, the code within the anonymous function is afforded ‘state’. What I understand to mean by ‘state’ is that if any code is altered or changed within the anonymous function, this will not effect any of the code extraneous to it.

Click here to find out more about JavaScript state:<http://www.dofactory.com/javascript/state-design-pattern>.

Below is an example Berry gives of an anonymous function, which states that globals can still be accessed within the wrapped element, but any variables or functions declared within the scope of the anonymous function are contained within its scope:

(function () {

// … all vars and functions are in this scope only

// still maintains access to all globals

}());

Berry adds that the () around the anonymous function is required, since statements that begin with the token function are considered to be function declarations. The () creates a function expression instead.

A function expression is different to a function declaration in that a function can be assigned to a variable in a function expression. When a function expression has been stored in a variable, the variable can then be used as a function.

Here is an example from W3schools: <http://www.w3schools.com/js/tryit.asp?filename=tryjs_function_expression_variable>

The example above is in fact an anonymous function, as the function has not been assigned a name.

The example above that Berry gives of an anonymous function can also be more accurately described as an anonymous self-invoking function. These functions do not have to be called, since they invoke themselves. You cannot self-invoke a function declaration, which is why a self—invoking function needs to be a function expression. Example of self-invoking expressions from W3schools: <http://www.w3schools.com/js/tryit.asp?filename=tryjs_function_expression_self>.

Of course, one of the main differences between a self-invoking function and a function declaration is the fact that a function declaration needs to be called to be executed (i.e. the function is stored and saved for later use).