Beginning the Text Comparison Application

Since I have been given the task of implementing a new text comparison feature for the Versioning Machine 5, it is clear that many questions are going to have to be answered before I can begin even thinking about writing the code and embedding it into the software.

In a VM survey carried out a few months ago, 17 out of 21 survey participants noted that they used the VM for the comparison of prose texts, 10 for poetry and 3 for drama.

The VM, however, lacks any differentiation comparison capabilities. In its current iteration it can only highlight the same line across each of the different witnesses. Its inability to show up difference seems a deficit somewhat, given the fact that users expressly stated in the survey that they would like to use the VM for longer texts in the future, and would therefore find a comparison functionality which highlights the difference within each witness more useful – as with Juxta’s heat map view.

Only in the last couple of weeks have I been set the task, by my mentor, to begin development of the text comparison application. The lifecycle of such a task doesn’t begin, however, with coding. There is a lengthly incubation process in which I must break down the code into a list of steps. These steps can be thought of as something like detailed descriptions of each function that will be requisite for successful implementation of the app.

In order to begin my development task, I needed to start by writing what is commonly referred to in IT as pseudo-code. Pseudo-code is not a functioning code in and of  itself, but it is an important aid in web development and software engineering  and helps the human developer to better understand the complexity and semantics of their software.

Being a novice at best at JavaScript, it has been extremely tricky for me to identify how I should begin drafting my code. For instance, in order to even begin creating a text comparison diff. (difference) utility, I will need to be able to grab the correct parts of the DOM elements from the Versioning Machine’s HTML files, as this is the only way that I will be able to get the code to parse through each line that has the same line number from each different witness. The pseudo-code, for the time being anyway, begins like this:

1.Loop over all lines of all witnesses

2. Get all witnesses for one line
a.  get the a class or line ID to find all readings of same line across different witnesses
b. using a class or line ID get different readings in HTML DOM
c. store the readings in either a JS array or JS object
d.  return the JS array or object

The pseudo-code above, however, is only one possible way to begin writing a diff. utility tool. There are already available many text comparison tools and APIs available on the internet, like prettydiff <prettydif.com> or <stringjs.com> , each with their own idiosyncrasies and differing capabilities. Another issue I will have to chew over is whether or not it may be easiest to embed one of these open-source APIs into the VM instead of building it from scratch.

The project team will need to discuss whether these APIs are capable of delivering what is needed for the VM. They may not, for instance, be able to compare four or five different texts side by side (usually they go as far as two). It may be better in the long run to develop a bespoke differentiation application since it will be easier to debug if the VM were to pass through later iterations down the line.

Other ontologically-based problems at hand are questions such as, should the text comparison app deal with mark-up based characters such as punctuation, or should it just highlight different words? Should the diff. comparison be viewed as a heat map, in which the colour gradient highlighting each word becomes darker, say, to indicate where there are is more differentiation across each of the separate witness documents? — or should there be a side by side view where one text is offset against another? If it was decided to highlight difference across three, four, or five different witnesses side-by-side at the same time, how would this work? Would there need to be a choice for the user to toggle on and off a base text of choice against which all the other witnesses are compared?

As I continue to revise and refine the pseudo-code I can begin to test it out, incrementally, by writing small pieces of code (like functions and methods) through which I can learn how to implement my ideas through the JavaScript syntax.

The meetings I have had with my mentor have been incredibly useful so far – as he has shown me how to create a workable JavaScript environment on my computer (which involves creating a html file through which JavaScript code can be tried and tested). My mentor has suggested that I write a small sample webpage consisting of four strings, something like:

<div class=’apparatus v1′>The is a test string</div>
<div class=’apparatus v2′>This is a test string.</div>
<div class=’apparatus v3′>This is a string test</div>
<div class=’apparatus v4′>This is, a string test.</div>

and through this I can learn how to write simple import or ‘get strings’ function, or cycle through the <div> tags with for or do… while loops. Writing these functions will be the first step in learning how to get, grab and manipulate the html code for the Versioning Machine.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>