Wednesday, 10 October 2012

Sakai Development: Post Nine

Before actually starting writing the code to do the deposit itself, I need to set up and include the SWORD2 Java client libraries. If you're not used to github, you might take a while to see the  button, which you can use to download the library as a zip file. Unzip it, cd to the created directory, and run

mvn clean package

to compile (and download a large number of new library files). This should hopefully end up with:

[INFO] [jar:jar {execution: default-jar}]
[INFO] Building jar: /home/simon/work/swordapp-JavaClient2.0-420485d/target/sword2-client-0.9.2.jar
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESSFUL
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3 minutes 24 seconds
[INFO] Finished at: Wed Sep 26 15:14:32 BST 2012
[INFO] Final Memory: 27M/121M
[INFO] ------------------------------------------------------------------------

and then it's a question of copying the jar file to somewhere where it can be picked up by Sakai. This requires two things (assuming my understanding of how maven POM files work is correct):

  • Add a dependency to the relevant pom.xml file, which will be picked up on compilation, so that maven will attempt to download the relevant jar file, and, if it can't, will ask for it to be installed locally by hand. The relevant file is in the content/content-tool/tool directory, and needs the following added (with line numbering):

  • Import the necessary classes into the ResourcesAction.java file so that the library can be used in this script. This is a simple pair
import org.swordapp.client.SWORDClient;
import org.swordapp.client.SWORDClient.*;
at line 135 of the file.
The code which will use this is based on doFinalizeDelete (lines 6545-6675), which follows doDeleteConfirm when the confirmation is selected. I haven't yet worked out where the actual display of the confirmation request happens, so this is not the last change by any means. The confirmation step could also include the opportunity to select from a list of repositories, and from the collections which exist at the chosen repository (as obtained from the repository SWORD2 service document). But that is a complication I don't really want to get into at this stage, I just want to be able to carry out a simple deposit. So I'm going to have both the repository and the collection set in the configuration file for the servlet.

The process is relatively simple; however, I should point out that I have just noticed in the SWORD2 client documentation that multi-part deposit is not supported, and the way I have been thinking has assumed that it is. So I will have to make a zip file or something of the items to be deposited (as their nature as a collection is important). java.util.zip is a core package, but not one I've ever used before; I'll start by adding the import for the package at the top of the file (new line 52).

The steps for producing a SWORD2 deposit from the selected files are:

  • Get archive details from configuration (some will always need to be obtained from configuration, but a service description could be downloaded on the fly to get information about collections etc. - just not in this version as I'm already overrunning the schedule);
  • Prepare metadata, using data from the confirmation form, which should basically be a subset of DC terms for use in the deposit (bearing in mind that it's possible for the depositor to go to the repository and change the metadata later if necessary);
  • Prepare files - create a zip as a file stream;
  • Authenticate to repository using userid/password provided in confirmation form;
  • Make deposit and (hopefully) get back a receipt.
While the information given in the swordapp documentation at first looks pretty complete, it is missing some details which I need, as I discover on starting to put the code together. I'll need to look at the source code for the app to get them.

The first issue is with the metadata. Dublin Core is not the only metadata used in SWORD2; there is some basic information which is part of the Atom profile: title, author, last modification date, and so on, as seen in the example here. The documentation gives no information about how to set this, and in fact I can't find anything useful in the source code (the string "updated", which is the element name for the last modification date, does not appear anywhere in the client code). I'm not particularly familiar with Atom, so it is possible that these are optional. I'll ignore this for the moment and see what happens. I'm also going to assume that in order to give multiple entries, I just repeatedly add a term: this needs to be supported for DC contributor - I think this should work, but I haven't actually gone through the apache Abdera library .which the swordapp code uses to check this.

Just at this point Andrew Martin put up a useful blog post which details his journey to working with the SWORD2 java library. He's not doing exactly the same thing, though we have already been in contact. While I need to go deeper into the coding, his post is probably a very useful resource for anyone reading this one.

The next thing to sort out is creating a ZIP file (virtually) to contain the items selected for archiving. I've not done this before, and the ZIP creation stuff I can find online, as well in my ancient Java book,   concentrates on making a ZIP from files (this looks pretty  useful for that, and is likely to form the framework I'll use) rather than from the Sakai content platform where the content may not even be stored in a filesystem. So I need to work out how to get the content out of the files as a Java input stream. I'll start by looking through the ResourcesAction.java code, and then move on to other files in the same directory if I can't find anything. All the input streams in ResourcesAction are for making changes to content rather than reading it - makes sense, as reading is not an action whcih affects the resource. But this code from FilePickerAction.java (lines 1710-13) makes it look very simple:

InputStream contentStream = resource.streamContent();
String contentType = resource.getContentType();
String filename = Validator.getFileName(itemId);
String resourceId = Validator.escapeResourceName(filename);

I just need to work back through the context to be sure that this code is doing what it appears to be doing. Although it doesn't appear to be (because it's in a method for up dating the resource), this is what is is in fact doing, as I eventually discover when I find the relevant javadocs (ContentHostingService, BaseContentService, and ContentResource - not from the current release, though). To re-use this code, the ContentResource class needs to be loaded, which it already is, and the content service needs to be set up (outside the loop which runs through the selected items):

ContentHostingService contentService = (ContentHostingService) toolSession.getAttribute (STATE_CONTENT_SERVICE);

The first problem, then, is that what I have is a ListItem object, when what I want is the itemID (which is a String); this is simple, as id is a property of the ListItem object, so I can just get it. I'll also need to protect against the itemid being null, which I don't think should happen. I'm not quite sure what the correct thing would be to do if it does, so I'll just log a warning if that happens. So the code I add is (lines 6741-4):

String itemid = item.id;
ContentResource resource = null;
if (itemid != null)
{

and then in the conditional block (6752-6774),

resource = contentService.getResource(itemId);
InputStream contentStream = resource.streamContent();

byte[] buf = new byte[1024];

//get filename and add to zip entry
String fileName = item.getName();

if (fileName != null)
{
  zip.putNextEntry(new ZipEntry(fileName);
}
else
{
  zip.putNextEntry(new ZipEntry("Unnamed resource with ID " + itemid);
}

int len;
while ((len = contentStream.read(buf)) > 0) {
  zip.write(buf, 0, len);
}
zip.closeEntry();     
contentStream.close(); 

Some preparation has to happen before this, using piped streams to turn the zip output into the input for the SWORD library, calculating the MD5 hash we want for the deposit stage on the way:

MessageDigest md = MessageDigest.getInstance("MD5");
DigestInputStream dpins = new DigestInputStream(pins, md);

And now we should be in a position to set up the deposit with the SWORD2 library..

It's also occurred to me that the solution to the problem of multiple archives is to embed them in the confirmation web form - the user selects the archive there from a drop down list, and the script reaps the URL used for deposit. So the URL to use is then just a form parameter. Except - the sword client readme file suggests that a collection object (derived from the collection description in a service document) is needed for deposit, so I need to check in the source code to see if there's a method with a deposit URL as an alternative. Turns out that there is, so I'll use that. So we have (ignoring a whole load of exceptions which will surely need to be caught, for the moment):

// Set up authentication
AuthCredentials auth = new AuthCredentials(params.getString("userid"),params.getString("password"));

// Make deposit and (hopefully) get back a receipt
Deposit deposit = new org.swordapp.client.Deposit();
deposit.setFile(dpins);
dpins.close();
pins.close();
pouts.close();
byte[] fileMD5 = md.digest();

deposit.setMimeType("application/zip");
deposit.setFilename(params.getString("title") + ".zip");
deposit.setPackaging(UriRegistry.PACKAGE_SIMPLE_ZIP);
deposit.setMd5(fileMD5);
deposit.setEntryPart(ep);
deposit.setInProgress(true);

// And now make the deposit
DepositReceipt receipt = client.deposit(params.getString("depositurl"), deposit, auth);

The next issue is what to do with the receipt. And how to alert the user to the success or failure of the deposit. The confirmation web page should still be available (especially if it has an indication that the status of the archiving will be displayed there). So it be displayed there, dynamically. So for the moment, I'll just hold on to the receipt and revisit this code when I've written the appropriate Velocity file.

There's just one final bit of code to add to this file, which is to add a call to the confirm delete method   as a new state resolution function (lines 7083-6):

else if(ResourceToolAction.ARCHIVE.equals(actionId))
{
  doArchiveconfirm(data);
}

Then I can start work on the Velocity file. I should say at this point that I don't expect this code to compile without error. I'm absolutely certain there will be exceptions which haven't been caught, and I may well have confused some of the variable names in the course of this. But I want to get on to the next stage before coming back here.

Wednesday, 3 October 2012

Sakai Development: Post Eight

This post follows straight on from the last, especially as I've missed a constant setting which goes with the ones listed at its end, a new line 857:

ACTIONS_ON_MULTIPLE_ITEMS.add(ActionType.ARCHIVE);

Looking at the next section I might need to change, where permissions are sorted out at lines 1721-1801, I don't think anything needs to be altered, because this draws in constant values which I have already altered. However, things do need to be changed where permissions are set for items. This is done with new lines 2203 and 2209:

boolean canArchive = ContentHostingService.allowArchiveResource(id);
item.setCanArchive(canArchive);

Of course, there will need to be corresponding alterations in the ContentHostingService class. I just need to find it - there is no ContentHostingService.java file in the source code. And searching online doesn't find anything useful. It's time to email those in the know, but meanwhile I can carry on with other bits of code. But I got a really quick answer - before I had a chance to do so, which tells me that I'm on the right lines:

"Yeah this one is in the kernel. The CHS Api lives there.
Sounds like what you are doing is correct. You'll need to add a method there that, most likely, checks some permission and returns true or false. If you want it set via a permission of course. You could just make it always available if that is what you wanted to, then you may not need to mod the kernel.

Then you could do item.canaddarchive(true)"

Thinking a bit more about this, I feel that perhaps I don't need a new kernel method, but I can be more sophisticated than making the service always available. To archive an item is basically making a copy, so what I really should do is to tie the archiving permission to the copy permission. So instead of the lines 2203 and 2209 above, I'll just check the canRead permission which is already there. (It strikes me, though, that in this world of DRM, read and copy are not necessarily the same thing, but never mind.) At least, I'm now at the end of the setting of permission booleans - the next bit of code should actually do something. (And no, I still have no idea why none of the kernel appears to be in the source code I downloaded, but this difficulty is one of the factors in my decision.)

What I want to work on now is to build the three archiving intermediate pages, which should be almost precisely like existing pages. The code for this starts at line 3961, which is where the delete confirmation page building code begins. To remind you, the three intermediate pages are the confirmation, collection metadata entry, and finish; the first and third will basically be copies of the equivalent delete pages and will therefore be built much in the same way. (I expected that most of the work for this little project would consist of copying and then amending existing code, nothing terribly difficult, and this is exactly what is being done here.) Since the code being copied is quite long, I'm not going to quote it all here. The two routines are very similar, so it looks as though copy and paste has already been used. I don't think I have time to look into the lines which are commented "//%%%% FIXME". I'll also need a third copy for MODE_ADD_ARCHIVE_METADATA. The new code becomes lines 4063 to 4207.

The code to call one of these new routines comes next, also a simple copy and modification of existing code. This comes in the context for the main page being re-displayed, as it will be on MODE_ARCHIVE_FINISH: lines 4823-7:

else if (mode.equals (MODE_ARCHIVE_FINISH))
{
  // build the context for the basic step of archive confirm page
  template = buildArchiveFinishContext (portlet, context, data, state);
}

This gives a total of about 200 lines of boilerplate code copied, modified slightly, and inserted back into the class. From this point onwards, we start getting into more exciting development (though there is still a little bit more to add, to call the code to create the context for the metadata form and for the confirmation page - which, it has just occurred to me, could sensibly be the same thing...I may want to revisit some of the above changes to do this, but for the moment I'll just leave things as they are, as it shouldn't do any harm to create constants but not use them).

The model I have used so far for these changes is the existing code for item deletion. Now, there is a slight problem: the actual deletion code appears to delete items one at a time when more than one is selected for deletion, and we can't do this for the archiving; we want one SWORD2 transaction whether there are one or more items. We now need to copy and modify two routines - doCopy, and doDeleteconfirm - which set the application state to doCopy or Deleteconfirm respectively. The first will set the doArchive state, and the second will set Archiveconfirm state, in both cases processing the list of items selected for archiving into a vector format, and these states should then be processed to make the actual archiving or the display of the confirmation page happen. This gives another hundred or so lines of code modified which doesn't really do much.

The next place where something needs to be added is now line 6288. The doDispatchItem method, of which this switch block forms part, is, like most of the rest this class, sparsely commented, but appears to be the part which determines what to do - hence the switch block, with cases corresponding to the different actions. The question is, whether the ARCHIVE case should be like the COPY case or the DELETE case? It's hard to tell without more documentation. The COPY case basically adds the ID of a selected item to a list of items to copy, while the DELETE case actually calls deleteItem - a method we have already decided isn't appropriate to copy directly (as we don't want to break down the archiving of a collection of items into a sequence of archiving actions on the individual items). So the ARCHIVE case needs to be something in between, something like this (I hope):

case ARCHIVE:
  List items_to_be_archived = new ArrayList();
  if(selectedItemId != null)
  {
    items_to_be_archived.add(selectedItemId);
  }
  state.removeAttribute(STATE_ITEMS_TO_BE_MOVED);
  state.setAttribute(STATE_ITEMS_TO_BE_ARCHIVED, items_to_be_archived);
  if (state.getAttribute(STATE_MESSAGE) == null)
  {
    // need new context
    state.setAttribute (STATE_MODE, MODE_ARCHIVE_FINISH);
  }
  break; 

which forms the new lines 6288 to 6301.