CaPRéT: Basic Analytics

A couple weeks back we pushed live the initial analytics that we’re developing for CaPRéT. (Yeah, I’m late getting this out.)

These are divided into two sets, realtime analytics where you can see activity use as it occurs and a usage log where you can gain a bit more insight into what’s being cut and paste.

(It’s worth mentioning that we’re developing this code as if the implementer is running the code for themselves, and hence can suppress public display of the analytics, or that you don’t mind your use being out in the open. So we’re not building a user system with logins, etc.)

Realtime Analytics

The realtime analytics are based on Hummingbird. (“Hummingbird lets you see how visitors are interacting with your website in real time.”) Hummingbird requires a WebSocket enabled server and browser (as of early September 2011 only “Firefox 7/8 and Chrome 14” support WebSocket).

The realtime analytics is displaying:

  • Live views of cuts showing frequency and geographic location (“Who’s using CaPRéT right now? – realtime.html).
  • Stats on use in the last hour, day and week (“CaPRéT use in the last hour, day and week” – aggregates.html)
  • Stats on cuts by day of the week (“CaPRéT use by day” – counts.html)

(I’ll agree that the last one might not be that useful, it came for free when Justin built on Hummingbird.)

Usage Logs – Updated

Next up on our list of things to do is to implement a tabular display of more details (and csv export). Initially the following data is being stored by CaPRéT in it’s database:

  • Timestamp: When the text was copied.
  • ID: ID string for each cut that is appended to the tracking gif and stored in the database.
  • Source URL: Location where the text that’s being used resides.
  • User IP: Yes there are privacy issues, but for the moment and for testing we’ll be capturing the IP address of the user copying the text but won’t display it publicly.
  • Text Cut: We’ll be storing 100 characters of the text that’s cut. There’s probably a data limitation of how much we can pass through the tracking gif, and we’re a little concerned about the space on the “free” Amazon micro instance we’re using. The 100 characters should be sufficient for a human to find the text on the page. We’ve talked about getting tricky and doing N characters plus a total count on characters so a user/program could recreate the whole text, assuming it didn’t change in the mean time if total data is indeed limited.
  • Last Modified Date: Using env.lmod = document.lastModified, we’re grabbing the last modified date of the page from which the text was cut.
  • Use IP: After the text is copied, and if it’s pasted into a compatible destination, every time the tracking image gets displayed we can read a unique string that’s added as a parameter to the URL. We can then match this string with the original text that was cut and source.

One of the other requests we’ve heard from the JISC OER-discuss mailing list is for the last modified date of the file that is getting served. We’re still working on this one.