AndrewPearson.org

Andrew Pearson's Little Corner of the Internet...

Saturday, November 13, 2010

Vibe Vault

My friend Sanders and I just launched an awesome, free app on the Android marketplace. It's called Vibe Vault and it lets you stream and download songs from archive.org. Archive.org has over 84,000 free recordings of concerts. The collection spans over 8,000 Grateful Dead shows, and includes a diverse collection of other acts from Lotus to 311 to Elliott Smith.

I use this app every day. On my way to class, I stream shows. I download entire shows (and sometimes single songs) and listen to them on the subway when I have no cell connection. Who needs to buy music or use an iPod when you can listen to FREE music by great artists on your phone? Check it out and let me know what you think.

If you want to check out my buddy Sanders' page on Vibe Vault, you can find it here.

Monday, September 13, 2010

Melodotron

I officially now have an app on the Android market. It is called the melodotron.



The Melodotron turns an Android phone into a dynamic, user interactive, musical instrument. The sounds that it produces are completely dynamic and synthetic (no sampling or anything). As of right now (Version 0.5.2 at the time of this posting), the Melodotron can play notes on 4 octaves, across 3 different types of wavetypes (sine, sawtooth, square), all with an orientation activated tremolo effect. I expect to add more wavetypes, effects, and other improvements as regularly and often as possible. The source code is entirely free.

Check out the Melodtron's main page. The program is completely open source. Better yet, check it out on the Android app market.

Friday, July 30, 2010

Android HTML Parsing

Another common task in smartphone app development is parsing a webpage. Maybe you want to display a subset of the data on the page to the user of your app. Maybe you are parsing the webpage for information that your app will use internally. Either way, the Android API does not provide an easy way to do this; thus the necessity of this
blogpost.

There are many overtures that one can take to accomplish this task. Some (see: idiots) advocate parsing HTML pages like long strings, using regex's or some other "roll-your-own" approach. Some prescribe using a SAX parser (treating HTML like XML), which is bug-prone (if the HTML isn't properly formed). I recommend using a free HTML parsing library. A good choice is the aptly, yet unoriginally, named HtmlCleaner. Though it doesn't fully support XPATH features (more on this in a bit) like its competitor, TagSoup, it is a bit smaller (this is important because you have to include the library in your app). If you want to use TagSoup instead of HtmlCleaner, I would bet that the steps in the rest of this tutorial are more or less the same, though I have not tested them.

Anyway, let's outline exactly what we want to do.
  • Open up some webpage
  • Programmatically extract some information from it.
  • Do something with that information.

As an example of a webpage to parse, I will once again draw from the development of my archive.org app. Below is a screenshot of the type of page that we will be parsing. The information that we are looking for is the URL and title for each song listed on the page.

We can see the information that we want in the table titled “Audio Files” which itself is in a table titled "Individual Files" a little bit down the page. Viewing the HTML source for the page betrays the tangle of and tags all with various attributes. Though it might appear that it would be difficult to sort through this mess, we can clean things up with just a few lines of code. Below is the code that will parse this page for exactly what we want:

// Create HtmlCleaner object to turn the page into
// XML that we can analyze to get the songs from the page.
HtmlCleaner pageParser = new HtmlCleaner();
CleanerProperties props = pageParser.getProperties();
props.setAllowHtmlInsideAttributes(true);
props.setAllowMultiWordAttributes(true);
props.setRecognizeUnicodeChars(true);
props.setOmitComments(true);

try {
URLConnection conn = url[0].openConnection();
TagNode node = pageParser.clean(new InputStreamReader(conn.getInputStream()));

// XPath string for locating download links...
// XPATH says "Select out of all 'table' elements with attribute 'class'
// equal to 'fileFormats' which contain element 'tr'..."
String xPathExpression = "//table[@class='fileFormats']//tr";
try {
// Stupid API returns Object[]... Why not TagNodes? We'll cast it later
Object[] downloadNodes = node.evaluateXPath(xPathExpression);

// Iterate through the nodes selected by the XPath statement...
boolean reachedSongs = false;
for(Object linkNode : downloadNodes){
// The song titles and locations are listed between two rows.
// Ignore all other rows to save a little time and battery...
if (!reachedSongs) {
String s = pageParser.getInnerHtml(((TagNode)linkNode).getChildTags()[0]);
if (!s.equals("Audio Files")) {
continue;
} else {
reachedSongs = true;
continue;
}
}else{
if(s.equals("Information")||s.equals(“Other Files”)){
break;
}
}

// Recursively find all nodes which have "href" (link) attributes. Then, store
// the link values in an ArrayList. Create a new ArchiveSongObj with these links
// and the title of the track, which is the inner HTML of the first child node.
TagNode[] links = ((TagNode)linkNode).getElementsHavingAttribute("href", true);
ArrayList stringLinks = new ArrayList();
for(TagNode t: links){
stringLinks.add(t.getAttributeByName("href"));
}
String title = pageParser.getInnerHtml(((TagNode)((TagNode)linkNode).getChildren().get(0))).trim();
System.out.println(title);
System.out.println(stringLinks);
}
} catch (XPatherException e) {
Log.e("ERROR", e.getMessage());
}
} catch (IOException e) {
Log.e("ERROR", e.getMessage());
}

The first thing that we do is set up and HtmlCleaner object. We set a few properties for it, and then are ready to use it. We call its clean() method on the URL's input stream. This returns a TagNode for the root node in the document. A TagNode is a crucial part of the HtmlCleaner API: it represents a node in an XML document and you can use the API to work with its elements, attributes, and children nodes.

The next step greatly simplifies the amount of processing that we have to do on the webpage. Instead of having to worry about EVERY subnode of the root node of the document, we can use an XPath String to ask for only a subset of these nodes. We define the String xPathExpression to be ""//table[@class='fileFormats']//tr". Calling evaluateXPath() with this String basically says "return the set of all subnodes with table elements that have attribute class equal to file format which contain element tr" (We want to find tr elements [table rows] in the table whose class is called "fileFormats"). We receive an array of Objects (which are really TagNodes) from this method.

Now we have a collection of TagNodes which makes up the table with the information that we want. The problem is that the table also has lots of extraneous information that we don't want. In fact, we don't care about anything in the table before the "Audio Files" subheading, and we don't care about anything after those files have been listed. Instead of wasting time (battery power) processing these TagNodes, I define a boolean called reachedSongs that I use to skip over nodes until we get to where we care about the information. The "Audio Files" subheading will be the inner HTML of the first child of one of the nodes returned from our XPath evaluation. After the files, there is a subheading called "Information". We know to break out of our loop after that.

In between the "Audio Files" and "Information" subheadings is where we have to actually analyze our nodes. Each node represents a tr (table row) element. Each row has several td elements: the inner HTML of the first td element is the song title, and any td with an href attribute is a link to a particular version of the song (64kb, VBR, FLAC, etc.). We grab this information for each song.

Tuesday, July 20, 2010

Android: Why To Use JSON And How To Use It

It is a pretty common task in smartphone application development to query a website for some information, do a bit of processing, and present it to the user in a different form. There are two different main ways of going about this:

1.) Just download a whole webpage and use an HTML or XML parser to try and extract the information which you want.
2.) Some websites have API's which allow queries that instead of returning webpages, return XML, JSON, or some other way of presenting data.

Clearly, option 2 (if available) makes a lot more sense. Instead of downloading a large webpage (wasting data), parsing the entire thing (wasting battery), and then trying to analyze it (wasting your time going through often improperly written HTML), you can download much smaller, easier to manage text which conforms to XML, JSON, or whatever it is that the API provides. (I will be posting another tutorial about option 1 soon).

You might wonder why you would want to use JSON instead of XML for your smartphone application. After all, XML has been a much-ballyhooed technology buzzword for many years. There is, however, a good (and simple) reason. XML is (usually) bigger. The closing tags in XML do not exist in JSON, and therefore save you a few bytes for each tag that you don't need. JSON can generally express the same data using fewer characters, thus saving the phone from having to transfer more data every time there is a query to a website. This makes JSON a natural choice for quick and efficient website querying (for websites which offer it).

One such website is www.archive.org. Among many other things, archive.org allows people to upload recordings from concerts for other people to download for free. It's pretty awesome. They also have an API which allows you to query their system which will return results in XML, JSON, or a variety of other formats.

I am currently writing an application for browsing archive.org from your phone to find shows and then either download or stream the tracks. I'll show you how I do the first part (finding shows) using JSON and just a few lines of code.

First, you need your JSON query. I am going to query archive.org for "Lotus," asking for a JSON result containing 10 items with their respective date, format, identifier, mediatype, and title. According to the archive.org search API, my query should look like this:

String archiveQuery = "http://www.archive.org/advancedsearch.php?q=Lotus&fl[]=date&fl[]=format&fl[]=identifier&fl[]=mediatype&fl[]=title&sort[]=createdate+desc&sort[]=&sort[]=&rows=10&page=1&output=json&callback=callback&save=yes";

Now that we have our query, we simly open an HTTP connection using the query, grab an input stream of bytes and turn it into a JSON object. On a side note, notice that I am using a BufferedInputStream because its read() call can grab many bytes at once and put them into an internal buffer. A regular InputStream grabs one byte per read() so it has to pester the OS more and is slower and wastes more processing power (which in turn wastes battery life).

InputStream in = null;
String queryResult = "";
try {
URL url = new URL(archiveQuery);
HttpURLConnection urlConn = (HttpURLConnection) url.openConnection();
HttpURLConnection httpConn = (HttpURLConnection) urlConn;
httpConn.setAllowUserInteraction(false);
httpConn.connect();
in = httpConn.getInputStream();
BufferedInputStream bis = new BufferedInputStream(in);
ByteArrayBuffer baf = new ByteArrayBuffer(50);
int read = 0;
int bufSize = 512;
byte[] buffer = new byte[bufSize];
while(true){
read = bis.read(buffer);
if(read==-1){
break;
}
baf.append(buffer, 0, read);
}
queryResult = new String(baf.toByteArray());
} catch (MalformedURLException e) {
// DEBUG
Log.e("DEBUG: ", e.toString());
} catch (IOException e) {
// DEBUG
Log.e("DEBUG: ", e.toString());
}

At this point, our JSON response is stored in the String queryResult. It looks kind of like this:

callback({
"responseHeader": {
... *snip* ...
}, "response": {
"numFound": 1496,
"start": 0,
"docs": [{
"mediatype": "audio",
"title": "The Disco Biscuits At Starscape 2010",
"identifier": "TheDiscoBiscuitsAtStarscape2010",
"format": ["Metadata", "Ogg Vorbis", "VBR MP3"]
}, {
"title": "Lotus Live at Bonnaroo Music & Arts Festival on 2010-06-10",
"mediatype": "etree",
"date": "2010-06-10T00:00:00Z",
"identifier": "Lotus2010-06-10TheOtherStageBonnarooMusicArtsFestivalManchester",
"format": ["Checksums", "Flac", "Flac FingerPrint", "Metadata", "Ogg Vorbis", "Text", "VBR MP3"]
}, {
"title": "Lotus Live at Mr. Smalls Theatre on 2006-02-17",
"mediatype": "etree",
"date": "2006-02-17T00:00:00Z",
"identifier": "lotus2006-02-17.matrix",
"format": ["64Kbps M3U", "64Kbps MP3", "64Kbps MP3 ZIP", "Checksums", "Flac", "Flac FingerPrint", "Metadata", "Ogg Vorbis", "Text", "VBR M3U", "VBR MP3", "VBR ZIP"]
}, {
... *snip* ...

We see that the information we want is stored in an array whose key is "docs" and is contained in an item called "response". We can grab this information VERY easily using the JSONObject class provided by Android as shown below:

JSONObject jObject;
try {
jObject = new JSONObject(queryResult.replace("callback(", "")).getJSONObject("response");
JSONArray docsArray = jObject.getJSONArray("docs");
for (int i = 0; i < 10; i++) {
if (docsArray.getJSONObject(i).optString("mediatype").equals("etree")) {
String title = docsArray.getJSONObject(i).optString("title");
String identifier = docsArray.getJSONObject(i).optString("identifier");
String date = docsArray.getJSONObject(i).optString("date");
System.out.println(title + " " + identifier + " " + date);
}
}
} catch (JSONException e) {
// DEBUG
Log.e("DEBUG: ", JSONString);
Log.e("DEBUG: ", e.toString());
}

The first thing that I do is create a JSONObject from "queryResult", which is the JSON response from archive.org. Note that I remove "callback(" from the JSON string because, even though archive.org returns it, it should not actually be part of the JSON string (I realized this when I was catching JSONException errors).

After that, we are ready to do some JSON parsing. Since this is just a tutorial, I hardcode "10" into the for loop because I requested 10 items. This would be a bad idea in production code (if you don't know why you are a huge noob and should not be writing production code). I only want items whose mediatype is "etree", and for each of these items I print the title, identifier, and date.

Voila, you now know how to use JSON in Android.