This applies to URIs that a system needs to generate when it finds it needs to mint a new resource.
I’ve been thinking a lot about automated URI assignment lately. In particular the scheme we’ve been using (relying on the database to maintain a “next available number” and incrementing that), is fraught with potential problems. However I really don’t like the guid style with their large unwieldy and mostly unnecessarily long strings.
I did some back of the envelop thinking and came up with the following recommendations. After the fact I decided to search the web and see what I could find. I found some excellent stuff, but not this in particular, nor anything that seemed to rule it out. Of note, Phil Archer has some excellent guidelines here: http://philarcher.org/diary/2013/uripersistence/. This is much broader than what I’m doing here, but it is very good. He even has “avoid auto increment” as one of his top 10 recommendations.
The points in this paper don’t apply to hand crafted URIs (as you would typically have for your classes and properties, and even some of your hand curated special instances). This applies to URIs that a system needs to generate when it finds it needs to mint a new resource. A quick survey of the approaches and who uses them:
- Hand curate all—dbpedia essentially has the author create a URI when they create a new topic.
- Modest-sized number—dbpedia page IDs and page revision IDs look like next available number types.
- Type+longish number—yago has URIs like yago:Horseman110185793 (class plus up to a billion numbers; not sure if there is a next available number behind this, but it kind of looks like there is).
- Guids—cyc identifies everything with a long string like Mx4rvkS9GZwpEbGdrcN5Y29ycA.
- Guids—Microsoft uses 128 bit guids for identifying system components, such as {21EC2020-3AEA-4069-A2DD08002B30309D}. The random version uses 6 bits to indicate random and therefore has a namespace of 1036, thought to be large enough that the probability of generating the same number is negligible.
Being pragmatist, I wanted to figure out if there is an optimal size and way to generate URIs.