Dette er et gæsteindlæg af Leigh Dodds, der rådgiver omkring åbne data og er tilknyttet Open Data Institute (ODI). I indlægget beskrives initiativet med Københavns “City Data Exchange“, der samler data fra forskellige åbne og lukkede kilder og udgør en form for markedsplads for data. Fra Open Knowledge Danmarks side er vi især skeptiske overfor ideen om at tage åbne data offentliggjort på åbne platforme (som Københavns åbne data portal, der er baseret på CKAN) og genpublicere dem bag login og med begrænsning i vilkårene for videreanvendelse. Indlægget blev oprindeligt postet på Leigh Doods egen blog og er udtryk for skribentens egne holdninger.
First Impressions of Copenhagen’s City Data Exchange
The first thing I did was to read the terms of service. And then explore the publishing and consuming options.
As of today 21st May there are 56 datasets on the site. All of them are free.
The majority seem to have been uploaded by Hitachi and are copies of datasets from Copenhagen’s open data portal.
Compare, for example this dataset on the exchange and the same one on the open data portal. The open version has better metadata, clearer provenance, more choice of formats and a download process that doesn’t require a registration step. The open data portal also has more datasets than the exchange.
Datasets on the exchange can apparently be downloaded as a “one time download” or purchased under a subscription model. However I’ve downloaded a few and the downloads aren’t restricted to being one-time, at least currently.
I’ve also subscribed to a free dataset. My expectation was that this would give me direct access to an API. It turns out that the developer portal is actually a completely separate website. After subscribing to a dataset I was emailed with a username and password (in clear text!) with instructions to go and log into that portal.
The list of subscriptions in the developer portal didn’t quite match what I had in the main site, as one that I’d cancelled was still active. It seems you can separately unsubscribe to them there, but its not clear what the implications of that might be.
Weirdly there’s also a prominent “close your account” button in the developer portal. Which seems a little odd. Feels like two different products or services have been grafted together.
The developer portal is very, very basic. The APIs expose by each dataset are:
- a download API that gives you the entire dataset
- a “delta” API that gives you changes made between specific dates.
There are no filtering or search options. No format options. Really there’s very little value-add at all.
Essentially the subscribing to a dataset gives you a URL from which you can fetch the dataset on a regular basis rather than having to manual download it. There’s no obvious help or support for developers creating useful applications against these APIs.
Authorising access to an API is done via an API key which is added as a URL parameter. They don’t appear to be using OAuth or similar to give extra security.
In order to publish data you need to have provided a contact phone number and address. You can then provide some basic configuration for your dataset:
- Period of update: one off, hourly, daily, weekly, monthly, annual
- Whether you want to allow it to be downloaded and if so, whether its free or paid
- Whether you want to allow API access and if so, whether its free or paid
Pricing is in Kronor and you can set a price per download or a monthly price for API access (such as it is).
To provide your data you can either upload a file or give the data exchange access to an API. It looks like there’s an option to discuss how to integrate your API with their system, or you can provide some configuration options:
- Type – this has one option “Restful”
- Response Type – this has one option “JSON”
- Endpoint URL
- API Key
When uploading a dataset, you can tell it a bit about the structure of the data, specifically:
- Whether it contains geographical information, and which columns include the latititude and longitude.
- Whether it’s a time series and which column contains the timestamp
This is as far as I’ve tested with publishing, but looks like there’s a basic workflow for draft and published datasets. I got stuck because of issues trying to publish and map a dataset that I’d just downloaded from the exchange itself.
The Terms of Service
There are a number interesting things to note there:
Section 7, Payments: “we will charge Data Consumers Service Delivery Charges based on factors such as the volume of the Dataset queried and downloaded as well as the frequency of usage of the APIs to query for the Datasets”
It’s not clear what those service delivery charges will be yet. The platform doesn’t currently provide access to any paid data, so I can’t tell. But it would appear that even free data might incur some charges. Hopefully there will be a freemium model?
Seems likely though that the platform is designed to generate revenue for Hitachi through ongoing use of the APIs. But if they want to raise traffic they need to think about adding a lot more power to the APIs.
Section 7, Payments: “As a Data Consumer your account must always have a positive balance with a minimum amount as stated at our Website from time to time”
Well, this isn’t currently required during either registration or signing up to subscribe to an API. However I’m concerned that I need to let Hitachi hold money even if I’m not actively using the service.
I’ll also note that in Section 8, they say that on termination, “Any positive balance on your account will be payable to you provided we receive payment instructions.” Given that the two payment options are Paypal and Invoice, you’d think they might at least offer to refund money via PayPal for those using that option.
Section 8, Restrictions in use of the Services or Website: You may not “access, view or use the Website or Services in or in connection with the development of any product, software or service that offers any functionality similar to, or competitive with, the Services”
So I can’t, for example, take free data from the service and offer an alternative catalogue or hosting option? Or provide value-added services that enrich the freely available datasets?
This is pure protecting the platform, not enabling consumers or innovation.
Section 12, License to use the Dataset: “Subject to your payment of any applicable fees, you are granted a license by the Data Provider to use the relevant Dataset solely for the internal purposes and as otherwise set out under section 14 below. You may not sub-license such right or otherwise make the Dataset or any part thereof available to third parties.”
Data reuse rights are also addressed in Section 13 which includes the clause: “You shall not…make the Dataset or any part thereof as such available to any third party.”
While Section 14, explains that as a consumer you may “(i) copy, distribute and publish the result of the use of the Dataset, (ii) adapt and combine the Dataset with other materials and (iii) exploit commercially and noncommercially” and that: “The Data Provider acknowledges that any separate work, analysis or similar derived from the Dataset shall vest in the creator of such“.
So, while they’ve given clearly given some thought to the creation of derived works and products, which is great, the data can only be used for “internal purposes” which are not clearly defined especially with respect to the other permissions.
I think this precludes using the data in a number of useful ways. You certainly don’t have any rights to redistribute, even if the data is free.
This is not an open license. I’ve written about the impacts of non-open licenses. It appears that data publishers must agree to these terms too, so you can’t publish open data through this exchange. This is not a good outcome, especially if the city decides to publish more data here and on its open data portal.
The data that Hitachi have copied into the site is now under a custom licence. If you access the data through the Copenhagen open data portal then you are given more rights. Amusingly, the data in the exchange isn’t properly attributed, so it break the terms of the open licence. I assume Hitachi have sought explicit permission to use the data in this way?
Overall I’m extremely underwhelmed by the exchange and the developer portal. Even allowing for it being at an early stage, its a very thin offering.I built more than this with a small team of a couple of people over a few months.
It’s also not clear to me how the exchange in its current form is going to deliver on the vision. I can’t see how the exchange is really going to unlock more data from commercial organisations. The exchange does give some (basic) options for monetising data, but has nothing to say about helping with all of the other considerations important to data publishing.