Open Data Sumbission Guidelines
The Open Data Portal of the Ministry of Health Sri Lanka recommends all data publishers to read the Open Data Submission Guidelines before submitting their datasets to the Open Data Portal.
The Open Data Portal recommends any publisher to contact the Ministry of Health for any clariffication regarding the publisher guidelines.
Personally Identifiable Information
The data should not contain any personally identifiable information. The submitted dataset should be sufficiently anonymised using a suitable anonymization method. Datasets that contain personally identifiable information will be rejected and won’t be published on the open data portal. You can use a free and open source tool such as ARX or Amnesia to anonymise your datasets before submission.
Personally identifiable information that are not accepted in the open data portal include datasets containing,
- Address (addresses containing information of a geogrpahical location smaller than an MoH area)
- Telephone number
- Fax number
- Email address
- NIC / Passport number
- Personal health identifiers (PHN)
- BHT number
- Account number
- Certificate or licence number (SLMC number)
- Vehicle identification number (Vehicle license plate number, chasis number etc)
- Driving license number
- Biometric data (finger print, voice)
- Photographs that uniquely identify an individual
- Any other data element that uniquely identifies an individual
Machine Readable File Datasets
The dataset should be in a machine readable format, this means that users of the data should be able to download and use the dataset for analyzing purposes without significant effort. Machine readable file formats include XML, JSON, CSV, Microsoft Excel, HTML etc. PDF, image file formats such as JPEG, PNG do not qualify as machine readable file format, and publishers are requested not to submit datasets in a non machine readable file format.
Recommendations to the publishers, use datasets with following file formats,
Ethical Clearence in Research Data
If you are submitting data from a research the research should have obtained an ethical clearance from the respective board of study. Publishers are requested to submit documents of ethical clearance together with the dataset, and research data collected without proper ethical clearance will not be published on the open data portal.
If the dataset was composed by multiple authors, the publishers should obtain consent from all the authors of the dataset prior to submitting the dataset to the open data portal. Publishers should submit the letters of consent together with the datasets when submitting data to the open data portal.
Submit High Value Datasets
While all Open Data publication is encouraged, publishers may have limited resources, and therefore it is important to prioritize datasets that are submitted to the open data portal.
High value datasets mean documents, the re-use of which is associated with important benefits for society, the environment and the economy.
We encourage all publishers to submit high value datasets to the open data portal that will be beneficial for all users.
Select Suitable Dataset License
All datasets submitted to the open data portal should be associated with an open data license. This provides a clear understanding to the end user in which conditions the data can be reused after publication.
It is recommended to use Creative Commons Attribution 4.0 (CC By 4.0) license. Under the CC 4.0 the end users can,
- Share - Copy and redistribute the material in any medium or format.
- Use - Reuse the data and combine with other datasets and build upon the published material for any purpose including commercial purposes.
Under the CC BY 4.0 License, users must acknowledge the source of the Information in their product or application. Data sourced from the open data portal, data.health.gov.lk can be acknowledged by providing the URL (link) to the dataset in the open data portal.
All datasets submitted to the open data portal will be published under CC by 4.0 unless an alternate open data license is provided with the submission. Publishers are encouraged to use licensing that is least restrictive in nature and encourage reuse.
Provide Comprehensive Metadata
Metadata provides additional information regarding datasets for users to better understand the meaning. The metadata may include the structure of the data, data quality, data access methods, update frequency etc.
The provision of comprehensive metadata when submitting a dataset to the open data portal improves the discoverability of the dataset, helping the users to find the datasets.
Recommendations to the publishers
- Provide quality description on the dataset.
- If the dataset is associated with a research, provide the abstract of the publication as the description.
- Select suitable categories for the dataset.
- Select suitable keywords associated with the dataset.
Provide Accurate Timeframe Metadata
Timeframe information associated with the dataset will help the users to understand the period of time the datasets covers, the dataset published and last modified.
It is recommended for publishers to include metadata about the timeframe the datasets cover when submitting the datasets to the open data portal.
Ensure Data is Kept Upto Date
The users trust can be improved when data within a dataset is kept up to date. What constitutes ‘old’ data is subjective matter. For example data published annually can only be updated once a year and ages slowly, while real time data gets updated frequently and data ages very rapidly.
It is recommended for publishers to,
- Submit on frequency of data update in metadata.
- Ensure data is kept up-to-date in accordance with the update frequency.
- Automate data publication and harvesting process.
Provide Granular Data
The value a dataset can deliver improves with the granularity of the dataset, as it allows detailed analysis and specific results. Providing granular data allows users to carry out detailed analysis and provide fine-grained services and applications based on the data.
For example data is generated every minute, and if the data gets published on a daily basis, the aggregation of data will cause loss of valuable information on the datasets.
A balance has to be struck between providing granular data and not breaching legal restrictions around personal, confidential or sensitive data. Open Data should never infringe upon a person’s right under data protection regulations.
It is recommended for data publishers to,
- Submit data in the most granular form.
- Ensure the provision of granular data does not breach legal restrictions.
- Ensure data that is available at different levels of granularity.
Provide Documentation Where Possible
It is important to include documentation of the datasets where necessary to eliminate the need for domain expertise needed to understand and use the published datasets. The documentation can be submitted as part of metadata which links to the documentation associated with the dataset, or the documentation can be submitted as a separate file together with the dataset when submitting to the open data portal.
Recommendations to the publishers
- Provide as much documentation for the dataset that a user will require to fully understand and use the data, alongside the dataset.
- Where possible, include the documentation in the dataset’s metadata.
- Provide reference data for each dataset, if applicable, e.g. data dictionary, standards used, etc
Ensure All URLs Are Functional
If your datasets contain links pointing to other datasets or resources, it is important to make sure the links are operational.
Recommendations for publishers
- Ensure all dataset URLs are operational.
- Update the URL on Open Data Portal if the source URL changes.
- Remove the datasets or links from the Open Data Portal if it is no longer available at source.
Support Non-Technical Users in Understanding Data
Providing specific support for non-technical users will improve the accessibility of the information and help a wide range of users to understand the data.
Having access to raw data is of huge value to users who wish to view, understand, analyze and reuse the information. However, being able to work with raw data requires a certain level of technical expertise. As a result, there can be a digital divide between those data-savvy users and non-technical users. In order to address this, it is good practice to provide specific support to non-technical users to help them understand the data. This can be achieved by providing easy-to-understand data previews or co-locating tools that can be used to view and interact with the data.