Dataset structured data refers to a body of structured information describing some topic(s) of interest. Dataset stuctured data’s purpose is to improve the discovery of your downloadable products.
Whatever that means. The above definition does seems ambiguous, especially if english is not your primary language like most people.
According to this data:
Out of the world’s approximately 7.5 billion inhabitants, 1.5 billion speak English — that’s 20% of the Earth’s population. However, most of those people aren’t native English speakers. Only about 360 million people speak English as their first language.
So, the definition is quite challenging to decipher. Anyway, this is what it means in layman’s term. Dataset structured data refers to an information that Google search results display pointing to a downloadable file. These downloadadble files are usually zipped files in the following formats: pdf, csv, text, etc. Because these files contain information of topics of interest, like sciences, google puts it in a category that makes it easy for anyone to find.
Examples of what can qualify as a dataset according to google developers website:
- A table or a CSV file with some data
- Organized collection of tables
- File in a proprietary format that contains data
- Collection of files that together constitute some meaningful dataset
- A structured object with data in some other format that you might want to load into a special tool for processing
- Images capturing data
- Files relating to machine learning, such as trained parameters or neural network structure definitions
- Anything that looks like a dataset to you
So, basically any topic of interest.
How to make your dataset easier to find in search results?
Dataset are easier to find when you provide supporting information such as their name, description, creator and distribution formats. Google’s approach to dataset discovery makes use of schema.org and other metadata standards. These information can be added to pages that describe Dataset structured data.
Here is what they say,
The purpose of this markup is to improve discovery of dataset. Information that comes from fields such as life sciences, social sciences, machine learning, civic and government data, and more.
Best Practices in Implementing Dataset structure data.
1. Using website sitemap file to help Google find your URLs.
Using sitemap files and sameAs markup helps document how dataset descriptions are published throughout your site.
What is sameAs? Schema.org has this to say:
sameAs are URL of a reference Web page that unambiguously indicates the item’s identity. E.g. the URL of the item’s Wikipedia page, Wikidata entry, or official website.
Simply the page describing the file being referenced as dataset.
If you have a dataset repository, you likely have at least two types of pages. One is the canonical (“landing”) pages for each dataset, and the other is the pages that list multiple datasets. Or sometimes a subset of datasets. We recommend that you add structured data about a dataset to the canonical pages. Use sameAs property to link to the canonical page if you add structured data to multiple copies of datasets.
Note: Google doesn’t need every mention of the same dataset to be explicitly marked up, but if you do so for other reasons, we strongly encourage the use of sameAs.
2. Use resource and author in your Dataset structured data.
It is common for open datasets to be republished, aggregated, and to be based on other datasets. A dataset within a dataset.
Steps to implement:
A. Use the sameAs property to indicate the most canonical URLs for the original in cases when the dataset. Check how to do sameAs above. Screenshot of code implementing dataset.
B. Use the “isBasedOn” property in cases where the republished dataset (including its metadata) has been changed significantly.
C. When a dataset derives from or aggregates several originals, use the isBasedOn property.
D. Use the identifier property to attach any relevant Digital Object identifiers (DOIs) or Compact Identifiers.
3. Limit your text content to what Google set as max
Google recommends limiting all text contents to 5000 characters or less for Dataset. Google Dataset Search only uses the first 5000 characters of any textual property. Names and titles are typically a few words or a short sentence.
Required properties for Dataset structured data:
Include the required properties “description” and “name” for your content to be eligible for display as a rich result. You can also include the recommended properties to add more information about your content. This could provide a better user experience.
Adding alternateName to the information also helps Google find your Dataset content in the search. Check the code below:
“creator” or author of the Dataset structured data also improves the Google search results display. Because it uniquely identify individuals, so that the information is the same as “Person” schema. Check the code below:
You may also supply the following to improve your Dataset search engine results:
keywords summarizing the dataset structured data.
the license(if there is one),
citation, any awards or certification garnered by the “Person” or “Organization” publishing the Dataset.
specialCoverage, if this is only available in a certain area or coverage.
and lastly, the version and the URL to the content itself in the web.
How is Dataset structured data published?
As mentioned above, Datasets are commonly published within and among many other Datasets. Because the same dataset can be included in more than one such repository, you can refer to a data catalog that this dataset belongs, so as to by referenced it directly. Check the full definition of DataCatalog at schema.org.
Full Dataset structured data can be found at https://schema.org/Dataset.
Reference for this article can be found here: DataSet, like so.
The Google Webmaster Central Help Forum for Structured Data also provides a pretty good resource. It is a forum where you can ask and answer questions about structured data.
Improve search results of your downloadable products by using Dataset Structured Data. Your content will be easily searchable by Google search engine in searches.
Let us know if this is helpful in the comment section. Give the article a thumbs up and a star.