# [Volute] r4071 - trunk/projects/dm/provenance/description

Volute commit messages volutecommits at g-vo.org
Fri May 12 00:23:50 CEST 2017

Author: kriebe
Date: Fri May 12 00:23:50 2017
New Revision: 4071

Log:

Modified:
trunk/projects/dm/provenance/description/ProvenanceDM.pdf
trunk/projects/dm/provenance/description/ProvenanceDM.tex
trunk/projects/dm/provenance/description/datamodel-description.tex
trunk/projects/dm/provenance/description/provaccess.tex

Modified: trunk/projects/dm/provenance/description/ProvenanceDM.pdf
==============================================================================
Binary file (source and/or target). No diff available.

Modified: trunk/projects/dm/provenance/description/ProvenanceDM.tex
==============================================================================
--- trunk/projects/dm/provenance/description/ProvenanceDM.tex	Thu May 11 20:51:08 2017	(r4070)
+++ trunk/projects/dm/provenance/description/ProvenanceDM.tex	Fri May 12 00:23:50 2017	(r4071)
@@ -139,12 +139,18 @@
\input{datamodel-description}

+% make sure that images/tables of the previous section are printed
+% before starting the new section
+\clearpage
%\section{Applying provenance -- Interactions with other Data models}\label{sec:dmlinks}

+% make sure that images/tables of the previous section are printed
+% before starting the new section
+\clearpage
\section{Accessing provenance information}
\input{provaccess}

@@ -165,6 +171,8 @@
% Use itemize environments.
\subsection{Changes from WD-ProvenanceDM-1.0-20161121}
\begin{itemize}
+\item Moved the figure showing relations between Provenance.Agent and Dataset.Party into Section~\ref{sec:dmlinks}.
+\item Added implementation and serialisation examples to Section~\ref{sec:serialisations}.
\item Use voprov:type and voprov:role in table with example agent roles, \ref{tab:agent-roles}, i.e. replaced prov:person by Individual and prov:organization by Organization.
\item Removed the obscore/dataset attributes from EntityDescription, since they are specific for observations only and are not applicable to configuration entities etc.
\item Renamed \emph{label} attribute to \emph{name} everywhere, for more consistency with SimDM naming scheme (\emph{label} is reserved there for SKOS labels).

Modified: trunk/projects/dm/provenance/description/datamodel-description.tex
==============================================================================
--- trunk/projects/dm/provenance/description/datamodel-description.tex	Thu May 11 20:51:08 2017	(r4070)
+++ trunk/projects/dm/provenance/description/datamodel-description.tex	Fri May 12 00:23:50 2017	(r4071)
@@ -641,10 +641,10 @@
The agent can be a single person, a group of persons (e.g. MUSE WISE Team), a
project (RAVE) or an institute.
This is also reflected in the IVOA Dataset Metadata Model, where \class{Party}
-represents an agent, and it has two subtypes: \class{Individual} and \class{Organization},
-which are explained in more detail in Table \ref{tab:agent-types}.
-Both types are also used for agent types in the W3C Provenance Data Model, though
-\class{Individual} is called \class{Person} there.
+represents an agent, and it has two types: \class{Individual} and \class{Organization},
+which are explained in more detail in Table \ref{tab:agent-types} (also see Section~\ref{sec:dmlinks} for comparison between \class{Agent} and \class{Party}).
+Both agent types are also used in the W3C Provenance Data Model, though
+\class{Individual} is called \class{Person} there.
We do not include the type \class{SoftwareAgent} from W3C, since it is not required for
our use cases.

@@ -670,18 +670,6 @@
\end{center}
\end{table}

-A definition of organizations is given in the
-IVOA Recommendation on Resource Metadata \citep{std:ResourceMeta}, hereafter
-refered to as RM: An organisation is [a] specific type of resource that
-brings people together to pursue participation in VO applications.''
-It also specifies further that scientific projects can be considered
-as organisations on a finer level:
-At a high level, an organisation could be a university, observatory, or government
-agency. At a finer level, it could be a specific scientific project, space mission,
-or individual researcher. A provider is an organisation that makes data and/or services
-available to users over the network.''
-
-
\begin{table}[h]
\small
\tymax  0.5\textwidth
@@ -695,12 +683,12 @@
\textbf{name} & prov:name & string & a common name for this agent; e.g. first name and last name; project name, agency name...\\
type & prov:type & string & type of the agent: either Individual (Person) or Organization\\
% insert here the attributes dedicated to contact for a Party in DataSet Metadata DM.
-\hline
-\hline
-address &  & string & Address of the agent both for Individual (Person) and Organization\\
-phone &  & string & Contact phone number of the agent both for Individual (Person) and Organization\\
-email & & string & Contact email of the agent both for Individual (Person) and Organization\\
+% \hline
+% \multicolumn{4}{l}{Additional optional attributes from Dataset.Party subclasses:}\\
+% \hline
+% address &  & string & Address of the agent both for Individual (Person) and Organization\\
+% phone   &  & string & Contact phone number of the agent both for Individual (Person) and Organization\\
+% email   &  & string & Contact email of the agent both for Individual (Person) and Organization\\
\bottomrule
\end{tabulary}
\caption{Agent attributes}
@@ -709,7 +697,22 @@
\end{table}

-For each agent a \emph{name} should be specified, a summary of the main common attributes for Agents is given in Table~\ref{tab:agent-attributes}.
+
+A definition of organizations is given in the
+IVOA Recommendation on Resource Metadata \citep{std:ResourceMeta}, hereafter
+refered to as RM: An organisation is [a] specific type of resource that
+brings people together to pursue participation in VO applications.''
+It also specifies further that scientific projects can be considered
+as organisations on a finer level:
+At a high level, an organisation could be a university, observatory, or government
+agency. At a finer level, it could be a specific scientific project, space mission,
+or individual researcher. A provider is an organisation that makes data and/or services
+available to users over the network.''
+
+
+
+For each agent a \emph{name} should be specified, a summary of the attributes for \class{Agent} is given in Table~\ref{tab:agent-attributes}.
+One could also add the optional attributes \emph{address}, \emph{phone} and \emph{email} (compare with subclasses of \emph{Party} in Section~\ref{sec:dmlinks}). However, we skip them here in this main class, since an advanced system may use permanent identifiers (e.g. ORCIDs) to identify agents and retrieve their properties from an external system.
It would also increase the value of the given
information if the (current) affiliation of the agent (and a project leader/group
leader) were specified in order to maximize the chance of finding any contact
@@ -718,25 +721,15 @@
but also in order
to know who was involved and to fulfill our Attribution'' requirement
(Section~\ref{sec:requirements}), so that proper credits are given to the right
-people/projects.
-
-\begin{figure}[h]
-\centering
-\includegraphics[scale=0.7]{../datamodel-diagrams/images/agent-relations.pdf}
-\caption{The relations between the \class{Agent} class within the Provenance Data Model
-(grey and yellow classes) with classes from the Dataset Metadata Model (green).}
-\label{fig:agent-relations}
-\end{figure}
+people/projects.

-The relations between \class{Agent} and other classes from the Provenance Data Model and
-the IVOA Dataset Metadata Model are detailed in Figure \ref{fig:agent-relations}. In DatasetDM, the \class{Party} class corresponds to our Agent class. The main difference is that Individual and Person are subclasses in DatasetDM, whereas we just use the same class \emph{Agent} for both and distinguish between them using the \emph{Agent.type} attribute (which can have the value Individual'' or Organization''.

It is desired to have at least one agent given for each activity (and entity), but it
is not enforced.
% , hence the multiplicity between \class{Entity}/\class{Activity} and the relations
%to the \class{Agent} starts with 0.
-There also can be more than one agent for each activity/entity with different \emph{roles}
+There can also be more than one agent for each activity/entity with different \emph{roles}
and one agent can be responsible for more than one activity or entity. This
many-to-many relationship is made explicit in our model by adding the two
following relation classes:
@@ -756,8 +749,8 @@
In order to make it clearer what an agent is useful for, we suggest the
possible roles an agent can have (along with descriptions partially taken from RM)
in Table~\ref{tab:agent-roles}.
-For comparison, SimDM contains following roles for their \emph{Contact} class:
-owner, creator, publisher and contributor.
+For comparison, SimDM contains following roles for their contacts:
+owner, creator, publisher and contributor. Note that the \emph{Party} class in Dataset and SimDM are very similar to the \emph{Agent} class, which is explained in more detail in Section~\ref{sec:dmlinks}.

\begin{table}[h]
@@ -774,7 +767,6 @@
editor & Individual & editor of e.g. an article, before publishing\\
creator & Individual & someone who created a dataset, creators of articles or software are rather called author''\\
curator & Individual & someone who checked and corrected a dataset before publishing\\
- & voprov:Organization & \\
publisher & Organization {(maybe also Individual?)}& organization (publishing house, institute) that published something\\
observer & Individual & observer at the telescope\\
operator & Individual & someone performing a given task \\ % removed executor: ambiguous

==============================================================================
--- trunk/projects/dm/provenance/description/datamodel-links.tex	Thu May 11 20:51:08 2017	(r4070)
+++ trunk/projects/dm/provenance/description/datamodel-links.tex	Fri May 12 00:23:50 2017	(r4071)
@@ -3,12 +3,11 @@
%(e.g. DatasetDM, SpectralDM (share some same classes), SimDM)
%and how provenance information can be stored.

-The Provenance Data Model can be applied without making links to any other
+The Provenance Data Model can be applied without making any links to other
IVOA data model classes. For example when the data is not yet published, provenance information
can be stored already, but a DatasetDM-description for the data may not yet exist.
But if there are data models implemented for the datasets, then it is
-very useful to connect the classes and attributes of the different models,
-which we are going to discuss in this Section. These links help to avoid
+very useful to connect the classes and attributes of the other data models with Provenance classes and attributes (if applicable), which we are going to discuss in this Section. These links help to avoid
unnecessary repetitions in the metadata of datsets, and also offer the possibility
to derive some basic provenance information from existing data model classes automatically.

@@ -36,7 +35,7 @@
\toprule
\midrule
-DataID.title      	 & Entity.name                & title of the dataset\\
+DataID.title         & Entity.name                & title of the dataset\\
DataID.collection    & HadMember.collectionId     & link to the collection to which the dataset belongs\\
DataID.creator       & Agent.name                 & name of agent\\
DataID.creatorDID    & alternative to Entity.id   & id for the dataset given by the creator, could be used if no PublisherDID exists (yet)\\
@@ -49,7 +48,7 @@
Curation.Version       & Entity.version           & version of the dataset\\
Curation.Rights        & Entity.rights            & access rights to the dataset; one of [...]\\
-Curation.Contact       & Agent.Id or name? & link to Agent with role contact\\
+Curation.Contact       & Agent                    & link to Agent with role contact\\
DataProductType  & EntityDescription.dataproduct\_type & type of a dataproduct/entity\\
DataProductSubType & EntityDescription.dataproduct\_subtype & subtype of a dataproduct/entity\\
ObsDataset.calibLevel  & EntityDescription.level & (output) calibration level, integer between 0 and 3\\\hline
@@ -60,8 +59,23 @@
\end{table}

+
+\begin{figure}[h]
+\centering
+\includegraphics[width=\textwidth]{../datamodel-diagrams/images/agent-relations.pdf}
+\caption{The relations between the \class{Agent} class within the Provenance Data Model
+(grey and yellow classes) with classes from the Dataset Metadata Model, party package (green).}
+\label{fig:agent-relations}
+\end{figure}
+
The \class{Agent} class, which is used for defining responsible persons and
-organizations, is similar to the \class{Party} class in the Dataset Metadata Model and SimDM.
+organizations in ProvenanceDM, is very similar to the \class{Party} class in the Dataset Metadata Model (and in SimDM). Its details are depicted in Figure~\ref{fig:agent-relations}.
+The main difference between \class{Agent} and \class{Party} is that \class{Individual} and \class{Person} are subclasses in DatasetDM, whereas we just use the same class \emph{Agent} for both and distinguish between them using the \emph{Agent.type} attribute (which can have the value Individual'' or Organization'').
+
+
+We imagine that services implementing both data models, \class{Dataset} and \class{ProvenanceDM} may use just \emph{one} class: either \class{Agent} or \class{Party}, enriched with all the necessary (project-specific) attributes. Note that for Provenance queries using a ProvTAP service and for W3C compatible serializations, the name \class{Agent} for the responsible individuals/organizations is required.
+
+

In SimDM one also encounters a normalization similar to our split-up of descriptions from

Modified: trunk/projects/dm/provenance/description/provaccess.tex
==============================================================================
--- trunk/projects/dm/provenance/description/provaccess.tex	Thu May 11 20:51:08 2017	(r4070)
+++ trunk/projects/dm/provenance/description/provaccess.tex	Fri May 12 00:23:50 2017	(r4071)
@@ -1,4 +1,4 @@
-\subsection{Provenance Data Model serialization}
+\subsection{Provenance Data Model serialization}\label{sec:serialisations}
There are two possible families of ProvenanceDM metadata serializations, examples for these can be found in the implementation section (\ref{sec:usecases-implementations}) and the links therein.
\begin{itemize}
\item W3C serializations: PROV\-N, PROV\-JSON, PROV\-XML. These are serializations of the W3C provenance data model. They allow the possibility to add additional IVOA or ad hoc attributes to the basic ones in each class. This way the IVOA models can produce W3C compliant serializations.
@@ -7,7 +7,7 @@
%\TODO{TAP SCHEMA of the ProvenanceDM datamodel: Maybe Mathieu can provide us with a copy of the TAP schema he designed ?}

\item Direct VOTABLE mapping by using some ad hoc mapping based on transcription of PROV-N format: this is called PROV-VOTABLE. Moreover in the future we could also define a VO-DML \citep{std:VODML} version of the mapping.
-%The following is an example of provenance metadata in this PROV-VOTABLE format. Objects become tables, their classes are rendered by a utype. Attributes and relationships become FIELDS or PARAMS. The model attribute names also become VOTABLE utypes.
+%The following is an example of provenance metadata in this PROV-VOTABLE format. Objects become tables, their classes are rendered by a utype. Attributes and relationships become FIELDS or PARAMS. The model attribute names also become VOTABLE utypes.

\end{itemize}

@@ -261,10 +261,10 @@

\end{verbatim}

-Such serializations can be retrieved through Access protocols (see \ref{AccessPro} ) or directly integrated in datasets headers or "associated metadata" in order to provide provenance metadata for these datasets.
+Such serializations can be retrieved through Access protocols (see \ref{sec:access_protocols} ) or directly integrated in datasets headers or associated metadata'' in order to provide provenance metadata for these datasets.

\subsection{Graphic formats}
-\label{Graphics}
+\label{sec:graphic_formats}
The voprov python module can also provide provenance information in graphic formats: PNG, SVG and PDF.
In the above example, you have to add the following instructions in your python program:

@@ -284,7 +284,7 @@

\subsection{Access protocols}
-\label{AccessPro}
+\label{sec:access_protocols}
We envision two possible access protocols:
\begin{itemize}
\item ProvDAL: retrieve provenance information based on given id of a data entity or activity