<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Governance - Anjana Data</title>
	<atom:link href="https://anjanadata.com/en/category/gobierno-del-dato/feed/" rel="self" type="application/rss+xml" />
	<link>https://anjanadata.com/en</link>
	<description>Data Governance &amp; Analytics</description>
	<lastbuilddate>Thu, 14 May 2026 07:50:32 +0000</lastbuilddate>
	<language>en-GB</language>
	<sy:updateperiod>
	hourly	</sy:updateperiod>
	<sy:updatefrequency>
	1	</sy:updatefrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://anjanadata.com/wp-content/uploads/2020/03/cropped-favicon_anjanadata-32x32.png</url>
	<title>Data Governance - Anjana Data</title>
	<link>https://anjanadata.com/en</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Govern your data in accordance with DCAT-AP... in record time</title>
		<link>https://anjanadata.com/en/health-dcat-ap-spaces-portals-data/</link>
					<comments>https://anjanadata.com/en/health-dcat-ap-spaces-portals-data/#respond</comments>
		
		<dc:creator><![CDATA[Lucía Engo]]></dc:creator>
		<pubdate>Fri, 11 Jul 2025 11:46:35 +0000</pubdate>
				<category><![CDATA[Casos de uso]]></category>
		<category><![CDATA[Gobierno del dato]]></category>
		<category><![CDATA[Sin categorizar]]></category>
		<category><![CDATA[Data Portals]]></category>
		<category><![CDATA[Data Spaces]]></category>
		<category><![CDATA[Espacios de datos]]></category>
		<category><![CDATA[gobierno del dato]]></category>
		<category><![CDATA[Health DCAT-AP]]></category>
		<category><![CDATA[Interoperabilidad]]></category>
		<category><![CDATA[Portales de datos]]></category>
		<guid ispermalink="false">https://anjanadata.com/?p=9087</guid>

					<description><![CDATA[Health DCAT-AP for the governance of data spaces and portals is now a reality thanks to the collaboration between Anjana Data and DQTeam. At a time when organisations need to turn their data strategies into concrete actions, this solution enables the governance of open catalogues and health data spaces in compliance with the latest standards.]]></description>
										<content:encoded><![CDATA[<p><strong>Health DCAT-AP for data space and portal governance</strong> is now a reality thanks to the collaboration between <a href="https://anjanadata.com/en/" target="_blank" rel="noreferrer noopener">Anjana Data</a> y <a href="https://dqteam.es/" target="_blank" rel="noreferrer noopener">DQTeam</a>. At a time when organisations need to convert their data strategies into concrete actions, this solution enables the governance of open catalogues and healthcare data spaces in accordance with the most demanding European standards.</p>



<p>We are working on configurations. <strong>out of the box</strong> for all extensions of the standard <strong>DCAT-AP</strong> —such as Health, Geospatial, Statistics, or Energy—with the aim of enabling any organisation to activate a complete solution without the need for programming. Thanks to the collaboration with <strong><a href="https://dqteam.es/" target="_blank" rel="noreferrer noopener">DQTeam</a></strong>, The first of these configurations is now available: <strong><a href="https://healthdcat-ap.github.io/" target="_blank" rel="noreferrer noopener">Health DCAT-AP</a></strong>, designed to manage open data catalogues and health data spaces in a simple, interoperable manner that complies with the European profile.</p>



<figure class="wp-block-image size-large is-style-default"><img decoding="async" src="https://anjanadata.com/wp-content/uploads/2025/07/Canva-Blog-Health-DCAT-AP-1024x512.png" alt="Visual with logos of Anjana Data, DQTeam, DCAT-AP, EHDS, and the European Commission, highlighting the focus on DATA SPACES and DATA PORTALS in accordance with European standards." class="wp-image-9091"/></figure>



<h2 class="wp-block-heading"><strong>Strategic alliance with our partners. Objective: from standard to action.</strong></h2>



<p>Thanks to the joint work with <strong>DQTeam</strong>, We have defined and parameterised a solution in Anjana Data that allows our clients to:</p>



<ul class="wp-block-list"><li>To start up a <strong>100% governance aligned with standards</strong> <strong>DCAT-AP</strong> and its healthcare extension <strong>Health DCAT-AP</strong><strong><br></strong></li><li>Configure the <strong>official roles defined by Health DCAT-AP</strong> (publisher, creator, rights holder, contact point, etc.)<br></li><li>Activate <strong>complete metadata templates</strong> for datasets, distributions, and catalogues<br></li></ul>



<p>All this under a technical and functional nomenclature that is fully aligned with the standard, ensuring <strong>full interoperability</strong> between European institutions, regions and platforms.</p>



<h2 class="wp-block-heading"><strong>Ready to deploy: no code, no friction</strong></h2>



<p>The solution features a <strong>out-of-the-box configuration kit </strong>which allows:</p>



<ul class="wp-block-list"><li>Implementations in record time</li><li><strong>No programming required</strong></li><li>Customisable templates to adapt metadata to the specific needs of each organisation</li></ul>



<p>Anjana Data allows you to start with a basic configuration and evolve according to your data governance needs.</p>



<h2 class="wp-block-heading"><strong>True interoperability: open APIs + metadata federation</strong></h2>



<p>Anjana Data has been designed with an architecture <strong>100% interoperable</strong>, which allows:</p>



<ul class="wp-block-list"><li>Share and federate metadata between different instances or versions of the <strong>Health DCAT-AP</strong><strong><br></strong></li><li>Bringing the logic of transformation where it needs to be: <strong>plugins and processes that consume APIs</strong><strong><br></strong></li><li>Seamlessly integrate with external portals, public catalogues, and sector-specific data spaces.<br></li></ul>



<p>All thanks to your <strong>REST API documented with Swagger</strong> and to their <strong>plugin development kits</strong>, designed for heterogeneous environments.</p>



<h2 class="wp-block-heading"><strong>An interface to govern your open data portals</strong></h2>



<p>Beyond the technical back-end, Anjana Data offers a <strong>simple, intuitive interface geared towards business users</strong>, where possible:</p>



<ul class="wp-block-list"><li>Inventory datasets, distributions, and catalogues<br></li><li>Manage open data portals, sectoral data spaces, or federated networks.<br></li><li>Locate and apply filters to find and access actual distributions</li></ul>



<h2 class="wp-block-heading"><strong>Would you like to see it in action?</strong></h2>



<p>Here you go. <strong>practical demonstration</strong> how the Health DCAT-AP-based solution is configured and used within Anjana Data, with a comprehensive experience of searching, filtering, exploring metadata, and downloading data from an interoperable dataset:</p>



<p>📺 <strong><a href="https://www.youtube.com/watch?v=9pDZKaHZ420" target="_blank" rel="noreferrer noopener">Watch video on YouTube – Health DCAT‑AP demo with DQTeam and Anjana Data</a></strong></p>



<h2 class="wp-block-heading"><strong>Ready to govern your data with European standards?</strong></h2>



<p>If your organisation wishes to <strong>align your data strategy with DCAT-AP, DAMA, UNE or the Data Act</strong>, Now you can do it immediately and without development, with a modular, extensible and 100% interoperable approach.</p>



<p>👉 <strong>Get in touch with us</strong> to learn how to activate this setting in your environment or try a customised demo.</p>]]></content:encoded>
					
					<wfw:commentrss>https://anjanadata.com/en/health-dcat-ap-spaces-portals-data/feed/</wfw:commentrss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Mexican Federal Telecommunications Institute and Anjana Data: How to build a Knowledge Marketplace in just 10 weeks and start generating value from day one</title>
		<link>https://anjanadata.com/en/knowledge-marketplace-ift/</link>
					<comments>https://anjanadata.com/en/knowledge-marketplace-ift/#respond</comments>
		
		<dc:creator><![CDATA[Lucía Engo]]></dc:creator>
		<pubdate>Thu, 29 May 2025 11:19:44 +0000</pubdate>
				<category><![CDATA[Casos de uso]]></category>
		<category><![CDATA[Gobierno del dato]]></category>
		<category><![CDATA[Knowledge Marketplace]]></category>
		<category><![CDATA[Administración Pública]]></category>
		<category><![CDATA[Data Marketplace]]></category>
		<category><![CDATA[gobierno del dato]]></category>
		<guid ispermalink="false">https://anjanadata.com/?p=8933</guid>

					<description><![CDATA[The Knowledge Marketplace at the IFT is already a tangible and transformative reality. In a context where data is growing out of control and strategic decisions depend on reliable information, the Mexican Federal Telecommunications Institute has managed to structure its internal knowledge, making it accessible, governed and value-oriented. Thanks to Anjana Data, the Federal [...]]]></description>
										<content:encoded><![CDATA[<figure class="wp-block-image size-large"><img decoding="async" src="https://anjanadata.com/wp-content/uploads/2025/05/IFT-Knowledge-Marketplace-1024x512.png" alt="Discover how IFT implemented its Knowledge Marketplace with Anjana Data in just 10 weeks." class="wp-image-9041"/></figure>



<p>The <strong>Knowledge Marketplace at the IFT</strong> is now a tangible and transformative reality. In a context where data is growing uncontrollably and strategic decisions depend on reliable information, the <a href="https://www.ift.org.mx/" target="_blank" rel="noreferrer noopener">Federal Telecommunications Institute of Mexico</a> has managed to structure its internal knowledge, making it accessible, governed and value-oriented.</p>



<p>Thanks to <strong>Anjana Data</strong>, the IFT has gone from having no governed assets to creating, in just 10 weeks, a <strong>Knowledge Marketplace</strong> Complete. An environment where data is converted into actionable knowledge, available to all profiles and aligned with the organisation's strategic vision.</p>



<p>This milestone not only marks the beginning of a new culture <em>data-driven</em>, but rather demonstrates how a public institution can generate real and early impact through a modern Data Governance ecosystem focused on <strong>democratisation of business knowledge</strong>.</p>



<h2 class="wp-block-heading">From zero to <em>Knowledge Marketplace</em>: IFT takes an exponential leap forward with Anjana Data</h2>



<p>The IFT did not start from a mature data governance base. It started from scratch: without a centralised catalogue, without defined management flows, without a semantic structure or standardised access mechanisms. The usual approach would be to start slowly, with a basic technical catalogue or an initial Data Marketplace. But the IFT opted for <strong>take an ambitious strategic leap</strong>: build a genuine <strong>Knowledge Marketplace</strong>.</p>



<p>Thanks to <strong>Anjana Data</strong>, That leap has been possible and successful:</p>



<ul class="wp-block-list"><li>A <strong>smart and governed catalogue</strong> with over 130 data and information assets in just 10 weeks.</li><li>A <strong>living business glossary</strong>, which links organisational concepts with technical assets to facilitate cross-functional understanding.</li><li>They were enabled. <strong>governance workflows</strong>, ensuring compliance, traceability, versioning, and auditing.</li><li>A <strong>eCommerce experience</strong>, where any user can find, understand, and request the knowledge they need.</li></ul>



<figure class="wp-block-image size-large"><img decoding="async" src="https://anjanadata.com/wp-content/uploads/2025/05/Lanbide-Fuentes-de-datos-1024x576.png" alt="" class="wp-image-8936"/></figure>



<p>The result: an environment where data becomes useful knowledge, with context, governance and purpose, ready to be <strong>driven by all areas of the organisation</strong>.</p>



<p>➡️ <a href="https://anjanadata.com/en/from-data-marketplace-to-knowledge-marketplace-the-natural-evolution-of-enterprise-knowledge-management/" target="_blank" rel="noreferrer noopener">Would you like to know more about the Knowledge Marketplace concept? Here we explain it in detail.</a></p>



<h2 class="wp-block-heading">Flawless execution, measurable results</h2>



<p>This ambitious project was rolled out quickly and efficiently thanks to three key factors:</p>



<ul class="wp-block-list"><li><strong>Flexible and scalable technology</strong> (Anjana Data) which was adapted to the organisational and technical requirements of the IFT.</li><li><strong>An expert implementation team</strong> (<a href="https://www.managementsolutions.com/es" target="_blank" rel="noreferrer noopener">Management Solutions</a>), with in-depth knowledge of the platform and best practices in data governance.</li><li><strong>Agile and empowered project management</strong> on behalf of the IFT, with a key Project Manager to remove obstacles and speed up decisions.</li></ul>



<p></p>



<p>The achievements speak for themselves:</p>



<ul class="wp-block-list"><li>+130 governed assets</li><li>Cases of strategic value activated at the close of the project</li><li>Governance implemented at all levels: technical, semantic, operational, and organisational</li><li>Active participation from all key areas: security, networks, business, and IT</li></ul>



<h2 class="wp-block-heading">The impact: much more than well-managed data</h2>



<p>The IFT has not only structured its information assets: it has also implemented <strong>a new culture based on data and shared knowledge</strong>, that:</p>



<ul class="wp-block-list"><li>Promotes <strong>data sovereignty</strong> without restricting access.</li><li>Reduces the barrier to entry to information.</li><li>Democratise the use of knowledge throughout the organisation.</li><li>Ensures regulatory compliance and traceability.</li></ul>



<p>This project is real proof that <strong>Data governance does not have to be slow or complex.</strong>. With the right vision and the right tools, you can <strong>go from zero to strategic in a matter of weeks</strong>.</p>



<h2 class="wp-block-heading">Would you like to know all the details?</h2>



<p><strong><a href="https://anjanadata.com/en/resources-2/descargar-caso-exito-knowledge-marketplace-ift/" target="_blank" rel="noreferrer noopener">Download the full success story here</a></strong> and discover how IFT has managed to transform its data ecosystem into a Knowledge Marketplace with Anjana Data.</p>



<p>📩 Would you like to take this leap forward in your organisation?<br>Contact us at <a>info@anjanadata.com</a>, We will be delighted to help you design your path towards modern, agile data governance that generates value.</p>]]></content:encoded>
					
					<wfw:commentrss>https://anjanadata.com/en/knowledge-marketplace-ift/feed/</wfw:commentrss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>Anjana Data available on Azure Marketplace and Microsoft App Source</title>
		<link>https://anjanadata.com/en/anjana-data-available-in-azure-marketplace-and-microsoft-app-source-2/</link>
					<comments>https://anjanadata.com/en/anjana-data-available-in-azure-marketplace-and-microsoft-app-source-2/#respond</comments>
		
		<dc:creator><![CDATA[Angela Miñana Francés]]></dc:creator>
		<pubdate>Wed, 28 Apr 2021 14:10:16 +0000</pubdate>
				<category><![CDATA[Actualidad]]></category>
		<category><![CDATA[Gobierno del dato]]></category>
		<category><![CDATA[Sin categorizar]]></category>
		<guid ispermalink="false">https://anjanadata.com/?p=4529</guid>

					<description><![CDATA[From Anjana Data we are pleased to announce that we continue to grow and meet all the expectations we had planned. More than a year ago we signed a collaboration agreement with Microsoft, and after a lot of effort we are pleased to announce that our Data Governance solution, designed to help organisations [...]]]></description>
										<content:encoded><![CDATA[<p><img fetchpriority="high" decoding="async" class="alignnone wp-image-4533 size-full" src="https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57.png" alt="" width="2880" height="1132" srcset="https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57.png 2880w, https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57-300x118.png 300w, https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57-1024x402.png 1024w, https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57-768x302.png 768w, https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57-1536x604.png 1536w, https://anjanadata.com/wp-content/uploads/2021/04/Captura-de-pantalla-2021-04-28-a-las-15.47.57-2048x805.png 2048w" sizes="(max-width: 2880px) 100vw, 2880px" /></p>
<p><span style="font-weight: 400;">From <span style="color: #ff9900;"><a style="color: #ff9900;" href="https://anjanadata.com/en/">Anjana Data</a> </span>We are pleased to announce that we continue to grow and meet all of our expectations. Over a year ago, we signed a partnership agreement with Microsoft, and after much effort, we are pleased to announce that <strong>our solution </strong></span><span style="font-weight: 400;"><strong>for the Data Government</strong>, designed to assist organisations in implementing their data strategy in the era of Big Data, Multi-Cloud and Data-Driven.</span><span style="font-weight: 400;"><strong> now available</strong> <strong>in the</strong></span><strong><span style="color: #ff9900;"><a style="color: #ff9900;" href="https://azuremarketplace.microsoft.com/es-es/marketplace/apps/anjanadatasl1583402861145.anjanadata?tab=Overview"> Azure Marketplace </a></span>and in <span style="color: #ff9900;"><a style="color: #ff9900;" href="https://appsource.microsoft.com/en-us/product/web-apps/anjanadatasl1583402861145.anjanadata">Microsoft App Source</a></span></strong><span style="font-weight: 400;">.</span><span style="font-weight: 400;"> </span></p>
<p><span style="font-weight: 400;">However, <strong>What are Microsoft's Azure Marketplace and App Source?</strong> Well, both are online stores containing thousands of IT software applications and services created by technology providers. Both the Marketplace and App Source, <strong>Azure users have the opportunity to </strong></span><span style="font-weight: 400;"><strong>find, test and purchase certified solutions tailored to meet the needs of organisations</strong>. For Anjana Data, appearing in these ecosystems means being able to directly offer our distinctive, innovative, and disruptive Data Governance solution to any organisation in the world working on Azure.</span></p>
<p><span style="font-weight: 400;">From a technical standpoint, Anjana Data's added value for organisations using Azure and its native cloud technologies as a data platform consists of offering a </span><span style="color: #ff9900;"><a style="color: #ff9900;" href="https://anjanadata.com/en/solucion/why-anjana/"><span style="font-weight: 400;">solution</span></a></span><span style="font-weight: 400;"> from <strong>Data Governance </strong></span><i><span style="font-weight: 400;">enterprise-ready</span></i><span style="font-weight: 400;">, with state-of-the-art architecture, oriented towards microservices, modular and robust, which enables high availability and load balancing in a native and transparent way, compatible with a multitude of deployment types and automation, among which the following are noteworthy: <strong>Azure Kubernetes Service</strong> y <strong>Azure Resource Manager</strong>, while additionally maintaining compatibility with deployments based on binary distribution.</span><span style="font-weight: 400;">.</span></p>
<p><span style="font-weight: 400;">And from a more functional point of view, in addition to all the benefits offered by Anjana Data, for Azure it incorporates a </span><span style="color: #ff9900;"><a style="color: #ff9900;" href="https://anjanadata.com/en/architecture-and-integrations/%29/"><span style="font-weight: 400;">extended native integration</span></a></span><span style="font-weight: 400;"> on the main technologies of </span><span style="font-weight: 400;">storage, processing and exploitation of Azure data (Data Factory, Data Lake Storage, Blob Storage, Azure SQL, Databricks, PowerBI, etc.) as well as Azure AD. All of this enables the organisation to implement proactive and preventive data governance over its data ecosystem, thanks to which advanced data governance use cases can be built, such as the creation of a “Data Marketplace” or the achievement of “Governed Data Self-Service”.</span></p>
<p><span style="font-weight: 400;">Additionally, at Anjana Data, we have included a series of integrations in our Roadmap on </span><span style="color: #ff9900;"><a style="color: #ff9900;" href="https://azuremarketplace.microsoft.com/es-es/marketplace/apps/Microsoft.AzurePurviewGalleryPackage?tab=Overview"><span style="font-weight: 400;">Azure Purview</span></a></span><span style="font-weight: 400;">, Therefore, our solution is also the perfect complement for organisations that are considering incorporating Azure Purview as the Data Catalogue for their Azure platform or that work in a hybrid or multi-cloud environment and need a more holistic and cross-cutting view of their data ecosystem, along with a series of advanced functionalities and value-added features for implementing effective and efficient data governance throughout the organisation.</span></p>
<p><span style="font-weight: 400;">Finally, the incorporation of the Anjana Data solution into this online space represents a further step in the strategic collaboration with </span><span style="color: #ff9900;"><a style="color: #ff9900;" href="https://anjanadata.com/en/partner/microsoft/"><span style="font-weight: 400;">Microsoft</span></a></span><span style="font-weight: 400;">, which began with the company's inclusion in the exclusive global programme. </span><strong><i>Microsoft for Startups </i></strong><span style="font-weight: 400;">more than a year ago and which has recently been renewed for another year. From the outset, <strong>the importance of our collaboration</strong> was based on <strong>define an Anjana Data integration roadmap</strong> with Azure's native cloud technologies, establish a <strong>joint vision of proactive and preventive data governance</strong> for organisations that want to go one step further in their strategy </span><i><span style="font-weight: 400;">data-driven</span></i><span style="font-weight: 400;"> y <strong>have access to a range of Azure services</strong>. Furthermore, as a Microsoft partner, we will also have the opportunity to market our solution jointly.</span></p>]]></content:encoded>
					
					<wfw:commentrss>https://anjanadata.com/en/anjana-data-available-in-azure-marketplace-and-microsoft-app-source-2/feed/</wfw:commentrss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>How to obtain a complete data lineage of Spark processes thanks to Anjana Data</title>
		<link>https://anjanadata.com/en/how-to-obtain-a-complete-data-lineage-of-spark-processes-thanks-to-anjana-data/</link>
					<comments>https://anjanadata.com/en/how-to-obtain-a-complete-data-lineage-of-spark-processes-thanks-to-anjana-data/#respond</comments>
		
		<dc:creator><![CDATA[Juan Sobrino]]></dc:creator>
		<pubdate>Fri, 04 Sep 2020 09:24:08 +0000</pubdate>
				<category><![CDATA[Gobierno del dato]]></category>
		<category><![CDATA[gobierno del dato]]></category>
		<category><![CDATA[linaje del dato]]></category>
		<guid ispermalink="false">https://anjanadata.com/?p=3484</guid>

					<description><![CDATA[Knowing the lineage of data is very important from a data governance point of view: Knowing how information flows throughout the organisation Understanding the data value chain and its processes Knowing the lifecycle of data assets and how [...]]]></description>
										<content:encoded><![CDATA[<h3><img decoding="async" class="aligncenter wp-image-3492 size-full" src="https://anjanadata.com/wp-content/uploads/2020/09/linaje-de-datos-1.png" alt="" width="1024" height="512" srcset="https://anjanadata.com/wp-content/uploads/2020/09/linaje-de-datos-1.png 1024w, https://anjanadata.com/wp-content/uploads/2020/09/linaje-de-datos-1-300x150.png 300w, https://anjanadata.com/wp-content/uploads/2020/09/linaje-de-datos-1-768x384.png 768w" sizes="(max-width: 1024px) 100vw, 1024px" /></h3>
<p>&nbsp;</p>
<p><span style="font-weight: 400;">Knowing the lineage of data is very important from a data governance perspective in order to:</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Knowing how information flows throughout the organisation</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Understanding the data value chain and its processes</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Understanding the lifecycle of data assets and how they are being generated</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Visualise dependencies between data assets and processes to manage the potential impacts generated by changes and modifications.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Facilitate the search for process errors, quality issues, service degradation, etc.</span></li>
</ul>
<p><span style="font-weight: 400;">The <strong>data lineage</strong> It comes in many forms, and perhaps one of the best known is the technical lineage. That is, knowing, from a technical point of view, how data moves from one place to another through the processes that are executed on it, either automatically or manually.</span></p>
<p><span style="font-weight: 400;">However, it should be noted that, even though this is technical lineage, this information must be useful for data governance. Thus, for lineage information to be valuable, it must be interpretable, and in the vast majority of cases, a string of activity logs spewed out by a machine is of little use. That is why it is most common for technical lineage to have to be captured, interpreted, and translated in order to be of any use.</span></p>
<h3><b>Obtaining the lineage</b></h3>
<p><span style="font-weight: 400;">In this context, in order to obtain the technical lineage of the processes that move data from one site to another, we can rely on several techniques:</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Extract lineage through the identification or inference of relationships between objects, which are declared in the form of metadata: This is typically the case with ETLs that operate with parameters and define all the processes to be executed by metadata.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Extracting lineage through source code parsing: This is no easy task, and its complexity depends greatly on both the programming language (SQL is not the same as Java) and the programmer who wrote the code (depending on the functions, methods, or variables used). Given that we are entering murky waters here and could encounter anything, the only certainty is that the guarantee that can be offered in these situations is usually very low and, in the vast majority of cases, a significant gap must be assumed.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">Extracting lineage from the recovery and interpretation of audit logs: This is what we at Anjana Data call dynamic lineage, and in many cases it is the only way to obtain the most complete trace possible, even though you will only be able to capture what is executed. To implement this, you need to carry out native integration with data platforms and technologies to understand how logs work, know where and how to retrieve them, and finally be able to interpret the captured information, which usually has to be translated to be valuable.</span></li>
</ul>
<p><span style="font-weight: 400;">Furthermore, given the variability of languages, platforms, technologies, etc., the spectrum we encounter becomes too broad to cover completely. That is why, when we talk about lineage, we usually have to seek compromises and apply Pareto's Law, especially in certain specific scenarios.</span></p>
<h3><b>Spark and the data lineage</b></h3>
<p><span style="font-weight: 400;">As we have already seen, not all data processing technologies make it easy for us to obtain the internal lineage of their processes, and among all of them, Spark is one of the most complex.</span></p>
<p><span style="font-weight: 400;">Spark is an open-source distributed processing technology that offers greater use of the possibilities of distributed data clusters. As it is an open-source project, different distributions are available from various technology providers, such as:, <a href="https://es.cloudera.com/">Cloudera</a>, AWS EMR, GCP DataProc, or <a href="https://databricks.com/">Databricks</a>, the most famous of these, which is also offered natively by <a href="https://www.microsoft.com/es-es">Microsoft</a> in your Azure cloud.</span></p>
<p><span style="font-weight: 400;">One of Spark's most distinctive features is that it does not have its own storage, but rather uses the memory of the machines that make up the cluster where it runs and is capable of retrieving data from different types of storage, such as HDFS.</span><span style="font-weight: 400;">, S3, Cloud Storage, Blob Storage, etc., or streaming systems such as Kafka.</span></p>
<p><span style="font-weight: 400;">In this regard, Spark distributes tasks across the different nodes of the cluster during execution, using the memory and processors of the different machines. That is why, by its very definition, it seems difficult to obtain a complete trace of the processes executed that move or transform data. Furthermore, this is not something that is completely and natively resolved in any of the current implementations of Spark, as they are all designed and optimised for data processing and not for data governance.</span></p>
<p><span style="font-weight: 400;">When we talk about data governance and Spark appears in the picture as the technology involved, the vast majority of professionals throw up their hands in horror or assume there is a significant gap. Certainly, activating low-level log capture can provide us with a lot of information about processes, but it also penalises performance, so a compromise solution must be found.</span></p>
<h3><b>Obtaining Spark lineage with Anjana Data</b></h3>
<p><span style="font-weight: 400;">As we have seen, obtaining the trace of Spark processes is not an easy task, mainly for the following reasons:</span></p>
<ul>
<li style="font-weight: 400;"><span style="font-weight: 400;">Since it does not have its own storage, the metadata that can be obtained from Spark processes at rest is null. When not running, the most we can obtain is the metadata of the datasets that may participate in those processes as inputs or outputs, but we cannot know anything about what happens in between or how the output data is generated from the input data.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">The way Spark encodes the processing of datasets, whether in an RDD or a Spark Dataset, can be written in several languages (Scala, Java, Python) and operations can be masked according to the encoding. Therefore, parsing source code to obtain traces is not a viable option when it comes to Spark.</span></li>
<li style="font-weight: 400;"><span style="font-weight: 400;">The information that can be obtained from the execution logs is never complete or interpretable and depends largely on both the Spark distribution and the audit configuration. In many cases, the most that can be obtained is the input-output ratio at a rough level, or, on other occasions, the amount of logs to be interpreted will be unmanageable.</span></li>
</ul>
<p><span style="font-weight: 400;">However, thanks to Anjana Data's approach and implementation, it is possible to obtain a fairly reliable picture with a level of granularity (field level and applied functions) that no other solution on the market is capable of offering. How? We can't reveal everything, but we can give you a few hints 🙂</span></p>
<p><span style="font-weight: 400;">Essentially, what Anjana Data does is apply a combination of the three techniques mentioned above, intercepting each of the processes just before they are executed.</span></p>
<p><span style="font-weight: 400;">To execute processes, Spark creates an execution plan (DAG) before dividing the process into tasks that are launched in parallel. This is the only moment when all the information is available in a single point, just before it is distributed to all the elements responsible for its execution. By including a specific agent that is always invoked in each and every default execution, all this information can be captured and extracted with the required level of detail. Furthermore, all this can be done centrally and non-invasively, without the need for programmers to include anything in their processes.</span></p>
<p><span style="font-weight: 400;">Finally, all this information is processed and interpreted by one of the components of Anjana Data's architecture and then served to the solution's CORE, where it is cross-referenced with governed information to generate a lineage of valuable data that is made available to the end user.</span></p>
<p><span style="font-weight: 400;">Anjana Data can do this and much more... Want to find out more? </span></p>
<p><strong>Request a <a href="https://anjanadata.com/en/request-a-demo/">demonstration</a> and we'll tell you all about it!</strong></p>]]></content:encoded>
					
					<wfw:commentrss>https://anjanadata.com/en/how-to-obtain-a-complete-data-lineage-of-spark-processes-thanks-to-anjana-data/feed/</wfw:commentrss>
			<slash:comments>0</slash:comments>
		
		
			</item>
		<item>
		<title>What value-added features should I look for in the technological solutions that support my Data Governance?</title>
		<link>https://anjanadata.com/en/value-added-features-for-technological-data-governance-solutions/</link>
					<comments>https://anjanadata.com/en/value-added-features-for-technological-data-governance-solutions/#respond</comments>
		
		<dc:creator><![CDATA[Mario De Francisco]]></dc:creator>
		<pubdate>Mon, 09 Mar 2020 15:32:34 +0000</pubdate>
				<category><![CDATA[Gobierno del dato]]></category>
		<category><![CDATA[artículo]]></category>
		<category><![CDATA[gobierno del dato]]></category>
		<guid ispermalink="false">https://anjanadata.com/?p=2284</guid>

					<description><![CDATA[Although Data Governance is fundamentally about cultural and organisational aspects and cannot be solved only through technology, technological solutions play a fundamental role and are more than necessary to achieve the implementation of an effective and efficient governance model. That is why in [...]]]></description>
										<content:encoded><![CDATA[<p><img decoding="async" class="aligncenter wp-image-2288 size-full" src="https://anjanadata.com/wp-content/uploads/2020/03/valor-anadido-blog-1.png" alt="value-added-blog" width="1200" height="650" srcset="https://anjanadata.com/wp-content/uploads/2020/03/valor-anadido-blog-1.png 1200w, https://anjanadata.com/wp-content/uploads/2020/03/valor-anadido-blog-1-300x163.png 300w, https://anjanadata.com/wp-content/uploads/2020/03/valor-anadido-blog-1-1024x555.png 1024w, https://anjanadata.com/wp-content/uploads/2020/03/valor-anadido-blog-1-768x416.png 768w" sizes="(max-width: 1200px) 100vw, 1200px" /></p>
<p>&nbsp;</p>
<p>Although the <strong>Data Governance </strong>It mainly deals with cultural and organisational aspects and cannot be resolved solely through technology, although technological solutions play a fundamental role and are more than necessary to achieve the implementation of an effective and efficient governance model. That is why there are many solutions on the market that serve as accelerators for achieving good data governance, as well as helping organisations to build and maintain the data culture necessary at all levels to achieve the goal of becoming <em>data-driven</em>.</p>
<p>From the <a href="https://dama.org/content/dama-dmbok-2" target="_blank" rel="nofollow noopener noreferrer"><strong>version 2 of the <em>DAMA-DMBOK</em></strong></a> we can draw some very interesting conclusions:</p>
<ul>
<li>Organisations that establish a formal Data Governance programme are much better equipped to increase the value they derive from their data assets.</li>
<li>The Data Governance function guides all other Data Management functions.</li>
<li>The purpose of the Data Governance is to ensure that data is managed appropriately, in accordance with a series of policies and best practices.</li>
<li>Data governance focuses on how decisions are made about data and how processes and people are expected to behave in relation to data.</li>
<li>Data governance is not an end in itself; it needs to be directly aligned with the organisation's strategy.</li>
<li>Data governance is not a one-off exercise; it requires an ongoing programme focused on ensuring that the organisation derives value from its data and reduces data-related risks.</li>
<li>Data governance is different from IT governance.</li>
<li>The objective of the Data Government is to enable the organisation to manage data as an asset.</li>
<li>A Data Governance programme must be sustainable, embedded and measurable.</li>
<li>Data governance cannot be implemented overnight and requires planning.</li>
</ul>
<p>&nbsp;</p>
<p>In short, the fact that the <strong>Data Governance </strong>The fact that it is so closely linked to these cultural and organisational aspects makes evaluating the technological solutions that can support us in this area difficult and complex. This forces us to broaden our horizons beyond an evaluation based on the coverage of available functionalities or modules and also consider a series of value-added features that we must incorporate into the assessment.</p>
<p>&nbsp;</p>
<h2>Features and modules</h2>
<p>On the one hand, if we consider the functionalities and modules “specific” to Data Governance, we can mention:</p>
<ul>
<li>Glossary of business terms</li>
<li>Metadata management with Dictionary and Catalogue</li>
<li>Data traceability and lineage</li>
<li>Architecture, design, and data modelling</li>
<li>Workflow and business process management</li>
<li>Master and reference data management</li>
<li>Data quality</li>
<li>Data incident management</li>
<li>Data security (policies, access and use, user roles and profiling, data obfuscation)</li>
<li>Dashboard</li>
<li>DataLabs and Sandboxes Management</li>
<li>Content Management and Publishing Portal</li>
<li>Data services management</li>
<li>Audit support</li>
</ul>
<p>&nbsp;</p>
<p>However, evaluating a solution solely on the basis of the completeness of these functionalities will mean that we only see part of the picture and may make a decision that we regret later on, especially when it is utopian to think that a single technological solution can accommodate all these functionalities in a self-contained manner. That is why, to ensure that this does not happen, we must weigh up the analysis of feature coverage alongside another type of analysis based on a series of value-added characteristics.</p>
<p>&nbsp;</p>
<h2>Value-added features</h2>
<p>These features will enable us to grow in the role of Data Governance in a timely manner according to the specific needs of the organisation:</p>
<ul>
<li><strong>Automation</strong>: processes should be as automated as possible to free users from the burden of using tools.</li>
<li><strong>UX &amp; UI</strong>: The user interface, as well as its navigation and usability, must be as intuitive and user-friendly as possible for all types of audiences, so that any user feels comfortable using it.</li>
<li><strong>Interoperability</strong>: it must be able to share and exchange data with other systems; it must not be a “black box” or a closed component, allowing interconnection with different types of systems through connectors and enabling the use of standards.</li>
<li><strong>Customisation</strong>: as configurable as possible in order to support the strategy and governance model defined by the organisation.</li>
<li><strong>Modularisation</strong>: the different functionalities should be understood as independent parts, so that the use of one does not limit the use of others, allowing the necessary modules to be used without compromising the overall experience.</li>
<li><strong>Multi-environment</strong>: the ability to centrally manage multiple platforms supported by different technologies from a single instance.</li>
<li><strong>Scalability</strong>: adaptable as data volume and processing and response requirements increase, maintaining stable performance over time.</li>
<li><strong>Adaptability</strong>: it must be able to adapt to the needs and realities of the organisation over time.</li>
</ul>
<p>&nbsp;</p>
<p>Additionally, from a longer-term perspective, the characteristics that require special attention and care are:</p>
<ul>
<li><strong><em>Vendor lock-in</em></strong>: as far as possible, efforts should be made to ensure that the solutions selected do not “tie” the organisation to a single <em>vendor</em> acquiring large dependencies and thus avoiding the migration from one solution to another having major consequences.</li>
<li><strong>Learning curve</strong>: as powerful solutions, the learning curve should not be a problem for users, who should not have to invest a large number of hours in learning how to use the solution or require very specific and costly training and certification.</li>
<li><strong>User limit</strong>: If we want to extend data governance to the entire organisation, we must consider solutions that do not license by user, as this can result in limited use of the solution due to skyrocketing costs in relation to the increase in users and not based on actual use.</li>
<li><strong>Licence cost</strong>: the cost must be flexible and scalable, tending towards pay-per-use, allowing total control over ROI without requiring a high initial investment to maximise the <em>time to market</em> and the <em>time-to-value</em>.</li>
</ul>
<p>&nbsp;</p>
<h2>What can we find on the market?</h2>
<p>Looking at the market, given that we are talking about technology, the manufacturers of data storage and processing solutions themselves often offer modules geared towards data governance within their own platforms, but generally with a biased and poorly interoperable vision, representing an integration problem between technologies and resulting in a new challenge of application and technology governance.</p>
<p>On the other hand, given the existing market need, in recent years new providers have emerged that specialise in developing specific, independent solutions with an agnostic approach to data storage and processing technologies, providing this practice with a new set of tools to facilitate its implementation. This group includes, for example, <a href="https://anjanadata.com/en/" target="_blank" rel="nofollow noopener noreferrer"><strong>Anjana Data</strong></a>.</p>
<p>Despite this, due to the complexity and breadth of the practice, solutions tend to focus on offering a series of specific functionalities and capabilities, and it seems very difficult, if not impossible, to find a single solution that covers everything. Therefore, it is advisable to look for the different pieces that help us build the puzzle of solutions that support Data Governance based on the needs of the organisation, starting with the most critical aspects.</p>
<p>In addition, the market for specific solutions for <strong>“<a href="https://anjanadata.com/en/resources-3/data-governance-metadata-centric-collaborative-approach/" target="_blank" rel="noopener noreferrer">Data Governance</a>”</strong> It has not been around for very long and is not widely used, except in the US, where it does represent a high volume of business. In fact, neither Gartner nor Forrester have yet created a quadrant or curve for this area, with solutions falling under “Metadata Management”, “Master Data Management” and “Data Quality”.</p>
<p>Finally, within the spectrum of Data Governance technology solution providers, we can group vendors into different categories... but that is a topic for another article entirely.</p>]]></content:encoded>
					
					<wfw:commentrss>https://anjanadata.com/en/value-added-features-for-technological-data-governance-solutions/feed/</wfw:commentrss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>