Data Governance - Anjana Data

Govern your data in accordance with DCAT-AP... in record time

Lucía Engo — Fri, 11 Jul 2025 11:46:35 +0000

Health DCAT-AP for data space and portal governance is now a reality thanks to the collaboration between Anjana Data y DQTeam. At a time when organisations need to convert their data strategies into concrete actions, this solution enables the governance of open catalogues and healthcare data spaces in accordance with the most demanding European standards.

We are working on configurations. out of the box for all extensions of the standard DCAT-AP —such as Health, Geospatial, Statistics, or Energy—with the aim of enabling any organisation to activate a complete solution without the need for programming. Thanks to the collaboration with DQTeam, The first of these configurations is now available: Health DCAT-AP, designed to manage open data catalogues and health data spaces in a simple, interoperable manner that complies with the European profile.

Strategic alliance with our partners. Objective: from standard to action.

Thanks to the joint work with DQTeam, We have defined and parameterised a solution in Anjana Data that allows our clients to:

To start up a 100% governance aligned with standards DCAT-AP and its healthcare extension Health DCAT-AP
Configure the official roles defined by Health DCAT-AP (publisher, creator, rights holder, contact point, etc.)
Activate complete metadata templates for datasets, distributions, and catalogues

All this under a technical and functional nomenclature that is fully aligned with the standard, ensuring full interoperability between European institutions, regions and platforms.

Ready to deploy: no code, no friction

The solution features a out-of-the-box configuration kit which allows:

Implementations in record time
No programming required
Customisable templates to adapt metadata to the specific needs of each organisation

Anjana Data allows you to start with a basic configuration and evolve according to your data governance needs.

True interoperability: open APIs + metadata federation

Anjana Data has been designed with an architecture 100% interoperable, which allows:

Share and federate metadata between different instances or versions of the Health DCAT-AP
Bringing the logic of transformation where it needs to be: plugins and processes that consume APIs
Seamlessly integrate with external portals, public catalogues, and sector-specific data spaces.

All thanks to your REST API documented with Swagger and to their plugin development kits, designed for heterogeneous environments.

An interface to govern your open data portals

Beyond the technical back-end, Anjana Data offers a simple, intuitive interface geared towards business users, where possible:

Inventory datasets, distributions, and catalogues
Manage open data portals, sectoral data spaces, or federated networks.
Locate and apply filters to find and access actual distributions

Would you like to see it in action?

Here you go. practical demonstration how the Health DCAT-AP-based solution is configured and used within Anjana Data, with a comprehensive experience of searching, filtering, exploring metadata, and downloading data from an interoperable dataset:

📺 Watch video on YouTube – Health DCAT‑AP demo with DQTeam and Anjana Data

Ready to govern your data with European standards?

If your organisation wishes to align your data strategy with DCAT-AP, DAMA, UNE or the Data Act, Now you can do it immediately and without development, with a modular, extensible and 100% interoperable approach.

👉 Get in touch with us to learn how to activate this setting in your environment or try a customised demo.

Mexican Federal Telecommunications Institute and Anjana Data: How to build a Knowledge Marketplace in just 10 weeks and start generating value from day one

Lucía Engo — Thu, 29 May 2025 11:19:44 +0000

The Knowledge Marketplace at the IFT is now a tangible and transformative reality. In a context where data is growing uncontrollably and strategic decisions depend on reliable information, the Federal Telecommunications Institute of Mexico has managed to structure its internal knowledge, making it accessible, governed and value-oriented.

Thanks to Anjana Data, the IFT has gone from having no governed assets to creating, in just 10 weeks, a Knowledge Marketplace Complete. An environment where data is converted into actionable knowledge, available to all profiles and aligned with the organisation's strategic vision.

This milestone not only marks the beginning of a new culture data-driven, but rather demonstrates how a public institution can generate real and early impact through a modern Data Governance ecosystem focused on democratisation of business knowledge.

From zero to Knowledge Marketplace: IFT takes an exponential leap forward with Anjana Data

The IFT did not start from a mature data governance base. It started from scratch: without a centralised catalogue, without defined management flows, without a semantic structure or standardised access mechanisms. The usual approach would be to start slowly, with a basic technical catalogue or an initial Data Marketplace. But the IFT opted for take an ambitious strategic leap: build a genuine Knowledge Marketplace.

Thanks to Anjana Data, That leap has been possible and successful:

A smart and governed catalogue with over 130 data and information assets in just 10 weeks.
A living business glossary, which links organisational concepts with technical assets to facilitate cross-functional understanding.
They were enabled. governance workflows, ensuring compliance, traceability, versioning, and auditing.
A eCommerce experience, where any user can find, understand, and request the knowledge they need.

The result: an environment where data becomes useful knowledge, with context, governance and purpose, ready to be driven by all areas of the organisation.

➡️ Would you like to know more about the Knowledge Marketplace concept? Here we explain it in detail.

Flawless execution, measurable results

This ambitious project was rolled out quickly and efficiently thanks to three key factors:

Flexible and scalable technology (Anjana Data) which was adapted to the organisational and technical requirements of the IFT.
An expert implementation team (Management Solutions), with in-depth knowledge of the platform and best practices in data governance.
Agile and empowered project management on behalf of the IFT, with a key Project Manager to remove obstacles and speed up decisions.

The achievements speak for themselves:

+130 governed assets
Cases of strategic value activated at the close of the project
Governance implemented at all levels: technical, semantic, operational, and organisational
Active participation from all key areas: security, networks, business, and IT

The impact: much more than well-managed data

The IFT has not only structured its information assets: it has also implemented a new culture based on data and shared knowledge, that:

Promotes data sovereignty without restricting access.
Reduces the barrier to entry to information.
Democratise the use of knowledge throughout the organisation.
Ensures regulatory compliance and traceability.

This project is real proof that Data governance does not have to be slow or complex.. With the right vision and the right tools, you can go from zero to strategic in a matter of weeks.

Would you like to know all the details?

Download the full success story here and discover how IFT has managed to transform its data ecosystem into a Knowledge Marketplace with Anjana Data.

📩 Would you like to take this leap forward in your organisation?
Contact us at info@anjanadata.com, We will be delighted to help you design your path towards modern, agile data governance that generates value.

Anjana Data available on Azure Marketplace and Microsoft App Source

Angela Miñana Francés — Wed, 28 Apr 2021 14:10:16 +0000

From Anjana Data We are pleased to announce that we continue to grow and meet all of our expectations. Over a year ago, we signed a partnership agreement with Microsoft, and after much effort, we are pleased to announce that our solution for the Data Government, designed to assist organisations in implementing their data strategy in the era of Big Data, Multi-Cloud and Data-Driven. now available in the Azure Marketplace and in Microsoft App Source.

However, What are Microsoft's Azure Marketplace and App Source? Well, both are online stores containing thousands of IT software applications and services created by technology providers. Both the Marketplace and App Source, Azure users have the opportunity to find, test and purchase certified solutions tailored to meet the needs of organisations. For Anjana Data, appearing in these ecosystems means being able to directly offer our distinctive, innovative, and disruptive Data Governance solution to any organisation in the world working on Azure.

From a technical standpoint, Anjana Data's added value for organisations using Azure and its native cloud technologies as a data platform consists of offering a solution from Data Governance enterprise-ready, with state-of-the-art architecture, oriented towards microservices, modular and robust, which enables high availability and load balancing in a native and transparent way, compatible with a multitude of deployment types and automation, among which the following are noteworthy: Azure Kubernetes Service y Azure Resource Manager, while additionally maintaining compatibility with deployments based on binary distribution..

And from a more functional point of view, in addition to all the benefits offered by Anjana Data, for Azure it incorporates a extended native integration on the main technologies of storage, processing and exploitation of Azure data (Data Factory, Data Lake Storage, Blob Storage, Azure SQL, Databricks, PowerBI, etc.) as well as Azure AD. All of this enables the organisation to implement proactive and preventive data governance over its data ecosystem, thanks to which advanced data governance use cases can be built, such as the creation of a “Data Marketplace” or the achievement of “Governed Data Self-Service”.

Additionally, at Anjana Data, we have included a series of integrations in our Roadmap on Azure Purview, Therefore, our solution is also the perfect complement for organisations that are considering incorporating Azure Purview as the Data Catalogue for their Azure platform or that work in a hybrid or multi-cloud environment and need a more holistic and cross-cutting view of their data ecosystem, along with a series of advanced functionalities and value-added features for implementing effective and efficient data governance throughout the organisation.

Finally, the incorporation of the Anjana Data solution into this online space represents a further step in the strategic collaboration with Microsoft, which began with the company's inclusion in the exclusive global programme. Microsoft for Startups more than a year ago and which has recently been renewed for another year. From the outset, the importance of our collaboration was based on define an Anjana Data integration roadmap with Azure's native cloud technologies, establish a joint vision of proactive and preventive data governance for organisations that want to go one step further in their strategy data-driven y have access to a range of Azure services. Furthermore, as a Microsoft partner, we will also have the opportunity to market our solution jointly.

How to obtain a complete data lineage of Spark processes thanks to Anjana Data

Juan Sobrino — Fri, 04 Sep 2020 09:24:08 +0000

Knowing the lineage of data is very important from a data governance perspective in order to:

Knowing how information flows throughout the organisation
Understanding the data value chain and its processes
Understanding the lifecycle of data assets and how they are being generated
Visualise dependencies between data assets and processes to manage the potential impacts generated by changes and modifications.
Facilitate the search for process errors, quality issues, service degradation, etc.

The data lineage It comes in many forms, and perhaps one of the best known is the technical lineage. That is, knowing, from a technical point of view, how data moves from one place to another through the processes that are executed on it, either automatically or manually.

However, it should be noted that, even though this is technical lineage, this information must be useful for data governance. Thus, for lineage information to be valuable, it must be interpretable, and in the vast majority of cases, a string of activity logs spewed out by a machine is of little use. That is why it is most common for technical lineage to have to be captured, interpreted, and translated in order to be of any use.

Obtaining the lineage

In this context, in order to obtain the technical lineage of the processes that move data from one site to another, we can rely on several techniques:

Extract lineage through the identification or inference of relationships between objects, which are declared in the form of metadata: This is typically the case with ETLs that operate with parameters and define all the processes to be executed by metadata.
Extracting lineage through source code parsing: This is no easy task, and its complexity depends greatly on both the programming language (SQL is not the same as Java) and the programmer who wrote the code (depending on the functions, methods, or variables used). Given that we are entering murky waters here and could encounter anything, the only certainty is that the guarantee that can be offered in these situations is usually very low and, in the vast majority of cases, a significant gap must be assumed.
Extracting lineage from the recovery and interpretation of audit logs: This is what we at Anjana Data call dynamic lineage, and in many cases it is the only way to obtain the most complete trace possible, even though you will only be able to capture what is executed. To implement this, you need to carry out native integration with data platforms and technologies to understand how logs work, know where and how to retrieve them, and finally be able to interpret the captured information, which usually has to be translated to be valuable.

Furthermore, given the variability of languages, platforms, technologies, etc., the spectrum we encounter becomes too broad to cover completely. That is why, when we talk about lineage, we usually have to seek compromises and apply Pareto's Law, especially in certain specific scenarios.

Spark and the data lineage

As we have already seen, not all data processing technologies make it easy for us to obtain the internal lineage of their processes, and among all of them, Spark is one of the most complex.

Spark is an open-source distributed processing technology that offers greater use of the possibilities of distributed data clusters. As it is an open-source project, different distributions are available from various technology providers, such as:, Cloudera, AWS EMR, GCP DataProc, or Databricks, the most famous of these, which is also offered natively by Microsoft in your Azure cloud.

One of Spark's most distinctive features is that it does not have its own storage, but rather uses the memory of the machines that make up the cluster where it runs and is capable of retrieving data from different types of storage, such as HDFS., S3, Cloud Storage, Blob Storage, etc., or streaming systems such as Kafka.

In this regard, Spark distributes tasks across the different nodes of the cluster during execution, using the memory and processors of the different machines. That is why, by its very definition, it seems difficult to obtain a complete trace of the processes executed that move or transform data. Furthermore, this is not something that is completely and natively resolved in any of the current implementations of Spark, as they are all designed and optimised for data processing and not for data governance.

When we talk about data governance and Spark appears in the picture as the technology involved, the vast majority of professionals throw up their hands in horror or assume there is a significant gap. Certainly, activating low-level log capture can provide us with a lot of information about processes, but it also penalises performance, so a compromise solution must be found.

Obtaining Spark lineage with Anjana Data

As we have seen, obtaining the trace of Spark processes is not an easy task, mainly for the following reasons:

Since it does not have its own storage, the metadata that can be obtained from Spark processes at rest is null. When not running, the most we can obtain is the metadata of the datasets that may participate in those processes as inputs or outputs, but we cannot know anything about what happens in between or how the output data is generated from the input data.
The way Spark encodes the processing of datasets, whether in an RDD or a Spark Dataset, can be written in several languages (Scala, Java, Python) and operations can be masked according to the encoding. Therefore, parsing source code to obtain traces is not a viable option when it comes to Spark.
The information that can be obtained from the execution logs is never complete or interpretable and depends largely on both the Spark distribution and the audit configuration. In many cases, the most that can be obtained is the input-output ratio at a rough level, or, on other occasions, the amount of logs to be interpreted will be unmanageable.

However, thanks to Anjana Data's approach and implementation, it is possible to obtain a fairly reliable picture with a level of granularity (field level and applied functions) that no other solution on the market is capable of offering. How? We can't reveal everything, but we can give you a few hints 🙂

Essentially, what Anjana Data does is apply a combination of the three techniques mentioned above, intercepting each of the processes just before they are executed.

To execute processes, Spark creates an execution plan (DAG) before dividing the process into tasks that are launched in parallel. This is the only moment when all the information is available in a single point, just before it is distributed to all the elements responsible for its execution. By including a specific agent that is always invoked in each and every default execution, all this information can be captured and extracted with the required level of detail. Furthermore, all this can be done centrally and non-invasively, without the need for programmers to include anything in their processes.

Finally, all this information is processed and interpreted by one of the components of Anjana Data's architecture and then served to the solution's CORE, where it is cross-referenced with governed information to generate a lineage of valuable data that is made available to the end user.

Anjana Data can do this and much more... Want to find out more?

Request a demonstration and we'll tell you all about it!

What value-added features should I look for in the technological solutions that support my Data Governance?

Mario De Francisco — Mon, 09 Mar 2020 15:32:34 +0000

Although the Data Governance It mainly deals with cultural and organisational aspects and cannot be resolved solely through technology, although technological solutions play a fundamental role and are more than necessary to achieve the implementation of an effective and efficient governance model. That is why there are many solutions on the market that serve as accelerators for achieving good data governance, as well as helping organisations to build and maintain the data culture necessary at all levels to achieve the goal of becoming data-driven.

From the version 2 of the DAMA-DMBOK we can draw some very interesting conclusions:

Organisations that establish a formal Data Governance programme are much better equipped to increase the value they derive from their data assets.
The Data Governance function guides all other Data Management functions.
The purpose of the Data Governance is to ensure that data is managed appropriately, in accordance with a series of policies and best practices.
Data governance focuses on how decisions are made about data and how processes and people are expected to behave in relation to data.
Data governance is not an end in itself; it needs to be directly aligned with the organisation's strategy.
Data governance is not a one-off exercise; it requires an ongoing programme focused on ensuring that the organisation derives value from its data and reduces data-related risks.
Data governance is different from IT governance.
The objective of the Data Government is to enable the organisation to manage data as an asset.
A Data Governance programme must be sustainable, embedded and measurable.
Data governance cannot be implemented overnight and requires planning.

In short, the fact that the Data Governance The fact that it is so closely linked to these cultural and organisational aspects makes evaluating the technological solutions that can support us in this area difficult and complex. This forces us to broaden our horizons beyond an evaluation based on the coverage of available functionalities or modules and also consider a series of value-added features that we must incorporate into the assessment.

Features and modules

On the one hand, if we consider the functionalities and modules “specific” to Data Governance, we can mention:

Glossary of business terms
Metadata management with Dictionary and Catalogue
Data traceability and lineage
Architecture, design, and data modelling
Workflow and business process management
Master and reference data management
Data quality
Data incident management
Data security (policies, access and use, user roles and profiling, data obfuscation)
Dashboard
DataLabs and Sandboxes Management
Content Management and Publishing Portal
Data services management
Audit support

However, evaluating a solution solely on the basis of the completeness of these functionalities will mean that we only see part of the picture and may make a decision that we regret later on, especially when it is utopian to think that a single technological solution can accommodate all these functionalities in a self-contained manner. That is why, to ensure that this does not happen, we must weigh up the analysis of feature coverage alongside another type of analysis based on a series of value-added characteristics.

Value-added features

These features will enable us to grow in the role of Data Governance in a timely manner according to the specific needs of the organisation:

Automation: processes should be as automated as possible to free users from the burden of using tools.
UX & UI: The user interface, as well as its navigation and usability, must be as intuitive and user-friendly as possible for all types of audiences, so that any user feels comfortable using it.
Interoperability: it must be able to share and exchange data with other systems; it must not be a “black box” or a closed component, allowing interconnection with different types of systems through connectors and enabling the use of standards.
Customisation: as configurable as possible in order to support the strategy and governance model defined by the organisation.
Modularisation: the different functionalities should be understood as independent parts, so that the use of one does not limit the use of others, allowing the necessary modules to be used without compromising the overall experience.
Multi-environment: the ability to centrally manage multiple platforms supported by different technologies from a single instance.
Scalability: adaptable as data volume and processing and response requirements increase, maintaining stable performance over time.
Adaptability: it must be able to adapt to the needs and realities of the organisation over time.

Additionally, from a longer-term perspective, the characteristics that require special attention and care are:

Vendor lock-in: as far as possible, efforts should be made to ensure that the solutions selected do not “tie” the organisation to a single vendor acquiring large dependencies and thus avoiding the migration from one solution to another having major consequences.
Learning curve: as powerful solutions, the learning curve should not be a problem for users, who should not have to invest a large number of hours in learning how to use the solution or require very specific and costly training and certification.
User limit: If we want to extend data governance to the entire organisation, we must consider solutions that do not license by user, as this can result in limited use of the solution due to skyrocketing costs in relation to the increase in users and not based on actual use.
Licence cost: the cost must be flexible and scalable, tending towards pay-per-use, allowing total control over ROI without requiring a high initial investment to maximise the time to market and the time-to-value.

What can we find on the market?

Looking at the market, given that we are talking about technology, the manufacturers of data storage and processing solutions themselves often offer modules geared towards data governance within their own platforms, but generally with a biased and poorly interoperable vision, representing an integration problem between technologies and resulting in a new challenge of application and technology governance.

On the other hand, given the existing market need, in recent years new providers have emerged that specialise in developing specific, independent solutions with an agnostic approach to data storage and processing technologies, providing this practice with a new set of tools to facilitate its implementation. This group includes, for example, Anjana Data.

Despite this, due to the complexity and breadth of the practice, solutions tend to focus on offering a series of specific functionalities and capabilities, and it seems very difficult, if not impossible, to find a single solution that covers everything. Therefore, it is advisable to look for the different pieces that help us build the puzzle of solutions that support Data Governance based on the needs of the organisation, starting with the most critical aspects.

In addition, the market for specific solutions for “Data Governance” It has not been around for very long and is not widely used, except in the US, where it does represent a high volume of business. In fact, neither Gartner nor Forrester have yet created a quadrant or curve for this area, with solutions falling under “Metadata Management”, “Master Data Management” and “Data Quality”.

Finally, within the spectrum of Data Governance technology solution providers, we can group vendors into different categories... but that is a topic for another article entirely.