How to Avoid Data Conflicts in Data Engineering Projects

Data conflicts are a common source of frustration and inefficiency in data engineering projects. They occur when engineers and architects have different expectations, assumptions, or definitions of the data they work with. Data conflicts can lead to errors, delays, rework, or even failures in the data pipeline. How can you avoid them or resolve them quickly and effectively? Here are some tips to help you.

Align on data requirements

Before you start building or modifying the data pipeline, make sure you understand the data requirements of the project. What are the sources, formats, quality, and volume of the data? What are the business goals, use cases, and performance criteria of the data? How will the data be accessed, processed, and delivered? Aligning on these questions with the architects and other stakeholders will help you avoid data conflicts due to mismatched expectations or assumptions.

Add your perspective

Ali Shamaei

Head of Data Engineering & Analytics | Data Architecture & Governance | BI & Data Science | Data Cloud Modernization
Before diving into the project, it's essential for both Architects and Engineers to step back and see things from the business stakeholders' perspective. By getting on the same page with the value the project brings to the customers, they can make sure their work hits the mark and satisfies everyone involved, resulting in less conflict among technology and business team members.
Like

13
Report contribution
Hemanth Kumar

Software Engineer at Ford || GCP Certified Cloud Engineer || Azure Certified Data Engineer || AWS || AI/ML || ML-Ops || MuleSoft || Salesforce || KAFKA
Some of the best ways to avoid data conflicts between engineers and architects are 1. Establish a clear data governance policy that outlines the roles and responsibilities of both engineers and architects in data engineering. 2. Encourage regular communication and collaboration between engineers and architects to ensure that everyone is in the same page regarding data standards, quality and security. 3. Develop clear data management processes and protocols between engineers and architects to ensure that everyone is in the same page regarding data integration, validation. 4. Use data visualisation tools to help both engineers and architects understand and interpret data more effectively. 5. Provide regular training on best practice.
Like

8
Report contribution
Sonali Majumdar

Senior Data Engineer at Digit Insurance
1. The role that an architect will add is to provide a design which is robust, cost and performance efficient giving equal importance to catering the business requirement. 2. It all depends on the goal of the business, where the collaboration of the client/end user, business analysts, data architects and data engineers is of utmost importance because knowing where the business is heading to, can really be a crucial factor for the architects to think and accordingly design and help improve performance of the existing system of pipelines so that it is scalable corresponding to the business scalability. 3. Such business critical collaborations should have efficient and effective communication, task management, Agility, Documentation.
Like

6

(edited)
Report contribution

Document and communicate data changes

As you work on the data pipeline, you may need to make changes to the data schema, structure, or content. For example, you may add, remove, or rename columns, tables, or files. You may also transform, filter, or aggregate the data to meet the requirements. Whenever you make such changes, document them clearly and communicate them to the architects and other engineers. This will help you avoid data conflicts due to outdated or inconsistent information.

Add your perspective

Ganesh DG

Top Data Engineering Voice | VP - Sr Technical Manager with Expertise in Financial and Manufacturing Domain | Mentor | Views expressed are my own
THis can be achived through two means (though its part of regular project mgmt practice. 1. Workshops: Many organizations focus on individual roles and responsibilities, but overlook the importance of collaborative workshops or meetings between data engineers and data architects. These sessions can be used to share knowledge, brainstorm ideas 2. Documenting Decisions: Data engineering and data architecture involve making various decisions related to data storage, processing, and infrastructure. Often, these decisions are not adequately documented, leading to confusion or disputes down the line. Documenting decisions and rationale can help prevent conflicts by providing a reference point for both parties.
Like

5
Report contribution
Justin DoBosh

Experienced Engineering Leader
My experience is maintaining these diagrams and keeping the necessary individuals informed about changes after the initial project kickoff is the more challenging problem. I encourage my teams to create these diagrams as code and commit to Git along side their code. To create these diagrams they can use either PlantUML or MermaidJS and by keeping the diagrams and code together it: - increased discoverability - easier to maintain - easier to solicit feedback/inform (via pull requests) If you need to share these diagrams with stakeholders outside of engineering you can use graphviz to convert PlantUML diagrams to pngs and other doc formats. Adopting the diagrams as code process has really helped us.
Like

5
Report contribution
Sujeeth Shetty

Sr. Data Engineer @ SmartAsset | Building next-gen Data & AI/ML products on AWS/Azure | 3x AWS Certified
Establish clear data standards and documentation procedures. Consistency in data formats, naming conventions, and documentation practices can significantly reduce conflicts. By creating and adhering to a shared data framework, everyone can work more efficiently and minimize data-related issues.
Like

2
Report contribution

Use data standards and conventions

One way to reduce data conflicts is to follow data standards and conventions that are agreed upon by the engineers and architects. These may include naming conventions, data types, formats, encoding, validation rules, metadata, and documentation. Using data standards and conventions will help you ensure consistency, quality, and readability of the data. It will also make it easier to troubleshoot and resolve any data conflicts that may arise.

Add your perspective

Ganesh DG

Top Data Engineering Voice | VP - Sr Technical Manager with Expertise in Financial and Manufacturing Domain | Mentor | Views expressed are my own
They also need to delve and finalize on the following: - ETL Vs ELT Processes to be followed - Versioning methods - Data Governance Policies and what is impact of each function on governance
Like

5
Report contribution
Henny Speelman

Visual Strategist | Data Storyteller | Graph Literacy | Data Communication Enabler | Public Speaker
The most easy win is to have clear naming conventions. In projects that I worked in, having naming conventions was key to have a successful project. As everybody speaks the same data language and the data is easy to find. For example we called our power bi views (vw_pbi_xxx) so everybody knew these belonged to power bi. Same goed for columns where you need to be consistent (cd versus code, or nm versus name). It makes it also easier for business at the end of the day.
Like

4
Report contribution
Thiago de Faria

Kindness | Engineering Management & Tech Leadership Consultant| Ex-Amazon (AWS)
Using data standards and conventions is like having a rulebook for a board game. Did you play Monopoly and fight with your cousins and siblings? Imagine a worse case where everyone has their own rules for buying property. Pandemonium, right? Similarly, in data projects, if engineers and architects don't play by the same rules – think naming conventions, data types, and formats – you're setting up for a data disaster. And nothing makes or breaks more than inconsistent naming conventions. It can really be like finding a needle in a haystack, except the needle kept changing names. Stick to the standards, and you'll save yourself from hours describing tables or parsing git ddls.
Like

1
Report contribution

Implement data testing and validation

Another way to prevent data conflicts is to implement data testing and validation throughout the data pipeline. Data testing and validation are processes that check the data for accuracy, completeness, and conformity to the requirements. They can help you identify and fix any data issues or anomalies before they cause problems downstream. You can use tools and frameworks such as PyTest, Great Expectations, or dbt to automate and simplify data testing and validation.

Add your perspective

Justin DoBosh

Experienced Engineering Leader
I agree with the importance of adding data testing and validation tools to identify data issue as early as possible. Data source schemas change or get disconnected more often than you expect and without an automated process in place to alert you of this you are going to be constantly fire fighting data quality issues. I’m a fan of dbt, but I think it’s less about the tool and more about adopting the mindset that automated tests and validation checks are necessary to build a reliable data pipeline! It doesn’t have to be anything crazy, adding some data freshness, unique, and not null checks go a long way! 🙂
Like

2
Report contribution
Sukesh Kumar

Power BI Developer at Deloitte
Here's how you can incorporate data testing and validation into your data pipeline: Define Data Quality Metrics: Start by defining data quality metrics and criteria specific to your project. These metrics should include expectations for data accuracy, completeness, consistency, and other relevant factors. Choose Testing Frameworks: Select appropriate testing frameworks or tools that align with your project's needs. Automate Testing: Create automated data validation tests that check the incoming data at various points in the pipeline. Custom Validation Rules: Implement custom validation rules that are tailored to your specific data requirements.
Like

2
Report contribution
Melbin E.

Cloud-Centric Big Data Engineer | SPARK | HIVE | Azure-Databricks | HBase | Sqoop | Python | SQL | ETL Pipelines
Using data testing and validation in data engineering offers significant advantages, including data quality assurance, error detection, compliance, data integrity, consistency, and enhanced decision-making. It also improves efficiency, data security, data lineage, and data documentation. In complex data ecosystems, data testing is crucial for maintaining reliable and trustworthy data, supporting data-driven decision-making, and meeting regulatory requirements.
Like

1
Report contribution

Collaborate and review data work

Finally, one of the best ways to avoid data conflicts is to collaborate and review your data work with the architects and other engineers. Collaboration and review can help you get feedback, insights, and suggestions on how to improve your data work. They can also help you catch and resolve any data conflicts early and efficiently. You can use tools and platforms such as GitHub, Databricks, or Airflow to facilitate collaboration and review of your data work.

Add your perspective

Kagiso Sebego
Through my professional experience, I've observed that the following practices serve as effective resolutions: Implementation of Standardized Data Formats Regular Coordination Meetings Collaborative Platforms: Utilizing shared platforms or tools to facilitate joint work, ensuring visibility and accessibility to each team's efforts. This enables real-time updates, reducing discrepancies due to outdated information. Emphasis on Documentation and Version Control Establishment of Conflict Resolution Protocols: Creating predefined processes for addressing differences, ensuring quick and efficient resolution without escalating conflicts. Integration of Feedback Loops
Like

4
Report contribution
Hakim HASSANI

Data Science & Machine Learning Specialist at eGreen | Leveraging Data for Intelligent Solutions
Also encouraging cross-training sessions or knowledge-sharing activities between engineers and architects helps to ensure that everyone has a comprehensive understanding of the project's requirements, architecture, and data processes.
Like

2
Report contribution
Melbin E.

Cloud-Centric Big Data Engineer | SPARK | HIVE | Azure-Databricks | HBase | Sqoop | Python | SQL | ETL Pipelines
Collaborating and reviewing data work in data engineering are vital for quality assurance, knowledge sharing, error detection, process optimization, data governance, cross-team understanding, scalability, enhanced documentation, data security, and informed decision-making. These practices ensure reliable and compliant data, supporting the success of data projects and data-driven initiatives.
Like
Report contribution

Here’s what else to consider

This is a space to share examples, stories, or insights that don’t fit into any of the previous sections. What else would you like to add?

Add your perspective

Xhorxhina Taraj

Cloud Advisor at Accenture Microsoft Business Group | UNESCO and Woman@Dior Mentee at Christian Dior Couture
I think fostering clear communication and collaboration is crucial. Establishing well-defined protocols for data sharing and documentation can help ensure consistency across teams. Implementing a centralized data management system, where both engineers and architects can access and update information in real-time, promotes transparency and reduces the likelihood of conflicting data. Regular cross-functional meetings and project reviews provide opportunities for teams to align on objectives and address any emerging discrepancies promptly. Additionally, investing in training programs that enhance data literacy among team members can contribute to a more harmonized understanding and usage of project data.
Like

4
Report contribution
Ganesh DG

Top Data Engineering Voice | VP - Sr Technical Manager with Expertise in Financial and Manufacturing Domain | Mentor | Views expressed are my own
Following aspects also need to be decided: - Agile and Iterative Approach: Embracing an agile and iterative approach to data engineering and architecture can help identify and resolve conflicts early in the process. Regular feedback and adjustments based on real-world implementation experiences can mitigate potential issues. - Performance Metrics: Establishing performance metrics and key performance indicators (KPIs) for data solutions can help align the goals of both data engineers and data architects. This can ensure that they are working towards common objectives and that their efforts are evaluated objectively.
Like

3
Report contribution
Gerhard Van Deventer

Looking for data engineers with >= 4 years of experience
A shared sense of ownership and a blameless culture always helps. Instead of "us" vs "them", it's more helpful to have a mentality of "we, together" from the get-go.
Like

2
Report contribution

How can you avoid data conflicts between engineers and architects?

Align on data requirements

Document and communicate data changes

Use data standards and conventions

Implement data testing and validation

Collaborate and review data work

Here’s what else to consider

Data Engineering

Rate this article

Thanks for your feedback

More articles on Data Engineering

How can you avoid data conflicts between engineers and architects?

Align on data requirements

Document and communicate data changes

Use data standards and conventions

Implement data testing and validation

Collaborate and review data work

Here’s what else to consider

Data Engineering

Rate this article

Thanks for your feedback

Explore Other Skills