What is the best format to create a data dictionary for the company to align everyone on terms so we speak the same language?
I'd consider a format that is easily accessible and that is easy to keep up to date. As such, I'd suggest something like a company wiki and/or link directly from your BI tool or whichever platform teams across your company access data and analytics. When creating this data dictionary, I'd spend the time upfront to get alignment from all the key stakeholders across the company that touch data that this will be the single source of truth and align on the process, timing and owners for updates to maintain the relevance of the content.
This answer depends on what sort of tools you have available to you. I think a google doc with key terms and definitions is always the quickest solve. Wherever you can incorporate these terms will help reinforce aligning the company. Outside of a google doc and sharing it you should take those terms and make sure they show up in all your presentations, your help text in SFDC and your training and enablement of your field team.
In most documentation efforts, internal and cross-functional alike, RevOps objective is typically to simultaneously achieve widespread adoption, relevance and freshness.
For a data dictionary, various options are possible: from a static PDF, to a live collaborative piece (Word, Wiki, GDoc, spreadsheet - enabling comments, revisions, use cases/queries/code logging); all the way to a data governance platform - typical for large orgs with complex schemas. There are many choices here: long-form vs. spreadsheet, collaborative vs. static, systematic vs. 3P app-supported. To lock an answer, consider your tech stack and org complexity and balance those with end users' experience and desired outcomes.
For widespread adoption, ensure buy-in from users, and give them sufficient airtime and opportunity to provide input - success here means all teams in the revenue engine should have this tool in their bookmarks (and ideally you track their usage). For relevance, make sure it tackles the key go-to-market model dimension (at a high, top-down level) while also dissecting the most used tables and fields/variables in clear language (for consultation). For freshness, assign ownership and drive accountability: either you or an appointee must periodically review, gather feedback, update, and re-publish regularly.
Finally -- make a call fast, you can always adjust later. Added alignment benefits are company-wide and often beat rebuild costs by a wide margin.
The best data dictionaries live in a system of record connected to the data source itself. Tools like Unity Catalog on Databricks or Alation surface the dictionary and governance on the same platform as the end query. This makes is more efficient for users as well as more transparent to keep updated.
That being said, I have found it useful to start with a simple, collaborative tool like Google Sheets to get things started. Use this as your structure for a few months until the necessary information is curated and you align on the data elements that really matter. You can do your research on a more robust system at the same time as you generate version 1.
This is as much about company culture as anything. At Intercom, we would have a Coda page because that is our main tool for documenting and sharing work like a data dictionary. There's nothing wrong with a Word doc or Excel file if that is something people will consistently rely on for this. Your job is to figure out what the medium is that the company will respond to.
Whatever format you pick, spending time to socialize your ideas and get buy-in from the other key stakeholders will matter a lot too. Where I have seen efforts like this fail is when one group in isolation decides to make a data dictionary or define metrics and impose that output on the rest of the company. They probably don't understand the full business or the nuances and needs of different teams. Only if you get the right parties together to agree will it be a successful effort.
The best way of creating a data dictionary is to store it in a shared, accessible document, or on a database. Every item in the dictionary should be comprised of its name, definition, source and how it is used. Google Sheets, Smartsheets, Confluence and a dedicated metadata management platform are tools that come in handy here. Make certain it is easy for users to understand. Also, categorize by function or department for quick searching. It is absolutely crucial to keep updating this document, with clear ownership being maintained. This is important to stay relevant and ensure that all stakeholders remain on the same page when it comes to data terms and understanding.
I tend to like to use a Google Sheet to build this to make sure you have a more table-based structure. Setting not only the attribute name, definition, source of data, systems that store, edit, create, etc.
This tends to be the easiest format in my experience for managing and being able to have tabs to support various tables (customers, product, activity, etc.) attributes.
A data dictionary holds remarkable importance, often overlooked because it's perceived as low-value documentation. However, I've found that neglecting it can lead to critical issues. Definitions, KPIs, and leading indicators may become consolidated in a few minds within the organization, posing significant risk. Alternatively, as organizations scale, maintaining consistency becomes challenging, hindering alignment across functions.
There's no one-size-fits-all format for creating a data dictionary; what matters is starting the process.
The best format is one that's accessible, easily digestible by various audiences, and easily updatable.
A data dictionary isn't static; it must evolve alongside changing definitions. New metrics are introduced, old ones deprecated, making it a living document essential for meeting the evolving reporting needs across the organization.
The most effective data dictionaries I've encountered were built using simple tools like Google Docs or Sheets. By logging different terminologies, their definitions, calculation methods, usage contexts, and examples, you create a foundational resource. Incorporating version control and regular updates, along with stakeholder collaboration, ensures alignment and consistency in language across the organization.