FAQ
Vision and Purpose
How is this different from the MiDataHub?
- Local Control vs. Broader Access: The MiDataHub is locally controlled and managed, while the MiGreatDataLake is expected to have governance allowing for data access across districts when permitted.
- Data Management: The MiGreatDataLake supports the management of more date, including structured and unstructured form, and will facilitate faster and seamless access with vendor, local, and state systems using APIs.
- Transactional vs. Analytical: The MiDataHub is more transactional in nature and good for sharing statistics. In contrast, the MiGreatDataLake is more analytical, designed to help teachers and educators make real-time decisions to support learners.
- Data Scope: The MiDataHub handles a select amount of structured data. The MiGreatDataLake, however, creates more data sources, taking both structured and unstructured data (e.g., student artwork) to provide broader support to teachers and students.
- Purpose and Functionality: MiDataHub moves data safely between systems and translates non-aligned data into a standards-based format. MiGreatDataLake is specifically built to handle the analytic layer with all the data, making that data usable for decision-making, innovation, and AI-powered learning.
Why don't we just purchase a DataLake solution?
- Customization and Focus: Off-the-shelf products are rarely exclusively focused on education and may require more customization, leading to higher costs and less control. An internally engineered platform allows for specific thought to be put into the data and needs of the community.
- Cost Containment and Ownership: A not-for-profit, community-controlled effort can lead to more responsible cost containment. There are concerns about the high costs associated with vendors and the difficulty in getting everyone on the same page. Ownership of the data is also key consideration.
- Control and Vendor Lock: Purchasing from a vendor can lead to a lack of control and potential vendor lock-in.
- Security Data: There are concerns about the security of data when using a vendor solution.
- Stronger Support System and Common Threads: An internal platform allows for a stronger support system and the ability to see common threads of data, which might be harder to achieve with a vendor solution.
Technology and Governance
Why can't I just collect my own data and use AI tools myself?
- Security and Maintenance: There are significant security concerns if everyone puts their data into their own AI. Additionally, keeping all this data up-to-date would be an overwhelming maintenance burden, raising questions about who would be responsible for it. The MiGreatDataLake, in contrast, allows for secure access across districts and the state.
- Limited Scope and Collaboration: Personal AI tools can lead to siloed data, limiting the ability to work with larger groups and learn from other schools. The power of a data lake lies in its ability to facilitate shared learning.
- Cost Limitations: Free AI tools have their limitations, and the costs associated with managing and securing data in individual chatbots can quickly become prohibitive.
- AI Tools are not Data Management Systems: An AI tool is primarily an interface for summarizing or answering questions; it is not designed to manage the complex work of data governance (who owns what, access, consent), data quality and lienage (origin, transformation, accuracy), and compliance and protection (FERPA, HIPPAA, etc.).
- Risk of Hallucination: Feeding raw or sensitive data to an AI tool without structured metadata and validation layers can lead to "hallucinations" or false answers, and it cannot guarantee security or transparency. TheMiGreatDataLake provides a secure, well-governed backbone for clean data to power AI tools.
afienier