In and of itself, a data lake is a collection of data stored in its native format on a server, either on-premises or in the cloud. While there doesn’t seem to be a widely accepted definition of “data lake platform,” ancillary services are required to manage the servers, provide security and storage services and make the data available for extraction and use. In other words, a data lake could be the data itself, and the data lake platform the servers, other equipment, hardware and software used to operate and maintain it.
Most resources that describe best practices for developing a data lake are describing best practices for any major technology undertaking in a large organization:
1. Gather relevant stakeholders and decide on your goals.
2. Develop an action plan and assign ownership of the project.
3. Evaluate the methods available.
4. Select the best server architecture for your needs.
5. Pick a vendor.
6. Ensure your organization’s data governance, security and privacy standards are maintained.