Data Lake - How it works : Infraspeak Academy

Infraspeak’s data lake solution is a product that allows unparalleled access to Infraspeak’s data, to enhance customer's scalability and flexibility, increase the team’s performance, improve operations efficiency, and allow a centralized data stream.
Infraspeak Data lake has a simplified architecture that enables a new generation of data-related use cases. It allows you to choose how to ingest your Infraspeak data, is compatible with any tech stack and gives you access to your data backups.

Data Lake Structure:

In a simple, and summarized, way, Infraspeak data lake has the following structure:

For each Infraspeak platform entity, a directory will be generated which contains folders with the .csv files resulting from the ETL process.
The files will be named with a timestamp and the corresponding object.
Infraspeak customers can retrieve the S3 files via AWS API, with the IAM credentials shared with them.
By using the Athena connector for any analytic tools, the customer can select which tables (from the available data) they want to import.
Athena will have a workgroup for each customer, defining the specific output folder to store the query results in S3.

Data Lake tables:

In Infraspeak Data Lake, you will find data from the following objects:

Data Lake objects/tables (please consider that only the data lake “__view “tables should be used for analytics):
- buy_order → Purchases (of type Material and Services)
- buy_order_material → Material Purchase Lines
- buy_order_registry → Buy order’s actions registry
- buy_order_service → Service Purchase Lines
- category → Maintenance Categories
- category_meterings → Maintenance Categories meterings
- category_meterings_catalog → Catalog options of catalog meterings
- characteristic →Maintenance Categories characteristics
- client → Clients
- client_operator → Client’s associated users
- cost_center → Cost Centers
- element → Assets
- element_characteristic → Asset characteristics
- element_economic → Asset economic data
- element_other cost → Other Costs registry in Assets
- element_registry → Asset registry
- event → Work Order and Planned Job Orders events
- event_registry → Event Registry
- failure → Work Orders
- failure_element → Work Order’s Assets
- failure_other_cost → Other costs registry in Work Orders
- failure_pause_reason → Work Order Pause reasons
- failure_priority → Available Work Order priorities
- failure_sla → Work Orders SLA
- failure_sla_rule → Work Order SLA rules
- failure_sla_rule_operator → Work Order SLA notification rules per Operator
- failure_sla_rule_registry →Work Order SLA rules registry
- gatekeeper → Gatekeepers available/configured
- gatekeeper_answer _registry → Registry of answered gatekeepers
- gatekeeper_question → Questions available in the Gatekeeper
- gatekeeper_question_answer → Registry of questions answered in each gatekeeper
- intervention → Maintenance Categorias interventions
- intervention_procedure → Intervention’s tasks
- local_operator → Location’s associated users
- location → Locations
- location_building_info → Buildings
- maintenance_procedure → Categories configured tasks
- maintenance_procedure_metering → Tasks’ associated measurements
- material → Materials
- material_warehouse → Materials’ warehouse association
- metering_registry → Metering registry
- operator → Users
- operator_activity → Operator activity registry
- operator_technical_skill → Operator’s associated technical skills
- other_cost → Other Costs
- problem → Work Order areas and types
- problem_technical_skill → Work Order area’s associated technical skills
- problem_responsible → Work Order area’s associated responsibles
- quote → Quotes
- quote_line → Quote Lines
- quote_request → Quote requests
- quote_request_line → Quote request Lines
- scheduled_work → Planned Job Orders
- schedule_work_other_cost → Other Costs registry in Planned Job Orders
- sell_order → Sell Orders
- sell_order_line → Sell Order Lines
- stock → Stock registry
- stock_movement → Stock movement registry
- supplier → Suppliers
- technical_skill → Technical Skills
- warehouse → Warehouses
- work → Planned Jobs
- work_intervention → Planned Jobs interventions
- work_location → Planned Jobs’ locations
- work_responsible → Planned Job responsibles
- work_sla_rule → Planned Jobs SLA rules
- work_sla_rule_operator → Planned Jobs SLA notification rules per Operator
- work_type → Planned Job types

The data will be provided in a sequential manner, the timing availability of the data will be according to the plan contractualized.

Table Repository

The Data Lake repository offers crucial information about the available tables and their correlations. The access to the repository is automatically granted via an email invitation once the Data Lake is configured.

Navigating the Repository

On the main page, you'll find a comprehensive list of tables available in the Data Lake. We recommend reviewing this list. To view the details of a specific table, simply click directly on the table name in the list on the left side of the page. The documentation also includes table correlation diagrams.

The specific diagram for each table can be viewed directly on that table's page, as explained above. To access the general diagram showing all tables, click on "Diagram" at the top center of the repository's first page.

Data Lake - How it works

Related Articles