

# Assignment 4

Hey there! Welcome to Knowledge Lens Intern Training Program.

This Assignment will serve as a quick refresher on the usage of NoSQL and Time-series databases.
There are three tasks in this assignment, on completion of which you'll learn:
*  How to interact with Mongo DB
*  Using Pandas Dataframe and generating your own excel reports
*  Leveraging Kairos Time-series database for data ingestion and querying the same
*  Publishing and Consuming messages via MQTT protocol
*  Caching mechanism using Redis DB 

Happy Coding! :tada:

## :pushpin: Task 1: Working with Mongo - Advanced


### :golf: Areas covered:
- Working with NoSQL
- Working with Pandas

### :books: Description:
You are given with semester details in a JSON format, write FAST APIs for the below  : 

1. To accept the semester details JSON and insert it in a collection
2. To get sum and average of all marks filerter by any of "student_id", "batch_id", "semster_id", "subject_id"
3. Generate excel report with "Student name", "Sememter name", "Subject 1", "Subject 2" ...., "Total"

Sample Document:
```json
{
  "semester_name": "Semester 3",
  "semester_id": 299,
  "semester_start_date": "2019-05-18",
  "semester_end_date": "2019-06-04",
  "student_details":[{
  "student_name": "Student 1",
  "student_id": 14500,
  "batch_id": 43,
  "subject_details": [
    {
      "subject_name": "Subject 1",
      "subject_id": 12,
      "score": 74
    },
    {
      "subject_name": "Subject 2",
      "subject_id": 14,
      "score": 88
    },
    {
      "subject_name": "Subject 3",
      "subject_id": 15,
      "score": 75
    },
    {
      "subject_name": "Subject 4",
      "subject_id": 5,
      "score": 85
    }
  ]
},
{
  "student_name": "Student 2",
  "student_id": 14523,
  "batch_id": 44,
  "subject_details": [
    {
      "subject_name": "Subject 1",
      "subject_id": 12,
      "score": 83
    },
    {
      "subject_name": "Subject 2",
      "subject_id": 14,
      "score": 98
    },
    {
      "subject_name": "Subject 3",
      "subject_id": 15,
      "score": 68
    },
    {
      "subject_name": "Subject 4",
      "subject_id": 5,
      "score": 71
    }
  ]
}]
}
```

Note: Perform all filter operations on Mongo.
Bonus Points: Use Mongo Aggregate framework,

### :wrench: Tools to use: 
1. Pycharm / VSCode
2. Robo3T / Studio3T / MongoDB Compass
3. PyMongo


### :mag: References:
* [Querying Documents on Mongo](https://www.mongodb.com/docs/manual/tutorial/query-documents/)
* [Quick Summary on Mongo Aggregation Stages](https://www.mongodb.com/docs/manual/reference/operator/aggregation-pipeline/)
* [Generating Excel Sheets from a Pandas Dataframe](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html)
* [How to return files on FastAPI response](https://fastapi.tiangolo.com/advanced/custom-response/#fileresponse)
* [PyMongo Official Documentation](https://pymongo.readthedocs.io/en/stable/)



_________________________________

## :pushpin: Task 2: Working with Time-series


### :golf: Areas covered:
- Timeseries Operation
- Working with Timeseries
- Working with Pandas

### :books: Description:
You are given with a dataset of temperature in the form of a CSV file. The end goal of the project is to create an API interface that will provide the following: 

1. Get daily, weekly and monthly aggregate (min, max, and average) of the data filter by "good" data points and generate report in Excel format.

Sample Document:



| Datetime | Temperature | Data quality |
| -------- | ----------- | ------------ |
|2022-04-03 00:01:00.000 | 112.4 | good |
|2022-04-03 00:02:00.000 | 111.32| good |
|2022-04-03 00:03:00.000 | 114.98| bad  |

### :wrench: Tools to use: 
1. Pycharm / VSCode
2. Pandas 
3. Kairos

### :mag: References:
* [How to query Kairos DB using Metrics](https://kairosdb.github.io/docs/restapi/QueryMetrics.html)


------------------------------------------------------

## :pushpin: Task 3: Working with MQTT & REDIS

### :golf: Areas covered:
- MQTT Protocol
- Caching using Redis DB

### :books: Description
For the sample given in Task 2: 

1. Push a message to the MQTT for every 5th successful record with "good" data quality.
2. Message should contain stats (sum, average,  timestamp - latest message) of all above 5 good records.
3. Store each aggregation to a separate redis database.
4. Develop an API to fetch the above saved data from redis DB.


### :wrench: Tools to use: 
1. Pycharm / VSCode
2. MQTT - (PIP package: `paho-mqtt`)
3. REDIS - (PIP package: `redis`)

### :mag: References:
* [Using MQTT in Python](https://www.emqx.com/en/blog/how-to-use-mqtt-in-python)
* [Connection to Redis in Python](https://docs.redis.com/latest/rs/references/client_references/client_python/)



