facebook tracking

Master Thesis - Data Crawler for Autonomous Driving Data

Background 

Thanks to our daily data collection and test driving work, Zenseact has a huge amount of data stored in our data centers. How to present the data in a user friendly way and how to manage it is a challenge. As of today we typically extract metadata upfront at the time of ingestion by analyzing the data being uploaded. We would like to explore how we could extract metadata from and index data already present in our storage systems by designing a data crawler. The crawler would need to have a basic understanding of our file system layout and the file formats we use and their internal structure. This way, if we want to extract new information from our dataset we can add new analysis logic to the crawler, and eventually, the data would be available once the crawling has been completed.

Creating a high-performance file searching algorithm should be explored based on the data system crawler work. Ideally, this algorithm should return detailed results with a very low response time. The algorithm could start by using the index maintained by the file system crawler, but then continue to actually inspect file contents in an efficient way. It could also mean some type of keywords based search, this part could integrate tags made by teams in the company or use machine learning methods to extract relevant keywords.

A data center user interface could be the final outcome that combines the functionality of searching for and analyzing data as well as monitoring the file crawling process. Potentially the data center could be integrated into Zenseact's data platform down the line alongside our other data access user interfaces.

Depending on the interest of the candidates we are open to focusing on different parts of the above ideas, or alternative approaches within this problem domain.

Project Description 

In this master thesis project, you will focus on: 

  • Building a data crawler to extract and index metadata from files stored in parallel file systems.
  • Designing a high-performance searching system with low response time based on the index.
  • Construct a data center that can be used to navigate the index and execute the data searching algorithm.

Qualifications 

We are looking for 2 students, preferably with good knowledge of

  • Background in Computer Science, Engineering, or related field.
  • C++ and/or Python programming
  • Distributed systems
  • Databases and querying engines
  • Data visualization

Good to have:

  • Machine Learning knowledge
  • User interface designing

Further information

Please send in individual applications with CV, motivational letter, and grade transcripts. 

Planned start: January 2022, with some flexibility.

Final application date: 20 of November 2021, but we will screen candidates continuously, so please submit your application as soon as possible.

Duration: 30 ECTS 

For questions regarding the project, please contact erik.thoresson@zenseact.com or yuchuan.jin@zenseact.com.

Additional information

  • Remote status

    Flexible remote

Or, know someone who would be a perfect fit? Let them know!

Gothenburg, Sweden

Lindholmspiren 2
417 56 Göteborg Directions View page

Making safe and intelligent mobility real.

At Zenseact, we lead the global movement of crafting tomorrow's mobility with the software platform of choice. Our mission is to “Make safe and intelligent mobility real, for everyone, everywhere”. This statement marks our conviction and dedication to bring autonomous driving out on the streets for real and is at the center of everything we do.

We could not dream of achieving this without our great teams of very talented people. We are on this journey together and our agile way of working is reflected throughout our entire organization; it is part of our culture and how we work, develop and grow together.

Teamtailor

Applicant tracking system by Teamtailor