Pattern Recognition: Find all unique segments made with 2d collinear points (python/pyspark)

Pattern recognition algorithm. Finds all unique line segments made with at least k collinear points found in a set of 2D points expressed in cartesian coordinates. k must be greater than or equal to 3 (why?, becuase any pair of points defines a line, so the concept of collinearity is not well define for two points)

Examples of the output:

Getting Started

The python code (fastSearchCollinearPoints.py) and pyspark code (pysparkSearchCollinearPoints.py) are independent from each other and each one represents an algorithm that solves this task.

The complexity of the python version is O(nln(n)) whereas the complexity of the pyspark version is O(n^2) where n is the size of the input

To run the code using your own data you must provide a plain text file in the following format

the first line indicates the number of points in the file. Inside the data folder there are many test cases to play around.

You can also use a random generator functions to generate the points at random. Search for the following snippets of code to use them

[a snippet of the code inside fastSearchCollinearPoints.py]

# * Random points generator
# Instead of read the data from files, uncomment the lines below to generate them randomly
# and set k to search for at leat k collinear points
# k = 4
# points = randomPointsGenerator.random_points(2000)

[a snippet of the code inside pysparkSearchCollinearPoints.py]

# * Random points generator
# Instead of read the data from a file, uncomment the line below to generate them randomly
# k = 4
# pts, points_to_draw = points_from_random(200)

Usage

For the python code you can use the followimg command

python fastSearchCollinearPoints.py k filename

Whete k is the number of collinear point you want to detect and filename is the name of the file you want to inspect (this file must be inside the data direcctory).

If you don't provide any arguments, the ones hardcoded inside the code will be used instead. You can use this option if you don't want to provide any argument. You can find where to select files manually in the line 100 which contains the following piece of code

except:
    k = 4
    file_name = 'input20.txt'          
    path      = Path.cwd()
    file_path = path / 'data' / file_name

For the pyspark version

spark-submit pysparkSearchCollinearPoints.py

to choose the files from the data folder you can use the function points_from_file (see the line 147 of the file pysparkSearchCollinearPoints.py for more info)

Installation

To run the python code you must have install the libraries: sys, numpy, pathlib, pygame

If you have installed anaconda, you can get pygame running the following code

pip install pygame

To run the pyspark code you must have intalled properly spark and pyspark

FAQ

I have troubles with pygame, Can I avoid used it?
- If you have troubles with pygame, you can rip off all the references to the stddraw.py file. You still have access to the list of the collinear points found and you can use this data to feed another plotting library like matplotlib.
Why do you choose pygame instead of matplotlib?
- Because in my opinion pygame offers a cleaner canvas to represent the results and it can render thousand of points and lines more efficiently than matplotlib
Does matplotlib will be supported in the future?
- Maybe, if I have time, I certainly do it
will it be useful to look inside this code or it is of that kind of unreadable and ugly code?
- Yeah, this code has some nice explanations and you certanly should see it if you want to learn to do similar things, particularly the pyspark version. The code written in python was a clever algorithm that I wrote for the Coursera course Algorithms I and I want to share it with the world. (The original version of this code was written in java, so if you are taking the Algorithms I course this python version don't will be graded by the autograder)

Acknowledgments and Credits

The test data was taken from the assignment 3 of the Coursera course Algorithms I offered by the Princeton University
The stddraw.py file was taken from the texbooksite Programming in python an interdiciplinary approach

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
images		images
.gitignore		.gitignore
LICENSE		LICENSE
LineSegment.py		LineSegment.py
Point.py		Point.py
README.md		README.md
color.py		color.py
fastSearchCollinearPoints.py		fastSearchCollinearPoints.py
mergeSort.py		mergeSort.py
mergeSortInefficient.py		mergeSortInefficient.py
pysparkSearchCollinearPoints.py		pysparkSearchCollinearPoints.py
randomPointsGenerator.py		randomPointsGenerator.py
stddraw.py		stddraw.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pattern Recognition: Find all unique segments made with 2d collinear points (python/pyspark)

Getting Started

Usage

Installation

FAQ

Acknowledgments and Credits

License

About

Releases

Packages

Languages

License

edxz7/Pattern-Recognition-2d-collinear-points-finder-with-python-pyspark

Folders and files

Latest commit

History

Repository files navigation

Pattern Recognition: Find all unique segments made with 2d collinear points (python/pyspark)

Getting Started

Usage

Installation

FAQ

Acknowledgments and Credits

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages