GitHub - FORRESTHUACHEN/Source-code-for-RCCSTD-problem: the soure code and test data of paper named 'A solution to reconstruct cross-cut shre dde d text documents based on constraine d see d K-means algorithm and ant colony algorithm' which was published in Expert Systems With Application

FORRESTHUACHEN / Source-code-for-RCCSTD-problem Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

the soure code and test data of paper named 'A solution to reconstruct cross-cut shre dde d text documents based on constraine d see d K-means algorithm and ant colony algorithm' which was published in Expert Systems With Application

0 stars 0 forks Branches Tags Activity

Star

Notifications

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Clustering algorithm Source code_Version 2.txt		Clustering algorithm Source code_Version 2.txt
Data To Github.rar		Data To Github.rar
Junhua Chen Paper 4.pdf		Junhua Chen Paper 4.pdf
Readme		Readme
test data		test data

Repository files navigation

Due to my mistakes, I missed my Test 4, Test 5, Test 14, Test 15, Test 18 in the original test data. I am really sorry. 


Here is the test data of our project. As you can see, there are hundreds of fragments, a big image which is the original document's image,
A document named  'classifyResult.txt' is the clustering result based on our clustering algorithm.
A document named  'classifyResultbykmeas.txt' is the clustering result based on Kean's algorithm. 
A document named  'classifyResultbyXu.txt' is the clustering result based on Xu et al. 2014 's algorithm. 

A document named  'signalResult.txt' is the unidimensional vector which was mentioned in our paper and the 181'st column of each row is the flag to represent this fragment is the first fragment in a row or not. If the value of that column is equal to '1' means that that fragment not the first fragments and can't be used as the initial clustering centers, If not, it means that fragment is one of the fragments and can be used as clustering centers.

Just as you can see, data 6 - data 15 are the test data with just one document and data 16- data 24 are the test data with two mixed documents(Unfortunately, I last one two mixed documents data in the last one year or more time).