This is a link to a data set of imaginary people that can be used for various data manipulation tests:
https://dlptest.com/sample-data/namedobemail/
-
Useful resource
- richmond62
- Posts: 5288
- Joined: Sun Sep 12, 2021 11:03 am
- Location: Bulgaria
- Contact:
Useful resource
https://richmondmathewson.owlstown.net/
- richmond62
- Posts: 5288
- Joined: Sun Sep 12, 2021 11:03 am
- Location: Bulgaria
- Contact:
Re: Useful resource
That is a screenshot of part of the file ringed in red:
-
-
https://richmondmathewson.owlstown.net/
- richmond62
- Posts: 5288
- Joined: Sun Sep 12, 2021 11:03 am
- Location: Bulgaria
- Contact:
Re: Useful resource
https://forums.livecode.com/viewtopic.php?t=7070

Personally I don't get that sweaty about things.CSV is the devil. It needs to die, but there are apparently enough programmers out there who hate humanity that it's still alive, so we have to write CSV parsers for it. Serious drag.

https://richmondmathewson.owlstown.net/
- richmond62
- Posts: 5288
- Joined: Sun Sep 12, 2021 11:03 am
- Location: Bulgaria
- Contact:
Re: Useful resource
My stack BLOATED like something horrible when it loaded the CSV file to 368.5 MB.
So I am emptying it again before I upload it here: as, when empty it "weighs" 1 kilobyte.
- -
That is seriously crude and would need a lot of refining to filter out a lot of cruft:
-
So I am emptying it again before I upload it here: as, when empty it "weighs" 1 kilobyte.
- -
That is seriously crude and would need a lot of refining to filter out a lot of cruft:
-
- Attachments
-
- Dirty Data.oxtstack.zip
- (1.28 KiB) Downloaded 116 times
https://richmondmathewson.owlstown.net/
- richmond62
- Posts: 5288
- Joined: Sun Sep 12, 2021 11:03 am
- Location: Bulgaria
- Contact:
Re: Useful resource
I suspect that those control figures are just artifacts from the CSV document format that will serve no purpose in data mining:
- -
It is interesting that things look different in LibreOffice ( mainly because LibreOffice automatically opens it as UTF-7: but if it is opened as UTF-8 those black diamonds disappear):
- -
And in BBEdit those control figures (if that is what they are) don't appear:
-
- -
It is interesting that things look different in LibreOffice ( mainly because LibreOffice automatically opens it as UTF-7: but if it is opened as UTF-8 those black diamonds disappear):
- -
And in BBEdit those control figures (if that is what they are) don't appear:
-
https://richmondmathewson.owlstown.net/
- richmond62
- Posts: 5288
- Joined: Sun Sep 12, 2021 11:03 am
- Location: Bulgaria
- Contact:
Re: Useful resource
This:
- -
is 0x00AC or Decimal 172
As LibreOffice and BBEdit opens the CSV file at UTF-8 without those "control characters" I really wonder what OXT Lite is doing with it.
- -
is 0x00AC or Decimal 172
As LibreOffice and BBEdit opens the CSV file at UTF-8 without those "control characters" I really wonder what OXT Lite is doing with it.
https://richmondmathewson.owlstown.net/
- tperry2x
- Posts: 3522
- Joined: Tue Dec 21, 2021 9:10 pm
- Location: webtalk.tsites.co.uk
- Contact:
Re: Useful resource
Hmm, not sure where those ¬ characters and such are coming in from.
I've opened the CSS in a variety of text editors, Libreoffice, and Word.
Even nano doesn't show them. It says the text is DOS formatted, but doesn't explain why those characters are being inserted.
Not sure what I'd use this data for. Made me laugh though, as on the front page of where I downloaded that CSS from, we have a member of staff with that exact name and date of birth (and they've probably got that exact hotmail email address too!)
edit: Ah!
If you open it in a really old version of Microsoft Excel on the mac, I can get those strange characters to show up there too
I've opened the CSS in a variety of text editors, Libreoffice, and Word.
Even nano doesn't show them. It says the text is DOS formatted, but doesn't explain why those characters are being inserted.
Not sure what I'd use this data for. Made me laugh though, as on the front page of where I downloaded that CSS from, we have a member of staff with that exact name and date of birth (and they've probably got that exact hotmail email address too!)
edit: Ah!
If you open it in a really old version of Microsoft Excel on the mac, I can get those strange characters to show up there too
-
- Posts: 34
- Joined: Mon Sep 27, 2021 1:14 pm
- Location: Sol/ Terra/ Europe/ Bavaria
- Contact:
Re: Useful resource
Hi,
seen with a hexeditor there's an "Â" (A with circumflex, Â ASCII 194) and a nonbreakable space ( ASCII 160) between first and second name. I assume this is meant as a delimiter within the "name" field. Could be an old dbase export, or something similar more or less mouldy ...
Actually this this is the first CSV I've ever seen where they use "comma" as field delimiter. There's seriously sick ppl out there!
seen with a hexeditor there's an "Â" (A with circumflex, Â ASCII 194) and a nonbreakable space ( ASCII 160) between first and second name. I assume this is meant as a delimiter within the "name" field. Could be an old dbase export, or something similar more or less mouldy ...
Actually this this is the first CSV I've ever seen where they use "comma" as field delimiter. There's seriously sick ppl out there!
Who is online
Users browsing this forum: No registered users and 32 guests