Useful resource

All sorts of amusements and nonsense unrelated to xTalk
Post Reply
User avatar
richmond62
Posts: 5288
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Useful resource

Post by richmond62 »

This is a link to a data set of imaginary people that can be used for various data manipulation tests:

https://dlptest.com/sample-data/namedobemail/
-
SShot 2025-03-14 at 10.48.31.png
SShot 2025-03-14 at 10.48.31.png (264.61 KiB) Viewed 6700 times
https://richmondmathewson.owlstown.net/
User avatar
richmond62
Posts: 5288
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Re: Useful resource

Post by richmond62 »

That is a screenshot of part of the file ringed in red:
-
SShot 2025-03-14 at 10.49.56.png
SShot 2025-03-14 at 10.49.56.png (28.62 KiB) Viewed 6697 times
https://richmondmathewson.owlstown.net/
User avatar
richmond62
Posts: 5288
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Re: Useful resource

Post by richmond62 »

https://forums.livecode.com/viewtopic.php?t=7070
CSV is the devil. It needs to die, but there are apparently enough programmers out there who hate humanity that it's still alive, so we have to write CSV parsers for it. Serious drag.
Personally I don't get that sweaty about things. 8-)
https://richmondmathewson.owlstown.net/
User avatar
richmond62
Posts: 5288
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Re: Useful resource

Post by richmond62 »

My stack BLOATED like something horrible when it loaded the CSV file to 368.5 MB.

So I am emptying it again before I upload it here: as, when empty it "weighs" 1 kilobyte.
-
SShot 2025-03-14 at 11.35.39.png
SShot 2025-03-14 at 11.35.39.png (267.28 KiB) Viewed 6671 times
-
That is seriously crude and would need a lot of refining to filter out a lot of cruft:
-
Attachments
Dirty Data.oxtstack.zip
(1.28 KiB) Downloaded 116 times
https://richmondmathewson.owlstown.net/
User avatar
richmond62
Posts: 5288
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Re: Useful resource

Post by richmond62 »

I suspect that those control figures are just artifacts from the CSV document format that will serve no purpose in data mining:
-
SShot 2025-03-14 at 11.35.56.png
SShot 2025-03-14 at 11.35.56.png (36.06 KiB) Viewed 6669 times
-
It is interesting that things look different in LibreOffice ( mainly because LibreOffice automatically opens it as UTF-7: but if it is opened as UTF-8 those black diamonds disappear):
-
SShot 2025-03-14 at 12.05.48.png
SShot 2025-03-14 at 12.05.48.png (59.34 KiB) Viewed 6645 times
-
And in BBEdit those control figures (if that is what they are) don't appear:
-
SShot 2025-03-14 at 12.06.31.png
SShot 2025-03-14 at 12.06.31.png (83.01 KiB) Viewed 6645 times
https://richmondmathewson.owlstown.net/
User avatar
richmond62
Posts: 5288
Joined: Sun Sep 12, 2021 11:03 am
Location: Bulgaria
Contact:

Re: Useful resource

Post by richmond62 »

This:
-
SShot 2025-03-14 at 12.23.55.png
SShot 2025-03-14 at 12.23.55.png (3.25 KiB) Viewed 6624 times
-
is 0x00AC or Decimal 172

As LibreOffice and BBEdit opens the CSV file at UTF-8 without those "control characters" I really wonder what OXT Lite is doing with it.
https://richmondmathewson.owlstown.net/
User avatar
tperry2x
Posts: 3522
Joined: Tue Dec 21, 2021 9:10 pm
Location: webtalk.tsites.co.uk
Contact:

Re: Useful resource

Post by tperry2x »

Hmm, not sure where those ¬ characters and such are coming in from.
I've opened the CSS in a variety of text editors, Libreoffice, and Word.
Even nano doesn't show them. It says the text is DOS formatted, but doesn't explain why those characters are being inserted.
screenshot.png
screenshot.png (42.7 KiB) Viewed 6452 times
Not sure what I'd use this data for. Made me laugh though, as on the front page of where I downloaded that CSS from, we have a member of staff with that exact name and date of birth (and they've probably got that exact hotmail email address too!)

edit: Ah!
If you open it in a really old version of Microsoft Excel on the mac, I can get those strange characters to show up there too
Screenshot at 2025-03-14 13-39-27.png
Screenshot at 2025-03-14 13-39-27.png (108.55 KiB) Viewed 6439 times
axwald
Posts: 34
Joined: Mon Sep 27, 2021 1:14 pm
Location: Sol/ Terra/ Europe/ Bavaria
Contact:

Re: Useful resource

Post by axwald »

Hi,

seen with a hexeditor there's an "Â" (A with circumflex, Â ASCII 194) and a nonbreakable space (  ASCII 160) between first and second name. I assume this is meant as a delimiter within the "name" field. Could be an old dbase export, or something similar more or less mouldy ...

Actually this this is the first CSV I've ever seen where they use "comma" as field delimiter. There's seriously sick ppl out there!
Post Reply

Who is online

Users browsing this forum: No registered users and 32 guests